Auto generate mkdocs api reference from swagger

2025-12-23 01:24:24 +00:00 · 2025-10-26 14:43:27 +01:00
parent 975c740272
commit 85e21596d9
4 changed files with 11 additions and 565 deletions
--- a/cmd/server/main.go
+++ b/cmd/server/main.go
@@ -22,6 +22,9 @@ var buildTime string = "unknown"
 // @license.name MIT License
 // @license.url https://opensource.org/license/mit/
 // @basePath /api/v1
 // @securityDefinitions.apikey ApiKeyAuth
 // @in header
 // @name X-API-Key
 func main() {
 	// --version flag to print the version
--- a/docs-requirements.txt
+++ b/docs-requirements.txt
@@ -1,5 +1,6 @@
-mkdocs-material==9.5.3
+mkdocs-material==9.6.22
-mkdocs==1.5.3
+mkdocs==1.6.1
-pymdown-extensions==10.7
+pymdown-extensions==10.16.1
-mkdocs-git-revision-date-localized-plugin==1.2.4
+mkdocs-git-revision-date-localized-plugin==1.4.7
-mike==2.0.0
+mike==2.1.3
 neoteroi-mkdocs==1.1.3
--- a/docs/api-reference.md
+++ b/docs/api-reference.md
@@ -1,560 +1 @@
-# API Reference
+[OAD(swagger.yaml)]
 Complete reference for the Llamactl REST API.
 ## Base URL
 All API endpoints are relative to the base URL:
 ```
 http://localhost:8080/api/v1
 ```
 ## Authentication
 Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:
 ```bash
 curl -H "Authorization: Bearer <your-api-key>" \
  http://localhost:8080/api/v1/instances
 ```
 The server supports two types of API keys:
 - **Management API Keys**: Required for instance management operations (CRUD operations on instances)
 - **Inference API Keys**: Required for OpenAI-compatible inference endpoints
 ## System Endpoints
 ### Get Llamactl Version
 Get the version information of the llamactl server.
 ```http
 GET /api/v1/version
 ```
 **Response:**
 ```
 Version: 1.0.0
 Commit: abc123
 Build Time: 2024-01-15T10:00:00Z
 ```
 ### Get Llama Server Help
 Get help text for the llama-server command.
 ```http
 GET /api/v1/server/help
 ```
 **Response:** Plain text help output from `llama-server --help`
 ### Get Llama Server Version
 Get version information of the llama-server binary.
 ```http
 GET /api/v1/server/version
 ```
 **Response:** Plain text version output from `llama-server --version`
 ### List Available Devices
 List available devices for llama-server.
 ```http
 GET /api/v1/server/devices
 ```
 **Response:** Plain text device list from `llama-server --list-devices`
 ## Instances
 ### List All Instances
 Get a list of all instances.
 ```http
 GET /api/v1/instances
 ```
 **Response:**
 ```json
 [
  {
    "name": "llama2-7b",
    "status": "running",
    "created": 1705312200
  }
 ]
 ```
 ### Get Instance Details
 Get detailed information about a specific instance.
 ```http
 GET /api/v1/instances/{name}
 ```
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
 }
 ```
 ### Create Instance
 Create and start a new instance.
 ```http
 POST /api/v1/instances/{name}
 ```
 **Request Body:** JSON object with instance configuration. Common fields include:
 - `backend_type`: Backend type (`llama_cpp`, `mlx_lm`, or `vllm`)
 - `backend_options`: Backend-specific configuration
 - `auto_restart`: Enable automatic restart on failure
 - `max_restarts`: Maximum restart attempts
 - `restart_delay`: Delay between restarts in seconds
 - `on_demand_start`: Start instance when receiving requests
 - `idle_timeout`: Idle timeout in minutes
 - `environment`: Environment variables as key-value pairs
 - `nodes`: Array with single node name to deploy the instance to (for remote deployments)
 See [Managing Instances](managing-instances.md) for complete configuration options.
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
 }
 ```
 ### Update Instance
 Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
 ```http
 PUT /api/v1/instances/{name}
 ```
 **Request Body:** JSON object with configuration fields to update.
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
 }
 ```
 ### Delete Instance
 Stop and remove an instance.
 ```http
 DELETE /api/v1/instances/{name}
 ```
 **Response:** `204 No Content`
 ## Instance Operations
 ### Start Instance
 Start a stopped instance.
 ```http
 POST /api/v1/instances/{name}/start
 ```
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
 }
 ```
 **Error Responses:**
 - `409 Conflict`: Maximum number of running instances reached
 - `500 Internal Server Error`: Failed to start instance
 ### Stop Instance
 Stop a running instance.
 ```http
 POST /api/v1/instances/{name}/stop
 ```
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "stopped",
  "created": 1705312200
 }
 ```
 ### Restart Instance
 Restart an instance (stop then start).
 ```http
 POST /api/v1/instances/{name}/restart
 ```
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
 }
 ```
 ### Get Instance Logs
 Retrieve instance logs.
 ```http
 GET /api/v1/instances/{name}/logs
 ```
 **Query Parameters:**
 - `lines`: Number of lines to return (default: all lines, use -1 for all)
 **Response:** Plain text log output
 **Example:**
 ```bash
 curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
 ```
 ### Proxy to Instance
 Proxy HTTP requests directly to the llama-server instance.
 ```http
 GET /api/v1/instances/{name}/proxy/*
 POST /api/v1/instances/{name}/proxy/*
 ```
 This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.
 **Example - Check Instance Health:**
 ```bash
 curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/proxy/health
 ```
 This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.
 **Error Responses:**
 - `503 Service Unavailable`: Instance is not running
 ## OpenAI-Compatible API
 Llamactl provides OpenAI-compatible endpoints for inference operations.
 ### List Models
 List all instances in OpenAI-compatible format.
 ```http
 GET /v1/models
 ```
 **Response:**
 ```json
 {
  "object": "list",
  "data": [
    {
      "id": "llama2-7b",
      "object": "model",
      "created": 1705312200,
      "owned_by": "llamactl"
    }
  ]
 }
 ```
 ### Chat Completions, Completions, Embeddings
 All OpenAI-compatible inference endpoints are available:
 ```http
 POST /v1/chat/completions
 POST /v1/completions
 POST /v1/embeddings
 POST /v1/rerank
 POST /v1/reranking
 ```
 **Request Body:** Standard OpenAI format with `model` field specifying the instance name
 **Example:**
 ```json
 {
  "model": "llama2-7b",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
 }
 ```
 The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).
 **Error Responses:**
 - `400 Bad Request`: Invalid request body or missing instance name
 - `503 Service Unavailable`: Instance is not running and on-demand start is disabled
 - `409 Conflict`: Cannot start instance due to maximum instances limit
 ## Instance Status Values
 Instances can have the following status values:
 - `stopped`: Instance is not running
 - `running`: Instance is running and ready to accept requests
 - `failed`: Instance failed to start or crashed  
 ## Error Responses
 All endpoints may return error responses in the following format:
 ```json
 {
  "error": "Error message description"
 }
 ```
 ### Common HTTP Status Codes
 - `200`: Success
 - `201`: Created
 - `204`: No Content (successful deletion)
 - `400`: Bad Request (invalid parameters or request body)
 - `401`: Unauthorized (missing or invalid API key)
 - `403`: Forbidden (insufficient permissions)
 - `404`: Not Found (instance not found)
 - `409`: Conflict (instance already exists, max instances reached)
 - `500`: Internal Server Error
 - `503`: Service Unavailable (instance not running)
 ## Examples
 ### Complete Instance Lifecycle
 ```bash
 # Create and start instance
 curl -X POST http://localhost:8080/api/v1/instances/my-model \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "backend_type": "llama_cpp",
    "backend_options": {
      "model": "/models/llama-2-7b.gguf",
      "gpu_layers": 32
    },
    "environment": {
      "CUDA_VISIBLE_DEVICES": "0",
      "OMP_NUM_THREADS": "8"
    }
  }'
 # Check instance status
 curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model
 # Get instance logs
 curl -H "Authorization: Bearer your-api-key" \
  "http://localhost:8080/api/v1/instances/my-model/logs?lines=50"
 # Use OpenAI-compatible chat completions
 curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-inference-api-key" \
  -d '{
    "model": "my-model",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'
 # Stop instance
 curl -X POST -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/stop
 # Delete instance
 curl -X DELETE -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model
 ```
 ### Remote Node Instance Example
 ```bash
 # Create instance on specific remote node
 curl -X POST http://localhost:8080/api/v1/instances/remote-model \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "backend_type": "llama_cpp",
    "backend_options": {
      "model": "/models/llama-2-7b.gguf",
      "gpu_layers": 32
    },
    "nodes": ["worker1"]
  }'
 # Check status of remote instance
 curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/remote-model
 # Use remote instance with OpenAI-compatible API
 curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-inference-api-key" \
  -d '{
    "model": "remote-model",
    "messages": [
      {"role": "user", "content": "Hello from remote node!"}
    ]
  }'
 ```
 ### Using the Proxy Endpoint
 You can also directly proxy requests to the llama-server instance:
 ```bash
 # Direct proxy to instance (bypasses OpenAI compatibility layer)
 curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "prompt": "Hello, world!",
    "n_predict": 50
  }'
 ```
 ## Backend-Specific Endpoints
 ### Parse Commands
 Llamactl provides endpoints to parse command strings from different backends into instance configuration options.
 #### Parse Llama.cpp Command
 Parse a llama-server command string into instance options.
 ```http
 POST /api/v1/backends/llama-cpp/parse-command
 ```
 **Request Body:**
 ```json
 {
  "command": "llama-server -m /path/to/model.gguf -c 2048 --port 8080"
 }
 ```
 **Response:**
 ```json
 {
  "backend_type": "llama_cpp",
  "llama_server_options": {
    "model": "/path/to/model.gguf",
    "ctx_size": 2048,
    "port": 8080
  }
 }
 ```
 #### Parse MLX-LM Command
 Parse an MLX-LM server command string into instance options.
 ```http
 POST /api/v1/backends/mlx/parse-command
 ```
 **Request Body:**
 ```json
 {
  "command": "mlx_lm.server --model /path/to/model --port 8080"
 }
 ```
 **Response:**
 ```json
 {
  "backend_type": "mlx_lm",
  "mlx_server_options": {
    "model": "/path/to/model",
    "port": 8080
  }
 }
 ```
 #### Parse vLLM Command
 Parse a vLLM serve command string into instance options.
 ```http
 POST /api/v1/backends/vllm/parse-command
 ```
 **Request Body:**
 ```json
 {
  "command": "vllm serve /path/to/model --port 8080"
 }
 ```
 **Response:**
 ```json
 {
  "backend_type": "vllm",
  "vllm_server_options": {
    "model": "/path/to/model",
    "port": 8080
  }
 }
 ```
 **Error Responses for Parse Commands:**
 - `400 Bad Request`: Invalid request body, empty command, or parse error
 - `500 Internal Server Error`: Encoding error
 ## Auto-Generated Documentation
 The API documentation is automatically generated from code annotations using Swagger/OpenAPI. To regenerate the documentation:
 1. Install the swag tool: `go install github.com/swaggo/swag/cmd/swag@latest`
 2. Generate docs: `swag init -g cmd/server/main.go -o apidocs`
 ## Swagger Documentation
 If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:
 ```
 http://localhost:8080/swagger/
 ```
 This provides a complete interactive interface for testing all API endpoints.
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -64,6 +64,7 @@ plugins:
      css_dir: css
      javascript_dir: js
      canonical_version: null
  - neoteroi.mkdocsoad
 hooks:
  - docs/readme_sync.py