mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
Auto generate mkdocs api reference from swagger
This commit is contained in:
@@ -22,6 +22,9 @@ var buildTime string = "unknown"
|
|||||||
// @license.name MIT License
|
// @license.name MIT License
|
||||||
// @license.url https://opensource.org/license/mit/
|
// @license.url https://opensource.org/license/mit/
|
||||||
// @basePath /api/v1
|
// @basePath /api/v1
|
||||||
|
// @securityDefinitions.apikey ApiKeyAuth
|
||||||
|
// @in header
|
||||||
|
// @name X-API-Key
|
||||||
func main() {
|
func main() {
|
||||||
|
|
||||||
// --version flag to print the version
|
// --version flag to print the version
|
||||||
|
|||||||
@@ -1,5 +1,6 @@
|
|||||||
mkdocs-material==9.5.3
|
mkdocs-material==9.6.22
|
||||||
mkdocs==1.5.3
|
mkdocs==1.6.1
|
||||||
pymdown-extensions==10.7
|
pymdown-extensions==10.16.1
|
||||||
mkdocs-git-revision-date-localized-plugin==1.2.4
|
mkdocs-git-revision-date-localized-plugin==1.4.7
|
||||||
mike==2.0.0
|
mike==2.1.3
|
||||||
|
neoteroi-mkdocs==1.1.3
|
||||||
@@ -1,560 +1 @@
|
|||||||
# API Reference
|
[OAD(swagger.yaml)]
|
||||||
|
|
||||||
Complete reference for the Llamactl REST API.
|
|
||||||
|
|
||||||
## Base URL
|
|
||||||
|
|
||||||
All API endpoints are relative to the base URL:
|
|
||||||
|
|
||||||
```
|
|
||||||
http://localhost:8080/api/v1
|
|
||||||
```
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -H "Authorization: Bearer <your-api-key>" \
|
|
||||||
http://localhost:8080/api/v1/instances
|
|
||||||
```
|
|
||||||
|
|
||||||
The server supports two types of API keys:
|
|
||||||
- **Management API Keys**: Required for instance management operations (CRUD operations on instances)
|
|
||||||
- **Inference API Keys**: Required for OpenAI-compatible inference endpoints
|
|
||||||
|
|
||||||
## System Endpoints
|
|
||||||
|
|
||||||
### Get Llamactl Version
|
|
||||||
|
|
||||||
Get the version information of the llamactl server.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/version
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```
|
|
||||||
Version: 1.0.0
|
|
||||||
Commit: abc123
|
|
||||||
Build Time: 2024-01-15T10:00:00Z
|
|
||||||
```
|
|
||||||
|
|
||||||
### Get Llama Server Help
|
|
||||||
|
|
||||||
Get help text for the llama-server command.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/server/help
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:** Plain text help output from `llama-server --help`
|
|
||||||
|
|
||||||
### Get Llama Server Version
|
|
||||||
|
|
||||||
Get version information of the llama-server binary.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/server/version
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:** Plain text version output from `llama-server --version`
|
|
||||||
|
|
||||||
### List Available Devices
|
|
||||||
|
|
||||||
List available devices for llama-server.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/server/devices
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:** Plain text device list from `llama-server --list-devices`
|
|
||||||
|
|
||||||
## Instances
|
|
||||||
|
|
||||||
### List All Instances
|
|
||||||
|
|
||||||
Get a list of all instances.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/instances
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
[
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
### Get Instance Details
|
|
||||||
|
|
||||||
Get detailed information about a specific instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/instances/{name}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Create Instance
|
|
||||||
|
|
||||||
Create and start a new instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/instances/{name}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:** JSON object with instance configuration. Common fields include:
|
|
||||||
|
|
||||||
- `backend_type`: Backend type (`llama_cpp`, `mlx_lm`, or `vllm`)
|
|
||||||
- `backend_options`: Backend-specific configuration
|
|
||||||
- `auto_restart`: Enable automatic restart on failure
|
|
||||||
- `max_restarts`: Maximum restart attempts
|
|
||||||
- `restart_delay`: Delay between restarts in seconds
|
|
||||||
- `on_demand_start`: Start instance when receiving requests
|
|
||||||
- `idle_timeout`: Idle timeout in minutes
|
|
||||||
- `environment`: Environment variables as key-value pairs
|
|
||||||
- `nodes`: Array with single node name to deploy the instance to (for remote deployments)
|
|
||||||
|
|
||||||
See [Managing Instances](managing-instances.md) for complete configuration options.
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Update Instance
|
|
||||||
|
|
||||||
Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
|
|
||||||
|
|
||||||
```http
|
|
||||||
PUT /api/v1/instances/{name}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:** JSON object with configuration fields to update.
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Delete Instance
|
|
||||||
|
|
||||||
Stop and remove an instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
DELETE /api/v1/instances/{name}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:** `204 No Content`
|
|
||||||
|
|
||||||
## Instance Operations
|
|
||||||
|
|
||||||
### Start Instance
|
|
||||||
|
|
||||||
Start a stopped instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/instances/{name}/start
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Error Responses:**
|
|
||||||
- `409 Conflict`: Maximum number of running instances reached
|
|
||||||
- `500 Internal Server Error`: Failed to start instance
|
|
||||||
|
|
||||||
### Stop Instance
|
|
||||||
|
|
||||||
Stop a running instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/instances/{name}/stop
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "stopped",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Restart Instance
|
|
||||||
|
|
||||||
Restart an instance (stop then start).
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/instances/{name}/restart
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "llama2-7b",
|
|
||||||
"status": "running",
|
|
||||||
"created": 1705312200
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Get Instance Logs
|
|
||||||
|
|
||||||
Retrieve instance logs.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/instances/{name}/logs
|
|
||||||
```
|
|
||||||
|
|
||||||
**Query Parameters:**
|
|
||||||
- `lines`: Number of lines to return (default: all lines, use -1 for all)
|
|
||||||
|
|
||||||
**Response:** Plain text log output
|
|
||||||
|
|
||||||
**Example:**
|
|
||||||
```bash
|
|
||||||
curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Proxy to Instance
|
|
||||||
|
|
||||||
Proxy HTTP requests directly to the llama-server instance.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/v1/instances/{name}/proxy/*
|
|
||||||
POST /api/v1/instances/{name}/proxy/*
|
|
||||||
```
|
|
||||||
|
|
||||||
This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.
|
|
||||||
|
|
||||||
**Example - Check Instance Health:**
|
|
||||||
```bash
|
|
||||||
curl -H "Authorization: Bearer your-api-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-model/proxy/health
|
|
||||||
```
|
|
||||||
|
|
||||||
This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.
|
|
||||||
|
|
||||||
**Error Responses:**
|
|
||||||
- `503 Service Unavailable`: Instance is not running
|
|
||||||
|
|
||||||
## OpenAI-Compatible API
|
|
||||||
|
|
||||||
Llamactl provides OpenAI-compatible endpoints for inference operations.
|
|
||||||
|
|
||||||
### List Models
|
|
||||||
|
|
||||||
List all instances in OpenAI-compatible format.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /v1/models
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"object": "list",
|
|
||||||
"data": [
|
|
||||||
{
|
|
||||||
"id": "llama2-7b",
|
|
||||||
"object": "model",
|
|
||||||
"created": 1705312200,
|
|
||||||
"owned_by": "llamactl"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Chat Completions, Completions, Embeddings
|
|
||||||
|
|
||||||
All OpenAI-compatible inference endpoints are available:
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /v1/chat/completions
|
|
||||||
POST /v1/completions
|
|
||||||
POST /v1/embeddings
|
|
||||||
POST /v1/rerank
|
|
||||||
POST /v1/reranking
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:** Standard OpenAI format with `model` field specifying the instance name
|
|
||||||
|
|
||||||
**Example:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"model": "llama2-7b",
|
|
||||||
"messages": [
|
|
||||||
{
|
|
||||||
"role": "user",
|
|
||||||
"content": "Hello, how are you?"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).
|
|
||||||
|
|
||||||
**Error Responses:**
|
|
||||||
- `400 Bad Request`: Invalid request body or missing instance name
|
|
||||||
- `503 Service Unavailable`: Instance is not running and on-demand start is disabled
|
|
||||||
- `409 Conflict`: Cannot start instance due to maximum instances limit
|
|
||||||
|
|
||||||
## Instance Status Values
|
|
||||||
|
|
||||||
Instances can have the following status values:
|
|
||||||
- `stopped`: Instance is not running
|
|
||||||
- `running`: Instance is running and ready to accept requests
|
|
||||||
- `failed`: Instance failed to start or crashed
|
|
||||||
|
|
||||||
## Error Responses
|
|
||||||
|
|
||||||
All endpoints may return error responses in the following format:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"error": "Error message description"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Common HTTP Status Codes
|
|
||||||
|
|
||||||
- `200`: Success
|
|
||||||
- `201`: Created
|
|
||||||
- `204`: No Content (successful deletion)
|
|
||||||
- `400`: Bad Request (invalid parameters or request body)
|
|
||||||
- `401`: Unauthorized (missing or invalid API key)
|
|
||||||
- `403`: Forbidden (insufficient permissions)
|
|
||||||
- `404`: Not Found (instance not found)
|
|
||||||
- `409`: Conflict (instance already exists, max instances reached)
|
|
||||||
- `500`: Internal Server Error
|
|
||||||
- `503`: Service Unavailable (instance not running)
|
|
||||||
|
|
||||||
## Examples
|
|
||||||
|
|
||||||
### Complete Instance Lifecycle
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create and start instance
|
|
||||||
curl -X POST http://localhost:8080/api/v1/instances/my-model \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer your-api-key" \
|
|
||||||
-d '{
|
|
||||||
"backend_type": "llama_cpp",
|
|
||||||
"backend_options": {
|
|
||||||
"model": "/models/llama-2-7b.gguf",
|
|
||||||
"gpu_layers": 32
|
|
||||||
},
|
|
||||||
"environment": {
|
|
||||||
"CUDA_VISIBLE_DEVICES": "0",
|
|
||||||
"OMP_NUM_THREADS": "8"
|
|
||||||
}
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Check instance status
|
|
||||||
curl -H "Authorization: Bearer your-api-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-model
|
|
||||||
|
|
||||||
# Get instance logs
|
|
||||||
curl -H "Authorization: Bearer your-api-key" \
|
|
||||||
"http://localhost:8080/api/v1/instances/my-model/logs?lines=50"
|
|
||||||
|
|
||||||
# Use OpenAI-compatible chat completions
|
|
||||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer your-inference-api-key" \
|
|
||||||
-d '{
|
|
||||||
"model": "my-model",
|
|
||||||
"messages": [
|
|
||||||
{"role": "user", "content": "Hello!"}
|
|
||||||
],
|
|
||||||
"max_tokens": 100
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Stop instance
|
|
||||||
curl -X POST -H "Authorization: Bearer your-api-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-model/stop
|
|
||||||
|
|
||||||
# Delete instance
|
|
||||||
curl -X DELETE -H "Authorization: Bearer your-api-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-model
|
|
||||||
```
|
|
||||||
|
|
||||||
### Remote Node Instance Example
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Create instance on specific remote node
|
|
||||||
curl -X POST http://localhost:8080/api/v1/instances/remote-model \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer your-api-key" \
|
|
||||||
-d '{
|
|
||||||
"backend_type": "llama_cpp",
|
|
||||||
"backend_options": {
|
|
||||||
"model": "/models/llama-2-7b.gguf",
|
|
||||||
"gpu_layers": 32
|
|
||||||
},
|
|
||||||
"nodes": ["worker1"]
|
|
||||||
}'
|
|
||||||
|
|
||||||
# Check status of remote instance
|
|
||||||
curl -H "Authorization: Bearer your-api-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/remote-model
|
|
||||||
|
|
||||||
# Use remote instance with OpenAI-compatible API
|
|
||||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer your-inference-api-key" \
|
|
||||||
-d '{
|
|
||||||
"model": "remote-model",
|
|
||||||
"messages": [
|
|
||||||
{"role": "user", "content": "Hello from remote node!"}
|
|
||||||
]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
### Using the Proxy Endpoint
|
|
||||||
|
|
||||||
You can also directly proxy requests to the llama-server instance:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Direct proxy to instance (bypasses OpenAI compatibility layer)
|
|
||||||
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer your-api-key" \
|
|
||||||
-d '{
|
|
||||||
"prompt": "Hello, world!",
|
|
||||||
"n_predict": 50
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Backend-Specific Endpoints
|
|
||||||
|
|
||||||
### Parse Commands
|
|
||||||
|
|
||||||
Llamactl provides endpoints to parse command strings from different backends into instance configuration options.
|
|
||||||
|
|
||||||
#### Parse Llama.cpp Command
|
|
||||||
|
|
||||||
Parse a llama-server command string into instance options.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/backends/llama-cpp/parse-command
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"command": "llama-server -m /path/to/model.gguf -c 2048 --port 8080"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"backend_type": "llama_cpp",
|
|
||||||
"llama_server_options": {
|
|
||||||
"model": "/path/to/model.gguf",
|
|
||||||
"ctx_size": 2048,
|
|
||||||
"port": 8080
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Parse MLX-LM Command
|
|
||||||
|
|
||||||
Parse an MLX-LM server command string into instance options.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/backends/mlx/parse-command
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"command": "mlx_lm.server --model /path/to/model --port 8080"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"backend_type": "mlx_lm",
|
|
||||||
"mlx_server_options": {
|
|
||||||
"model": "/path/to/model",
|
|
||||||
"port": 8080
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Parse vLLM Command
|
|
||||||
|
|
||||||
Parse a vLLM serve command string into instance options.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/v1/backends/vllm/parse-command
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"command": "vllm serve /path/to/model --port 8080"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"backend_type": "vllm",
|
|
||||||
"vllm_server_options": {
|
|
||||||
"model": "/path/to/model",
|
|
||||||
"port": 8080
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Error Responses for Parse Commands:**
|
|
||||||
- `400 Bad Request`: Invalid request body, empty command, or parse error
|
|
||||||
- `500 Internal Server Error`: Encoding error
|
|
||||||
|
|
||||||
## Auto-Generated Documentation
|
|
||||||
|
|
||||||
The API documentation is automatically generated from code annotations using Swagger/OpenAPI. To regenerate the documentation:
|
|
||||||
|
|
||||||
1. Install the swag tool: `go install github.com/swaggo/swag/cmd/swag@latest`
|
|
||||||
2. Generate docs: `swag init -g cmd/server/main.go -o apidocs`
|
|
||||||
|
|
||||||
## Swagger Documentation
|
|
||||||
|
|
||||||
If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:
|
|
||||||
|
|
||||||
```
|
|
||||||
http://localhost:8080/swagger/
|
|
||||||
```
|
|
||||||
|
|
||||||
This provides a complete interactive interface for testing all API endpoints.
|
|
||||||
@@ -64,6 +64,7 @@ plugins:
|
|||||||
css_dir: css
|
css_dir: css
|
||||||
javascript_dir: js
|
javascript_dir: js
|
||||||
canonical_version: null
|
canonical_version: null
|
||||||
|
- neoteroi.mkdocsoad
|
||||||
|
|
||||||
hooks:
|
hooks:
|
||||||
- docs/readme_sync.py
|
- docs/readme_sync.py
|
||||||
|
|||||||
Reference in New Issue
Block a user