Update api-referrence

This commit is contained in:
2025-08-31 16:21:18 +02:00
parent 81a6c14bf6
commit 131b1b407d

View File

@@ -7,18 +7,69 @@ Complete reference for the Llamactl REST API.
All API endpoints are relative to the base URL: All API endpoints are relative to the base URL:
``` ```
http://localhost:8080/api http://localhost:8080/api/v1
``` ```
## Authentication ## Authentication
If authentication is enabled, include the JWT token in the Authorization header: Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:
```bash ```bash
curl -H "Authorization: Bearer <your-jwt-token>" \ curl -H "Authorization: Bearer <your-api-key>" \
http://localhost:8080/api/instances http://localhost:8080/api/v1/instances
``` ```
The server supports two types of API keys:
- **Management API Keys**: Required for instance management operations (CRUD operations on instances)
- **Inference API Keys**: Required for OpenAI-compatible inference endpoints
## System Endpoints
### Get Llamactl Version
Get the version information of the llamactl server.
```http
GET /api/v1/version
```
**Response:**
```
Version: 1.0.0
Commit: abc123
Build Time: 2024-01-15T10:00:00Z
```
### Get Llama Server Help
Get help text for the llama-server command.
```http
GET /api/v1/server/help
```
**Response:** Plain text help output from `llama-server --help`
### Get Llama Server Version
Get version information of the llama-server binary.
```http
GET /api/v1/server/version
```
**Response:** Plain text version output from `llama-server --version`
### List Available Devices
List available devices for llama-server.
```http
GET /api/v1/server/devices
```
**Response:** Plain text device list from `llama-server --list-devices`
## Instances ## Instances
### List All Instances ### List All Instances
@@ -26,23 +77,18 @@ curl -H "Authorization: Bearer <your-jwt-token>" \
Get a list of all instances. Get a list of all instances.
```http ```http
GET /api/instances GET /api/v1/instances
``` ```
**Response:** **Response:**
```json ```json
{ [
"instances": [
{ {
"name": "llama2-7b", "name": "llama2-7b",
"status": "running", "status": "running",
"model_path": "/models/llama-2-7b.gguf", "created": 1705312200
"port": 8081,
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
} }
] ]
}
``` ```
### Get Instance Details ### Get Instance Details
@@ -50,7 +96,7 @@ GET /api/instances
Get detailed information about a specific instance. Get detailed information about a specific instance.
```http ```http
GET /api/instances/{name} GET /api/v1/instances/{name}
``` ```
**Response:** **Response:**
@@ -58,92 +104,57 @@ GET /api/instances/{name}
{ {
"name": "llama2-7b", "name": "llama2-7b",
"status": "running", "status": "running",
"model_path": "/models/llama-2-7b.gguf", "created": 1705312200
"port": 8081,
"pid": 12345,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
},
"stats": {
"memory_usage": 4294967296,
"cpu_usage": 25.5,
"uptime": 3600
},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
} }
``` ```
### Create Instance ### Create Instance
Create a new instance. Create and start a new instance.
```http ```http
POST /api/instances POST /api/v1/instances/{name}
``` ```
**Request Body:** **Request Body:** JSON object with instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
```json
{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
}
}
```
**Response:** **Response:**
```json ```json
{ {
"message": "Instance created successfully", "name": "llama2-7b",
"instance": { "status": "running",
"name": "my-instance", "created": 1705312200
"status": "stopped",
"model_path": "/path/to/model.gguf",
"port": 8081,
"created_at": "2024-01-15T14:30:00Z"
}
} }
``` ```
### Update Instance ### Update Instance
Update an existing instance configuration. Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
```http ```http
PUT /api/instances/{name} PUT /api/v1/instances/{name}
``` ```
**Request Body:** **Request Body:** JSON object with configuration fields to update.
**Response:**
```json ```json
{ {
"options": { "name": "llama2-7b",
"threads": 8, "status": "running",
"context_size": 4096 "created": 1705312200
}
} }
``` ```
### Delete Instance ### Delete Instance
Delete an instance (must be stopped first). Stop and remove an instance.
```http ```http
DELETE /api/instances/{name} DELETE /api/v1/instances/{name}
``` ```
**Response:** **Response:** `204 No Content`
```json
{
"message": "Instance deleted successfully"
}
```
## Instance Operations ## Instance Operations
@@ -152,38 +163,36 @@ DELETE /api/instances/{name}
Start a stopped instance. Start a stopped instance.
```http ```http
POST /api/instances/{name}/start POST /api/v1/instances/{name}/start
``` ```
**Response:** **Response:**
```json ```json
{ {
"message": "Instance start initiated", "name": "llama2-7b",
"status": "starting" "status": "starting",
"created": 1705312200
} }
``` ```
**Error Responses:**
- `409 Conflict`: Maximum number of running instances reached
- `500 Internal Server Error`: Failed to start instance
### Stop Instance ### Stop Instance
Stop a running instance. Stop a running instance.
```http ```http
POST /api/instances/{name}/stop POST /api/v1/instances/{name}/stop
```
**Request Body (Optional):**
```json
{
"force": false,
"timeout": 30
}
``` ```
**Response:** **Response:**
```json ```json
{ {
"message": "Instance stop initiated", "name": "llama2-7b",
"status": "stopping" "status": "stopping",
"created": 1705312200
} }
``` ```
@@ -192,27 +201,15 @@ POST /api/instances/{name}/stop
Restart an instance (stop then start). Restart an instance (stop then start).
```http ```http
POST /api/instances/{name}/restart POST /api/v1/instances/{name}/restart
```
### Get Instance Health
Check instance health status.
```http
GET /api/instances/{name}/health
``` ```
**Response:** **Response:**
```json ```json
{ {
"status": "healthy", "name": "llama2-7b",
"checks": { "status": "restarting",
"process": "running", "created": 1705312200
"port": "open",
"response": "ok"
},
"last_check": "2024-01-15T14:30:00Z"
} }
``` ```
@@ -221,146 +218,108 @@ GET /api/instances/{name}/health
Retrieve instance logs. Retrieve instance logs.
```http ```http
GET /api/instances/{name}/logs GET /api/v1/instances/{name}/logs
``` ```
**Query Parameters:** **Query Parameters:**
- `lines`: Number of lines to return (default: 100) - `lines`: Number of lines to return (default: all lines, use -1 for all)
- `follow`: Stream logs (boolean)
- `level`: Filter by log level (debug, info, warn, error) **Response:** Plain text log output
**Example:**
```bash
curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
```
### Proxy to Instance
Proxy HTTP requests directly to the llama-server instance.
```http
GET /api/v1/instances/{name}/proxy/*
POST /api/v1/instances/{name}/proxy/*
```
This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.
**Example - Check Instance Health:**
```bash
curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model/proxy/health
```
This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.
**Error Responses:**
- `503 Service Unavailable`: Instance is not running
## OpenAI-Compatible API
Llamactl provides OpenAI-compatible endpoints for inference operations.
### List Models
List all instances in OpenAI-compatible format.
```http
GET /v1/models
```
**Response:** **Response:**
```json ```json
{ {
"logs": [ "object": "list",
"data": [
{ {
"timestamp": "2024-01-15T14:30:00Z", "id": "llama2-7b",
"level": "info", "object": "model",
"message": "Model loaded successfully" "created": 1705312200,
"owned_by": "llamactl"
} }
] ]
} }
``` ```
## Batch Operations ### Chat Completions, Completions, Embeddings
### Start All Instances All OpenAI-compatible inference endpoints are available:
Start all stopped instances.
```http ```http
POST /api/instances/start-all POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings
POST /v1/rerank
POST /v1/reranking
``` ```
### Stop All Instances **Request Body:** Standard OpenAI format with `model` field specifying the instance name
Stop all running instances. **Example:**
```http
POST /api/instances/stop-all
```
## System Information
### Get System Status
Get overall system status and metrics.
```http
GET /api/system/status
```
**Response:**
```json ```json
{ {
"version": "1.0.0", "model": "llama2-7b",
"uptime": 86400, "messages": [
"instances": {
"total": 5,
"running": 3,
"stopped": 2
},
"resources": {
"cpu_usage": 45.2,
"memory_usage": 8589934592,
"memory_total": 17179869184,
"disk_usage": 75.5
}
}
```
### Get System Information
Get detailed system information.
```http
GET /api/system/info
```
**Response:**
```json
{ {
"hostname": "server-01", "role": "user",
"os": "linux", "content": "Hello, how are you?"
"arch": "amd64", }
"cpu_count": 8, ]
"memory_total": 17179869184,
"version": "1.0.0",
"build_time": "2024-01-15T10:00:00Z"
} }
``` ```
## Configuration The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).
### Get Configuration **Error Responses:**
- `400 Bad Request`: Invalid request body or missing model name
- `503 Service Unavailable`: Instance is not running and on-demand start is disabled
- `409 Conflict`: Cannot start instance due to maximum instances limit
Get current Llamactl configuration. ## Instance Status Values
```http Instances can have the following status values:
GET /api/config - `stopped`: Instance is not running
``` - `running`: Instance is running and ready to accept requests
- `failed`: Instance failed to start or crashed
### Update Configuration
Update Llamactl configuration (requires restart).
```http
PUT /api/config
```
## Authentication
### Login
Authenticate and receive a JWT token.
```http
POST /api/auth/login
```
**Request Body:**
```json
{
"username": "admin",
"password": "password"
}
```
**Response:**
```json
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires_at": "2024-01-16T14:30:00Z"
}
```
### Refresh Token
Refresh an existing JWT token.
```http
POST /api/auth/refresh
```
## Error Responses ## Error Responses
@@ -368,9 +327,7 @@ All endpoints may return error responses in the following format:
```json ```json
{ {
"error": "Error message", "error": "Error message description"
"code": "ERROR_CODE",
"details": "Additional error details"
} }
``` ```
@@ -378,87 +335,78 @@ All endpoints may return error responses in the following format:
- `200`: Success - `200`: Success
- `201`: Created - `201`: Created
- `400`: Bad Request - `204`: No Content (successful deletion)
- `401`: Unauthorized - `400`: Bad Request (invalid parameters or request body)
- `403`: Forbidden - `401`: Unauthorized (missing or invalid API key)
- `404`: Not Found - `403`: Forbidden (insufficient permissions)
- `409`: Conflict (e.g., instance already exists) - `404`: Not Found (instance not found)
- `409`: Conflict (instance already exists, max instances reached)
- `500`: Internal Server Error - `500`: Internal Server Error
- `503`: Service Unavailable (instance not running)
## WebSocket API
### Real-time Updates
Connect to WebSocket for real-time updates:
```javascript
const ws = new WebSocket('ws://localhost:8080/api/ws');
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Update:', data);
};
```
**Message Types:**
- `instance_status_changed`: Instance status updates
- `instance_stats_updated`: Resource usage updates
- `system_alert`: System-level alerts
## Rate Limiting
API requests are rate limited to:
- **100 requests per minute** for regular endpoints
- **10 requests per minute** for resource-intensive operations
Rate limit headers are included in responses:
- `X-RateLimit-Limit`: Request limit
- `X-RateLimit-Remaining`: Remaining requests
- `X-RateLimit-Reset`: Reset time (Unix timestamp)
## SDKs and Libraries
### Go Client
```go
import "github.com/lordmathis/llamactl-go-client"
client := llamactl.NewClient("http://localhost:8080")
instances, err := client.ListInstances()
```
### Python Client
```python
from llamactl import Client
client = Client("http://localhost:8080")
instances = client.list_instances()
```
## Examples ## Examples
### Complete Instance Lifecycle ### Complete Instance Lifecycle
```bash ```bash
# Create instance # Create and start instance
curl -X POST http://localhost:8080/api/instances \ curl -X POST http://localhost:8080/api/v1/instances/my-model \
-H "Content-Type: application/json" \ -H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{ -d '{
"name": "example", "model": "/models/llama-2-7b.gguf"
"model_path": "/models/example.gguf",
"port": 8081
}' }'
# Start instance # Check instance status
curl -X POST http://localhost:8080/api/instances/example/start curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model
# Check status # Get instance logs
curl http://localhost:8080/api/instances/example curl -H "Authorization: Bearer your-api-key" \
"http://localhost:8080/api/v1/instances/my-model/logs?lines=50"
# Use OpenAI-compatible chat completions
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-inference-api-key" \
-d '{
"model": "my-model",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100
}'
# Stop instance # Stop instance
curl -X POST http://localhost:8080/api/instances/example/stop curl -X POST -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model/stop
# Delete instance # Delete instance
curl -X DELETE http://localhost:8080/api/instances/example curl -X DELETE -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model
``` ```
### Using the Proxy Endpoint
You can also directly proxy requests to the llama-server instance:
```bash
# Direct proxy to instance (bypasses OpenAI compatibility layer)
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"prompt": "Hello, world!",
"n_predict": 50
}'
```
## Swagger Documentation
If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:
```
http://localhost:8080/swagger/
```
This provides a complete interactive interface for testing all API endpoints.