6.9 KiB
API Reference
Complete reference for the Llamactl REST API.
Base URL
All API endpoints are relative to the base URL:
http://localhost:8080/api
Authentication
If authentication is enabled, include the JWT token in the Authorization header:
curl -H "Authorization: Bearer <your-jwt-token>" \
http://localhost:8080/api/instances
Instances
List All Instances
Get a list of all instances.
GET /api/instances
Response:
{
"instances": [
{
"name": "llama2-7b",
"status": "running",
"model_path": "/models/llama-2-7b.gguf",
"port": 8081,
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
}
]
}
Get Instance Details
Get detailed information about a specific instance.
GET /api/instances/{name}
Response:
{
"name": "llama2-7b",
"status": "running",
"model_path": "/models/llama-2-7b.gguf",
"port": 8081,
"pid": 12345,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
},
"stats": {
"memory_usage": 4294967296,
"cpu_usage": 25.5,
"uptime": 3600
},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
}
Create Instance
Create a new instance.
POST /api/instances
Request Body:
{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
}
}
Response:
{
"message": "Instance created successfully",
"instance": {
"name": "my-instance",
"status": "stopped",
"model_path": "/path/to/model.gguf",
"port": 8081,
"created_at": "2024-01-15T14:30:00Z"
}
}
Update Instance
Update an existing instance configuration.
PUT /api/instances/{name}
Request Body:
{
"options": {
"threads": 8,
"context_size": 4096
}
}
Delete Instance
Delete an instance (must be stopped first).
DELETE /api/instances/{name}
Response:
{
"message": "Instance deleted successfully"
}
Instance Operations
Start Instance
Start a stopped instance.
POST /api/instances/{name}/start
Response:
{
"message": "Instance start initiated",
"status": "starting"
}
Stop Instance
Stop a running instance.
POST /api/instances/{name}/stop
Request Body (Optional):
{
"force": false,
"timeout": 30
}
Response:
{
"message": "Instance stop initiated",
"status": "stopping"
}
Restart Instance
Restart an instance (stop then start).
POST /api/instances/{name}/restart
Get Instance Health
Check instance health status.
GET /api/instances/{name}/health
Response:
{
"status": "healthy",
"checks": {
"process": "running",
"port": "open",
"response": "ok"
},
"last_check": "2024-01-15T14:30:00Z"
}
Get Instance Logs
Retrieve instance logs.
GET /api/instances/{name}/logs
Query Parameters:
lines: Number of lines to return (default: 100)follow: Stream logs (boolean)level: Filter by log level (debug, info, warn, error)
Response:
{
"logs": [
{
"timestamp": "2024-01-15T14:30:00Z",
"level": "info",
"message": "Model loaded successfully"
}
]
}
Batch Operations
Start All Instances
Start all stopped instances.
POST /api/instances/start-all
Stop All Instances
Stop all running instances.
POST /api/instances/stop-all
System Information
Get System Status
Get overall system status and metrics.
GET /api/system/status
Response:
{
"version": "1.0.0",
"uptime": 86400,
"instances": {
"total": 5,
"running": 3,
"stopped": 2
},
"resources": {
"cpu_usage": 45.2,
"memory_usage": 8589934592,
"memory_total": 17179869184,
"disk_usage": 75.5
}
}
Get System Information
Get detailed system information.
GET /api/system/info
Response:
{
"hostname": "server-01",
"os": "linux",
"arch": "amd64",
"cpu_count": 8,
"memory_total": 17179869184,
"version": "1.0.0",
"build_time": "2024-01-15T10:00:00Z"
}
Configuration
Get Configuration
Get current Llamactl configuration.
GET /api/config
Update Configuration
Update Llamactl configuration (requires restart).
PUT /api/config
Authentication
Login
Authenticate and receive a JWT token.
POST /api/auth/login
Request Body:
{
"username": "admin",
"password": "password"
}
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires_at": "2024-01-16T14:30:00Z"
}
Refresh Token
Refresh an existing JWT token.
POST /api/auth/refresh
Error Responses
All endpoints may return error responses in the following format:
{
"error": "Error message",
"code": "ERROR_CODE",
"details": "Additional error details"
}
Common HTTP Status Codes
200: Success201: Created400: Bad Request401: Unauthorized403: Forbidden404: Not Found409: Conflict (e.g., instance already exists)500: Internal Server Error
WebSocket API
Real-time Updates
Connect to WebSocket for real-time updates:
const ws = new WebSocket('ws://localhost:8080/api/ws');
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Update:', data);
};
Message Types:
instance_status_changed: Instance status updatesinstance_stats_updated: Resource usage updatessystem_alert: System-level alerts
Rate Limiting
API requests are rate limited to:
- 100 requests per minute for regular endpoints
- 10 requests per minute for resource-intensive operations
Rate limit headers are included in responses:
X-RateLimit-Limit: Request limitX-RateLimit-Remaining: Remaining requestsX-RateLimit-Reset: Reset time (Unix timestamp)
SDKs and Libraries
Go Client
import "github.com/lordmathis/llamactl-go-client"
client := llamactl.NewClient("http://localhost:8080")
instances, err := client.ListInstances()
Python Client
from llamactl import Client
client = Client("http://localhost:8080")
instances = client.list_instances()
Examples
Complete Instance Lifecycle
# Create instance
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "example",
"model_path": "/models/example.gguf",
"port": 8081
}'
# Start instance
curl -X POST http://localhost:8080/api/instances/example/start
# Check status
curl http://localhost:8080/api/instances/example
# Stop instance
curl -X POST http://localhost:8080/api/instances/example/stop
# Delete instance
curl -X DELETE http://localhost:8080/api/instances/example