Files
llamactl/docs/user-guide/api-reference.md

6.9 KiB

API Reference

Complete reference for the Llamactl REST API.

Base URL

All API endpoints are relative to the base URL:

http://localhost:8080/api

Authentication

If authentication is enabled, include the JWT token in the Authorization header:

curl -H "Authorization: Bearer <your-jwt-token>" \
  http://localhost:8080/api/instances

Instances

List All Instances

Get a list of all instances.

GET /api/instances

Response:

{
  "instances": [
    {
      "name": "llama2-7b",
      "status": "running",
      "model_path": "/models/llama-2-7b.gguf",
      "port": 8081,
      "created_at": "2024-01-15T10:30:00Z",
      "updated_at": "2024-01-15T12:45:00Z"
    }
  ]
}

Get Instance Details

Get detailed information about a specific instance.

GET /api/instances/{name}

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "model_path": "/models/llama-2-7b.gguf",
  "port": 8081,
  "pid": 12345,
  "options": {
    "threads": 4,
    "context_size": 2048,
    "gpu_layers": 0
  },
  "stats": {
    "memory_usage": 4294967296,
    "cpu_usage": 25.5,
    "uptime": 3600
  },
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T12:45:00Z"
}

Create Instance

Create a new instance.

POST /api/instances

Request Body:

{
  "name": "my-instance",
  "model_path": "/path/to/model.gguf",
  "port": 8081,
  "options": {
    "threads": 4,
    "context_size": 2048,
    "gpu_layers": 0
  }
}

Response:

{
  "message": "Instance created successfully",
  "instance": {
    "name": "my-instance",
    "status": "stopped",
    "model_path": "/path/to/model.gguf",
    "port": 8081,
    "created_at": "2024-01-15T14:30:00Z"
  }
}

Update Instance

Update an existing instance configuration.

PUT /api/instances/{name}

Request Body:

{
  "options": {
    "threads": 8,
    "context_size": 4096
  }
}

Delete Instance

Delete an instance (must be stopped first).

DELETE /api/instances/{name}

Response:

{
  "message": "Instance deleted successfully"
}

Instance Operations

Start Instance

Start a stopped instance.

POST /api/instances/{name}/start

Response:

{
  "message": "Instance start initiated",
  "status": "starting"
}

Stop Instance

Stop a running instance.

POST /api/instances/{name}/stop

Request Body (Optional):

{
  "force": false,
  "timeout": 30
}

Response:

{
  "message": "Instance stop initiated",
  "status": "stopping"
}

Restart Instance

Restart an instance (stop then start).

POST /api/instances/{name}/restart

Get Instance Health

Check instance health status.

GET /api/instances/{name}/health

Response:

{
  "status": "healthy",
  "checks": {
    "process": "running",
    "port": "open",
    "response": "ok"
  },
  "last_check": "2024-01-15T14:30:00Z"
}

Get Instance Logs

Retrieve instance logs.

GET /api/instances/{name}/logs

Query Parameters:

  • lines: Number of lines to return (default: 100)
  • follow: Stream logs (boolean)
  • level: Filter by log level (debug, info, warn, error)

Response:

{
  "logs": [
    {
      "timestamp": "2024-01-15T14:30:00Z",
      "level": "info",
      "message": "Model loaded successfully"
    }
  ]
}

Batch Operations

Start All Instances

Start all stopped instances.

POST /api/instances/start-all

Stop All Instances

Stop all running instances.

POST /api/instances/stop-all

System Information

Get System Status

Get overall system status and metrics.

GET /api/system/status

Response:

{
  "version": "1.0.0",
  "uptime": 86400,
  "instances": {
    "total": 5,
    "running": 3,
    "stopped": 2
  },
  "resources": {
    "cpu_usage": 45.2,
    "memory_usage": 8589934592,
    "memory_total": 17179869184,
    "disk_usage": 75.5
  }
}

Get System Information

Get detailed system information.

GET /api/system/info

Response:

{
  "hostname": "server-01",
  "os": "linux",
  "arch": "amd64",
  "cpu_count": 8,
  "memory_total": 17179869184,
  "version": "1.0.0",
  "build_time": "2024-01-15T10:00:00Z"
}

Configuration

Get Configuration

Get current Llamactl configuration.

GET /api/config

Update Configuration

Update Llamactl configuration (requires restart).

PUT /api/config

Authentication

Login

Authenticate and receive a JWT token.

POST /api/auth/login

Request Body:

{
  "username": "admin",
  "password": "password"
}

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2024-01-16T14:30:00Z"
}

Refresh Token

Refresh an existing JWT token.

POST /api/auth/refresh

Error Responses

All endpoints may return error responses in the following format:

{
  "error": "Error message",
  "code": "ERROR_CODE",
  "details": "Additional error details"
}

Common HTTP Status Codes

  • 200: Success
  • 201: Created
  • 400: Bad Request
  • 401: Unauthorized
  • 403: Forbidden
  • 404: Not Found
  • 409: Conflict (e.g., instance already exists)
  • 500: Internal Server Error

WebSocket API

Real-time Updates

Connect to WebSocket for real-time updates:

const ws = new WebSocket('ws://localhost:8080/api/ws');

ws.onmessage = function(event) {
  const data = JSON.parse(event.data);
  console.log('Update:', data);
};

Message Types:

  • instance_status_changed: Instance status updates
  • instance_stats_updated: Resource usage updates
  • system_alert: System-level alerts

Rate Limiting

API requests are rate limited to:

  • 100 requests per minute for regular endpoints
  • 10 requests per minute for resource-intensive operations

Rate limit headers are included in responses:

  • X-RateLimit-Limit: Request limit
  • X-RateLimit-Remaining: Remaining requests
  • X-RateLimit-Reset: Reset time (Unix timestamp)

SDKs and Libraries

Go Client

import "github.com/lordmathis/llamactl-go-client"

client := llamactl.NewClient("http://localhost:8080")
instances, err := client.ListInstances()

Python Client

from llamactl import Client

client = Client("http://localhost:8080")
instances = client.list_instances()

Examples

Complete Instance Lifecycle

# Create instance
curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "example",
    "model_path": "/models/example.gguf",
    "port": 8081
  }'

# Start instance
curl -X POST http://localhost:8080/api/instances/example/start

# Check status
curl http://localhost:8080/api/instances/example

# Stop instance
curl -X POST http://localhost:8080/api/instances/example/stop

# Delete instance
curl -X DELETE http://localhost:8080/api/instances/example