Mathis/llamactl

Fork 0

mirror of https://github.com/lordmathis/llamactl.git synced 2025-11-06 09:04:27 +00:00

Files

LordMathis d092518114 Update documentation

2025-09-28 15:10:35 +02:00

11 KiB

Raw Blame History

API Reference

Complete reference for the Llamactl REST API.

Base URL

All API endpoints are relative to the base URL:

http://localhost:8080/api/v1

Authentication

Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:

curl -H "Authorization: Bearer <your-api-key>" \
  http://localhost:8080/api/v1/instances

The server supports two types of API keys:

Management API Keys: Required for instance management operations (CRUD operations on instances)
Inference API Keys: Required for OpenAI-compatible inference endpoints

System Endpoints

Get Llamactl Version

Get the version information of the llamactl server.

GET /api/v1/version

Response:

Version: 1.0.0
Commit: abc123
Build Time: 2024-01-15T10:00:00Z

Get Llama Server Help

Get help text for the llama-server command.

GET /api/v1/server/help

Response: Plain text help output from llama-server --help

Get Llama Server Version

Get version information of the llama-server binary.

GET /api/v1/server/version

Response: Plain text version output from llama-server --version

List Available Devices

List available devices for llama-server.

GET /api/v1/server/devices

Response: Plain text device list from llama-server --list-devices

Instances

List All Instances

Get a list of all instances.

GET /api/v1/instances

Response:

[
  {
    "name": "llama2-7b",
    "status": "running",
    "created": 1705312200
  }
]

Get Instance Details

Get detailed information about a specific instance.

GET /api/v1/instances/{name}

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}

Create Instance

Create and start a new instance.

POST /api/v1/instances/{name}

Request Body: JSON object with instance configuration. Common fields include:

backend_type: Backend type (llama_cpp, mlx_lm, or vllm)
backend_options: Backend-specific configuration
auto_restart: Enable automatic restart on failure
max_restarts: Maximum restart attempts
restart_delay: Delay between restarts in seconds
on_demand_start: Start instance when receiving requests
idle_timeout: Idle timeout in minutes
environment: Environment variables as key-value pairs

See Managing Instances for complete configuration options.

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}

Update Instance

Update an existing instance configuration. See Managing Instances for available configuration options.

PUT /api/v1/instances/{name}

Request Body: JSON object with configuration fields to update.

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}

Delete Instance

Stop and remove an instance.

DELETE /api/v1/instances/{name}

Response: 204 No Content

Instance Operations

Start Instance

Start a stopped instance.

POST /api/v1/instances/{name}/start

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}

Error Responses:

409 Conflict: Maximum number of running instances reached
500 Internal Server Error: Failed to start instance

Stop Instance

Stop a running instance.

POST /api/v1/instances/{name}/stop

Response:

{
  "name": "llama2-7b",
  "status": "stopped",
  "created": 1705312200
}

Restart Instance

Restart an instance (stop then start).

POST /api/v1/instances/{name}/restart

Response:

{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}

Get Instance Logs

Retrieve instance logs.

GET /api/v1/instances/{name}/logs

Query Parameters:

lines: Number of lines to return (default: all lines, use -1 for all)

Response: Plain text log output

Example:

curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"

Proxy to Instance

Proxy HTTP requests directly to the llama-server instance.

GET /api/v1/instances/{name}/proxy/*
POST /api/v1/instances/{name}/proxy/*

This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the /api/v1/instances/{name}/proxy prefix and forwards the remaining path to the instance.

Example - Check Instance Health:

curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/proxy/health

This forwards the request to http://instance-host:instance-port/health on the actual llama-server instance.

Error Responses:

503 Service Unavailable: Instance is not running

OpenAI-Compatible API

Llamactl provides OpenAI-compatible endpoints for inference operations.

List Models

List all instances in OpenAI-compatible format.

GET /v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "llama2-7b",
      "object": "model",
      "created": 1705312200,
      "owned_by": "llamactl"
    }
  ]
}

Chat Completions, Completions, Embeddings

All OpenAI-compatible inference endpoints are available:

POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings
POST /v1/rerank
POST /v1/reranking

Request Body: Standard OpenAI format with model field specifying the instance name

Example:

{
  "model": "llama2-7b",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}

The server routes requests to the appropriate instance based on the model field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see Managing Instances.

Error Responses:

400 Bad Request: Invalid request body or missing instance name
503 Service Unavailable: Instance is not running and on-demand start is disabled
409 Conflict: Cannot start instance due to maximum instances limit

Instance Status Values

Instances can have the following status values:

stopped: Instance is not running
running: Instance is running and ready to accept requests
failed: Instance failed to start or crashed

Error Responses

All endpoints may return error responses in the following format:

{
  "error": "Error message description"
}

Common HTTP Status Codes

200: Success
201: Created
204: No Content (successful deletion)
400: Bad Request (invalid parameters or request body)
401: Unauthorized (missing or invalid API key)
403: Forbidden (insufficient permissions)
404: Not Found (instance not found)
409: Conflict (instance already exists, max instances reached)
500: Internal Server Error
503: Service Unavailable (instance not running)

Examples

Complete Instance Lifecycle

# Create and start instance
curl -X POST http://localhost:8080/api/v1/instances/my-model \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "backend_type": "llama_cpp",
    "backend_options": {
      "model": "/models/llama-2-7b.gguf",
      "gpu_layers": 32
    },
    "environment": {
      "CUDA_VISIBLE_DEVICES": "0",
      "OMP_NUM_THREADS": "8"
    }
  }'

# Check instance status
curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model

# Get instance logs
curl -H "Authorization: Bearer your-api-key" \
  "http://localhost:8080/api/v1/instances/my-model/logs?lines=50"

# Use OpenAI-compatible chat completions
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-inference-api-key" \
  -d '{
    "model": "my-model",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

# Stop instance
curl -X POST -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/stop

# Delete instance
curl -X DELETE -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model

Using the Proxy Endpoint

You can also directly proxy requests to the llama-server instance:

# Direct proxy to instance (bypasses OpenAI compatibility layer)
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "prompt": "Hello, world!",
    "n_predict": 50
  }'

Backend-Specific Endpoints

Parse Commands

Llamactl provides endpoints to parse command strings from different backends into instance configuration options.

Parse Llama.cpp Command

Parse a llama-server command string into instance options.

POST /api/v1/backends/llama-cpp/parse-command

Request Body:

{
  "command": "llama-server -m /path/to/model.gguf -c 2048 --port 8080"
}

Response:

{
  "backend_type": "llama_cpp",
  "llama_server_options": {
    "model": "/path/to/model.gguf",
    "ctx_size": 2048,
    "port": 8080
  }
}

Parse MLX-LM Command

Parse an MLX-LM server command string into instance options.

POST /api/v1/backends/mlx/parse-command

Request Body:

{
  "command": "mlx_lm.server --model /path/to/model --port 8080"
}

Response:

{
  "backend_type": "mlx_lm",
  "mlx_server_options": {
    "model": "/path/to/model",
    "port": 8080
  }
}

Parse vLLM Command

Parse a vLLM serve command string into instance options.

POST /api/v1/backends/vllm/parse-command

Request Body:

{
  "command": "vllm serve /path/to/model --port 8080"
}

Response:

{
  "backend_type": "vllm",
  "vllm_server_options": {
    "model": "/path/to/model",
    "port": 8080
  }
}

Error Responses for Parse Commands:

400 Bad Request: Invalid request body, empty command, or parse error
500 Internal Server Error: Encoding error

Auto-Generated Documentation

The API documentation is automatically generated from code annotations using Swagger/OpenAPI. To regenerate the documentation:

Install the swag tool: go install github.com/swaggo/swag/cmd/swag@latest
Generate docs: swag init -g cmd/server/main.go -o apidocs

Swagger Documentation

If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:

http://localhost:8080/swagger/

This provides a complete interactive interface for testing all API endpoints.

11 KiB Raw Blame History

API Reference

Base URL

Authentication

System Endpoints

Get Llamactl Version

Get Llama Server Help

Get Llama Server Version

List Available Devices

Instances

List All Instances

Get Instance Details

Create Instance

Update Instance

Delete Instance

Instance Operations

Start Instance

Stop Instance

Restart Instance

Get Instance Logs

Proxy to Instance

OpenAI-Compatible API

List Models

Chat Completions, Completions, Embeddings

Instance Status Values

Error Responses

Common HTTP Status Codes

Examples

Complete Instance Lifecycle

Using the Proxy Endpoint

Backend-Specific Endpoints

Parse Commands

Parse Llama.cpp Command

Parse MLX-LM Command

Parse vLLM Command

Auto-Generated Documentation

Swagger Documentation

11 KiB

Raw Blame History