llamactl/docs/user-guide/api-reference.md

# API Reference

Complete reference for the Llamactl REST API.

## Base URL

All API endpoints are relative to the base URL:

```
http://localhost:8080/api/v1
```

## Authentication

Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:

```bash
curl -H "Authorization: Bearer <your-api-key>" \
  http://localhost:8080/api/v1/instances
```

The server supports two types of API keys:
- **Management API Keys**: Required for instance management operations (CRUD operations on instances)
- **Inference API Keys**: Required for OpenAI-compatible inference endpoints

## System Endpoints

### Get Llamactl Version

Get the version information of the llamactl server.

```http
GET /api/v1/version
```

**Response:**
```
Version: 1.0.0
Commit: abc123
Build Time: 2024-01-15T10:00:00Z
```

### Get Llama Server Help

Get help text for the llama-server command.

```http
GET /api/v1/server/help
```

**Response:** Plain text help output from `llama-server --help`

### Get Llama Server Version

Get version information of the llama-server binary.

```http
GET /api/v1/server/version
```

**Response:** Plain text version output from `llama-server --version`

### List Available Devices

List available devices for llama-server.

```http
GET /api/v1/server/devices
```

**Response:** Plain text device list from `llama-server --list-devices`

## Instances

### List All Instances

Get a list of all instances.

```http
GET /api/v1/instances
```

**Response:**
```json
[
  {
    "name": "llama2-7b",
    "status": "running",
    "created": 1705312200
  }
]
```

### Get Instance Details

Get detailed information about a specific instance.

```http
GET /api/v1/instances/{name}
```

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}
```

### Create Instance

Create and start a new instance.

```http
POST /api/v1/instances/{name}
```

**Request Body:** JSON object with instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}
```

### Update Instance

Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.

```http
PUT /api/v1/instances/{name}
```

**Request Body:** JSON object with configuration fields to update.

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "running",
  "created": 1705312200
}
```

### Delete Instance

Stop and remove an instance.

```http
DELETE /api/v1/instances/{name}
```

**Response:** `204 No Content`

## Instance Operations

### Start Instance

Start a stopped instance.

```http
POST /api/v1/instances/{name}/start
```

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "starting",
  "created": 1705312200
}
```

**Error Responses:**
- `409 Conflict`: Maximum number of running instances reached
- `500 Internal Server Error`: Failed to start instance

### Stop Instance

Stop a running instance.

```http
POST /api/v1/instances/{name}/stop
```

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "stopping",
  "created": 1705312200
}
```

### Restart Instance

Restart an instance (stop then start).

```http
POST /api/v1/instances/{name}/restart
```

**Response:**
```json
{
  "name": "llama2-7b",
  "status": "restarting",
  "created": 1705312200
}
```

### Get Instance Logs

Retrieve instance logs.

```http
GET /api/v1/instances/{name}/logs
```

**Query Parameters:**
- `lines`: Number of lines to return (default: all lines, use -1 for all)

**Response:** Plain text log output

**Example:**
```bash
curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
```

### Proxy to Instance

Proxy HTTP requests directly to the llama-server instance.

```http
GET /api/v1/instances/{name}/proxy/*
POST /api/v1/instances/{name}/proxy/*
```

This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.

**Example - Check Instance Health:**
```bash
curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/proxy/health
```

This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.

**Error Responses:**
- `503 Service Unavailable`: Instance is not running

## OpenAI-Compatible API

Llamactl provides OpenAI-compatible endpoints for inference operations.

### List Models

List all instances in OpenAI-compatible format.

```http
GET /v1/models
```

**Response:**
```json
{
  "object": "list",
  "data": [
    {
      "id": "llama2-7b",
      "object": "model",
      "created": 1705312200,
      "owned_by": "llamactl"
    }
  ]
}
```

### Chat Completions, Completions, Embeddings

All OpenAI-compatible inference endpoints are available:

```http
POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings
POST /v1/rerank
POST /v1/reranking
```

**Request Body:** Standard OpenAI format with `model` field specifying the instance name

**Example:**
```json
{
  "model": "llama2-7b",
  "messages": [
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
}
```

The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).

**Error Responses:**
- `400 Bad Request`: Invalid request body or missing model name
- `503 Service Unavailable`: Instance is not running and on-demand start is disabled
- `409 Conflict`: Cannot start instance due to maximum instances limit

## Instance Status Values

Instances can have the following status values:
- `stopped`: Instance is not running
- `running`: Instance is running and ready to accept requests
- `failed`: Instance failed to start or crashed

## Error Responses

All endpoints may return error responses in the following format:

```json
{
  "error": "Error message description"
}
```

### Common HTTP Status Codes

- `200`: Success
- `201`: Created
- `204`: No Content (successful deletion)
- `400`: Bad Request (invalid parameters or request body)
- `401`: Unauthorized (missing or invalid API key)
- `403`: Forbidden (insufficient permissions)
- `404`: Not Found (instance not found)
- `409`: Conflict (instance already exists, max instances reached)
- `500`: Internal Server Error
- `503`: Service Unavailable (instance not running)

## Examples

### Complete Instance Lifecycle

```bash
# Create and start instance
curl -X POST http://localhost:8080/api/v1/instances/my-model \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "model": "/models/llama-2-7b.gguf"
  }'

# Check instance status
curl -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model

# Get instance logs
curl -H "Authorization: Bearer your-api-key" \
  "http://localhost:8080/api/v1/instances/my-model/logs?lines=50"

# Use OpenAI-compatible chat completions
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-inference-api-key" \
  -d '{
    "model": "my-model",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 100
  }'

# Stop instance
curl -X POST -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model/stop

# Delete instance
curl -X DELETE -H "Authorization: Bearer your-api-key" \
  http://localhost:8080/api/v1/instances/my-model
```

### Using the Proxy Endpoint

You can also directly proxy requests to the llama-server instance:

```bash
# Direct proxy to instance (bypasses OpenAI compatibility layer)
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{
    "prompt": "Hello, world!",
    "n_predict": 50
  }'
```

## Swagger Documentation

If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:

```
http://localhost:8080/swagger/
```

This provides a complete interactive interface for testing all API endpoints.