2025-08-02 22:52:50 +02:00
2025-07-30 21:34:46 +02:00
2025-08-02 22:52:50 +02:00
2025-07-31 20:34:33 +02:00
2025-07-26 12:25:51 +02:00
2025-07-27 19:05:15 +02:00
2025-07-27 19:05:15 +02:00
2025-07-16 20:02:28 +02:00

llamactl

Build and Release Go Tests WebUI Tests

A control server for managing multiple Llama Server instances with a web-based dashboard.

Features

  • Multi-instance Management: Create, start, stop, restart, and delete multiple llama-server instances
  • Web Dashboard: Modern React-based UI for managing instances
  • Auto-restart: Configurable automatic restart on instance failure
  • Instance Monitoring: Real-time health checks and status monitoring
  • Log Management: View, search, and download instance logs
  • Data Persistence: Persistent storage of instance state.
  • REST API: Full API for programmatic control
  • OpenAI Compatible: Route requests to instances by instance name
  • Configuration Management: Comprehensive llama-server parameter support
  • System Information: View llama-server version, devices, and help

Prerequisites

This project requires llama-server from llama.cpp to be installed and available in your PATH.

Install llama.cpp: Follow the installation instructions at https://github.com/ggml-org/llama.cpp

Installation

Download Prebuilt Binaries

The easiest way to install llamactl is to download a prebuilt binary from the releases page.

Linux/macOS:

# Download the latest release for your platform
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep tag_name | cut -d '"' -f 4)-linux-amd64.tar.gz | tar -xz

# Move to PATH
sudo mv llamactl /usr/local/bin/

# Run the server
llamactl

Manual Download:

  1. Go to the releases page
  2. Download the appropriate archive for your platform
  3. Extract the archive and move the binary to a directory in your PATH

Build from Source

If you prefer to build from source or need the latest development version:

Build Requirements

  • Go 1.24 or later
  • Node.js 22 or later (for building the web UI)

Building with Web UI

# Clone the repository
git clone https://github.com/lordmathis/llamactl.git
cd llamactl

# Install Node.js dependencies
cd webui
npm ci

# Build the web UI
npm run build

# Return to project root and build
cd ..
go build -o llamactl ./cmd/server

# Run the server
./llamactl

Configuration

llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:

  1. Hardcoded defaults
  2. Configuration file
  3. Environment variables

Configuration Files

Configuration File Locations

Configuration files are searched in the following locations (in order of precedence):

Linux/macOS:

  • ./llamactl.yaml or ./config.yaml (current directory)
  • $HOME/.config/llamactl/config.yaml
  • /etc/llamactl/config.yaml

Windows:

  • ./llamactl.yaml or ./config.yaml (current directory)
  • %APPDATA%\llamactl\config.yaml
  • %USERPROFILE%\llamactl\config.yaml
  • %PROGRAMDATA%\llamactl\config.yaml

You can specify the path to config file with LLAMACTL_CONFIG_PATH environment variable.

API Key Authentication

llamactl now supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. The are separate keys for management and inference APIs. Management keys grant full access; inference keys grant access to OpenAI-compatible endpoints

How to Use:

  • Pass your API key in requests using one of:
    • Authorization: Bearer <key> header
    • X-API-Key: <key> header
    • api_key=<key> query parameter

Auto-generated keys: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables.

Configuration Options

Server Configuration

server:
  host: "0.0.0.0"         # Server host to bind to (default: "0.0.0.0")
  port: 8080              # Server port to bind to (default: 8080)
  allowed_origins: ["*"]  # CORS allowed origins (default: ["*"])
  enable_swagger: false   # Enable Swagger UI (default: false)

Environment Variables:

  • LLAMACTL_HOST - Server host
  • LLAMACTL_PORT - Server port
  • LLAMACTL_ALLOWED_ORIGINS - Comma-separated CORS origins
  • LLAMACTL_ENABLE_SWAGGER - Enable Swagger UI (true/false)

Instance Configuration

instances:
  port_range: [8000, 9000]           # Port range for instances (default: [8000, 9000])
  data_dir: "~/.local/share/llamactl" # Directory for all llamactl data (default varies by OS)
  configs_dir: "~/.local/share/llamactl/instances" # Directory for instance configs (default: data_dir/instances)
  logs_dir: "~/.local/share/llamactl/logs" # Directory for instance logs (default: data_dir/logs)
  auto_create_dirs: true             # Automatically create data/config/logs directories (default: true)
  max_instances: -1                  # Maximum instances (-1 = unlimited)
  llama_executable: "llama-server"   # Path to llama-server executable
  default_auto_restart: true         # Default auto-restart setting
  default_max_restarts: 3            # Default maximum restart attempts
  default_restart_delay: 5           # Default restart delay in seconds

Environment Variables:

  • LLAMACTL_INSTANCE_PORT_RANGE - Port range (format: "8000-9000" or "8000,9000")
  • LLAMACTL_DATA_DIRECTORY - Data directory path
  • LLAMACTL_INSTANCES_DIR - Instance configs directory path
  • LLAMACTL_LOGS_DIR - Log directory path
  • LLAMACTL_AUTO_CREATE_DATA_DIR - Auto-create data/config/logs directories (true/false)
  • LLAMACTL_MAX_INSTANCES - Maximum number of instances
  • LLAMACTL_LLAMA_EXECUTABLE - Path to llama-server executable
  • LLAMACTL_DEFAULT_AUTO_RESTART - Default auto-restart setting (true/false)
  • LLAMACTL_DEFAULT_MAX_RESTARTS - Default maximum restarts
  • LLAMACTL_DEFAULT_RESTART_DELAY - Default restart delay in seconds

Auth Configuration

auth:
  require_inference_auth: true           # Require API key for OpenAI endpoints (default: true)
  inference_keys: []                     # List of valid inference API keys
  require_management_auth: true          # Require API key for management endpoints (default: true)
  management_keys: []                    # List of valid management API keys

Environment Variables:

  • LLAMACTL_REQUIRE_INFERENCE_AUTH - Require auth for OpenAI endpoints (true/false)
  • LLAMACTL_INFERENCE_KEYS - Comma-separated inference API keys
  • LLAMACTL_REQUIRE_MANAGEMENT_AUTH - Require auth for management endpoints (true/false)
  • LLAMACTL_MANAGEMENT_KEYS - Comma-separated management API keys

Example Configuration

server:
  host: "0.0.0.0"
  port: 8080

instances:
  port_range: [8001, 8100]
  data_dir: "/var/lib/llamactl"
  configs_dir: "/var/lib/llamactl/instances"
  logs_dir: "/var/log/llamactl"
  auto_create_dirs: true
  max_instances: 10
  llama_executable: "/usr/local/bin/llama-server"
  default_auto_restart: true
  default_max_restarts: 5
  default_restart_delay: 10

auth:
  require_inference_auth: true
  inference_keys: ["sk-inference-abc123"]
  require_management_auth: true
  management_keys: ["sk-management-xyz456"]

Usage

Starting the Server

# Start with default configuration
./llamactl

# Start with custom config file
LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl

# Start with environment variables
LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl

Web Dashboard

Open your browser and navigate to http://localhost:8080 to access the web dashboard.

API Usage

The REST API is available at http://localhost:8080/api/v1. See the Swagger documentation at http://localhost:8080/swagger/ for complete API reference.

Create an Instance

curl -X POST http://localhost:8080/api/v1/instances/my-instance \
  -H "Content-Type: application/json" \
  -d '{
    "model": "/path/to/model.gguf",
    "gpu_layers": 32,
    "auto_restart": true
  }'

List Instances

curl http://localhost:8080/api/v1/instances

Start/Stop Instance

# Start
curl -X POST http://localhost:8080/api/v1/instances/my-instance/start

# Stop
curl -X POST http://localhost:8080/api/v1/instances/my-instance/stop

OpenAI Compatible Endpoints

Route requests to instances by including the instance name as the model parameter:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-instance",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Development

Running Tests

# Go tests
go test ./...

# Web UI tests
cd webui
npm test

Development Server

# Start Go server in development mode
go run ./cmd/server

# Start web UI development server (in another terminal)
cd webui
npm run dev

API Documentation

Interactive API documentation is available at http://localhost:8080/swagger/ when the server is running.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Description
Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.
Readme MIT 2.9 MiB
Languages
Go 59%
TypeScript 39.5%
CSS 0.9%
JavaScript 0.5%