9.2 KiB
llamactl
A control server for managing multiple Llama Server instances with a web-based dashboard.
Features
- Multi-instance Management: Create, start, stop, restart, and delete multiple llama-server instances
- Web Dashboard: Modern React-based UI for managing instances
- Auto-restart: Configurable automatic restart on instance failure
- Instance Monitoring: Real-time health checks and status monitoring
- Log Management: View, search, and download instance logs
- Data Persistence: Persistent storage of instance state.
- REST API: Full API for programmatic control
- OpenAI Compatible: Route requests to instances by instance name
- Configuration Management: Comprehensive llama-server parameter support
- System Information: View llama-server version, devices, and help
Prerequisites
This project requires llama-server from llama.cpp to be installed and available in your PATH.
Install llama.cpp: Follow the installation instructions at https://github.com/ggml-org/llama.cpp
Installation
Download Prebuilt Binaries
The easiest way to install llamactl is to download a prebuilt binary from the releases page.
Linux/macOS:
# Download the latest release for your platform
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep tag_name | cut -d '"' -f 4)-linux-amd64.tar.gz | tar -xz
# Move to PATH
sudo mv llamactl /usr/local/bin/
# Run the server
llamactl
Manual Download:
- Go to the releases page
- Download the appropriate archive for your platform
- Extract the archive and move the binary to a directory in your PATH
Build from Source
If you prefer to build from source or need the latest development version:
Build Requirements
- Go 1.24 or later
- Node.js 22 or later (for building the web UI)
Building with Web UI
# Clone the repository
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
# Install Node.js dependencies
cd webui
npm ci
# Build the web UI
npm run build
# Return to project root and build
cd ..
go build -o llamactl ./cmd/server
# Run the server
./llamactl
Configuration
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
- Hardcoded defaults
- Configuration file
- Environment variables
Configuration Files
Configuration File Locations
Configuration files are searched in the following locations (in order of precedence):
Linux/macOS:
./llamactl.yamlor./config.yaml(current directory)$HOME/.config/llamactl/config.yaml/etc/llamactl/config.yaml
Windows:
./llamactl.yamlor./config.yaml(current directory)%APPDATA%\llamactl\config.yaml%USERPROFILE%\llamactl\config.yaml%PROGRAMDATA%\llamactl\config.yaml
You can specify the path to config file with LLAMACTL_CONFIG_PATH environment variable.
API Key Authentication
llamactl now supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. The are separate keys for management and inference APIs. Management keys grant full access; inference keys grant access to OpenAI-compatible endpoints
How to Use:
- Pass your API key in requests using one of:
Authorization: Bearer <key>headerX-API-Key: <key>headerapi_key=<key>query parameter
Auto-generated keys: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables.
Configuration Options
Server Configuration
server:
host: "0.0.0.0" # Server host to bind to (default: "0.0.0.0")
port: 8080 # Server port to bind to (default: 8080)
allowed_origins: ["*"] # CORS allowed origins (default: ["*"])
enable_swagger: false # Enable Swagger UI (default: false)
Environment Variables:
LLAMACTL_HOST- Server hostLLAMACTL_PORT- Server portLLAMACTL_ALLOWED_ORIGINS- Comma-separated CORS originsLLAMACTL_ENABLE_SWAGGER- Enable Swagger UI (true/false)
Instance Configuration
instances:
port_range: [8000, 9000] # Port range for instances (default: [8000, 9000])
data_dir: "~/.local/share/llamactl" # Directory for all llamactl data (default varies by OS)
configs_dir: "~/.local/share/llamactl/instances" # Directory for instance configs (default: data_dir/instances)
logs_dir: "~/.local/share/llamactl/logs" # Directory for instance logs (default: data_dir/logs)
auto_create_dirs: true # Automatically create data/config/logs directories (default: true)
max_instances: -1 # Maximum instances (-1 = unlimited)
llama_executable: "llama-server" # Path to llama-server executable
default_auto_restart: true # Default auto-restart setting
default_max_restarts: 3 # Default maximum restart attempts
default_restart_delay: 5 # Default restart delay in seconds
Environment Variables:
LLAMACTL_INSTANCE_PORT_RANGE- Port range (format: "8000-9000" or "8000,9000")LLAMACTL_DATA_DIRECTORY- Data directory pathLLAMACTL_INSTANCES_DIR- Instance configs directory pathLLAMACTL_LOGS_DIR- Log directory pathLLAMACTL_AUTO_CREATE_DATA_DIR- Auto-create data/config/logs directories (true/false)LLAMACTL_MAX_INSTANCES- Maximum number of instancesLLAMACTL_LLAMA_EXECUTABLE- Path to llama-server executableLLAMACTL_DEFAULT_AUTO_RESTART- Default auto-restart setting (true/false)LLAMACTL_DEFAULT_MAX_RESTARTS- Default maximum restartsLLAMACTL_DEFAULT_RESTART_DELAY- Default restart delay in seconds
Auth Configuration
auth:
require_inference_auth: true # Require API key for OpenAI endpoints (default: true)
inference_keys: [] # List of valid inference API keys
require_management_auth: true # Require API key for management endpoints (default: true)
management_keys: [] # List of valid management API keys
Environment Variables:
LLAMACTL_REQUIRE_INFERENCE_AUTH- Require auth for OpenAI endpoints (true/false)LLAMACTL_INFERENCE_KEYS- Comma-separated inference API keysLLAMACTL_REQUIRE_MANAGEMENT_AUTH- Require auth for management endpoints (true/false)LLAMACTL_MANAGEMENT_KEYS- Comma-separated management API keys
Example Configuration
server:
host: "0.0.0.0"
port: 8080
instances:
port_range: [8001, 8100]
data_dir: "/var/lib/llamactl"
configs_dir: "/var/lib/llamactl/instances"
logs_dir: "/var/log/llamactl"
auto_create_dirs: true
max_instances: 10
llama_executable: "/usr/local/bin/llama-server"
default_auto_restart: true
default_max_restarts: 5
default_restart_delay: 10
auth:
require_inference_auth: true
inference_keys: ["sk-inference-abc123"]
require_management_auth: true
management_keys: ["sk-management-xyz456"]
Usage
Starting the Server
# Start with default configuration
./llamactl
# Start with custom config file
LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl
# Start with environment variables
LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl
Web Dashboard
Open your browser and navigate to http://localhost:8080 to access the web dashboard.
API Usage
The REST API is available at http://localhost:8080/api/v1. See the Swagger documentation at http://localhost:8080/swagger/ for complete API reference.
Create an Instance
curl -X POST http://localhost:8080/api/v1/instances/my-instance \
-H "Content-Type: application/json" \
-d '{
"model": "/path/to/model.gguf",
"gpu_layers": 32,
"auto_restart": true
}'
List Instances
curl http://localhost:8080/api/v1/instances
Start/Stop Instance
# Start
curl -X POST http://localhost:8080/api/v1/instances/my-instance/start
# Stop
curl -X POST http://localhost:8080/api/v1/instances/my-instance/stop
OpenAI Compatible Endpoints
Route requests to instances by including the instance name as the model parameter:
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-instance",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Development
Running Tests
# Go tests
go test ./...
# Web UI tests
cd webui
npm test
Development Server
# Start Go server in development mode
go run ./cmd/server
# Start web UI development server (in another terminal)
cd webui
npm run dev
API Documentation
Interactive API documentation is available at http://localhost:8080/swagger/ when the server is running.
License
This project is licensed under the MIT License. See the LICENSE file for details.