mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Imrove getting started section
This commit is contained in:
@@ -1,59 +1,144 @@
|
||||
# Configuration
|
||||
|
||||
Llamactl can be configured through various methods to suit your needs.
|
||||
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
|
||||
|
||||
## Configuration File
|
||||
```
|
||||
Defaults < Configuration file < Environment variables
|
||||
```
|
||||
|
||||
Create a configuration file at `~/.llamactl/config.yaml`:
|
||||
llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.
|
||||
|
||||
## Default Configuration
|
||||
|
||||
Here's the default configuration with all available options:
|
||||
|
||||
```yaml
|
||||
# Server configuration
|
||||
server:
|
||||
host: "0.0.0.0"
|
||||
port: 8080
|
||||
cors_enabled: true
|
||||
host: "0.0.0.0" # Server host to bind to
|
||||
port: 8080 # Server port to bind to
|
||||
allowed_origins: ["*"] # Allowed CORS origins (default: all)
|
||||
enable_swagger: false # Enable Swagger UI for API docs
|
||||
|
||||
instances:
|
||||
port_range: [8000, 9000] # Port range for instances
|
||||
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
|
||||
configs_dir: ~/.local/share/llamactl/instances # Instance configs directory
|
||||
logs_dir: ~/.local/share/llamactl/logs # Logs directory
|
||||
auto_create_dirs: true # Auto-create data/config/logs dirs if missing
|
||||
max_instances: -1 # Max instances (-1 = unlimited)
|
||||
max_running_instances: -1 # Max running instances (-1 = unlimited)
|
||||
enable_lru_eviction: true # Enable LRU eviction for idle instances
|
||||
llama_executable: llama-server # Path to llama-server executable
|
||||
default_auto_restart: true # Auto-restart new instances by default
|
||||
default_max_restarts: 3 # Max restarts for new instances
|
||||
default_restart_delay: 5 # Restart delay (seconds) for new instances
|
||||
default_on_demand_start: true # Default on-demand start setting
|
||||
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
|
||||
timeout_check_interval: 5 # Idle instance timeout check in minutes
|
||||
|
||||
# Authentication (optional)
|
||||
auth:
|
||||
enabled: false
|
||||
# When enabled, configure your authentication method
|
||||
# jwt_secret: "your-secret-key"
|
||||
|
||||
# Default instance settings
|
||||
defaults:
|
||||
backend: "llamacpp"
|
||||
timeout: 300
|
||||
log_level: "info"
|
||||
|
||||
# Paths
|
||||
paths:
|
||||
models_dir: "/path/to/your/models"
|
||||
logs_dir: "/var/log/llamactl"
|
||||
data_dir: "/var/lib/llamactl"
|
||||
|
||||
# Instance limits
|
||||
limits:
|
||||
max_instances: 10
|
||||
max_memory_per_instance: "8GB"
|
||||
require_inference_auth: true # Require auth for inference endpoints
|
||||
inference_keys: [] # Keys for inference endpoints
|
||||
require_management_auth: true # Require auth for management endpoints
|
||||
management_keys: [] # Keys for management endpoints
|
||||
```
|
||||
|
||||
## Environment Variables
|
||||
## Configuration Files
|
||||
|
||||
You can also configure Llamactl using environment variables:
|
||||
### Configuration File Locations
|
||||
|
||||
```bash
|
||||
# Server settings
|
||||
export LLAMACTL_HOST=0.0.0.0
|
||||
export LLAMACTL_PORT=8080
|
||||
Configuration files are searched in the following locations (in order of precedence):
|
||||
|
||||
# Paths
|
||||
export LLAMACTL_MODELS_DIR=/path/to/models
|
||||
export LLAMACTL_LOGS_DIR=/var/log/llamactl
|
||||
**Linux:**
|
||||
- `./llamactl.yaml` or `./config.yaml` (current directory)
|
||||
- `$HOME/.config/llamactl/config.yaml`
|
||||
- `/etc/llamactl/config.yaml`
|
||||
|
||||
# Limits
|
||||
export LLAMACTL_MAX_INSTANCES=5
|
||||
**macOS:**
|
||||
- `./llamactl.yaml` or `./config.yaml` (current directory)
|
||||
- `$HOME/Library/Application Support/llamactl/config.yaml`
|
||||
- `/Library/Application Support/llamactl/config.yaml`
|
||||
|
||||
**Windows:**
|
||||
- `./llamactl.yaml` or `./config.yaml` (current directory)
|
||||
- `%APPDATA%\llamactl\config.yaml`
|
||||
- `%USERPROFILE%\llamactl\config.yaml`
|
||||
- `%PROGRAMDATA%\llamactl\config.yaml`
|
||||
|
||||
You can specify the path to config file with `LLAMACTL_CONFIG_PATH` environment variable.
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Server Configuration
|
||||
|
||||
```yaml
|
||||
server:
|
||||
host: "0.0.0.0" # Server host to bind to (default: "0.0.0.0")
|
||||
port: 8080 # Server port to bind to (default: 8080)
|
||||
allowed_origins: ["*"] # CORS allowed origins (default: ["*"])
|
||||
enable_swagger: false # Enable Swagger UI (default: false)
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
- `LLAMACTL_HOST` - Server host
|
||||
- `LLAMACTL_PORT` - Server port
|
||||
- `LLAMACTL_ALLOWED_ORIGINS` - Comma-separated CORS origins
|
||||
- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
|
||||
|
||||
### Instance Configuration
|
||||
|
||||
```yaml
|
||||
instances:
|
||||
port_range: [8000, 9000] # Port range for instances (default: [8000, 9000])
|
||||
data_dir: "~/.local/share/llamactl" # Directory for all llamactl data (default varies by OS)
|
||||
configs_dir: "~/.local/share/llamactl/instances" # Directory for instance configs (default: data_dir/instances)
|
||||
logs_dir: "~/.local/share/llamactl/logs" # Directory for instance logs (default: data_dir/logs)
|
||||
auto_create_dirs: true # Automatically create data/config/logs directories (default: true)
|
||||
max_instances: -1 # Maximum instances (-1 = unlimited)
|
||||
max_running_instances: -1 # Maximum running instances (-1 = unlimited)
|
||||
enable_lru_eviction: true # Enable LRU eviction for idle instances
|
||||
llama_executable: "llama-server" # Path to llama-server executable
|
||||
default_auto_restart: true # Default auto-restart setting
|
||||
default_max_restarts: 3 # Default maximum restart attempts
|
||||
default_restart_delay: 5 # Default restart delay in seconds
|
||||
default_on_demand_start: true # Default on-demand start setting
|
||||
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
|
||||
timeout_check_interval: 5 # Default instance timeout check interval in minutes
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
- `LLAMACTL_INSTANCE_PORT_RANGE` - Port range (format: "8000-9000" or "8000,9000")
|
||||
- `LLAMACTL_DATA_DIRECTORY` - Data directory path
|
||||
- `LLAMACTL_INSTANCES_DIR` - Instance configs directory path
|
||||
- `LLAMACTL_LOGS_DIR` - Log directory path
|
||||
- `LLAMACTL_AUTO_CREATE_DATA_DIR` - Auto-create data/config/logs directories (true/false)
|
||||
- `LLAMACTL_MAX_INSTANCES` - Maximum number of instances
|
||||
- `LLAMACTL_MAX_RUNNING_INSTANCES` - Maximum number of running instances
|
||||
- `LLAMACTL_ENABLE_LRU_EVICTION` - Enable LRU eviction for idle instances
|
||||
- `LLAMACTL_LLAMA_EXECUTABLE` - Path to llama-server executable
|
||||
- `LLAMACTL_DEFAULT_AUTO_RESTART` - Default auto-restart setting (true/false)
|
||||
- `LLAMACTL_DEFAULT_MAX_RESTARTS` - Default maximum restarts
|
||||
- `LLAMACTL_DEFAULT_RESTART_DELAY` - Default restart delay in seconds
|
||||
- `LLAMACTL_DEFAULT_ON_DEMAND_START` - Default on-demand start setting (true/false)
|
||||
- `LLAMACTL_ON_DEMAND_START_TIMEOUT` - Default on-demand start timeout in seconds
|
||||
- `LLAMACTL_TIMEOUT_CHECK_INTERVAL` - Default instance timeout check interval in minutes
|
||||
|
||||
### Authentication Configuration
|
||||
|
||||
```yaml
|
||||
auth:
|
||||
require_inference_auth: true # Require API key for OpenAI endpoints (default: true)
|
||||
inference_keys: [] # List of valid inference API keys
|
||||
require_management_auth: true # Require API key for management endpoints (default: true)
|
||||
management_keys: [] # List of valid management API keys
|
||||
```
|
||||
|
||||
**Environment Variables:**
|
||||
- `LLAMACTL_REQUIRE_INFERENCE_AUTH` - Require auth for OpenAI endpoints (true/false)
|
||||
- `LLAMACTL_INFERENCE_KEYS` - Comma-separated inference API keys
|
||||
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
|
||||
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
|
||||
|
||||
## Command Line Options
|
||||
|
||||
View all available command line options:
|
||||
@@ -62,90 +147,13 @@ View all available command line options:
|
||||
llamactl --help
|
||||
```
|
||||
|
||||
Common options:
|
||||
|
||||
```bash
|
||||
# Specify config file
|
||||
llamactl --config /path/to/config.yaml
|
||||
|
||||
# Set log level
|
||||
llamactl --log-level debug
|
||||
|
||||
# Run on different port
|
||||
llamactl --port 9090
|
||||
```
|
||||
|
||||
## Instance Configuration
|
||||
|
||||
When creating instances, you can specify various options:
|
||||
|
||||
### Basic Options
|
||||
|
||||
- `name`: Unique identifier for the instance
|
||||
- `model_path`: Path to the GGUF model file
|
||||
- `port`: Port for the instance to listen on
|
||||
|
||||
### Advanced Options
|
||||
|
||||
- `threads`: Number of CPU threads to use
|
||||
- `context_size`: Context window size
|
||||
- `batch_size`: Batch size for processing
|
||||
- `gpu_layers`: Number of layers to offload to GPU
|
||||
- `memory_lock`: Lock model in memory
|
||||
- `no_mmap`: Disable memory mapping
|
||||
|
||||
### Example Instance Configuration
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "production-model",
|
||||
"model_path": "/models/llama-2-13b-chat.gguf",
|
||||
"port": 8081,
|
||||
"options": {
|
||||
"threads": 8,
|
||||
"context_size": 4096,
|
||||
"batch_size": 512,
|
||||
"gpu_layers": 35,
|
||||
"memory_lock": true
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Security Configuration
|
||||
|
||||
### Enable Authentication
|
||||
|
||||
To enable authentication, update your config file:
|
||||
|
||||
```yaml
|
||||
auth:
|
||||
enabled: true
|
||||
jwt_secret: "your-very-secure-secret-key"
|
||||
token_expiry: "24h"
|
||||
```
|
||||
|
||||
### HTTPS Configuration
|
||||
|
||||
For production deployments, configure HTTPS:
|
||||
|
||||
```yaml
|
||||
server:
|
||||
tls:
|
||||
enabled: true
|
||||
cert_file: "/path/to/cert.pem"
|
||||
key_file: "/path/to/key.pem"
|
||||
```
|
||||
|
||||
## Logging Configuration
|
||||
|
||||
Configure logging levels and outputs:
|
||||
|
||||
```yaml
|
||||
logging:
|
||||
level: "info" # debug, info, warn, error
|
||||
format: "json" # json or text
|
||||
output: "/var/log/llamactl/app.log"
|
||||
```
|
||||
You can also override configuration using command line flags when starting llamactl.
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Learn about [Managing Instances](../user-guide/managing-instances.md)
|
||||
- Explore [Advanced Configuration](../advanced/monitoring.md)
|
||||
- Set up [Monitoring](../advanced/monitoring.md)
|
||||
|
||||
## Next Steps
|
||||
|
||||
|
||||
@@ -4,9 +4,19 @@ This guide will walk you through installing Llamactl on your system.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before installing Llamactl, ensure you have:
|
||||
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
|
||||
|
||||
- Go 1.19 or later
|
||||
```bash
|
||||
# Quick install methods:
|
||||
# Homebrew (macOS)
|
||||
brew install llama.cpp
|
||||
|
||||
# Or build from source - see llama.cpp docs
|
||||
```
|
||||
|
||||
Additional requirements for building from source:
|
||||
- Go 1.24 or later
|
||||
- Node.js 22 or later
|
||||
- Git
|
||||
- Sufficient disk space for your models
|
||||
|
||||
@@ -14,17 +24,18 @@ Before installing Llamactl, ensure you have:
|
||||
|
||||
### Option 1: Download Binary (Recommended)
|
||||
|
||||
Download the latest release from our [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
|
||||
Download the latest release from the [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
|
||||
|
||||
```bash
|
||||
# Download for Linux
|
||||
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-linux-amd64 -o llamactl
|
||||
|
||||
# Make executable
|
||||
chmod +x llamactl
|
||||
|
||||
# Move to PATH (optional)
|
||||
# Linux/macOS - Get latest version and download
|
||||
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
||||
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
|
||||
sudo mv llamactl /usr/local/bin/
|
||||
|
||||
# Or download manually from:
|
||||
# https://github.com/lordmathis/llamactl/releases/latest
|
||||
|
||||
# Windows - Download from releases page
|
||||
```
|
||||
|
||||
### Option 2: Build from Source
|
||||
@@ -36,11 +47,12 @@ If you prefer to build from source:
|
||||
git clone https://github.com/lordmathis/llamactl.git
|
||||
cd llamactl
|
||||
|
||||
# Build the application
|
||||
go build -o llamactl cmd/server/main.go
|
||||
```
|
||||
# Build the web UI
|
||||
cd webui && npm ci && npm run build && cd ..
|
||||
|
||||
For detailed build instructions, see the [Building from Source](../development/building.md) guide.
|
||||
# Build the application
|
||||
go build -o llamactl ./cmd/server
|
||||
```
|
||||
|
||||
## Verification
|
||||
|
||||
|
||||
@@ -28,7 +28,6 @@ You should see the Llamactl web interface.
|
||||
2. Fill in the instance configuration:
|
||||
- **Name**: Give your instance a descriptive name
|
||||
- **Model Path**: Path to your Llama.cpp model file
|
||||
- **Port**: Port for the instance to run on
|
||||
- **Additional Options**: Any extra Llama.cpp parameters
|
||||
|
||||
3. Click "Create Instance"
|
||||
@@ -50,7 +49,6 @@ Here's a basic example configuration for a Llama 2 model:
|
||||
{
|
||||
"name": "llama2-7b",
|
||||
"model_path": "/path/to/llama-2-7b-chat.gguf",
|
||||
"port": 8081,
|
||||
"options": {
|
||||
"threads": 4,
|
||||
"context_size": 2048
|
||||
@@ -72,13 +70,70 @@ curl -X POST http://localhost:8080/api/instances \
|
||||
-d '{
|
||||
"name": "my-model",
|
||||
"model_path": "/path/to/model.gguf",
|
||||
"port": 8081
|
||||
}'
|
||||
|
||||
# Start an instance
|
||||
curl -X POST http://localhost:8080/api/instances/my-model/start
|
||||
```
|
||||
|
||||
## OpenAI Compatible API
|
||||
|
||||
Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.
|
||||
|
||||
### Chat Completions
|
||||
|
||||
Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "my-model",
|
||||
"messages": [
|
||||
{
|
||||
"role": "user",
|
||||
"content": "Hello! Can you help me write a Python function?"
|
||||
}
|
||||
],
|
||||
"max_tokens": 150,
|
||||
"temperature": 0.7
|
||||
}'
|
||||
```
|
||||
|
||||
### Using with Python OpenAI Client
|
||||
|
||||
You can also use the official OpenAI Python client:
|
||||
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
# Point the client to your Llamactl server
|
||||
client = OpenAI(
|
||||
base_url="http://localhost:8080/v1",
|
||||
api_key="not-needed" # Llamactl doesn't require API keys by default
|
||||
)
|
||||
|
||||
# Create a chat completion
|
||||
response = client.chat.completions.create(
|
||||
model="my-model", # Use the name of your instance
|
||||
messages=[
|
||||
{"role": "user", "content": "Explain quantum computing in simple terms"}
|
||||
],
|
||||
max_tokens=200,
|
||||
temperature=0.7
|
||||
)
|
||||
|
||||
print(response.choices[0].message.content)
|
||||
```
|
||||
|
||||
### List Available Models
|
||||
|
||||
Get a list of running instances (models) in OpenAI-compatible format:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/v1/models
|
||||
```
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Learn more about the [Web UI](../user-guide/web-ui.md)
|
||||
|
||||
Reference in New Issue
Block a user