Imrove getting started section

2025-11-06 00:54:23 +00:00 · 2025-08-31 15:41:29 +02:00
parent 0b264c8015
commit b51974bbf7
8 changed files with 223 additions and 1090 deletions
--- a/docs/getting-started/configuration.md
+++ b/docs/getting-started/configuration.md
@@ -1,59 +1,144 @@
 # Configuration

-Llamactl can be configured through various methods to suit your needs.
+llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:

-## Configuration File
+```
+Defaults < Configuration file < Environment variables
+```

-Create a configuration file at `~/.llamactl/config.yaml`:
+llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.
+
+## Default Configuration
+
+Here's the default configuration with all available options:

 ```yaml
-# Server configuration
 server:
-  host: "0.0.0.0"
-  port: 8080
-  cors_enabled: true
+  host: "0.0.0.0"                # Server host to bind to
+  port: 8080                     # Server port to bind to
+  allowed_origins: ["*"]         # Allowed CORS origins (default: all)
+  enable_swagger: false          # Enable Swagger UI for API docs
+
+instances:
+  port_range: [8000, 9000]       # Port range for instances
+  data_dir: ~/.local/share/llamactl         # Data directory (platform-specific, see below)
+  configs_dir: ~/.local/share/llamactl/instances  # Instance configs directory
+  logs_dir: ~/.local/share/llamactl/logs    # Logs directory
+  auto_create_dirs: true         # Auto-create data/config/logs dirs if missing
+  max_instances: -1              # Max instances (-1 = unlimited)
+  max_running_instances: -1      # Max running instances (-1 = unlimited)
+  enable_lru_eviction: true      # Enable LRU eviction for idle instances
+  llama_executable: llama-server # Path to llama-server executable
+  default_auto_restart: true     # Auto-restart new instances by default
+  default_max_restarts: 3        # Max restarts for new instances
+  default_restart_delay: 5       # Restart delay (seconds) for new instances
+  default_on_demand_start: true  # Default on-demand start setting
+  on_demand_start_timeout: 120   # Default on-demand start timeout in seconds
+  timeout_check_interval: 5      # Idle instance timeout check in minutes

-# Authentication (optional)
 auth:
-  enabled: false
-  # When enabled, configure your authentication method
-  # jwt_secret: "your-secret-key"
-
-# Default instance settings
-defaults:
-  backend: "llamacpp"
-  timeout: 300
-  log_level: "info"
-
-# Paths
-paths:
-  models_dir: "/path/to/your/models"
-  logs_dir: "/var/log/llamactl"
-  data_dir: "/var/lib/llamactl"
-
-# Instance limits
-limits:
-  max_instances: 10
-  max_memory_per_instance: "8GB"
+  require_inference_auth: true   # Require auth for inference endpoints
+  inference_keys: []             # Keys for inference endpoints
+  require_management_auth: true  # Require auth for management endpoints
+  management_keys: []            # Keys for management endpoints
 ```

-## Environment Variables
+## Configuration Files

-You can also configure Llamactl using environment variables:
+### Configuration File Locations

-```bash
-# Server settings
-export LLAMACTL_HOST=0.0.0.0
-export LLAMACTL_PORT=8080
+Configuration files are searched in the following locations (in order of precedence):

-# Paths
-export LLAMACTL_MODELS_DIR=/path/to/models
-export LLAMACTL_LOGS_DIR=/var/log/llamactl
+**Linux:**
+- `./llamactl.yaml` or `./config.yaml` (current directory)
+- `$HOME/.config/llamactl/config.yaml`
+- `/etc/llamactl/config.yaml`

-# Limits
-export LLAMACTL_MAX_INSTANCES=5
+**macOS:**
+- `./llamactl.yaml` or `./config.yaml` (current directory)
+- `$HOME/Library/Application Support/llamactl/config.yaml`
+- `/Library/Application Support/llamactl/config.yaml`
+
+**Windows:**
+- `./llamactl.yaml` or `./config.yaml` (current directory)
+- `%APPDATA%\llamactl\config.yaml`
+- `%USERPROFILE%\llamactl\config.yaml`
+- `%PROGRAMDATA%\llamactl\config.yaml`
+
+You can specify the path to config file with `LLAMACTL_CONFIG_PATH` environment variable.
+
+## Configuration Options
+
+### Server Configuration
+
+```yaml
+server:
+  host: "0.0.0.0"         # Server host to bind to (default: "0.0.0.0")
+  port: 8080              # Server port to bind to (default: 8080)
+  allowed_origins: ["*"]  # CORS allowed origins (default: ["*"])
+  enable_swagger: false   # Enable Swagger UI (default: false)
 ```

+**Environment Variables:**
+- `LLAMACTL_HOST` - Server host
+- `LLAMACTL_PORT` - Server port
+- `LLAMACTL_ALLOWED_ORIGINS` - Comma-separated CORS origins
+- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
+
+### Instance Configuration
+
+```yaml
+instances:
+  port_range: [8000, 9000]                          # Port range for instances (default: [8000, 9000])
+  data_dir: "~/.local/share/llamactl"               # Directory for all llamactl data (default varies by OS)
+  configs_dir: "~/.local/share/llamactl/instances"  # Directory for instance configs (default: data_dir/instances)
+  logs_dir: "~/.local/share/llamactl/logs"          # Directory for instance logs (default: data_dir/logs)
+  auto_create_dirs: true                            # Automatically create data/config/logs directories (default: true)
+  max_instances: -1                                 # Maximum instances (-1 = unlimited)
+  max_running_instances: -1                         # Maximum running instances (-1 = unlimited)
+  enable_lru_eviction: true                         # Enable LRU eviction for idle instances
+  llama_executable: "llama-server"                  # Path to llama-server executable
+  default_auto_restart: true                        # Default auto-restart setting
+  default_max_restarts: 3                           # Default maximum restart attempts
+  default_restart_delay: 5                          # Default restart delay in seconds
+  default_on_demand_start: true                     # Default on-demand start setting
+  on_demand_start_timeout: 120                      # Default on-demand start timeout in seconds
+  timeout_check_interval: 5                         # Default instance timeout check interval in minutes
+```
+
+**Environment Variables:**
+- `LLAMACTL_INSTANCE_PORT_RANGE` - Port range (format: "8000-9000" or "8000,9000")
+- `LLAMACTL_DATA_DIRECTORY` - Data directory path
+- `LLAMACTL_INSTANCES_DIR` - Instance configs directory path
+- `LLAMACTL_LOGS_DIR` - Log directory path
+- `LLAMACTL_AUTO_CREATE_DATA_DIR` - Auto-create data/config/logs directories (true/false)
+- `LLAMACTL_MAX_INSTANCES` - Maximum number of instances
+- `LLAMACTL_MAX_RUNNING_INSTANCES` - Maximum number of running instances
+- `LLAMACTL_ENABLE_LRU_EVICTION` - Enable LRU eviction for idle instances
+- `LLAMACTL_LLAMA_EXECUTABLE` - Path to llama-server executable
+- `LLAMACTL_DEFAULT_AUTO_RESTART` - Default auto-restart setting (true/false)
+- `LLAMACTL_DEFAULT_MAX_RESTARTS` - Default maximum restarts
+- `LLAMACTL_DEFAULT_RESTART_DELAY` - Default restart delay in seconds
+- `LLAMACTL_DEFAULT_ON_DEMAND_START` - Default on-demand start setting (true/false)
+- `LLAMACTL_ON_DEMAND_START_TIMEOUT` - Default on-demand start timeout in seconds
+- `LLAMACTL_TIMEOUT_CHECK_INTERVAL` - Default instance timeout check interval in minutes
+
+### Authentication Configuration
+
+```yaml
+auth:
+  require_inference_auth: true           # Require API key for OpenAI endpoints (default: true)
+  inference_keys: []                     # List of valid inference API keys
+  require_management_auth: true          # Require API key for management endpoints (default: true)
+  management_keys: []                    # List of valid management API keys
+```
+
+**Environment Variables:**
+- `LLAMACTL_REQUIRE_INFERENCE_AUTH` - Require auth for OpenAI endpoints (true/false)
+- `LLAMACTL_INFERENCE_KEYS` - Comma-separated inference API keys
+- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
+- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
+
 ## Command Line Options

 View all available command line options:
@@ -62,90 +147,13 @@ View all available command line options:
 llamactl --help
 ```

-Common options:
-
-```bash
-# Specify config file
-llamactl --config /path/to/config.yaml
-
-# Set log level
-llamactl --log-level debug
-
-# Run on different port
-llamactl --port 9090
-```
-
-## Instance Configuration
-
-When creating instances, you can specify various options:
-
-### Basic Options
-
- `name`: Unique identifier for the instance
- `model_path`: Path to the GGUF model file
- `port`: Port for the instance to listen on
-
-### Advanced Options
-
- `threads`: Number of CPU threads to use
- `context_size`: Context window size
- `batch_size`: Batch size for processing
- `gpu_layers`: Number of layers to offload to GPU
- `memory_lock`: Lock model in memory
- `no_mmap`: Disable memory mapping
-
-### Example Instance Configuration
-
-```json
-{
-  "name": "production-model",
-  "model_path": "/models/llama-2-13b-chat.gguf",
-  "port": 8081,
-  "options": {
-    "threads": 8,
-    "context_size": 4096,
-    "batch_size": 512,
-    "gpu_layers": 35,
-    "memory_lock": true
-  }
-}
-```
-
-## Security Configuration
-
-### Enable Authentication
-
-To enable authentication, update your config file:
-
-```yaml
-auth:
-  enabled: true
-  jwt_secret: "your-very-secure-secret-key"
-  token_expiry: "24h"
-```
-
-### HTTPS Configuration
-
-For production deployments, configure HTTPS:
-
-```yaml
-server:
-  tls:
-    enabled: true
-    cert_file: "/path/to/cert.pem"
-    key_file: "/path/to/key.pem"
-```
-
-## Logging Configuration
-
-Configure logging levels and outputs:
-
-```yaml
-logging:
-  level: "info"  # debug, info, warn, error
-  format: "json"  # json or text
-  output: "/var/log/llamactl/app.log"
-```
+You can also override configuration using command line flags when starting llamactl.
+
+## Next Steps
+
+- Learn about [Managing Instances](../user-guide/managing-instances.md)
+- Explore [Advanced Configuration](../advanced/monitoring.md)
+- Set up [Monitoring](../advanced/monitoring.md)

 ## Next Steps

--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@@ -4,9 +4,19 @@ This guide will walk you through installing Llamactl on your system.

 ## Prerequisites

-Before installing Llamactl, ensure you have:
+You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:

- Go 1.19 or later
+```bash
+# Quick install methods:
+# Homebrew (macOS)
+brew install llama.cpp
+
+# Or build from source - see llama.cpp docs
+```
+
+Additional requirements for building from source:
+- Go 1.24 or later
+- Node.js 22 or later
 - Git
 - Sufficient disk space for your models

@@ -14,17 +24,18 @@ Before installing Llamactl, ensure you have:

 ### Option 1: Download Binary (Recommended)

-Download the latest release from our [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
+Download the latest release from the [GitHub releases page](https://github.com/lordmathis/llamactl/releases):

 ```bash
-# Download for Linux
-curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-linux-amd64 -o llamactl
-
-# Make executable
-chmod +x llamactl
-
-# Move to PATH (optional)
+# Linux/macOS - Get latest version and download
+LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
+curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
 sudo mv llamactl /usr/local/bin/
+
+# Or download manually from:
+# https://github.com/lordmathis/llamactl/releases/latest
+
+# Windows - Download from releases page
 ```

 ### Option 2: Build from Source
@@ -36,11 +47,12 @@ If you prefer to build from source:
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl

-# Build the application
-go build -o llamactl cmd/server/main.go
-```
+# Build the web UI
+cd webui && npm ci && npm run build && cd ..

-For detailed build instructions, see the [Building from Source](../development/building.md) guide.
+# Build the application
+go build -o llamactl ./cmd/server
+```

 ## Verification

--- a/docs/getting-started/quick-start.md
+++ b/docs/getting-started/quick-start.md
@@ -28,7 +28,6 @@ You should see the Llamactl web interface.
 2. Fill in the instance configuration:
   - **Name**: Give your instance a descriptive name
   - **Model Path**: Path to your Llama.cpp model file
-   - **Port**: Port for the instance to run on
   - **Additional Options**: Any extra Llama.cpp parameters

 3. Click "Create Instance"
@@ -50,7 +49,6 @@ Here's a basic example configuration for a Llama 2 model:
 {
  "name": "llama2-7b",
  "model_path": "/path/to/llama-2-7b-chat.gguf",
-  "port": 8081,
  "options": {
    "threads": 4,
    "context_size": 2048
@@ -72,13 +70,70 @@ curl -X POST http://localhost:8080/api/instances \
  -d '{
    "name": "my-model",
    "model_path": "/path/to/model.gguf",
-    "port": 8081
  }'

 # Start an instance
 curl -X POST http://localhost:8080/api/instances/my-model/start
 ```

+## OpenAI Compatible API
+
+Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.
+
+### Chat Completions
+
+Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:
+
+```bash
+curl -X POST http://localhost:8080/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "my-model",
+    "messages": [
+      {
+        "role": "user",
+        "content": "Hello! Can you help me write a Python function?"
+      }
+    ],
+    "max_tokens": 150,
+    "temperature": 0.7
+  }'
+```
+
+### Using with Python OpenAI Client
+
+You can also use the official OpenAI Python client:
+
+```python
+from openai import OpenAI
+
+# Point the client to your Llamactl server
+client = OpenAI(
+    base_url="http://localhost:8080/v1",
+    api_key="not-needed"  # Llamactl doesn't require API keys by default
+)
+
+# Create a chat completion
+response = client.chat.completions.create(
+    model="my-model",  # Use the name of your instance
+    messages=[
+        {"role": "user", "content": "Explain quantum computing in simple terms"}
+    ],
+    max_tokens=200,
+    temperature=0.7
+)
+
+print(response.choices[0].message.content)
+```
+
+### List Available Models
+
+Get a list of running instances (models) in OpenAI-compatible format:
+
+```bash
+curl http://localhost:8080/v1/models
+```
+
 ## Next Steps

 - Learn more about the [Web UI](../user-guide/web-ui.md)