mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Create initial documentation structure
This commit is contained in:
65
.github/workflows/docs.yml
vendored
Normal file
65
.github/workflows/docs.yml
vendored
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
name: Build and Deploy Documentation
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
paths:
|
||||||
|
- 'docs/**'
|
||||||
|
- 'mkdocs.yml'
|
||||||
|
- 'docs-requirements.txt'
|
||||||
|
- '.github/workflows/docs.yml'
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
paths:
|
||||||
|
- 'docs/**'
|
||||||
|
- 'mkdocs.yml'
|
||||||
|
- 'docs-requirements.txt'
|
||||||
|
|
||||||
|
permissions:
|
||||||
|
contents: read
|
||||||
|
pages: write
|
||||||
|
id-token: write
|
||||||
|
|
||||||
|
concurrency:
|
||||||
|
group: "pages"
|
||||||
|
cancel-in-progress: false
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
build:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- name: Checkout
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
fetch-depth: 0 # Needed for git-revision-date-localized plugin
|
||||||
|
|
||||||
|
- name: Setup Python
|
||||||
|
uses: actions/setup-python@v4
|
||||||
|
with:
|
||||||
|
python-version: '3.11'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
pip install -r docs-requirements.txt
|
||||||
|
|
||||||
|
- name: Build documentation
|
||||||
|
run: |
|
||||||
|
mkdocs build --strict
|
||||||
|
|
||||||
|
- name: Upload documentation artifact
|
||||||
|
if: github.ref == 'refs/heads/main'
|
||||||
|
uses: actions/upload-pages-artifact@v3
|
||||||
|
with:
|
||||||
|
path: ./site
|
||||||
|
|
||||||
|
deploy:
|
||||||
|
if: github.ref == 'refs/heads/main'
|
||||||
|
environment:
|
||||||
|
name: github-pages
|
||||||
|
url: ${{ steps.deployment.outputs.page_url }}
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
needs: build
|
||||||
|
steps:
|
||||||
|
- name: Deploy to GitHub Pages
|
||||||
|
id: deployment
|
||||||
|
uses: actions/deploy-pages@v4
|
||||||
@@ -129,6 +129,50 @@ Use this format for pull request titles:
|
|||||||
- Use meaningful component and variable names
|
- Use meaningful component and variable names
|
||||||
- Prefer functional components over class components
|
- Prefer functional components over class components
|
||||||
|
|
||||||
|
## Documentation Development
|
||||||
|
|
||||||
|
This project uses MkDocs for documentation. When working on documentation:
|
||||||
|
|
||||||
|
### Setup Documentation Environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install documentation dependencies
|
||||||
|
pip install -r docs-requirements.txt
|
||||||
|
```
|
||||||
|
|
||||||
|
### Development Workflow
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Serve documentation locally for development
|
||||||
|
mkdocs serve
|
||||||
|
```
|
||||||
|
The documentation will be available at http://localhost:8000
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build static documentation site
|
||||||
|
mkdocs build
|
||||||
|
```
|
||||||
|
The built site will be in the `site/` directory.
|
||||||
|
|
||||||
|
### Documentation Structure
|
||||||
|
|
||||||
|
- `docs/` - Documentation content (Markdown files)
|
||||||
|
- `mkdocs.yml` - MkDocs configuration
|
||||||
|
- `docs-requirements.txt` - Python dependencies for documentation
|
||||||
|
|
||||||
|
### Adding New Documentation
|
||||||
|
|
||||||
|
When adding new documentation:
|
||||||
|
|
||||||
|
1. Create Markdown files in the appropriate `docs/` subdirectory
|
||||||
|
2. Update the navigation in `mkdocs.yml`
|
||||||
|
3. Test locally with `mkdocs serve`
|
||||||
|
4. Submit a pull request
|
||||||
|
|
||||||
|
### Documentation Deployment
|
||||||
|
|
||||||
|
Documentation is automatically built and deployed to GitHub Pages when changes are pushed to the main branch.
|
||||||
|
|
||||||
## Getting Help
|
## Getting Help
|
||||||
|
|
||||||
- Check existing [issues](https://github.com/lordmathis/llamactl/issues)
|
- Check existing [issues](https://github.com/lordmathis/llamactl/issues)
|
||||||
|
|||||||
4
docs-requirements.txt
Normal file
4
docs-requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
|||||||
|
mkdocs-material==9.5.3
|
||||||
|
mkdocs==1.5.3
|
||||||
|
pymdown-extensions==10.7
|
||||||
|
mkdocs-git-revision-date-localized-plugin==1.2.4
|
||||||
316
docs/advanced/backends.md
Normal file
316
docs/advanced/backends.md
Normal file
@@ -0,0 +1,316 @@
|
|||||||
|
# Backends
|
||||||
|
|
||||||
|
LlamaCtl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
|
||||||
|
|
||||||
|
## Llama.cpp Backend
|
||||||
|
|
||||||
|
The primary backend for LlamaCtl, providing robust support for GGUF models.
|
||||||
|
|
||||||
|
### Features
|
||||||
|
|
||||||
|
- **GGUF Support**: Native support for GGUF model format
|
||||||
|
- **GPU Acceleration**: CUDA, OpenCL, and Metal support
|
||||||
|
- **Memory Optimization**: Efficient memory usage and mapping
|
||||||
|
- **Multi-threading**: Configurable CPU thread utilization
|
||||||
|
- **Quantization**: Support for various quantization levels
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backends:
|
||||||
|
llamacpp:
|
||||||
|
binary_path: "/usr/local/bin/llama-server"
|
||||||
|
default_options:
|
||||||
|
threads: 4
|
||||||
|
context_size: 2048
|
||||||
|
batch_size: 512
|
||||||
|
gpu:
|
||||||
|
enabled: true
|
||||||
|
layers: 35
|
||||||
|
```
|
||||||
|
|
||||||
|
### Supported Options
|
||||||
|
|
||||||
|
| Option | Description | Default |
|
||||||
|
|--------|-------------|---------|
|
||||||
|
| `threads` | Number of CPU threads | 4 |
|
||||||
|
| `context_size` | Context window size | 2048 |
|
||||||
|
| `batch_size` | Batch size for processing | 512 |
|
||||||
|
| `gpu_layers` | Layers to offload to GPU | 0 |
|
||||||
|
| `memory_lock` | Lock model in memory | false |
|
||||||
|
| `no_mmap` | Disable memory mapping | false |
|
||||||
|
| `rope_freq_base` | RoPE frequency base | 10000 |
|
||||||
|
| `rope_freq_scale` | RoPE frequency scale | 1.0 |
|
||||||
|
|
||||||
|
### GPU Acceleration
|
||||||
|
|
||||||
|
#### CUDA Setup
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install CUDA toolkit
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install nvidia-cuda-toolkit
|
||||||
|
|
||||||
|
# Verify CUDA installation
|
||||||
|
nvcc --version
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Configuration for GPU
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "gpu-accelerated",
|
||||||
|
"model_path": "/models/llama-2-13b.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"options": {
|
||||||
|
"gpu_layers": 35,
|
||||||
|
"threads": 2,
|
||||||
|
"context_size": 4096
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Tuning
|
||||||
|
|
||||||
|
#### Memory Optimization
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# For limited memory systems
|
||||||
|
options:
|
||||||
|
context_size: 1024
|
||||||
|
batch_size: 256
|
||||||
|
no_mmap: true
|
||||||
|
memory_lock: false
|
||||||
|
|
||||||
|
# For high-memory systems
|
||||||
|
options:
|
||||||
|
context_size: 8192
|
||||||
|
batch_size: 1024
|
||||||
|
memory_lock: true
|
||||||
|
no_mmap: false
|
||||||
|
```
|
||||||
|
|
||||||
|
#### CPU Optimization
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Match thread count to CPU cores
|
||||||
|
# For 8-core CPU:
|
||||||
|
options:
|
||||||
|
threads: 6 # Leave 2 cores for system
|
||||||
|
|
||||||
|
# For high-performance CPUs:
|
||||||
|
options:
|
||||||
|
threads: 16
|
||||||
|
batch_size: 1024
|
||||||
|
```
|
||||||
|
|
||||||
|
## Future Backends
|
||||||
|
|
||||||
|
LlamaCtl is designed to support multiple backends. Planned additions:
|
||||||
|
|
||||||
|
### vLLM Backend
|
||||||
|
|
||||||
|
High-performance inference engine optimized for serving:
|
||||||
|
|
||||||
|
- **Features**: Fast inference, batching, streaming
|
||||||
|
- **Models**: Supports various model formats
|
||||||
|
- **Scaling**: Horizontal scaling support
|
||||||
|
|
||||||
|
### TensorRT-LLM Backend
|
||||||
|
|
||||||
|
NVIDIA's optimized inference engine:
|
||||||
|
|
||||||
|
- **Features**: Maximum GPU performance
|
||||||
|
- **Models**: Optimized for NVIDIA GPUs
|
||||||
|
- **Deployment**: Production-ready inference
|
||||||
|
|
||||||
|
### Ollama Backend
|
||||||
|
|
||||||
|
Integration with Ollama for easy model management:
|
||||||
|
|
||||||
|
- **Features**: Simplified model downloading
|
||||||
|
- **Models**: Large model library
|
||||||
|
- **Integration**: Seamless model switching
|
||||||
|
|
||||||
|
## Backend Selection
|
||||||
|
|
||||||
|
### Automatic Detection
|
||||||
|
|
||||||
|
LlamaCtl can automatically detect the best backend:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backends:
|
||||||
|
auto_detect: true
|
||||||
|
preference_order:
|
||||||
|
- "llamacpp"
|
||||||
|
- "vllm"
|
||||||
|
- "tensorrt"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Manual Selection
|
||||||
|
|
||||||
|
Force a specific backend for an instance:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "manual-backend",
|
||||||
|
"backend": "llamacpp",
|
||||||
|
"model_path": "/models/model.gguf",
|
||||||
|
"port": 8081
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Backend-Specific Features
|
||||||
|
|
||||||
|
### Llama.cpp Features
|
||||||
|
|
||||||
|
#### Model Formats
|
||||||
|
|
||||||
|
- **GGUF**: Primary format, best compatibility
|
||||||
|
- **GGML**: Legacy format (limited support)
|
||||||
|
|
||||||
|
#### Quantization Levels
|
||||||
|
|
||||||
|
- `Q2_K`: Smallest size, lower quality
|
||||||
|
- `Q4_K_M`: Balanced size and quality
|
||||||
|
- `Q5_K_M`: Higher quality, larger size
|
||||||
|
- `Q6_K`: Near-original quality
|
||||||
|
- `Q8_0`: Minimal loss, largest size
|
||||||
|
|
||||||
|
#### Advanced Options
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
advanced:
|
||||||
|
rope_scaling:
|
||||||
|
type: "linear"
|
||||||
|
factor: 2.0
|
||||||
|
attention:
|
||||||
|
flash_attention: true
|
||||||
|
grouped_query: true
|
||||||
|
```
|
||||||
|
|
||||||
|
## Monitoring Backend Performance
|
||||||
|
|
||||||
|
### Metrics Collection
|
||||||
|
|
||||||
|
Monitor backend-specific metrics:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get backend statistics
|
||||||
|
curl http://localhost:8080/api/instances/my-instance/backend/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"backend": "llamacpp",
|
||||||
|
"version": "b1234",
|
||||||
|
"metrics": {
|
||||||
|
"tokens_per_second": 15.2,
|
||||||
|
"memory_usage": 4294967296,
|
||||||
|
"gpu_utilization": 85.5,
|
||||||
|
"context_usage": 75.0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Optimization
|
||||||
|
|
||||||
|
#### Benchmark Different Configurations
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test various thread counts
|
||||||
|
for threads in 2 4 8 16; do
|
||||||
|
echo "Testing $threads threads"
|
||||||
|
curl -X PUT http://localhost:8080/api/instances/benchmark \
|
||||||
|
-d "{\"options\": {\"threads\": $threads}}"
|
||||||
|
# Run performance test
|
||||||
|
done
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Memory Usage Optimization
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Monitor memory usage
|
||||||
|
watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting Backends
|
||||||
|
|
||||||
|
### Common Llama.cpp Issues
|
||||||
|
|
||||||
|
**Model won't load:**
|
||||||
|
```bash
|
||||||
|
# Check model file
|
||||||
|
file /path/to/model.gguf
|
||||||
|
|
||||||
|
# Verify format
|
||||||
|
llama-server --model /path/to/model.gguf --dry-run
|
||||||
|
```
|
||||||
|
|
||||||
|
**GPU not detected:**
|
||||||
|
```bash
|
||||||
|
# Check CUDA installation
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
# Verify llama.cpp GPU support
|
||||||
|
llama-server --help | grep -i gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
**Performance issues:**
|
||||||
|
```bash
|
||||||
|
# Check system resources
|
||||||
|
htop
|
||||||
|
nvidia-smi
|
||||||
|
|
||||||
|
# Verify configuration
|
||||||
|
curl http://localhost:8080/api/instances/my-instance/config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Custom Backend Development
|
||||||
|
|
||||||
|
### Backend Interface
|
||||||
|
|
||||||
|
Implement the backend interface for custom backends:
|
||||||
|
|
||||||
|
```go
|
||||||
|
type Backend interface {
|
||||||
|
Start(config InstanceConfig) error
|
||||||
|
Stop(instance *Instance) error
|
||||||
|
Health(instance *Instance) (*HealthStatus, error)
|
||||||
|
Stats(instance *Instance) (*Stats, error)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Registration
|
||||||
|
|
||||||
|
Register your custom backend:
|
||||||
|
|
||||||
|
```go
|
||||||
|
func init() {
|
||||||
|
backends.Register("custom", &CustomBackend{})
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Production Deployments
|
||||||
|
|
||||||
|
1. **Resource allocation**: Plan for peak usage
|
||||||
|
2. **Backend selection**: Choose based on requirements
|
||||||
|
3. **Monitoring**: Set up comprehensive monitoring
|
||||||
|
4. **Fallback**: Configure backup backends
|
||||||
|
|
||||||
|
### Development
|
||||||
|
|
||||||
|
1. **Rapid iteration**: Use smaller models
|
||||||
|
2. **Resource monitoring**: Track usage patterns
|
||||||
|
3. **Configuration testing**: Validate settings
|
||||||
|
4. **Performance profiling**: Optimize bottlenecks
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn about [Monitoring](monitoring.md) backend performance
|
||||||
|
- Explore [Troubleshooting](troubleshooting.md) guides
|
||||||
|
- Set up [Production Monitoring](monitoring.md)
|
||||||
420
docs/advanced/monitoring.md
Normal file
420
docs/advanced/monitoring.md
Normal file
@@ -0,0 +1,420 @@
|
|||||||
|
# Monitoring
|
||||||
|
|
||||||
|
Comprehensive monitoring setup for LlamaCtl in production environments.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Effective monitoring of LlamaCtl involves tracking:
|
||||||
|
|
||||||
|
- Instance health and performance
|
||||||
|
- System resource usage
|
||||||
|
- API response times
|
||||||
|
- Error rates and alerts
|
||||||
|
|
||||||
|
## Built-in Monitoring
|
||||||
|
|
||||||
|
### Health Checks
|
||||||
|
|
||||||
|
LlamaCtl provides built-in health monitoring:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check overall system health
|
||||||
|
curl http://localhost:8080/api/system/health
|
||||||
|
|
||||||
|
# Check specific instance health
|
||||||
|
curl http://localhost:8080/api/instances/{name}/health
|
||||||
|
```
|
||||||
|
|
||||||
|
### Metrics Endpoint
|
||||||
|
|
||||||
|
Access Prometheus-compatible metrics:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/metrics
|
||||||
|
```
|
||||||
|
|
||||||
|
**Available Metrics:**
|
||||||
|
- `llamactl_instances_total`: Total number of instances
|
||||||
|
- `llamactl_instances_running`: Number of running instances
|
||||||
|
- `llamactl_instance_memory_bytes`: Instance memory usage
|
||||||
|
- `llamactl_instance_cpu_percent`: Instance CPU usage
|
||||||
|
- `llamactl_api_requests_total`: Total API requests
|
||||||
|
- `llamactl_api_request_duration_seconds`: API response times
|
||||||
|
|
||||||
|
## Prometheus Integration
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Add LlamaCtl as a Prometheus target:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# prometheus.yml
|
||||||
|
scrape_configs:
|
||||||
|
- job_name: 'llamactl'
|
||||||
|
static_configs:
|
||||||
|
- targets: ['localhost:8080']
|
||||||
|
metrics_path: '/metrics'
|
||||||
|
scrape_interval: 15s
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Metrics
|
||||||
|
|
||||||
|
Enable additional metrics in LlamaCtl:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
monitoring:
|
||||||
|
enabled: true
|
||||||
|
prometheus:
|
||||||
|
enabled: true
|
||||||
|
path: "/metrics"
|
||||||
|
metrics:
|
||||||
|
- instance_stats
|
||||||
|
- api_performance
|
||||||
|
- system_resources
|
||||||
|
```
|
||||||
|
|
||||||
|
## Grafana Dashboards
|
||||||
|
|
||||||
|
### LlamaCtl Dashboard
|
||||||
|
|
||||||
|
Import the official Grafana dashboard:
|
||||||
|
|
||||||
|
1. Download dashboard JSON from releases
|
||||||
|
2. Import into Grafana
|
||||||
|
3. Configure Prometheus data source
|
||||||
|
|
||||||
|
### Key Panels
|
||||||
|
|
||||||
|
**Instance Overview:**
|
||||||
|
- Instance count and status
|
||||||
|
- Resource usage per instance
|
||||||
|
- Health status indicators
|
||||||
|
|
||||||
|
**Performance Metrics:**
|
||||||
|
- API response times
|
||||||
|
- Tokens per second
|
||||||
|
- Memory usage trends
|
||||||
|
|
||||||
|
**System Resources:**
|
||||||
|
- CPU and memory utilization
|
||||||
|
- Disk I/O and network usage
|
||||||
|
- GPU utilization (if applicable)
|
||||||
|
|
||||||
|
### Custom Queries
|
||||||
|
|
||||||
|
**Instance Uptime:**
|
||||||
|
```promql
|
||||||
|
(time() - llamactl_instance_start_time_seconds) / 3600
|
||||||
|
```
|
||||||
|
|
||||||
|
**Memory Usage Percentage:**
|
||||||
|
```promql
|
||||||
|
(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
|
||||||
|
```
|
||||||
|
|
||||||
|
**API Error Rate:**
|
||||||
|
```promql
|
||||||
|
rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
|
||||||
|
```
|
||||||
|
|
||||||
|
## Alerting
|
||||||
|
|
||||||
|
### Prometheus Alerts
|
||||||
|
|
||||||
|
Configure alerts for critical conditions:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# alerts.yml
|
||||||
|
groups:
|
||||||
|
- name: llamactl
|
||||||
|
rules:
|
||||||
|
- alert: InstanceDown
|
||||||
|
expr: llamactl_instance_up == 0
|
||||||
|
for: 1m
|
||||||
|
labels:
|
||||||
|
severity: critical
|
||||||
|
annotations:
|
||||||
|
summary: "LlamaCtl instance {{ $labels.instance_name }} is down"
|
||||||
|
|
||||||
|
- alert: HighMemoryUsage
|
||||||
|
expr: llamactl_instance_memory_percent > 90
|
||||||
|
for: 5m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "High memory usage on {{ $labels.instance_name }}"
|
||||||
|
|
||||||
|
- alert: APIHighLatency
|
||||||
|
expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
|
||||||
|
for: 2m
|
||||||
|
labels:
|
||||||
|
severity: warning
|
||||||
|
annotations:
|
||||||
|
summary: "High API latency detected"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Notification Channels
|
||||||
|
|
||||||
|
Configure alert notifications:
|
||||||
|
|
||||||
|
**Slack Integration:**
|
||||||
|
```yaml
|
||||||
|
# alertmanager.yml
|
||||||
|
route:
|
||||||
|
group_by: ['alertname']
|
||||||
|
receiver: 'slack'
|
||||||
|
|
||||||
|
receivers:
|
||||||
|
- name: 'slack'
|
||||||
|
slack_configs:
|
||||||
|
- api_url: 'https://hooks.slack.com/services/...'
|
||||||
|
channel: '#alerts'
|
||||||
|
title: 'LlamaCtl Alert'
|
||||||
|
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Management
|
||||||
|
|
||||||
|
### Centralized Logging
|
||||||
|
|
||||||
|
Configure log aggregation:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
logging:
|
||||||
|
level: "info"
|
||||||
|
output: "json"
|
||||||
|
destinations:
|
||||||
|
- type: "file"
|
||||||
|
path: "/var/log/llamactl/app.log"
|
||||||
|
- type: "syslog"
|
||||||
|
facility: "local0"
|
||||||
|
- type: "elasticsearch"
|
||||||
|
url: "http://elasticsearch:9200"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Log Analysis
|
||||||
|
|
||||||
|
Use ELK stack for log analysis:
|
||||||
|
|
||||||
|
**Elasticsearch Index Template:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"index_patterns": ["llamactl-*"],
|
||||||
|
"mappings": {
|
||||||
|
"properties": {
|
||||||
|
"timestamp": {"type": "date"},
|
||||||
|
"level": {"type": "keyword"},
|
||||||
|
"message": {"type": "text"},
|
||||||
|
"instance": {"type": "keyword"},
|
||||||
|
"component": {"type": "keyword"}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Kibana Visualizations:**
|
||||||
|
- Log volume over time
|
||||||
|
- Error rate by instance
|
||||||
|
- Performance trends
|
||||||
|
- Resource usage patterns
|
||||||
|
|
||||||
|
## Application Performance Monitoring
|
||||||
|
|
||||||
|
### OpenTelemetry Integration
|
||||||
|
|
||||||
|
Enable distributed tracing:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
telemetry:
|
||||||
|
enabled: true
|
||||||
|
otlp:
|
||||||
|
endpoint: "http://jaeger:14268/api/traces"
|
||||||
|
sampling_rate: 0.1
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Spans
|
||||||
|
|
||||||
|
Add custom tracing to track operations:
|
||||||
|
|
||||||
|
```go
|
||||||
|
ctx, span := tracer.Start(ctx, "instance.start")
|
||||||
|
defer span.End()
|
||||||
|
|
||||||
|
// Track instance startup time
|
||||||
|
span.SetAttributes(
|
||||||
|
attribute.String("instance.name", name),
|
||||||
|
attribute.String("model.path", modelPath),
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Health Check Configuration
|
||||||
|
|
||||||
|
### Readiness Probes
|
||||||
|
|
||||||
|
Configure Kubernetes readiness probes:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /api/health
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 30
|
||||||
|
periodSeconds: 10
|
||||||
|
```
|
||||||
|
|
||||||
|
### Liveness Probes
|
||||||
|
|
||||||
|
Configure liveness probes:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /api/health/live
|
||||||
|
port: 8080
|
||||||
|
initialDelaySeconds: 60
|
||||||
|
periodSeconds: 30
|
||||||
|
```
|
||||||
|
|
||||||
|
### Custom Health Checks
|
||||||
|
|
||||||
|
Implement custom health checks:
|
||||||
|
|
||||||
|
```go
|
||||||
|
func (h *HealthHandler) CustomCheck(ctx context.Context) error {
|
||||||
|
// Check database connectivity
|
||||||
|
if err := h.db.Ping(); err != nil {
|
||||||
|
return fmt.Errorf("database unreachable: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check instance responsiveness
|
||||||
|
for _, instance := range h.instances {
|
||||||
|
if !instance.IsHealthy() {
|
||||||
|
return fmt.Errorf("instance %s unhealthy", instance.Name)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
return nil
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Profiling
|
||||||
|
|
||||||
|
### pprof Integration
|
||||||
|
|
||||||
|
Enable Go profiling:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
debug:
|
||||||
|
pprof_enabled: true
|
||||||
|
pprof_port: 6060
|
||||||
|
```
|
||||||
|
|
||||||
|
Access profiling endpoints:
|
||||||
|
```bash
|
||||||
|
# CPU profile
|
||||||
|
go tool pprof http://localhost:6060/debug/pprof/profile
|
||||||
|
|
||||||
|
# Memory profile
|
||||||
|
go tool pprof http://localhost:6060/debug/pprof/heap
|
||||||
|
|
||||||
|
# Goroutine profile
|
||||||
|
go tool pprof http://localhost:6060/debug/pprof/goroutine
|
||||||
|
```
|
||||||
|
|
||||||
|
### Continuous Profiling
|
||||||
|
|
||||||
|
Set up continuous profiling with Pyroscope:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
profiling:
|
||||||
|
enabled: true
|
||||||
|
pyroscope:
|
||||||
|
server_address: "http://pyroscope:4040"
|
||||||
|
application_name: "llamactl"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Monitoring
|
||||||
|
|
||||||
|
### Audit Logging
|
||||||
|
|
||||||
|
Enable security audit logs:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# config.yaml
|
||||||
|
audit:
|
||||||
|
enabled: true
|
||||||
|
log_file: "/var/log/llamactl/audit.log"
|
||||||
|
events:
|
||||||
|
- "auth.login"
|
||||||
|
- "auth.logout"
|
||||||
|
- "instance.create"
|
||||||
|
- "instance.delete"
|
||||||
|
- "config.update"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Rate Limiting Monitoring
|
||||||
|
|
||||||
|
Track rate limiting metrics:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Monitor rate limit hits
|
||||||
|
curl http://localhost:8080/metrics | grep rate_limit
|
||||||
|
```
|
||||||
|
|
||||||
|
## Troubleshooting Monitoring
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**Metrics not appearing:**
|
||||||
|
1. Check Prometheus configuration
|
||||||
|
2. Verify network connectivity
|
||||||
|
3. Review LlamaCtl logs for errors
|
||||||
|
|
||||||
|
**High memory usage:**
|
||||||
|
1. Check for memory leaks in profiles
|
||||||
|
2. Monitor garbage collection metrics
|
||||||
|
3. Review instance configurations
|
||||||
|
|
||||||
|
**Alert fatigue:**
|
||||||
|
1. Tune alert thresholds
|
||||||
|
2. Implement alert severity levels
|
||||||
|
3. Use alert routing and suppression
|
||||||
|
|
||||||
|
### Debug Tools
|
||||||
|
|
||||||
|
**Monitoring health:**
|
||||||
|
```bash
|
||||||
|
# Check monitoring endpoints
|
||||||
|
curl -v http://localhost:8080/metrics
|
||||||
|
curl -v http://localhost:8080/api/health
|
||||||
|
|
||||||
|
# Review logs
|
||||||
|
tail -f /var/log/llamactl/app.log
|
||||||
|
```
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Production Monitoring
|
||||||
|
|
||||||
|
1. **Comprehensive coverage**: Monitor all critical components
|
||||||
|
2. **Appropriate alerting**: Balance sensitivity and noise
|
||||||
|
3. **Regular review**: Analyze trends and patterns
|
||||||
|
4. **Documentation**: Maintain runbooks for alerts
|
||||||
|
|
||||||
|
### Performance Optimization
|
||||||
|
|
||||||
|
1. **Baseline establishment**: Know normal operating parameters
|
||||||
|
2. **Trend analysis**: Identify performance degradation early
|
||||||
|
3. **Capacity planning**: Monitor resource growth trends
|
||||||
|
4. **Optimization cycles**: Regular performance tuning
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Set up [Troubleshooting](troubleshooting.md) procedures
|
||||||
|
- Learn about [Backend optimization](backends.md)
|
||||||
|
- Configure [Production deployment](../development/building.md)
|
||||||
560
docs/advanced/troubleshooting.md
Normal file
560
docs/advanced/troubleshooting.md
Normal file
@@ -0,0 +1,560 @@
|
|||||||
|
# Troubleshooting
|
||||||
|
|
||||||
|
Common issues and solutions for LlamaCtl deployment and operation.
|
||||||
|
|
||||||
|
## Installation Issues
|
||||||
|
|
||||||
|
### Binary Not Found
|
||||||
|
|
||||||
|
**Problem:** `llamactl: command not found`
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Verify the binary is in your PATH:
|
||||||
|
```bash
|
||||||
|
echo $PATH
|
||||||
|
which llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add to PATH or use full path:
|
||||||
|
```bash
|
||||||
|
export PATH=$PATH:/path/to/llamactl
|
||||||
|
# or
|
||||||
|
/full/path/to/llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check binary permissions:
|
||||||
|
```bash
|
||||||
|
chmod +x llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
### Permission Denied
|
||||||
|
|
||||||
|
**Problem:** Permission errors when starting LlamaCtl
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Check file permissions:
|
||||||
|
```bash
|
||||||
|
ls -la llamactl
|
||||||
|
chmod +x llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify directory permissions:
|
||||||
|
```bash
|
||||||
|
# Check models directory
|
||||||
|
ls -la /path/to/models/
|
||||||
|
|
||||||
|
# Check logs directory
|
||||||
|
sudo mkdir -p /var/log/llamactl
|
||||||
|
sudo chown $USER:$USER /var/log/llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Run with appropriate user:
|
||||||
|
```bash
|
||||||
|
# Don't run as root unless necessary
|
||||||
|
sudo -u llamactl ./llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
## Startup Issues
|
||||||
|
|
||||||
|
### Port Already in Use
|
||||||
|
|
||||||
|
**Problem:** `bind: address already in use`
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Find process using the port:
|
||||||
|
```bash
|
||||||
|
sudo netstat -tulpn | grep :8080
|
||||||
|
# or
|
||||||
|
sudo lsof -i :8080
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Kill the conflicting process:
|
||||||
|
```bash
|
||||||
|
sudo kill -9 <PID>
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Use a different port:
|
||||||
|
```bash
|
||||||
|
llamactl --port 8081
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration Errors
|
||||||
|
|
||||||
|
**Problem:** Invalid configuration preventing startup
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. Validate configuration file:
|
||||||
|
```bash
|
||||||
|
llamactl --config /path/to/config.yaml --validate
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check YAML syntax:
|
||||||
|
```bash
|
||||||
|
yamllint config.yaml
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Use minimal configuration:
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
host: "localhost"
|
||||||
|
port: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instance Management Issues
|
||||||
|
|
||||||
|
### Model Loading Failures
|
||||||
|
|
||||||
|
**Problem:** Instance fails to start with model loading errors
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check model file exists:
|
||||||
|
```bash
|
||||||
|
ls -la /path/to/model.gguf
|
||||||
|
file /path/to/model.gguf
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify model format:
|
||||||
|
```bash
|
||||||
|
# Check if it's a valid GGUF file
|
||||||
|
hexdump -C /path/to/model.gguf | head -5
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Test with llama.cpp directly:
|
||||||
|
```bash
|
||||||
|
llama-server --model /path/to/model.gguf --port 8081
|
||||||
|
```
|
||||||
|
|
||||||
|
**Common Solutions:**
|
||||||
|
- **Corrupted model:** Re-download the model file
|
||||||
|
- **Wrong format:** Ensure model is in GGUF format
|
||||||
|
- **Insufficient memory:** Reduce context size or use smaller model
|
||||||
|
- **Path issues:** Use absolute paths, check file permissions
|
||||||
|
|
||||||
|
### Memory Issues
|
||||||
|
|
||||||
|
**Problem:** Out of memory errors or system becomes unresponsive
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check system memory:
|
||||||
|
```bash
|
||||||
|
free -h
|
||||||
|
cat /proc/meminfo
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Monitor memory usage:
|
||||||
|
```bash
|
||||||
|
top -p $(pgrep llamactl)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check instance memory requirements:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/instances/{name}/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Reduce context size:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"context_size": 1024
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Enable memory mapping:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"no_mmap": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Use quantized models:**
|
||||||
|
- Try Q4_K_M instead of higher precision models
|
||||||
|
- Use smaller model variants (7B instead of 13B)
|
||||||
|
|
||||||
|
### GPU Issues
|
||||||
|
|
||||||
|
**Problem:** GPU not detected or not being used
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check GPU availability:
|
||||||
|
```bash
|
||||||
|
nvidia-smi
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify CUDA installation:
|
||||||
|
```bash
|
||||||
|
nvcc --version
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check llama.cpp GPU support:
|
||||||
|
```bash
|
||||||
|
llama-server --help | grep -i gpu
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Install CUDA drivers:**
|
||||||
|
```bash
|
||||||
|
sudo apt update
|
||||||
|
sudo apt install nvidia-driver-470 nvidia-cuda-toolkit
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Rebuild llama.cpp with GPU support:**
|
||||||
|
```bash
|
||||||
|
cmake -DLLAMA_CUBLAS=ON ..
|
||||||
|
make
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Configure GPU layers:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"gpu_layers": 35
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Performance Issues
|
||||||
|
|
||||||
|
### Slow Response Times
|
||||||
|
|
||||||
|
**Problem:** API responses are slow or timeouts occur
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check API response times:
|
||||||
|
```bash
|
||||||
|
time curl http://localhost:8080/api/instances
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Monitor system resources:
|
||||||
|
```bash
|
||||||
|
htop
|
||||||
|
iotop
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check instance logs:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/instances/{name}/logs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Optimize thread count:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"threads": 6
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Adjust batch size:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"batch_size": 512
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Enable GPU acceleration:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"gpu_layers": 35
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### High CPU Usage
|
||||||
|
|
||||||
|
**Problem:** LlamaCtl consuming excessive CPU
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Identify CPU-intensive processes:
|
||||||
|
```bash
|
||||||
|
top -p $(pgrep -f llamactl)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Check thread allocation:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/instances/{name}/config
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Reduce thread count:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"threads": 4
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Limit concurrent instances:**
|
||||||
|
```yaml
|
||||||
|
limits:
|
||||||
|
max_instances: 3
|
||||||
|
```
|
||||||
|
|
||||||
|
## Network Issues
|
||||||
|
|
||||||
|
### Connection Refused
|
||||||
|
|
||||||
|
**Problem:** Cannot connect to LlamaCtl web interface
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check if service is running:
|
||||||
|
```bash
|
||||||
|
ps aux | grep llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify port binding:
|
||||||
|
```bash
|
||||||
|
netstat -tulpn | grep :8080
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Test local connectivity:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/health
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Check firewall settings:**
|
||||||
|
```bash
|
||||||
|
sudo ufw status
|
||||||
|
sudo ufw allow 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Bind to correct interface:**
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
host: "0.0.0.0" # Instead of "localhost"
|
||||||
|
port: 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
### CORS Errors
|
||||||
|
|
||||||
|
**Problem:** Web UI shows CORS errors in browser console
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Enable CORS in configuration:**
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
cors_enabled: true
|
||||||
|
cors_origins:
|
||||||
|
- "http://localhost:3000"
|
||||||
|
- "https://yourdomain.com"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Use reverse proxy:**
|
||||||
|
```nginx
|
||||||
|
server {
|
||||||
|
listen 80;
|
||||||
|
location / {
|
||||||
|
proxy_pass http://localhost:8080;
|
||||||
|
proxy_set_header Host $host;
|
||||||
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Database Issues
|
||||||
|
|
||||||
|
### Startup Database Errors
|
||||||
|
|
||||||
|
**Problem:** Database connection failures on startup
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check database service:
|
||||||
|
```bash
|
||||||
|
systemctl status postgresql
|
||||||
|
# or
|
||||||
|
systemctl status mysql
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Test database connectivity:
|
||||||
|
```bash
|
||||||
|
psql -h localhost -U llamactl -d llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Start database service:**
|
||||||
|
```bash
|
||||||
|
sudo systemctl start postgresql
|
||||||
|
sudo systemctl enable postgresql
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create database and user:**
|
||||||
|
```sql
|
||||||
|
CREATE DATABASE llamactl;
|
||||||
|
CREATE USER llamactl WITH PASSWORD 'password';
|
||||||
|
GRANT ALL PRIVILEGES ON DATABASE llamactl TO llamactl;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Web UI Issues
|
||||||
|
|
||||||
|
### Blank Page or Loading Issues
|
||||||
|
|
||||||
|
**Problem:** Web UI doesn't load or shows blank page
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check browser console for errors (F12)
|
||||||
|
2. Verify API connectivity:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/system/status
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Check static file serving:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Clear browser cache**
|
||||||
|
2. **Try different browser**
|
||||||
|
3. **Check for JavaScript errors in console**
|
||||||
|
4. **Verify API endpoint accessibility**
|
||||||
|
|
||||||
|
### Authentication Issues
|
||||||
|
|
||||||
|
**Problem:** Unable to login or authentication failures
|
||||||
|
|
||||||
|
**Diagnostic Steps:**
|
||||||
|
1. Check authentication configuration:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:8080/api/config | jq .auth
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Verify user credentials:
|
||||||
|
```bash
|
||||||
|
# Test login endpoint
|
||||||
|
curl -X POST http://localhost:8080/api/auth/login \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{"username":"admin","password":"password"}'
|
||||||
|
```
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Reset admin password:**
|
||||||
|
```bash
|
||||||
|
llamactl --reset-admin-password
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Disable authentication temporarily:**
|
||||||
|
```yaml
|
||||||
|
auth:
|
||||||
|
enabled: false
|
||||||
|
```
|
||||||
|
|
||||||
|
## Log Analysis
|
||||||
|
|
||||||
|
### Enable Debug Logging
|
||||||
|
|
||||||
|
For detailed troubleshooting, enable debug logging:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
logging:
|
||||||
|
level: "debug"
|
||||||
|
output: "/var/log/llamactl/debug.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Log Patterns
|
||||||
|
|
||||||
|
Look for these patterns in logs:
|
||||||
|
|
||||||
|
**Startup issues:**
|
||||||
|
```
|
||||||
|
ERRO Failed to start server
|
||||||
|
ERRO Database connection failed
|
||||||
|
ERRO Port binding failed
|
||||||
|
```
|
||||||
|
|
||||||
|
**Instance issues:**
|
||||||
|
```
|
||||||
|
ERRO Failed to start instance
|
||||||
|
ERRO Model loading failed
|
||||||
|
ERRO Process crashed
|
||||||
|
```
|
||||||
|
|
||||||
|
**Performance issues:**
|
||||||
|
```
|
||||||
|
WARN High memory usage detected
|
||||||
|
WARN Request timeout
|
||||||
|
WARN Resource limit exceeded
|
||||||
|
```
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
### Collecting Information
|
||||||
|
|
||||||
|
When seeking help, provide:
|
||||||
|
|
||||||
|
1. **System information:**
|
||||||
|
```bash
|
||||||
|
uname -a
|
||||||
|
llamactl --version
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Configuration:**
|
||||||
|
```bash
|
||||||
|
llamactl --config-dump
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Logs:**
|
||||||
|
```bash
|
||||||
|
tail -100 /var/log/llamactl/app.log
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Error details:**
|
||||||
|
- Exact error messages
|
||||||
|
- Steps to reproduce
|
||||||
|
- Environment details
|
||||||
|
|
||||||
|
### Support Channels
|
||||||
|
|
||||||
|
- **GitHub Issues:** Report bugs and feature requests
|
||||||
|
- **Documentation:** Check this documentation first
|
||||||
|
- **Community:** Join discussions in GitHub Discussions
|
||||||
|
|
||||||
|
## Preventive Measures
|
||||||
|
|
||||||
|
### Health Monitoring
|
||||||
|
|
||||||
|
Set up monitoring to catch issues early:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Regular health checks
|
||||||
|
*/5 * * * * curl -f http://localhost:8080/api/health || alert
|
||||||
|
```
|
||||||
|
|
||||||
|
### Resource Monitoring
|
||||||
|
|
||||||
|
Monitor system resources:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Disk space monitoring
|
||||||
|
df -h /var/log/llamactl/
|
||||||
|
df -h /path/to/models/
|
||||||
|
|
||||||
|
# Memory monitoring
|
||||||
|
free -h
|
||||||
|
```
|
||||||
|
|
||||||
|
### Backup Configuration
|
||||||
|
|
||||||
|
Regular configuration backups:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backup configuration
|
||||||
|
cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
|
||||||
|
|
||||||
|
# Backup instance configurations
|
||||||
|
curl http://localhost:8080/api/instances > instances-backup.json
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Set up [Monitoring](monitoring.md) to prevent issues
|
||||||
|
- Learn about [Advanced Configuration](backends.md)
|
||||||
|
- Review [Best Practices](../development/contributing.md)
|
||||||
464
docs/development/building.md
Normal file
464
docs/development/building.md
Normal file
@@ -0,0 +1,464 @@
|
|||||||
|
# Building from Source
|
||||||
|
|
||||||
|
This guide covers building LlamaCtl from source code for development and production deployment.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
### Required Tools
|
||||||
|
|
||||||
|
- **Go 1.24+**: Download from [golang.org](https://golang.org/dl/)
|
||||||
|
- **Node.js 22+**: Download from [nodejs.org](https://nodejs.org/)
|
||||||
|
- **Git**: For cloning the repository
|
||||||
|
- **Make**: For build automation (optional)
|
||||||
|
|
||||||
|
### System Requirements
|
||||||
|
|
||||||
|
- **Memory**: 4GB+ RAM for building
|
||||||
|
- **Disk**: 2GB+ free space
|
||||||
|
- **OS**: Linux, macOS, or Windows
|
||||||
|
|
||||||
|
## Quick Build
|
||||||
|
|
||||||
|
### Clone and Build
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone https://github.com/lordmathis/llamactl.git
|
||||||
|
cd llamactl
|
||||||
|
|
||||||
|
# Build the application
|
||||||
|
go build -o llamactl cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
### Run
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development Build
|
||||||
|
|
||||||
|
### Setup Development Environment
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone repository
|
||||||
|
git clone https://github.com/lordmathis/llamactl.git
|
||||||
|
cd llamactl
|
||||||
|
|
||||||
|
# Install Go dependencies
|
||||||
|
go mod download
|
||||||
|
|
||||||
|
# Install frontend dependencies
|
||||||
|
cd webui
|
||||||
|
npm ci
|
||||||
|
cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
### Build Components
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build backend only
|
||||||
|
go build -o llamactl cmd/server/main.go
|
||||||
|
|
||||||
|
# Build frontend only
|
||||||
|
cd webui
|
||||||
|
npm run build
|
||||||
|
cd ..
|
||||||
|
|
||||||
|
# Build everything
|
||||||
|
make build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Development Server
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run backend in development mode
|
||||||
|
go run cmd/server/main.go --dev
|
||||||
|
|
||||||
|
# Run frontend dev server (separate terminal)
|
||||||
|
cd webui
|
||||||
|
npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## Production Build
|
||||||
|
|
||||||
|
### Optimized Build
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build with optimizations
|
||||||
|
go build -ldflags="-s -w" -o llamactl cmd/server/main.go
|
||||||
|
|
||||||
|
# Or use the Makefile
|
||||||
|
make build-prod
|
||||||
|
```
|
||||||
|
|
||||||
|
### Build Flags
|
||||||
|
|
||||||
|
Common build flags for production:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go build \
|
||||||
|
-ldflags="-s -w -X main.version=1.0.0 -X main.buildTime=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
|
||||||
|
-trimpath \
|
||||||
|
-o llamactl \
|
||||||
|
cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
**Flag explanations:**
|
||||||
|
- `-s`: Strip symbol table
|
||||||
|
- `-w`: Strip debug information
|
||||||
|
- `-X`: Set variable values at build time
|
||||||
|
- `-trimpath`: Remove absolute paths from binary
|
||||||
|
|
||||||
|
## Cross-Platform Building
|
||||||
|
|
||||||
|
### Build for Multiple Platforms
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Linux AMD64
|
||||||
|
GOOS=linux GOARCH=amd64 go build -o llamactl-linux-amd64 cmd/server/main.go
|
||||||
|
|
||||||
|
# Linux ARM64
|
||||||
|
GOOS=linux GOARCH=arm64 go build -o llamactl-linux-arm64 cmd/server/main.go
|
||||||
|
|
||||||
|
# macOS AMD64
|
||||||
|
GOOS=darwin GOARCH=amd64 go build -o llamactl-darwin-amd64 cmd/server/main.go
|
||||||
|
|
||||||
|
# macOS ARM64 (Apple Silicon)
|
||||||
|
GOOS=darwin GOARCH=arm64 go build -o llamactl-darwin-arm64 cmd/server/main.go
|
||||||
|
|
||||||
|
# Windows AMD64
|
||||||
|
GOOS=windows GOARCH=amd64 go build -o llamactl-windows-amd64.exe cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
### Automated Cross-Building
|
||||||
|
|
||||||
|
Use the provided Makefile:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build all platforms
|
||||||
|
make build-all
|
||||||
|
|
||||||
|
# Build specific platform
|
||||||
|
make build-linux
|
||||||
|
make build-darwin
|
||||||
|
make build-windows
|
||||||
|
```
|
||||||
|
|
||||||
|
## Build with Docker
|
||||||
|
|
||||||
|
### Development Container
|
||||||
|
|
||||||
|
```dockerfile
|
||||||
|
# Dockerfile.dev
|
||||||
|
FROM golang:1.24-alpine AS builder
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
COPY go.mod go.sum ./
|
||||||
|
RUN go mod download
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
RUN go build -o llamactl cmd/server/main.go
|
||||||
|
|
||||||
|
FROM alpine:latest
|
||||||
|
RUN apk --no-cache add ca-certificates
|
||||||
|
WORKDIR /root/
|
||||||
|
COPY --from=builder /app/llamactl .
|
||||||
|
|
||||||
|
EXPOSE 8080
|
||||||
|
CMD ["./llamactl"]
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build development image
|
||||||
|
docker build -f Dockerfile.dev -t llamactl:dev .
|
||||||
|
|
||||||
|
# Run container
|
||||||
|
docker run -p 8080:8080 llamactl:dev
|
||||||
|
```
|
||||||
|
|
||||||
|
### Production Container
|
||||||
|
|
||||||
|
```dockerfile
|
||||||
|
# Dockerfile
|
||||||
|
FROM node:22-alpine AS frontend-builder
|
||||||
|
|
||||||
|
WORKDIR /app/webui
|
||||||
|
COPY webui/package*.json ./
|
||||||
|
RUN npm ci
|
||||||
|
|
||||||
|
COPY webui/ ./
|
||||||
|
RUN npm run build
|
||||||
|
|
||||||
|
FROM golang:1.24-alpine AS backend-builder
|
||||||
|
|
||||||
|
WORKDIR /app
|
||||||
|
COPY go.mod go.sum ./
|
||||||
|
RUN go mod download
|
||||||
|
|
||||||
|
COPY . .
|
||||||
|
COPY --from=frontend-builder /app/webui/dist ./webui/dist
|
||||||
|
|
||||||
|
RUN CGO_ENABLED=0 GOOS=linux go build \
|
||||||
|
-ldflags="-s -w" \
|
||||||
|
-o llamactl \
|
||||||
|
cmd/server/main.go
|
||||||
|
|
||||||
|
FROM alpine:latest
|
||||||
|
|
||||||
|
RUN apk --no-cache add ca-certificates tzdata
|
||||||
|
RUN adduser -D -s /bin/sh llamactl
|
||||||
|
|
||||||
|
WORKDIR /home/llamactl
|
||||||
|
COPY --from=backend-builder /app/llamactl .
|
||||||
|
RUN chown llamactl:llamactl llamactl
|
||||||
|
|
||||||
|
USER llamactl
|
||||||
|
EXPOSE 8080
|
||||||
|
|
||||||
|
CMD ["./llamactl"]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Advanced Build Options
|
||||||
|
|
||||||
|
### Static Linking
|
||||||
|
|
||||||
|
For deployments without external dependencies:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
CGO_ENABLED=0 go build \
|
||||||
|
-ldflags="-s -w -extldflags '-static'" \
|
||||||
|
-o llamactl-static \
|
||||||
|
cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
### Debug Build
|
||||||
|
|
||||||
|
Build with debug information:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go build -gcflags="all=-N -l" -o llamactl-debug cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
### Race Detection Build
|
||||||
|
|
||||||
|
Build with race detection (development only):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
go build -race -o llamactl-race cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
## Build Automation
|
||||||
|
|
||||||
|
### Makefile
|
||||||
|
|
||||||
|
```makefile
|
||||||
|
# Makefile
|
||||||
|
VERSION := $(shell git describe --tags --always --dirty)
|
||||||
|
BUILD_TIME := $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
|
||||||
|
LDFLAGS := -s -w -X main.version=$(VERSION) -X main.buildTime=$(BUILD_TIME)
|
||||||
|
|
||||||
|
.PHONY: build clean test install
|
||||||
|
|
||||||
|
build:
|
||||||
|
@echo "Building LlamaCtl..."
|
||||||
|
@cd webui && npm run build
|
||||||
|
@go build -ldflags="$(LDFLAGS)" -o llamactl cmd/server/main.go
|
||||||
|
|
||||||
|
build-prod:
|
||||||
|
@echo "Building production binary..."
|
||||||
|
@cd webui && npm run build
|
||||||
|
@CGO_ENABLED=0 go build -ldflags="$(LDFLAGS)" -trimpath -o llamactl cmd/server/main.go
|
||||||
|
|
||||||
|
build-all: build-linux build-darwin build-windows
|
||||||
|
|
||||||
|
build-linux:
|
||||||
|
@GOOS=linux GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-linux-amd64 cmd/server/main.go
|
||||||
|
@GOOS=linux GOARCH=arm64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-linux-arm64 cmd/server/main.go
|
||||||
|
|
||||||
|
build-darwin:
|
||||||
|
@GOOS=darwin GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-darwin-amd64 cmd/server/main.go
|
||||||
|
@GOOS=darwin GOARCH=arm64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-darwin-arm64 cmd/server/main.go
|
||||||
|
|
||||||
|
build-windows:
|
||||||
|
@GOOS=windows GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-windows-amd64.exe cmd/server/main.go
|
||||||
|
|
||||||
|
test:
|
||||||
|
@go test ./...
|
||||||
|
|
||||||
|
clean:
|
||||||
|
@rm -f llamactl llamactl-*
|
||||||
|
@rm -rf dist/
|
||||||
|
|
||||||
|
install: build
|
||||||
|
@cp llamactl $(GOPATH)/bin/llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
### GitHub Actions
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# .github/workflows/build.yml
|
||||||
|
name: Build
|
||||||
|
|
||||||
|
on:
|
||||||
|
push:
|
||||||
|
branches: [ main ]
|
||||||
|
pull_request:
|
||||||
|
branches: [ main ]
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
test:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Go
|
||||||
|
uses: actions/setup-go@v4
|
||||||
|
with:
|
||||||
|
go-version: '1.24'
|
||||||
|
|
||||||
|
- name: Set up Node.js
|
||||||
|
uses: actions/setup-node@v4
|
||||||
|
with:
|
||||||
|
node-version: '22'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: |
|
||||||
|
go mod download
|
||||||
|
cd webui && npm ci
|
||||||
|
|
||||||
|
- name: Run tests
|
||||||
|
run: |
|
||||||
|
go test ./...
|
||||||
|
cd webui && npm test
|
||||||
|
|
||||||
|
- name: Build
|
||||||
|
run: make build
|
||||||
|
|
||||||
|
build:
|
||||||
|
needs: test
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
if: github.ref == 'refs/heads/main'
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- uses: actions/checkout@v4
|
||||||
|
|
||||||
|
- name: Set up Go
|
||||||
|
uses: actions/setup-go@v4
|
||||||
|
with:
|
||||||
|
go-version: '1.24'
|
||||||
|
|
||||||
|
- name: Set up Node.js
|
||||||
|
uses: actions/setup-node@v4
|
||||||
|
with:
|
||||||
|
node-version: '22'
|
||||||
|
|
||||||
|
- name: Build all platforms
|
||||||
|
run: make build-all
|
||||||
|
|
||||||
|
- name: Upload artifacts
|
||||||
|
uses: actions/upload-artifact@v4
|
||||||
|
with:
|
||||||
|
name: binaries
|
||||||
|
path: dist/
|
||||||
|
```
|
||||||
|
|
||||||
|
## Build Troubleshooting
|
||||||
|
|
||||||
|
### Common Issues
|
||||||
|
|
||||||
|
**Go version mismatch:**
|
||||||
|
```bash
|
||||||
|
# Check Go version
|
||||||
|
go version
|
||||||
|
|
||||||
|
# Update Go
|
||||||
|
# Download from https://golang.org/dl/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Node.js issues:**
|
||||||
|
```bash
|
||||||
|
# Clear npm cache
|
||||||
|
npm cache clean --force
|
||||||
|
|
||||||
|
# Remove node_modules and reinstall
|
||||||
|
rm -rf webui/node_modules
|
||||||
|
cd webui && npm ci
|
||||||
|
```
|
||||||
|
|
||||||
|
**Build failures:**
|
||||||
|
```bash
|
||||||
|
# Clean and rebuild
|
||||||
|
make clean
|
||||||
|
go mod tidy
|
||||||
|
make build
|
||||||
|
```
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
**Slow builds:**
|
||||||
|
```bash
|
||||||
|
# Use build cache
|
||||||
|
export GOCACHE=$(go env GOCACHE)
|
||||||
|
|
||||||
|
# Parallel builds
|
||||||
|
export GOMAXPROCS=$(nproc)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Large binary size:**
|
||||||
|
```bash
|
||||||
|
# Use UPX compression
|
||||||
|
upx --best llamactl
|
||||||
|
|
||||||
|
# Analyze binary size
|
||||||
|
go tool nm -size llamactl | head -20
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deployment
|
||||||
|
|
||||||
|
### System Service
|
||||||
|
|
||||||
|
Create a systemd service:
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# /etc/systemd/system/llamactl.service
|
||||||
|
[Unit]
|
||||||
|
Description=LlamaCtl Server
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=llamactl
|
||||||
|
Group=llamactl
|
||||||
|
ExecStart=/usr/local/bin/llamactl
|
||||||
|
Restart=always
|
||||||
|
RestartSec=5
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Enable and start service
|
||||||
|
sudo systemctl enable llamactl
|
||||||
|
sudo systemctl start llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create configuration directory
|
||||||
|
sudo mkdir -p /etc/llamactl
|
||||||
|
|
||||||
|
# Copy configuration
|
||||||
|
sudo cp config.yaml /etc/llamactl/
|
||||||
|
|
||||||
|
# Set permissions
|
||||||
|
sudo chown -R llamactl:llamactl /etc/llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Configure [Installation](../getting-started/installation.md)
|
||||||
|
- Set up [Configuration](../getting-started/configuration.md)
|
||||||
|
- Learn about [Contributing](contributing.md)
|
||||||
373
docs/development/contributing.md
Normal file
373
docs/development/contributing.md
Normal file
@@ -0,0 +1,373 @@
|
|||||||
|
# Contributing
|
||||||
|
|
||||||
|
Thank you for your interest in contributing to LlamaCtl! This guide will help you get started with development and contribution.
|
||||||
|
|
||||||
|
## Development Setup
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Go 1.24 or later
|
||||||
|
- Node.js 22 or later
|
||||||
|
- `llama-server` executable (from [llama.cpp](https://github.com/ggml-org/llama.cpp))
|
||||||
|
- Git
|
||||||
|
|
||||||
|
### Getting Started
|
||||||
|
|
||||||
|
1. **Fork and Clone**
|
||||||
|
```bash
|
||||||
|
# Fork the repository on GitHub, then clone your fork
|
||||||
|
git clone https://github.com/yourusername/llamactl.git
|
||||||
|
cd llamactl
|
||||||
|
|
||||||
|
# Add upstream remote
|
||||||
|
git remote add upstream https://github.com/lordmathis/llamactl.git
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Install Dependencies**
|
||||||
|
```bash
|
||||||
|
# Go dependencies
|
||||||
|
go mod download
|
||||||
|
|
||||||
|
# Frontend dependencies
|
||||||
|
cd webui && npm ci && cd ..
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Run Development Environment**
|
||||||
|
```bash
|
||||||
|
# Start backend server
|
||||||
|
go run ./cmd/server
|
||||||
|
```
|
||||||
|
|
||||||
|
In a separate terminal:
|
||||||
|
```bash
|
||||||
|
# Start frontend dev server
|
||||||
|
cd webui && npm run dev
|
||||||
|
```
|
||||||
|
|
||||||
|
## Development Workflow
|
||||||
|
|
||||||
|
### Setting Up Your Environment
|
||||||
|
|
||||||
|
1. **Configuration**
|
||||||
|
Create a development configuration file:
|
||||||
|
```yaml
|
||||||
|
# dev-config.yaml
|
||||||
|
server:
|
||||||
|
host: "localhost"
|
||||||
|
port: 8080
|
||||||
|
logging:
|
||||||
|
level: "debug"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Test Data**
|
||||||
|
Set up test models and instances for development.
|
||||||
|
|
||||||
|
### Making Changes
|
||||||
|
|
||||||
|
1. **Create a Branch**
|
||||||
|
```bash
|
||||||
|
git checkout -b feature/your-feature-name
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Development Commands**
|
||||||
|
```bash
|
||||||
|
# Backend
|
||||||
|
go test ./... -v # Run tests
|
||||||
|
go test -race ./... -v # Run with race detector
|
||||||
|
go fmt ./... && go vet ./... # Format and vet code
|
||||||
|
go build ./cmd/server # Build binary
|
||||||
|
|
||||||
|
# Frontend (from webui/ directory)
|
||||||
|
npm run test # Run tests
|
||||||
|
npm run lint # Lint code
|
||||||
|
npm run type-check # TypeScript check
|
||||||
|
npm run build # Build for production
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Code Quality**
|
||||||
|
```bash
|
||||||
|
# Run all checks before committing
|
||||||
|
make lint
|
||||||
|
make test
|
||||||
|
make build
|
||||||
|
```
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
### Backend (Go)
|
||||||
|
|
||||||
|
```
|
||||||
|
cmd/
|
||||||
|
├── server/ # Main application entry point
|
||||||
|
pkg/
|
||||||
|
├── backends/ # Model backend implementations
|
||||||
|
├── config/ # Configuration management
|
||||||
|
├── instance/ # Instance lifecycle management
|
||||||
|
├── manager/ # Instance manager
|
||||||
|
├── server/ # HTTP server and routes
|
||||||
|
├── testutil/ # Test utilities
|
||||||
|
└── validation/ # Input validation
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend (React/TypeScript)
|
||||||
|
|
||||||
|
```
|
||||||
|
webui/src/
|
||||||
|
├── components/ # React components
|
||||||
|
├── contexts/ # React contexts
|
||||||
|
├── hooks/ # Custom hooks
|
||||||
|
├── lib/ # Utility libraries
|
||||||
|
├── schemas/ # Zod schemas
|
||||||
|
└── types/ # TypeScript types
|
||||||
|
```
|
||||||
|
|
||||||
|
## Coding Standards
|
||||||
|
|
||||||
|
### Go Code
|
||||||
|
|
||||||
|
- Follow standard Go formatting (`gofmt`)
|
||||||
|
- Use `go vet` and address all warnings
|
||||||
|
- Write comprehensive tests for new functionality
|
||||||
|
- Include documentation comments for exported functions
|
||||||
|
- Use meaningful variable and function names
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```go
|
||||||
|
// CreateInstance creates a new model instance with the given configuration.
|
||||||
|
// It validates the configuration and ensures the instance name is unique.
|
||||||
|
func (m *Manager) CreateInstance(ctx context.Context, config InstanceConfig) (*Instance, error) {
|
||||||
|
if err := config.Validate(); err != nil {
|
||||||
|
return nil, fmt.Errorf("invalid configuration: %w", err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Implementation...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### TypeScript/React Code
|
||||||
|
|
||||||
|
- Use TypeScript strict mode
|
||||||
|
- Follow React best practices
|
||||||
|
- Use functional components with hooks
|
||||||
|
- Implement proper error boundaries
|
||||||
|
- Write unit tests for components
|
||||||
|
|
||||||
|
Example:
|
||||||
|
```typescript
|
||||||
|
interface InstanceCardProps {
|
||||||
|
instance: Instance;
|
||||||
|
onStart: (name: string) => Promise<void>;
|
||||||
|
onStop: (name: string) => Promise<void>;
|
||||||
|
}
|
||||||
|
|
||||||
|
export const InstanceCard: React.FC<InstanceCardProps> = ({
|
||||||
|
instance,
|
||||||
|
onStart,
|
||||||
|
onStop,
|
||||||
|
}) => {
|
||||||
|
// Implementation...
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Backend Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
go test ./...
|
||||||
|
|
||||||
|
# Run tests with coverage
|
||||||
|
go test ./... -coverprofile=coverage.out
|
||||||
|
go tool cover -html=coverage.out
|
||||||
|
|
||||||
|
# Run specific package tests
|
||||||
|
go test ./pkg/manager -v
|
||||||
|
|
||||||
|
# Run with race detection
|
||||||
|
go test -race ./...
|
||||||
|
```
|
||||||
|
|
||||||
|
### Frontend Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd webui
|
||||||
|
|
||||||
|
# Run unit tests
|
||||||
|
npm run test
|
||||||
|
|
||||||
|
# Run tests with coverage
|
||||||
|
npm run test:coverage
|
||||||
|
|
||||||
|
# Run E2E tests
|
||||||
|
npm run test:e2e
|
||||||
|
```
|
||||||
|
|
||||||
|
### Integration Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run integration tests (requires llama-server)
|
||||||
|
go test ./... -tags=integration
|
||||||
|
```
|
||||||
|
|
||||||
|
## Pull Request Process
|
||||||
|
|
||||||
|
### Before Submitting
|
||||||
|
|
||||||
|
1. **Update your branch**
|
||||||
|
```bash
|
||||||
|
git fetch upstream
|
||||||
|
git rebase upstream/main
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Run all tests**
|
||||||
|
```bash
|
||||||
|
make test-all
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Update documentation** if needed
|
||||||
|
|
||||||
|
4. **Write clear commit messages**
|
||||||
|
```
|
||||||
|
feat: add instance health monitoring
|
||||||
|
|
||||||
|
- Implement health check endpoint
|
||||||
|
- Add periodic health monitoring
|
||||||
|
- Update API documentation
|
||||||
|
|
||||||
|
Fixes #123
|
||||||
|
```
|
||||||
|
|
||||||
|
### Submitting a PR
|
||||||
|
|
||||||
|
1. **Push your branch**
|
||||||
|
```bash
|
||||||
|
git push origin feature/your-feature-name
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create Pull Request**
|
||||||
|
- Use the PR template
|
||||||
|
- Provide clear description
|
||||||
|
- Link related issues
|
||||||
|
- Add screenshots for UI changes
|
||||||
|
|
||||||
|
3. **PR Review Process**
|
||||||
|
- Automated checks must pass
|
||||||
|
- Code review by maintainers
|
||||||
|
- Address feedback promptly
|
||||||
|
- Keep PR scope focused
|
||||||
|
|
||||||
|
## Issue Guidelines
|
||||||
|
|
||||||
|
### Reporting Bugs
|
||||||
|
|
||||||
|
Use the bug report template and include:
|
||||||
|
|
||||||
|
- Steps to reproduce
|
||||||
|
- Expected vs actual behavior
|
||||||
|
- Environment details (OS, Go version, etc.)
|
||||||
|
- Relevant logs or error messages
|
||||||
|
- Minimal reproduction case
|
||||||
|
|
||||||
|
### Feature Requests
|
||||||
|
|
||||||
|
Use the feature request template and include:
|
||||||
|
|
||||||
|
- Clear description of the problem
|
||||||
|
- Proposed solution
|
||||||
|
- Alternative solutions considered
|
||||||
|
- Implementation complexity estimate
|
||||||
|
|
||||||
|
### Security Issues
|
||||||
|
|
||||||
|
For security vulnerabilities:
|
||||||
|
- Do NOT create public issues
|
||||||
|
- Email security@llamactl.dev
|
||||||
|
- Provide detailed description
|
||||||
|
- Allow time for fix before disclosure
|
||||||
|
|
||||||
|
## Development Best Practices
|
||||||
|
|
||||||
|
### API Design
|
||||||
|
|
||||||
|
- Follow REST principles
|
||||||
|
- Use consistent naming conventions
|
||||||
|
- Provide comprehensive error messages
|
||||||
|
- Include proper HTTP status codes
|
||||||
|
- Document all endpoints
|
||||||
|
|
||||||
|
### Error Handling
|
||||||
|
|
||||||
|
```go
|
||||||
|
// Wrap errors with context
|
||||||
|
if err := instance.Start(); err != nil {
|
||||||
|
return fmt.Errorf("failed to start instance %s: %w", instance.Name, err)
|
||||||
|
}
|
||||||
|
|
||||||
|
// Use structured logging
|
||||||
|
log.WithFields(log.Fields{
|
||||||
|
"instance": instance.Name,
|
||||||
|
"error": err,
|
||||||
|
}).Error("Failed to start instance")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
- Use environment variables for deployment
|
||||||
|
- Provide sensible defaults
|
||||||
|
- Validate configuration on startup
|
||||||
|
- Support configuration file reloading
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
|
||||||
|
- Profile code for bottlenecks
|
||||||
|
- Use efficient data structures
|
||||||
|
- Implement proper caching
|
||||||
|
- Monitor resource usage
|
||||||
|
|
||||||
|
## Release Process
|
||||||
|
|
||||||
|
### Version Management
|
||||||
|
|
||||||
|
- Use semantic versioning (SemVer)
|
||||||
|
- Tag releases properly
|
||||||
|
- Maintain CHANGELOG.md
|
||||||
|
- Create release notes
|
||||||
|
|
||||||
|
### Building Releases
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Build all platforms
|
||||||
|
make build-all
|
||||||
|
|
||||||
|
# Create release package
|
||||||
|
make package
|
||||||
|
```
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
### Communication Channels
|
||||||
|
|
||||||
|
- **GitHub Issues**: Bug reports and feature requests
|
||||||
|
- **GitHub Discussions**: General questions and ideas
|
||||||
|
- **Code Review**: PR comments and feedback
|
||||||
|
|
||||||
|
### Development Questions
|
||||||
|
|
||||||
|
When asking for help:
|
||||||
|
|
||||||
|
1. Check existing documentation
|
||||||
|
2. Search previous issues
|
||||||
|
3. Provide minimal reproduction case
|
||||||
|
4. Include relevant environment details
|
||||||
|
|
||||||
|
## Recognition
|
||||||
|
|
||||||
|
Contributors are recognized in:
|
||||||
|
|
||||||
|
- CONTRIBUTORS.md file
|
||||||
|
- Release notes
|
||||||
|
- Documentation credits
|
||||||
|
- Annual contributor highlights
|
||||||
|
|
||||||
|
Thank you for contributing to LlamaCtl!
|
||||||
154
docs/getting-started/configuration.md
Normal file
154
docs/getting-started/configuration.md
Normal file
@@ -0,0 +1,154 @@
|
|||||||
|
# Configuration
|
||||||
|
|
||||||
|
LlamaCtl can be configured through various methods to suit your needs.
|
||||||
|
|
||||||
|
## Configuration File
|
||||||
|
|
||||||
|
Create a configuration file at `~/.llamactl/config.yaml`:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Server configuration
|
||||||
|
server:
|
||||||
|
host: "0.0.0.0"
|
||||||
|
port: 8080
|
||||||
|
cors_enabled: true
|
||||||
|
|
||||||
|
# Authentication (optional)
|
||||||
|
auth:
|
||||||
|
enabled: false
|
||||||
|
# When enabled, configure your authentication method
|
||||||
|
# jwt_secret: "your-secret-key"
|
||||||
|
|
||||||
|
# Default instance settings
|
||||||
|
defaults:
|
||||||
|
backend: "llamacpp"
|
||||||
|
timeout: 300
|
||||||
|
log_level: "info"
|
||||||
|
|
||||||
|
# Paths
|
||||||
|
paths:
|
||||||
|
models_dir: "/path/to/your/models"
|
||||||
|
logs_dir: "/var/log/llamactl"
|
||||||
|
data_dir: "/var/lib/llamactl"
|
||||||
|
|
||||||
|
# Instance limits
|
||||||
|
limits:
|
||||||
|
max_instances: 10
|
||||||
|
max_memory_per_instance: "8GB"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Environment Variables
|
||||||
|
|
||||||
|
You can also configure LlamaCtl using environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Server settings
|
||||||
|
export LLAMACTL_HOST=0.0.0.0
|
||||||
|
export LLAMACTL_PORT=8080
|
||||||
|
|
||||||
|
# Paths
|
||||||
|
export LLAMACTL_MODELS_DIR=/path/to/models
|
||||||
|
export LLAMACTL_LOGS_DIR=/var/log/llamactl
|
||||||
|
|
||||||
|
# Limits
|
||||||
|
export LLAMACTL_MAX_INSTANCES=5
|
||||||
|
```
|
||||||
|
|
||||||
|
## Command Line Options
|
||||||
|
|
||||||
|
View all available command line options:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
llamactl --help
|
||||||
|
```
|
||||||
|
|
||||||
|
Common options:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Specify config file
|
||||||
|
llamactl --config /path/to/config.yaml
|
||||||
|
|
||||||
|
# Set log level
|
||||||
|
llamactl --log-level debug
|
||||||
|
|
||||||
|
# Run on different port
|
||||||
|
llamactl --port 9090
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instance Configuration
|
||||||
|
|
||||||
|
When creating instances, you can specify various options:
|
||||||
|
|
||||||
|
### Basic Options
|
||||||
|
|
||||||
|
- `name`: Unique identifier for the instance
|
||||||
|
- `model_path`: Path to the GGUF model file
|
||||||
|
- `port`: Port for the instance to listen on
|
||||||
|
|
||||||
|
### Advanced Options
|
||||||
|
|
||||||
|
- `threads`: Number of CPU threads to use
|
||||||
|
- `context_size`: Context window size
|
||||||
|
- `batch_size`: Batch size for processing
|
||||||
|
- `gpu_layers`: Number of layers to offload to GPU
|
||||||
|
- `memory_lock`: Lock model in memory
|
||||||
|
- `no_mmap`: Disable memory mapping
|
||||||
|
|
||||||
|
### Example Instance Configuration
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "production-model",
|
||||||
|
"model_path": "/models/llama-2-13b-chat.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"options": {
|
||||||
|
"threads": 8,
|
||||||
|
"context_size": 4096,
|
||||||
|
"batch_size": 512,
|
||||||
|
"gpu_layers": 35,
|
||||||
|
"memory_lock": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Security Configuration
|
||||||
|
|
||||||
|
### Enable Authentication
|
||||||
|
|
||||||
|
To enable authentication, update your config file:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
auth:
|
||||||
|
enabled: true
|
||||||
|
jwt_secret: "your-very-secure-secret-key"
|
||||||
|
token_expiry: "24h"
|
||||||
|
```
|
||||||
|
|
||||||
|
### HTTPS Configuration
|
||||||
|
|
||||||
|
For production deployments, configure HTTPS:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
tls:
|
||||||
|
enabled: true
|
||||||
|
cert_file: "/path/to/cert.pem"
|
||||||
|
key_file: "/path/to/key.pem"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Logging Configuration
|
||||||
|
|
||||||
|
Configure logging levels and outputs:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
logging:
|
||||||
|
level: "info" # debug, info, warn, error
|
||||||
|
format: "json" # json or text
|
||||||
|
output: "/var/log/llamactl/app.log"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn about [Managing Instances](../user-guide/managing-instances.md)
|
||||||
|
- Explore [Advanced Configuration](../advanced/monitoring.md)
|
||||||
|
- Set up [Monitoring](../advanced/monitoring.md)
|
||||||
55
docs/getting-started/installation.md
Normal file
55
docs/getting-started/installation.md
Normal file
@@ -0,0 +1,55 @@
|
|||||||
|
# Installation
|
||||||
|
|
||||||
|
This guide will walk you through installing LlamaCtl on your system.
|
||||||
|
|
||||||
|
## Prerequisites
|
||||||
|
|
||||||
|
Before installing LlamaCtl, ensure you have:
|
||||||
|
|
||||||
|
- Go 1.19 or later
|
||||||
|
- Git
|
||||||
|
- Sufficient disk space for your models
|
||||||
|
|
||||||
|
## Installation Methods
|
||||||
|
|
||||||
|
### Option 1: Download Binary (Recommended)
|
||||||
|
|
||||||
|
Download the latest release from our [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download for Linux
|
||||||
|
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-linux-amd64 -o llamactl
|
||||||
|
|
||||||
|
# Make executable
|
||||||
|
chmod +x llamactl
|
||||||
|
|
||||||
|
# Move to PATH (optional)
|
||||||
|
sudo mv llamactl /usr/local/bin/
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: Build from Source
|
||||||
|
|
||||||
|
If you prefer to build from source:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone the repository
|
||||||
|
git clone https://github.com/lordmathis/llamactl.git
|
||||||
|
cd llamactl
|
||||||
|
|
||||||
|
# Build the application
|
||||||
|
go build -o llamactl cmd/server/main.go
|
||||||
|
```
|
||||||
|
|
||||||
|
For detailed build instructions, see the [Building from Source](../development/building.md) guide.
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
Verify your installation by checking the version:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
llamactl --version
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
Now that LlamaCtl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!
|
||||||
86
docs/getting-started/quick-start.md
Normal file
86
docs/getting-started/quick-start.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# Quick Start
|
||||||
|
|
||||||
|
This guide will help you get LlamaCtl up and running in just a few minutes.
|
||||||
|
|
||||||
|
## Step 1: Start LlamaCtl
|
||||||
|
|
||||||
|
Start the LlamaCtl server:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
llamactl
|
||||||
|
```
|
||||||
|
|
||||||
|
By default, LlamaCtl will start on `http://localhost:8080`.
|
||||||
|
|
||||||
|
## Step 2: Access the Web UI
|
||||||
|
|
||||||
|
Open your web browser and navigate to:
|
||||||
|
|
||||||
|
```
|
||||||
|
http://localhost:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see the LlamaCtl web interface.
|
||||||
|
|
||||||
|
## Step 3: Create Your First Instance
|
||||||
|
|
||||||
|
1. Click the "Add Instance" button
|
||||||
|
2. Fill in the instance configuration:
|
||||||
|
- **Name**: Give your instance a descriptive name
|
||||||
|
- **Model Path**: Path to your Llama.cpp model file
|
||||||
|
- **Port**: Port for the instance to run on
|
||||||
|
- **Additional Options**: Any extra Llama.cpp parameters
|
||||||
|
|
||||||
|
3. Click "Create Instance"
|
||||||
|
|
||||||
|
## Step 4: Start Your Instance
|
||||||
|
|
||||||
|
Once created, you can:
|
||||||
|
|
||||||
|
- **Start** the instance by clicking the start button
|
||||||
|
- **Monitor** its status in real-time
|
||||||
|
- **View logs** by clicking the logs button
|
||||||
|
- **Stop** the instance when needed
|
||||||
|
|
||||||
|
## Example Configuration
|
||||||
|
|
||||||
|
Here's a basic example configuration for a Llama 2 model:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "llama2-7b",
|
||||||
|
"model_path": "/path/to/llama-2-7b-chat.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"options": {
|
||||||
|
"threads": 4,
|
||||||
|
"context_size": 2048
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using the API
|
||||||
|
|
||||||
|
You can also manage instances via the REST API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all instances
|
||||||
|
curl http://localhost:8080/api/instances
|
||||||
|
|
||||||
|
# Create a new instance
|
||||||
|
curl -X POST http://localhost:8080/api/instances \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"name": "my-model",
|
||||||
|
"model_path": "/path/to/model.gguf",
|
||||||
|
"port": 8081
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Start an instance
|
||||||
|
curl -X POST http://localhost:8080/api/instances/my-model/start
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn more about the [Web UI](../user-guide/web-ui.md)
|
||||||
|
- Explore the [API Reference](../user-guide/api-reference.md)
|
||||||
|
- Configure advanced settings in the [Configuration](configuration.md) guide
|
||||||
41
docs/index.md
Normal file
41
docs/index.md
Normal file
@@ -0,0 +1,41 @@
|
|||||||
|
# LlamaCtl Documentation
|
||||||
|
|
||||||
|
Welcome to the LlamaCtl documentation! LlamaCtl is a powerful management tool for Llama.cpp instances that provides both a web interface and REST API for managing large language models.
|
||||||
|
|
||||||
|
## What is LlamaCtl?
|
||||||
|
|
||||||
|
LlamaCtl is designed to simplify the deployment and management of Llama.cpp instances. It provides:
|
||||||
|
|
||||||
|
- **Instance Management**: Start, stop, and monitor multiple Llama.cpp instances
|
||||||
|
- **Web UI**: User-friendly interface for managing your models
|
||||||
|
- **REST API**: Programmatic access to all functionality
|
||||||
|
- **Health Monitoring**: Real-time status and health checks
|
||||||
|
- **Configuration Management**: Easy setup and configuration options
|
||||||
|
|
||||||
|
## Key Features
|
||||||
|
|
||||||
|
- 🚀 **Easy Setup**: Quick installation and configuration
|
||||||
|
- 🌐 **Web Interface**: Intuitive web UI for model management
|
||||||
|
- 🔧 **REST API**: Full API access for automation
|
||||||
|
- 📊 **Monitoring**: Real-time health and status monitoring
|
||||||
|
- 🔒 **Security**: Authentication and access control
|
||||||
|
- 📱 **Responsive**: Works on desktop and mobile devices
|
||||||
|
|
||||||
|
## Quick Links
|
||||||
|
|
||||||
|
- [Installation Guide](getting-started/installation.md) - Get LlamaCtl up and running
|
||||||
|
- [Quick Start](getting-started/quick-start.md) - Your first steps with LlamaCtl
|
||||||
|
- [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
|
||||||
|
- [API Reference](user-guide/api-reference.md) - Complete API documentation
|
||||||
|
|
||||||
|
## Getting Help
|
||||||
|
|
||||||
|
If you need help or have questions:
|
||||||
|
|
||||||
|
- Check the [Troubleshooting](advanced/troubleshooting.md) guide
|
||||||
|
- Visit our [GitHub repository](https://github.com/lordmathis/llamactl)
|
||||||
|
- Read the [Contributing guide](development/contributing.md) to help improve LlamaCtl
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
Ready to get started? Head over to the [Installation Guide](getting-started/installation.md)!
|
||||||
470
docs/user-guide/api-reference.md
Normal file
470
docs/user-guide/api-reference.md
Normal file
@@ -0,0 +1,470 @@
|
|||||||
|
# API Reference
|
||||||
|
|
||||||
|
Complete reference for the LlamaCtl REST API.
|
||||||
|
|
||||||
|
## Base URL
|
||||||
|
|
||||||
|
All API endpoints are relative to the base URL:
|
||||||
|
|
||||||
|
```
|
||||||
|
http://localhost:8080/api
|
||||||
|
```
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
If authentication is enabled, include the JWT token in the Authorization header:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -H "Authorization: Bearer <your-jwt-token>" \
|
||||||
|
http://localhost:8080/api/instances
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instances
|
||||||
|
|
||||||
|
### List All Instances
|
||||||
|
|
||||||
|
Get a list of all instances.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/instances
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"instances": [
|
||||||
|
{
|
||||||
|
"name": "llama2-7b",
|
||||||
|
"status": "running",
|
||||||
|
"model_path": "/models/llama-2-7b.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"created_at": "2024-01-15T10:30:00Z",
|
||||||
|
"updated_at": "2024-01-15T12:45:00Z"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Instance Details
|
||||||
|
|
||||||
|
Get detailed information about a specific instance.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/instances/{name}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "llama2-7b",
|
||||||
|
"status": "running",
|
||||||
|
"model_path": "/models/llama-2-7b.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"pid": 12345,
|
||||||
|
"options": {
|
||||||
|
"threads": 4,
|
||||||
|
"context_size": 2048,
|
||||||
|
"gpu_layers": 0
|
||||||
|
},
|
||||||
|
"stats": {
|
||||||
|
"memory_usage": 4294967296,
|
||||||
|
"cpu_usage": 25.5,
|
||||||
|
"uptime": 3600
|
||||||
|
},
|
||||||
|
"created_at": "2024-01-15T10:30:00Z",
|
||||||
|
"updated_at": "2024-01-15T12:45:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Create Instance
|
||||||
|
|
||||||
|
Create a new instance.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"name": "my-instance",
|
||||||
|
"model_path": "/path/to/model.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"options": {
|
||||||
|
"threads": 4,
|
||||||
|
"context_size": 2048,
|
||||||
|
"gpu_layers": 0
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Instance created successfully",
|
||||||
|
"instance": {
|
||||||
|
"name": "my-instance",
|
||||||
|
"status": "stopped",
|
||||||
|
"model_path": "/path/to/model.gguf",
|
||||||
|
"port": 8081,
|
||||||
|
"created_at": "2024-01-15T14:30:00Z"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update Instance
|
||||||
|
|
||||||
|
Update an existing instance configuration.
|
||||||
|
|
||||||
|
```http
|
||||||
|
PUT /api/instances/{name}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"options": {
|
||||||
|
"threads": 8,
|
||||||
|
"context_size": 4096
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Delete Instance
|
||||||
|
|
||||||
|
Delete an instance (must be stopped first).
|
||||||
|
|
||||||
|
```http
|
||||||
|
DELETE /api/instances/{name}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Instance deleted successfully"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instance Operations
|
||||||
|
|
||||||
|
### Start Instance
|
||||||
|
|
||||||
|
Start a stopped instance.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances/{name}/start
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Instance start initiated",
|
||||||
|
"status": "starting"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stop Instance
|
||||||
|
|
||||||
|
Stop a running instance.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances/{name}/stop
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body (Optional):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"force": false,
|
||||||
|
"timeout": 30
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"message": "Instance stop initiated",
|
||||||
|
"status": "stopping"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restart Instance
|
||||||
|
|
||||||
|
Restart an instance (stop then start).
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances/{name}/restart
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Instance Health
|
||||||
|
|
||||||
|
Check instance health status.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/instances/{name}/health
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "healthy",
|
||||||
|
"checks": {
|
||||||
|
"process": "running",
|
||||||
|
"port": "open",
|
||||||
|
"response": "ok"
|
||||||
|
},
|
||||||
|
"last_check": "2024-01-15T14:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Instance Logs
|
||||||
|
|
||||||
|
Retrieve instance logs.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/instances/{name}/logs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Query Parameters:**
|
||||||
|
- `lines`: Number of lines to return (default: 100)
|
||||||
|
- `follow`: Stream logs (boolean)
|
||||||
|
- `level`: Filter by log level (debug, info, warn, error)
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"logs": [
|
||||||
|
{
|
||||||
|
"timestamp": "2024-01-15T14:30:00Z",
|
||||||
|
"level": "info",
|
||||||
|
"message": "Model loaded successfully"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Batch Operations
|
||||||
|
|
||||||
|
### Start All Instances
|
||||||
|
|
||||||
|
Start all stopped instances.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances/start-all
|
||||||
|
```
|
||||||
|
|
||||||
|
### Stop All Instances
|
||||||
|
|
||||||
|
Stop all running instances.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/instances/stop-all
|
||||||
|
```
|
||||||
|
|
||||||
|
## System Information
|
||||||
|
|
||||||
|
### Get System Status
|
||||||
|
|
||||||
|
Get overall system status and metrics.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/system/status
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"version": "1.0.0",
|
||||||
|
"uptime": 86400,
|
||||||
|
"instances": {
|
||||||
|
"total": 5,
|
||||||
|
"running": 3,
|
||||||
|
"stopped": 2
|
||||||
|
},
|
||||||
|
"resources": {
|
||||||
|
"cpu_usage": 45.2,
|
||||||
|
"memory_usage": 8589934592,
|
||||||
|
"memory_total": 17179869184,
|
||||||
|
"disk_usage": 75.5
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get System Information
|
||||||
|
|
||||||
|
Get detailed system information.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/system/info
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"hostname": "server-01",
|
||||||
|
"os": "linux",
|
||||||
|
"arch": "amd64",
|
||||||
|
"cpu_count": 8,
|
||||||
|
"memory_total": 17179869184,
|
||||||
|
"version": "1.0.0",
|
||||||
|
"build_time": "2024-01-15T10:00:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Get Configuration
|
||||||
|
|
||||||
|
Get current LlamaCtl configuration.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/config
|
||||||
|
```
|
||||||
|
|
||||||
|
### Update Configuration
|
||||||
|
|
||||||
|
Update LlamaCtl configuration (requires restart).
|
||||||
|
|
||||||
|
```http
|
||||||
|
PUT /api/config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
### Login
|
||||||
|
|
||||||
|
Authenticate and receive a JWT token.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/auth/login
|
||||||
|
```
|
||||||
|
|
||||||
|
**Request Body:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"username": "admin",
|
||||||
|
"password": "password"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
||||||
|
"expires_at": "2024-01-16T14:30:00Z"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Refresh Token
|
||||||
|
|
||||||
|
Refresh an existing JWT token.
|
||||||
|
|
||||||
|
```http
|
||||||
|
POST /api/auth/refresh
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Responses
|
||||||
|
|
||||||
|
All endpoints may return error responses in the following format:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"error": "Error message",
|
||||||
|
"code": "ERROR_CODE",
|
||||||
|
"details": "Additional error details"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Common HTTP Status Codes
|
||||||
|
|
||||||
|
- `200`: Success
|
||||||
|
- `201`: Created
|
||||||
|
- `400`: Bad Request
|
||||||
|
- `401`: Unauthorized
|
||||||
|
- `403`: Forbidden
|
||||||
|
- `404`: Not Found
|
||||||
|
- `409`: Conflict (e.g., instance already exists)
|
||||||
|
- `500`: Internal Server Error
|
||||||
|
|
||||||
|
## WebSocket API
|
||||||
|
|
||||||
|
### Real-time Updates
|
||||||
|
|
||||||
|
Connect to WebSocket for real-time updates:
|
||||||
|
|
||||||
|
```javascript
|
||||||
|
const ws = new WebSocket('ws://localhost:8080/api/ws');
|
||||||
|
|
||||||
|
ws.onmessage = function(event) {
|
||||||
|
const data = JSON.parse(event.data);
|
||||||
|
console.log('Update:', data);
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
**Message Types:**
|
||||||
|
- `instance_status_changed`: Instance status updates
|
||||||
|
- `instance_stats_updated`: Resource usage updates
|
||||||
|
- `system_alert`: System-level alerts
|
||||||
|
|
||||||
|
## Rate Limiting
|
||||||
|
|
||||||
|
API requests are rate limited to:
|
||||||
|
- **100 requests per minute** for regular endpoints
|
||||||
|
- **10 requests per minute** for resource-intensive operations
|
||||||
|
|
||||||
|
Rate limit headers are included in responses:
|
||||||
|
- `X-RateLimit-Limit`: Request limit
|
||||||
|
- `X-RateLimit-Remaining`: Remaining requests
|
||||||
|
- `X-RateLimit-Reset`: Reset time (Unix timestamp)
|
||||||
|
|
||||||
|
## SDKs and Libraries
|
||||||
|
|
||||||
|
### Go Client
|
||||||
|
|
||||||
|
```go
|
||||||
|
import "github.com/lordmathis/llamactl-go-client"
|
||||||
|
|
||||||
|
client := llamactl.NewClient("http://localhost:8080")
|
||||||
|
instances, err := client.ListInstances()
|
||||||
|
```
|
||||||
|
|
||||||
|
### Python Client
|
||||||
|
|
||||||
|
```python
|
||||||
|
from llamactl import Client
|
||||||
|
|
||||||
|
client = Client("http://localhost:8080")
|
||||||
|
instances = client.list_instances()
|
||||||
|
```
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
### Complete Instance Lifecycle
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create instance
|
||||||
|
curl -X POST http://localhost:8080/api/instances \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"name": "example",
|
||||||
|
"model_path": "/models/example.gguf",
|
||||||
|
"port": 8081
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Start instance
|
||||||
|
curl -X POST http://localhost:8080/api/instances/example/start
|
||||||
|
|
||||||
|
# Check status
|
||||||
|
curl http://localhost:8080/api/instances/example
|
||||||
|
|
||||||
|
# Stop instance
|
||||||
|
curl -X POST http://localhost:8080/api/instances/example/stop
|
||||||
|
|
||||||
|
# Delete instance
|
||||||
|
curl -X DELETE http://localhost:8080/api/instances/example
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn about [Managing Instances](managing-instances.md) in detail
|
||||||
|
- Explore [Advanced Configuration](../advanced/backends.md)
|
||||||
|
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
||||||
171
docs/user-guide/managing-instances.md
Normal file
171
docs/user-guide/managing-instances.md
Normal file
@@ -0,0 +1,171 @@
|
|||||||
|
# Managing Instances
|
||||||
|
|
||||||
|
Learn how to effectively manage your Llama.cpp instances with LlamaCtl.
|
||||||
|
|
||||||
|
## Instance Lifecycle
|
||||||
|
|
||||||
|
### Creating Instances
|
||||||
|
|
||||||
|
Instances can be created through the Web UI or API:
|
||||||
|
|
||||||
|
#### Via Web UI
|
||||||
|
1. Click "Add Instance" button
|
||||||
|
2. Fill in the configuration form
|
||||||
|
3. Click "Create"
|
||||||
|
|
||||||
|
#### Via API
|
||||||
|
```bash
|
||||||
|
curl -X POST http://localhost:8080/api/instances \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"name": "my-instance",
|
||||||
|
"model_path": "/path/to/model.gguf",
|
||||||
|
"port": 8081
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
### Starting and Stopping
|
||||||
|
|
||||||
|
#### Start an Instance
|
||||||
|
```bash
|
||||||
|
# Via API
|
||||||
|
curl -X POST http://localhost:8080/api/instances/{name}/start
|
||||||
|
|
||||||
|
# The instance will begin loading the model
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Stop an Instance
|
||||||
|
```bash
|
||||||
|
# Via API
|
||||||
|
curl -X POST http://localhost:8080/api/instances/{name}/stop
|
||||||
|
|
||||||
|
# Graceful shutdown with configurable timeout
|
||||||
|
```
|
||||||
|
|
||||||
|
### Monitoring Status
|
||||||
|
|
||||||
|
Check instance status in real-time:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get instance details
|
||||||
|
curl http://localhost:8080/api/instances/{name}
|
||||||
|
|
||||||
|
# Get health status
|
||||||
|
curl http://localhost:8080/api/instances/{name}/health
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instance States
|
||||||
|
|
||||||
|
Instances can be in one of several states:
|
||||||
|
|
||||||
|
- **Stopped**: Instance is not running
|
||||||
|
- **Starting**: Instance is initializing and loading the model
|
||||||
|
- **Running**: Instance is active and ready to serve requests
|
||||||
|
- **Stopping**: Instance is shutting down gracefully
|
||||||
|
- **Error**: Instance encountered an error
|
||||||
|
|
||||||
|
## Configuration Management
|
||||||
|
|
||||||
|
### Updating Instance Configuration
|
||||||
|
|
||||||
|
Modify instance settings:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
curl -X PUT http://localhost:8080/api/instances/{name} \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"options": {
|
||||||
|
"threads": 8,
|
||||||
|
"context_size": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
!!! note
|
||||||
|
Configuration changes require restarting the instance to take effect.
|
||||||
|
|
||||||
|
### Viewing Configuration
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get current configuration
|
||||||
|
curl http://localhost:8080/api/instances/{name}/config
|
||||||
|
```
|
||||||
|
|
||||||
|
## Resource Management
|
||||||
|
|
||||||
|
### Memory Usage
|
||||||
|
|
||||||
|
Monitor memory consumption:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get resource usage
|
||||||
|
curl http://localhost:8080/api/instances/{name}/stats
|
||||||
|
```
|
||||||
|
|
||||||
|
### CPU and GPU Usage
|
||||||
|
|
||||||
|
Track performance metrics:
|
||||||
|
|
||||||
|
- CPU thread utilization
|
||||||
|
- GPU memory usage (if applicable)
|
||||||
|
- Request processing times
|
||||||
|
|
||||||
|
## Troubleshooting Common Issues
|
||||||
|
|
||||||
|
### Instance Won't Start
|
||||||
|
|
||||||
|
1. **Check model path**: Ensure the model file exists and is readable
|
||||||
|
2. **Port conflicts**: Verify the port isn't already in use
|
||||||
|
3. **Resource limits**: Check available memory and CPU
|
||||||
|
4. **Permissions**: Ensure proper file system permissions
|
||||||
|
|
||||||
|
### Performance Issues
|
||||||
|
|
||||||
|
1. **Adjust thread count**: Match to your CPU cores
|
||||||
|
2. **Optimize context size**: Balance memory usage and capability
|
||||||
|
3. **GPU offloading**: Use `gpu_layers` for GPU acceleration
|
||||||
|
4. **Batch size tuning**: Optimize for your workload
|
||||||
|
|
||||||
|
### Memory Problems
|
||||||
|
|
||||||
|
1. **Reduce context size**: Lower memory requirements
|
||||||
|
2. **Disable memory mapping**: Use `no_mmap` option
|
||||||
|
3. **Enable memory locking**: Use `memory_lock` for performance
|
||||||
|
4. **Monitor system resources**: Check available RAM
|
||||||
|
|
||||||
|
## Best Practices
|
||||||
|
|
||||||
|
### Production Deployments
|
||||||
|
|
||||||
|
1. **Resource allocation**: Plan memory and CPU requirements
|
||||||
|
2. **Health monitoring**: Set up regular health checks
|
||||||
|
3. **Graceful shutdowns**: Use proper stop procedures
|
||||||
|
4. **Backup configurations**: Save instance configurations
|
||||||
|
5. **Log management**: Configure appropriate logging levels
|
||||||
|
|
||||||
|
### Development Environments
|
||||||
|
|
||||||
|
1. **Resource sharing**: Use smaller models for development
|
||||||
|
2. **Quick iterations**: Optimize for fast startup times
|
||||||
|
3. **Debug logging**: Enable detailed logging for troubleshooting
|
||||||
|
|
||||||
|
## Batch Operations
|
||||||
|
|
||||||
|
### Managing Multiple Instances
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Start all instances
|
||||||
|
curl -X POST http://localhost:8080/api/instances/start-all
|
||||||
|
|
||||||
|
# Stop all instances
|
||||||
|
curl -X POST http://localhost:8080/api/instances/stop-all
|
||||||
|
|
||||||
|
# Get status of all instances
|
||||||
|
curl http://localhost:8080/api/instances
|
||||||
|
```
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn about the [Web UI](web-ui.md) interface
|
||||||
|
- Explore the complete [API Reference](api-reference.md)
|
||||||
|
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
||||||
216
docs/user-guide/web-ui.md
Normal file
216
docs/user-guide/web-ui.md
Normal file
@@ -0,0 +1,216 @@
|
|||||||
|
# Web UI Guide
|
||||||
|
|
||||||
|
The LlamaCtl Web UI provides an intuitive interface for managing your Llama.cpp instances.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The web interface is accessible at `http://localhost:8080` (or your configured host/port) and provides:
|
||||||
|
|
||||||
|
- Instance management dashboard
|
||||||
|
- Real-time status monitoring
|
||||||
|
- Configuration management
|
||||||
|
- Log viewing
|
||||||
|
- System information
|
||||||
|
|
||||||
|
## Dashboard
|
||||||
|
|
||||||
|
### Instance Cards
|
||||||
|
|
||||||
|
Each instance is displayed as a card showing:
|
||||||
|
|
||||||
|
- **Instance name** and status indicator
|
||||||
|
- **Model information** (name, size)
|
||||||
|
- **Current state** (stopped, starting, running, error)
|
||||||
|
- **Resource usage** (memory, CPU)
|
||||||
|
- **Action buttons** (start, stop, configure, logs)
|
||||||
|
|
||||||
|
### Status Indicators
|
||||||
|
|
||||||
|
- 🟢 **Green**: Instance is running and healthy
|
||||||
|
- 🟡 **Yellow**: Instance is starting or stopping
|
||||||
|
- 🔴 **Red**: Instance has encountered an error
|
||||||
|
- ⚪ **Gray**: Instance is stopped
|
||||||
|
|
||||||
|
## Creating Instances
|
||||||
|
|
||||||
|
### Add Instance Dialog
|
||||||
|
|
||||||
|
1. Click the **"Add Instance"** button
|
||||||
|
2. Fill in the required fields:
|
||||||
|
- **Name**: Unique identifier for your instance
|
||||||
|
- **Model Path**: Full path to your GGUF model file
|
||||||
|
- **Port**: Port number for the instance
|
||||||
|
|
||||||
|
3. Configure optional settings:
|
||||||
|
- **Threads**: Number of CPU threads
|
||||||
|
- **Context Size**: Context window size
|
||||||
|
- **GPU Layers**: Layers to offload to GPU
|
||||||
|
- **Additional Options**: Advanced Llama.cpp parameters
|
||||||
|
|
||||||
|
4. Click **"Create"** to save the instance
|
||||||
|
|
||||||
|
### Model Path Helper
|
||||||
|
|
||||||
|
Use the file browser to select model files:
|
||||||
|
|
||||||
|
- Navigate to your models directory
|
||||||
|
- Select the `.gguf` file
|
||||||
|
- Path is automatically filled in the form
|
||||||
|
|
||||||
|
## Managing Instances
|
||||||
|
|
||||||
|
### Starting Instances
|
||||||
|
|
||||||
|
1. Click the **"Start"** button on an instance card
|
||||||
|
2. Watch the status change to "Starting"
|
||||||
|
3. Monitor progress in the logs
|
||||||
|
4. Instance becomes "Running" when ready
|
||||||
|
|
||||||
|
### Stopping Instances
|
||||||
|
|
||||||
|
1. Click the **"Stop"** button
|
||||||
|
2. Instance gracefully shuts down
|
||||||
|
3. Status changes to "Stopped"
|
||||||
|
|
||||||
|
### Viewing Logs
|
||||||
|
|
||||||
|
1. Click the **"Logs"** button on any instance
|
||||||
|
2. Real-time log viewer opens
|
||||||
|
3. Filter by log level (Debug, Info, Warning, Error)
|
||||||
|
4. Search through log entries
|
||||||
|
5. Download logs for offline analysis
|
||||||
|
|
||||||
|
## Configuration Management
|
||||||
|
|
||||||
|
### Editing Instance Settings
|
||||||
|
|
||||||
|
1. Click the **"Configure"** button
|
||||||
|
2. Modify settings in the configuration dialog
|
||||||
|
3. Changes require instance restart to take effect
|
||||||
|
4. Click **"Save"** to apply changes
|
||||||
|
|
||||||
|
### Advanced Options
|
||||||
|
|
||||||
|
Access advanced Llama.cpp options:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# Example advanced configuration
|
||||||
|
options:
|
||||||
|
rope_freq_base: 10000
|
||||||
|
rope_freq_scale: 1.0
|
||||||
|
yarn_ext_factor: -1.0
|
||||||
|
yarn_attn_factor: 1.0
|
||||||
|
yarn_beta_fast: 32.0
|
||||||
|
yarn_beta_slow: 1.0
|
||||||
|
```
|
||||||
|
|
||||||
|
## System Information
|
||||||
|
|
||||||
|
### Health Dashboard
|
||||||
|
|
||||||
|
Monitor overall system health:
|
||||||
|
|
||||||
|
- **System Resources**: CPU, memory, disk usage
|
||||||
|
- **Instance Summary**: Running/stopped instance counts
|
||||||
|
- **Performance Metrics**: Request rates, response times
|
||||||
|
|
||||||
|
### Resource Usage
|
||||||
|
|
||||||
|
Track resource consumption:
|
||||||
|
|
||||||
|
- Per-instance memory usage
|
||||||
|
- CPU utilization
|
||||||
|
- GPU memory (if applicable)
|
||||||
|
- Network I/O
|
||||||
|
|
||||||
|
## User Interface Features
|
||||||
|
|
||||||
|
### Theme Support
|
||||||
|
|
||||||
|
Switch between light and dark themes:
|
||||||
|
|
||||||
|
1. Click the theme toggle button
|
||||||
|
2. Setting is remembered across sessions
|
||||||
|
|
||||||
|
### Responsive Design
|
||||||
|
|
||||||
|
The UI adapts to different screen sizes:
|
||||||
|
|
||||||
|
- **Desktop**: Full-featured dashboard
|
||||||
|
- **Tablet**: Condensed layout
|
||||||
|
- **Mobile**: Stack-based navigation
|
||||||
|
|
||||||
|
### Keyboard Shortcuts
|
||||||
|
|
||||||
|
- `Ctrl+N`: Create new instance
|
||||||
|
- `Ctrl+R`: Refresh dashboard
|
||||||
|
- `Ctrl+L`: Open logs for selected instance
|
||||||
|
- `Esc`: Close dialogs
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
### Login
|
||||||
|
|
||||||
|
If authentication is enabled:
|
||||||
|
|
||||||
|
1. Navigate to the web UI
|
||||||
|
2. Enter your credentials
|
||||||
|
3. JWT token is stored for the session
|
||||||
|
4. Automatic logout on token expiry
|
||||||
|
|
||||||
|
### Session Management
|
||||||
|
|
||||||
|
- Sessions persist across browser restarts
|
||||||
|
- Logout clears authentication tokens
|
||||||
|
- Configurable session timeout
|
||||||
|
|
||||||
|
## Troubleshooting
|
||||||
|
|
||||||
|
### Common UI Issues
|
||||||
|
|
||||||
|
**Page won't load:**
|
||||||
|
- Check if LlamaCtl server is running
|
||||||
|
- Verify the correct URL and port
|
||||||
|
- Check browser console for errors
|
||||||
|
|
||||||
|
**Instance won't start from UI:**
|
||||||
|
- Verify model path is correct
|
||||||
|
- Check for port conflicts
|
||||||
|
- Review instance logs for errors
|
||||||
|
|
||||||
|
**Real-time updates not working:**
|
||||||
|
- Check WebSocket connection
|
||||||
|
- Verify firewall settings
|
||||||
|
- Try refreshing the page
|
||||||
|
|
||||||
|
### Browser Compatibility
|
||||||
|
|
||||||
|
Supported browsers:
|
||||||
|
- Chrome/Chromium 90+
|
||||||
|
- Firefox 88+
|
||||||
|
- Safari 14+
|
||||||
|
- Edge 90+
|
||||||
|
|
||||||
|
## Mobile Access
|
||||||
|
|
||||||
|
### Responsive Features
|
||||||
|
|
||||||
|
On mobile devices:
|
||||||
|
|
||||||
|
- Touch-friendly interface
|
||||||
|
- Swipe gestures for navigation
|
||||||
|
- Optimized button sizes
|
||||||
|
- Condensed information display
|
||||||
|
|
||||||
|
### Limitations
|
||||||
|
|
||||||
|
Some features may be limited on mobile:
|
||||||
|
- Log viewing (use horizontal scrolling)
|
||||||
|
- Complex configuration forms
|
||||||
|
- File browser functionality
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
- Learn about [API Reference](api-reference.md) for programmatic access
|
||||||
|
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
||||||
|
- Explore [Advanced Configuration](../advanced/backends.md) options
|
||||||
75
mkdocs.yml
Normal file
75
mkdocs.yml
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
site_name: LlamaCtl Documentation
|
||||||
|
site_description: User documentation for LlamaCtl - A management tool for Llama.cpp instances
|
||||||
|
site_author: LlamaCtl Team
|
||||||
|
site_url: https://llamactl.org
|
||||||
|
|
||||||
|
repo_name: lordmathis/llamactl
|
||||||
|
repo_url: https://github.com/lordmathis/llamactl
|
||||||
|
|
||||||
|
theme:
|
||||||
|
name: material
|
||||||
|
palette:
|
||||||
|
# Palette toggle for light mode
|
||||||
|
- scheme: default
|
||||||
|
primary: indigo
|
||||||
|
accent: indigo
|
||||||
|
toggle:
|
||||||
|
icon: material/brightness-7
|
||||||
|
name: Switch to dark mode
|
||||||
|
# Palette toggle for dark mode
|
||||||
|
- scheme: slate
|
||||||
|
primary: indigo
|
||||||
|
accent: indigo
|
||||||
|
toggle:
|
||||||
|
icon: material/brightness-4
|
||||||
|
name: Switch to light mode
|
||||||
|
features:
|
||||||
|
- navigation.tabs
|
||||||
|
- navigation.sections
|
||||||
|
- navigation.expand
|
||||||
|
- navigation.top
|
||||||
|
- search.highlight
|
||||||
|
- search.share
|
||||||
|
- content.code.copy
|
||||||
|
|
||||||
|
markdown_extensions:
|
||||||
|
- pymdownx.highlight:
|
||||||
|
anchor_linenums: true
|
||||||
|
- pymdownx.inlinehilite
|
||||||
|
- pymdownx.snippets
|
||||||
|
- pymdownx.superfences
|
||||||
|
- admonition
|
||||||
|
- pymdownx.details
|
||||||
|
- pymdownx.tabbed:
|
||||||
|
alternate_style: true
|
||||||
|
- attr_list
|
||||||
|
- md_in_html
|
||||||
|
- toc:
|
||||||
|
permalink: true
|
||||||
|
|
||||||
|
nav:
|
||||||
|
- Home: index.md
|
||||||
|
- Getting Started:
|
||||||
|
- Installation: getting-started/installation.md
|
||||||
|
- Quick Start: getting-started/quick-start.md
|
||||||
|
- Configuration: getting-started/configuration.md
|
||||||
|
- User Guide:
|
||||||
|
- Managing Instances: user-guide/managing-instances.md
|
||||||
|
- Web UI: user-guide/web-ui.md
|
||||||
|
- API Reference: user-guide/api-reference.md
|
||||||
|
- Advanced:
|
||||||
|
- Backends: advanced/backends.md
|
||||||
|
- Monitoring: advanced/monitoring.md
|
||||||
|
- Troubleshooting: advanced/troubleshooting.md
|
||||||
|
- Development:
|
||||||
|
- Contributing: development/contributing.md
|
||||||
|
- Building from Source: development/building.md
|
||||||
|
|
||||||
|
plugins:
|
||||||
|
- search
|
||||||
|
- git-revision-date-localized
|
||||||
|
|
||||||
|
extra:
|
||||||
|
social:
|
||||||
|
- icon: fontawesome/brands/github
|
||||||
|
link: https://github.com/lordmathis/llamactl
|
||||||
Reference in New Issue
Block a user