Create initial documentation structure

2025-12-23 01:24:24 +00:00 · 2025-08-31 14:27:00 +02:00
parent 7675271370
commit bd31c03f4a
16 changed files with 3514 additions and 0 deletions
--- a/docs/advanced/backends.md
+++ b/docs/advanced/backends.md
@@ -0,0 +1,316 @@
+# Backends
+
+LlamaCtl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
+
+## Llama.cpp Backend
+
+The primary backend for LlamaCtl, providing robust support for GGUF models.
+
+### Features
+
+- **GGUF Support**: Native support for GGUF model format
+- **GPU Acceleration**: CUDA, OpenCL, and Metal support
+- **Memory Optimization**: Efficient memory usage and mapping
+- **Multi-threading**: Configurable CPU thread utilization
+- **Quantization**: Support for various quantization levels
+
+### Configuration
+
+```yaml
+backends:
+  llamacpp:
+    binary_path: "/usr/local/bin/llama-server"
+    default_options:
+      threads: 4
+      context_size: 2048
+      batch_size: 512
+    gpu:
+      enabled: true
+      layers: 35
+```
+
+### Supported Options
+
+| Option | Description | Default |
+|--------|-------------|---------|
+| `threads` | Number of CPU threads | 4 |
+| `context_size` | Context window size | 2048 |
+| `batch_size` | Batch size for processing | 512 |
+| `gpu_layers` | Layers to offload to GPU | 0 |
+| `memory_lock` | Lock model in memory | false |
+| `no_mmap` | Disable memory mapping | false |
+| `rope_freq_base` | RoPE frequency base | 10000 |
+| `rope_freq_scale` | RoPE frequency scale | 1.0 |
+
+### GPU Acceleration
+
+#### CUDA Setup
+
+```bash
+# Install CUDA toolkit
+sudo apt update
+sudo apt install nvidia-cuda-toolkit
+
+# Verify CUDA installation
+nvcc --version
+nvidia-smi
+```
+
+#### Configuration for GPU
+
+```json
+{
+  "name": "gpu-accelerated",
+  "model_path": "/models/llama-2-13b.gguf",
+  "port": 8081,
+  "options": {
+    "gpu_layers": 35,
+    "threads": 2,
+    "context_size": 4096
+  }
+}
+```
+
+### Performance Tuning
+
+#### Memory Optimization
+
+```yaml
+# For limited memory systems
+options:
+  context_size: 1024
+  batch_size: 256
+  no_mmap: true
+  memory_lock: false
+
+# For high-memory systems
+options:
+  context_size: 8192
+  batch_size: 1024
+  memory_lock: true
+  no_mmap: false
+```
+
+#### CPU Optimization
+
+```yaml
+# Match thread count to CPU cores
+# For 8-core CPU:
+options:
+  threads: 6  # Leave 2 cores for system
+  
+# For high-performance CPUs:
+options:
+  threads: 16
+  batch_size: 1024
+```
+
+## Future Backends
+
+LlamaCtl is designed to support multiple backends. Planned additions:
+
+### vLLM Backend
+
+High-performance inference engine optimized for serving:
+
+- **Features**: Fast inference, batching, streaming
+- **Models**: Supports various model formats
+- **Scaling**: Horizontal scaling support
+
+### TensorRT-LLM Backend
+
+NVIDIA's optimized inference engine:
+
+- **Features**: Maximum GPU performance
+- **Models**: Optimized for NVIDIA GPUs
+- **Deployment**: Production-ready inference
+
+### Ollama Backend
+
+Integration with Ollama for easy model management:
+
+- **Features**: Simplified model downloading
+- **Models**: Large model library
+- **Integration**: Seamless model switching
+
+## Backend Selection
+
+### Automatic Detection
+
+LlamaCtl can automatically detect the best backend:
+
+```yaml
+backends:
+  auto_detect: true
+  preference_order:
+    - "llamacpp"
+    - "vllm"
+    - "tensorrt"
+```
+
+### Manual Selection
+
+Force a specific backend for an instance:
+
+```json
+{
+  "name": "manual-backend",
+  "backend": "llamacpp",
+  "model_path": "/models/model.gguf",
+  "port": 8081
+}
+```
+
+## Backend-Specific Features
+
+### Llama.cpp Features
+
+#### Model Formats
+
+- **GGUF**: Primary format, best compatibility
+- **GGML**: Legacy format (limited support)
+
+#### Quantization Levels
+
+- `Q2_K`: Smallest size, lower quality
+- `Q4_K_M`: Balanced size and quality
+- `Q5_K_M`: Higher quality, larger size
+- `Q6_K`: Near-original quality
+- `Q8_0`: Minimal loss, largest size
+
+#### Advanced Options
+
+```yaml
+advanced:
+  rope_scaling:
+    type: "linear"
+    factor: 2.0
+  attention:
+    flash_attention: true
+    grouped_query: true
+```
+
+## Monitoring Backend Performance
+
+### Metrics Collection
+
+Monitor backend-specific metrics:
+
+```bash
+# Get backend statistics
+curl http://localhost:8080/api/instances/my-instance/backend/stats
+```
+
+**Response:**
+```json
+{
+  "backend": "llamacpp",
+  "version": "b1234",
+  "metrics": {
+    "tokens_per_second": 15.2,
+    "memory_usage": 4294967296,
+    "gpu_utilization": 85.5,
+    "context_usage": 75.0
+  }
+}
+```
+
+### Performance Optimization
+
+#### Benchmark Different Configurations
+
+```bash
+# Test various thread counts
+for threads in 2 4 8 16; do
+  echo "Testing $threads threads"
+  curl -X PUT http://localhost:8080/api/instances/benchmark \
+    -d "{\"options\": {\"threads\": $threads}}"
+  # Run performance test
+done
+```
+
+#### Memory Usage Optimization
+
+```bash
+# Monitor memory usage
+watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
+```
+
+## Troubleshooting Backends
+
+### Common Llama.cpp Issues
+
+**Model won't load:**
+```bash
+# Check model file
+file /path/to/model.gguf
+
+# Verify format
+llama-server --model /path/to/model.gguf --dry-run
+```
+
+**GPU not detected:**
+```bash
+# Check CUDA installation
+nvidia-smi
+
+# Verify llama.cpp GPU support
+llama-server --help | grep -i gpu
+```
+
+**Performance issues:**
+```bash
+# Check system resources
+htop
+nvidia-smi
+
+# Verify configuration
+curl http://localhost:8080/api/instances/my-instance/config
+```
+
+## Custom Backend Development
+
+### Backend Interface
+
+Implement the backend interface for custom backends:
+
+```go
+type Backend interface {
+    Start(config InstanceConfig) error
+    Stop(instance *Instance) error
+    Health(instance *Instance) (*HealthStatus, error)
+    Stats(instance *Instance) (*Stats, error)
+}
+```
+
+### Registration
+
+Register your custom backend:
+
+```go
+func init() {
+    backends.Register("custom", &CustomBackend{})
+}
+```
+
+## Best Practices
+
+### Production Deployments
+
+1. **Resource allocation**: Plan for peak usage
+2. **Backend selection**: Choose based on requirements
+3. **Monitoring**: Set up comprehensive monitoring
+4. **Fallback**: Configure backup backends
+
+### Development
+
+1. **Rapid iteration**: Use smaller models
+2. **Resource monitoring**: Track usage patterns
+3. **Configuration testing**: Validate settings
+4. **Performance profiling**: Optimize bottlenecks
+
+## Next Steps
+
+- Learn about [Monitoring](monitoring.md) backend performance
+- Explore [Troubleshooting](troubleshooting.md) guides
+- Set up [Production Monitoring](monitoring.md)
--- a/docs/advanced/monitoring.md
+++ b/docs/advanced/monitoring.md
@@ -0,0 +1,420 @@
+# Monitoring
+
+Comprehensive monitoring setup for LlamaCtl in production environments.
+
+## Overview
+
+Effective monitoring of LlamaCtl involves tracking:
+
+- Instance health and performance
+- System resource usage
+- API response times
+- Error rates and alerts
+
+## Built-in Monitoring
+
+### Health Checks
+
+LlamaCtl provides built-in health monitoring:
+
+```bash
+# Check overall system health
+curl http://localhost:8080/api/system/health
+
+# Check specific instance health
+curl http://localhost:8080/api/instances/{name}/health
+```
+
+### Metrics Endpoint
+
+Access Prometheus-compatible metrics:
+
+```bash
+curl http://localhost:8080/metrics
+```
+
+**Available Metrics:**
+- `llamactl_instances_total`: Total number of instances
+- `llamactl_instances_running`: Number of running instances
+- `llamactl_instance_memory_bytes`: Instance memory usage
+- `llamactl_instance_cpu_percent`: Instance CPU usage
+- `llamactl_api_requests_total`: Total API requests
+- `llamactl_api_request_duration_seconds`: API response times
+
+## Prometheus Integration
+
+### Configuration
+
+Add LlamaCtl as a Prometheus target:
+
+```yaml
+# prometheus.yml
+scrape_configs:
+  - job_name: 'llamactl'
+    static_configs:
+      - targets: ['localhost:8080']
+    metrics_path: '/metrics'
+    scrape_interval: 15s
+```
+
+### Custom Metrics
+
+Enable additional metrics in LlamaCtl:
+
+```yaml
+# config.yaml
+monitoring:
+  enabled: true
+  prometheus:
+    enabled: true
+    path: "/metrics"
+  metrics:
+    - instance_stats
+    - api_performance
+    - system_resources
+```
+
+## Grafana Dashboards
+
+### LlamaCtl Dashboard
+
+Import the official Grafana dashboard:
+
+1. Download dashboard JSON from releases
+2. Import into Grafana
+3. Configure Prometheus data source
+
+### Key Panels
+
+**Instance Overview:**
+- Instance count and status
+- Resource usage per instance
+- Health status indicators
+
+**Performance Metrics:**
+- API response times
+- Tokens per second
+- Memory usage trends
+
+**System Resources:**
+- CPU and memory utilization
+- Disk I/O and network usage
+- GPU utilization (if applicable)
+
+### Custom Queries
+
+**Instance Uptime:**
+```promql
+(time() - llamactl_instance_start_time_seconds) / 3600
+```
+
+**Memory Usage Percentage:**
+```promql
+(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
+```
+
+**API Error Rate:**
+```promql
+rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
+```
+
+## Alerting
+
+### Prometheus Alerts
+
+Configure alerts for critical conditions:
+
+```yaml
+# alerts.yml
+groups:
+  - name: llamactl
+    rules:
+      - alert: InstanceDown
+        expr: llamactl_instance_up == 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "LlamaCtl instance {{ $labels.instance_name }} is down"
+          
+      - alert: HighMemoryUsage
+        expr: llamactl_instance_memory_percent > 90
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High memory usage on {{ $labels.instance_name }}"
+          
+      - alert: APIHighLatency
+        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
+        for: 2m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High API latency detected"
+```
+
+### Notification Channels
+
+Configure alert notifications:
+
+**Slack Integration:**
+```yaml
+# alertmanager.yml
+route:
+  group_by: ['alertname']
+  receiver: 'slack'
+
+receivers:
+  - name: 'slack'
+    slack_configs:
+      - api_url: 'https://hooks.slack.com/services/...'
+        channel: '#alerts'
+        title: 'LlamaCtl Alert'
+        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
+```
+
+## Log Management
+
+### Centralized Logging
+
+Configure log aggregation:
+
+```yaml
+# config.yaml
+logging:
+  level: "info"
+  output: "json"
+  destinations:
+    - type: "file"
+      path: "/var/log/llamactl/app.log"
+    - type: "syslog"
+      facility: "local0"
+    - type: "elasticsearch"
+      url: "http://elasticsearch:9200"
+```
+
+### Log Analysis
+
+Use ELK stack for log analysis:
+
+**Elasticsearch Index Template:**
+```json
+{
+  "index_patterns": ["llamactl-*"],
+  "mappings": {
+    "properties": {
+      "timestamp": {"type": "date"},
+      "level": {"type": "keyword"},
+      "message": {"type": "text"},
+      "instance": {"type": "keyword"},
+      "component": {"type": "keyword"}
+    }
+  }
+}
+```
+
+**Kibana Visualizations:**
+- Log volume over time
+- Error rate by instance
+- Performance trends
+- Resource usage patterns
+
+## Application Performance Monitoring
+
+### OpenTelemetry Integration
+
+Enable distributed tracing:
+
+```yaml
+# config.yaml
+telemetry:
+  enabled: true
+  otlp:
+    endpoint: "http://jaeger:14268/api/traces"
+  sampling_rate: 0.1
+```
+
+### Custom Spans
+
+Add custom tracing to track operations:
+
+```go
+ctx, span := tracer.Start(ctx, "instance.start")
+defer span.End()
+
+// Track instance startup time
+span.SetAttributes(
+    attribute.String("instance.name", name),
+    attribute.String("model.path", modelPath),
+)
+```
+
+## Health Check Configuration
+
+### Readiness Probes
+
+Configure Kubernetes readiness probes:
+
+```yaml
+readinessProbe:
+  httpGet:
+    path: /api/health
+    port: 8080
+  initialDelaySeconds: 30
+  periodSeconds: 10
+```
+
+### Liveness Probes
+
+Configure liveness probes:
+
+```yaml
+livenessProbe:
+  httpGet:
+    path: /api/health/live
+    port: 8080
+  initialDelaySeconds: 60
+  periodSeconds: 30
+```
+
+### Custom Health Checks
+
+Implement custom health checks:
+
+```go
+func (h *HealthHandler) CustomCheck(ctx context.Context) error {
+    // Check database connectivity
+    if err := h.db.Ping(); err != nil {
+        return fmt.Errorf("database unreachable: %w", err)
+    }
+    
+    // Check instance responsiveness
+    for _, instance := range h.instances {
+        if !instance.IsHealthy() {
+            return fmt.Errorf("instance %s unhealthy", instance.Name)
+        }
+    }
+    
+    return nil
+}
+```
+
+## Performance Profiling
+
+### pprof Integration
+
+Enable Go profiling:
+
+```yaml
+# config.yaml
+debug:
+  pprof_enabled: true
+  pprof_port: 6060
+```
+
+Access profiling endpoints:
+```bash
+# CPU profile
+go tool pprof http://localhost:6060/debug/pprof/profile
+
+# Memory profile
+go tool pprof http://localhost:6060/debug/pprof/heap
+
+# Goroutine profile
+go tool pprof http://localhost:6060/debug/pprof/goroutine
+```
+
+### Continuous Profiling
+
+Set up continuous profiling with Pyroscope:
+
+```yaml
+# config.yaml
+profiling:
+  enabled: true
+  pyroscope:
+    server_address: "http://pyroscope:4040"
+    application_name: "llamactl"
+```
+
+## Security Monitoring
+
+### Audit Logging
+
+Enable security audit logs:
+
+```yaml
+# config.yaml
+audit:
+  enabled: true
+  log_file: "/var/log/llamactl/audit.log"
+  events:
+    - "auth.login"
+    - "auth.logout"
+    - "instance.create"
+    - "instance.delete"
+    - "config.update"
+```
+
+### Rate Limiting Monitoring
+
+Track rate limiting metrics:
+
+```bash
+# Monitor rate limit hits
+curl http://localhost:8080/metrics | grep rate_limit
+```
+
+## Troubleshooting Monitoring
+
+### Common Issues
+
+**Metrics not appearing:**
+1. Check Prometheus configuration
+2. Verify network connectivity
+3. Review LlamaCtl logs for errors
+
+**High memory usage:**
+1. Check for memory leaks in profiles
+2. Monitor garbage collection metrics
+3. Review instance configurations
+
+**Alert fatigue:**
+1. Tune alert thresholds
+2. Implement alert severity levels
+3. Use alert routing and suppression
+
+### Debug Tools
+
+**Monitoring health:**
+```bash
+# Check monitoring endpoints
+curl -v http://localhost:8080/metrics
+curl -v http://localhost:8080/api/health
+
+# Review logs
+tail -f /var/log/llamactl/app.log
+```
+
+## Best Practices
+
+### Production Monitoring
+
+1. **Comprehensive coverage**: Monitor all critical components
+2. **Appropriate alerting**: Balance sensitivity and noise
+3. **Regular review**: Analyze trends and patterns
+4. **Documentation**: Maintain runbooks for alerts
+
+### Performance Optimization
+
+1. **Baseline establishment**: Know normal operating parameters
+2. **Trend analysis**: Identify performance degradation early
+3. **Capacity planning**: Monitor resource growth trends
+4. **Optimization cycles**: Regular performance tuning
+
+## Next Steps
+
+- Set up [Troubleshooting](troubleshooting.md) procedures
+- Learn about [Backend optimization](backends.md)
+- Configure [Production deployment](../development/building.md)
--- a/docs/advanced/troubleshooting.md
+++ b/docs/advanced/troubleshooting.md
@@ -0,0 +1,560 @@
+# Troubleshooting
+
+Common issues and solutions for LlamaCtl deployment and operation.
+
+## Installation Issues
+
+### Binary Not Found
+
+**Problem:** `llamactl: command not found`
+
+**Solutions:**
+1. Verify the binary is in your PATH:
+   ```bash
+   echo $PATH
+   which llamactl
+   ```
+
+2. Add to PATH or use full path:
+   ```bash
+   export PATH=$PATH:/path/to/llamactl
+   # or
+   /full/path/to/llamactl
+   ```
+
+3. Check binary permissions:
+   ```bash
+   chmod +x llamactl
+   ```
+
+### Permission Denied
+
+**Problem:** Permission errors when starting LlamaCtl
+
+**Solutions:**
+1. Check file permissions:
+   ```bash
+   ls -la llamactl
+   chmod +x llamactl
+   ```
+
+2. Verify directory permissions:
+   ```bash
+   # Check models directory
+   ls -la /path/to/models/
+   
+   # Check logs directory
+   sudo mkdir -p /var/log/llamactl
+   sudo chown $USER:$USER /var/log/llamactl
+   ```
+
+3. Run with appropriate user:
+   ```bash
+   # Don't run as root unless necessary
+   sudo -u llamactl ./llamactl
+   ```
+
+## Startup Issues
+
+### Port Already in Use
+
+**Problem:** `bind: address already in use`
+
+**Solutions:**
+1. Find process using the port:
+   ```bash
+   sudo netstat -tulpn | grep :8080
+   # or
+   sudo lsof -i :8080
+   ```
+
+2. Kill the conflicting process:
+   ```bash
+   sudo kill -9 <PID>
+   ```
+
+3. Use a different port:
+   ```bash
+   llamactl --port 8081
+   ```
+
+### Configuration Errors
+
+**Problem:** Invalid configuration preventing startup
+
+**Solutions:**
+1. Validate configuration file:
+   ```bash
+   llamactl --config /path/to/config.yaml --validate
+   ```
+
+2. Check YAML syntax:
+   ```bash
+   yamllint config.yaml
+   ```
+
+3. Use minimal configuration:
+   ```yaml
+   server:
+     host: "localhost"
+     port: 8080
+   ```
+
+## Instance Management Issues
+
+### Model Loading Failures
+
+**Problem:** Instance fails to start with model loading errors
+
+**Diagnostic Steps:**
+1. Check model file exists:
+   ```bash
+   ls -la /path/to/model.gguf
+   file /path/to/model.gguf
+   ```
+
+2. Verify model format:
+   ```bash
+   # Check if it's a valid GGUF file
+   hexdump -C /path/to/model.gguf | head -5
+   ```
+
+3. Test with llama.cpp directly:
+   ```bash
+   llama-server --model /path/to/model.gguf --port 8081
+   ```
+
+**Common Solutions:**
+- **Corrupted model:** Re-download the model file
+- **Wrong format:** Ensure model is in GGUF format
+- **Insufficient memory:** Reduce context size or use smaller model
+- **Path issues:** Use absolute paths, check file permissions
+
+### Memory Issues
+
+**Problem:** Out of memory errors or system becomes unresponsive
+
+**Diagnostic Steps:**
+1. Check system memory:
+   ```bash
+   free -h
+   cat /proc/meminfo
+   ```
+
+2. Monitor memory usage:
+   ```bash
+   top -p $(pgrep llamactl)
+   ```
+
+3. Check instance memory requirements:
+   ```bash
+   curl http://localhost:8080/api/instances/{name}/stats
+   ```
+
+**Solutions:**
+1. **Reduce context size:**
+   ```json
+   {
+     "options": {
+       "context_size": 1024
+     }
+   }
+   ```
+
+2. **Enable memory mapping:**
+   ```json
+   {
+     "options": {
+       "no_mmap": false
+     }
+   }
+   ```
+
+3. **Use quantized models:**
+   - Try Q4_K_M instead of higher precision models
+   - Use smaller model variants (7B instead of 13B)
+
+### GPU Issues
+
+**Problem:** GPU not detected or not being used
+
+**Diagnostic Steps:**
+1. Check GPU availability:
+   ```bash
+   nvidia-smi
+   ```
+
+2. Verify CUDA installation:
+   ```bash
+   nvcc --version
+   ```
+
+3. Check llama.cpp GPU support:
+   ```bash
+   llama-server --help | grep -i gpu
+   ```
+
+**Solutions:**
+1. **Install CUDA drivers:**
+   ```bash
+   sudo apt update
+   sudo apt install nvidia-driver-470 nvidia-cuda-toolkit
+   ```
+
+2. **Rebuild llama.cpp with GPU support:**
+   ```bash
+   cmake -DLLAMA_CUBLAS=ON ..
+   make
+   ```
+
+3. **Configure GPU layers:**
+   ```json
+   {
+     "options": {
+       "gpu_layers": 35
+     }
+   }
+   ```
+
+## Performance Issues
+
+### Slow Response Times
+
+**Problem:** API responses are slow or timeouts occur
+
+**Diagnostic Steps:**
+1. Check API response times:
+   ```bash
+   time curl http://localhost:8080/api/instances
+   ```
+
+2. Monitor system resources:
+   ```bash
+   htop
+   iotop
+   ```
+
+3. Check instance logs:
+   ```bash
+   curl http://localhost:8080/api/instances/{name}/logs
+   ```
+
+**Solutions:**
+1. **Optimize thread count:**
+   ```json
+   {
+     "options": {
+       "threads": 6
+     }
+   }
+   ```
+
+2. **Adjust batch size:**
+   ```json
+   {
+     "options": {
+       "batch_size": 512
+     }
+   }
+   ```
+
+3. **Enable GPU acceleration:**
+   ```json
+   {
+     "options": {
+       "gpu_layers": 35
+     }
+   }
+   ```
+
+### High CPU Usage
+
+**Problem:** LlamaCtl consuming excessive CPU
+
+**Diagnostic Steps:**
+1. Identify CPU-intensive processes:
+   ```bash
+   top -p $(pgrep -f llamactl)
+   ```
+
+2. Check thread allocation:
+   ```bash
+   curl http://localhost:8080/api/instances/{name}/config
+   ```
+
+**Solutions:**
+1. **Reduce thread count:**
+   ```json
+   {
+     "options": {
+       "threads": 4
+     }
+   }
+   ```
+
+2. **Limit concurrent instances:**
+   ```yaml
+   limits:
+     max_instances: 3
+   ```
+
+## Network Issues
+
+### Connection Refused
+
+**Problem:** Cannot connect to LlamaCtl web interface
+
+**Diagnostic Steps:**
+1. Check if service is running:
+   ```bash
+   ps aux | grep llamactl
+   ```
+
+2. Verify port binding:
+   ```bash
+   netstat -tulpn | grep :8080
+   ```
+
+3. Test local connectivity:
+   ```bash
+   curl http://localhost:8080/api/health
+   ```
+
+**Solutions:**
+1. **Check firewall settings:**
+   ```bash
+   sudo ufw status
+   sudo ufw allow 8080
+   ```
+
+2. **Bind to correct interface:**
+   ```yaml
+   server:
+     host: "0.0.0.0"  # Instead of "localhost"
+     port: 8080
+   ```
+
+### CORS Errors
+
+**Problem:** Web UI shows CORS errors in browser console
+
+**Solutions:**
+1. **Enable CORS in configuration:**
+   ```yaml
+   server:
+     cors_enabled: true
+     cors_origins:
+       - "http://localhost:3000"
+       - "https://yourdomain.com"
+   ```
+
+2. **Use reverse proxy:**
+   ```nginx
+   server {
+       listen 80;
+       location / {
+           proxy_pass http://localhost:8080;
+           proxy_set_header Host $host;
+           proxy_set_header X-Real-IP $remote_addr;
+       }
+   }
+   ```
+
+## Database Issues
+
+### Startup Database Errors
+
+**Problem:** Database connection failures on startup
+
+**Diagnostic Steps:**
+1. Check database service:
+   ```bash
+   systemctl status postgresql
+   # or
+   systemctl status mysql
+   ```
+
+2. Test database connectivity:
+   ```bash
+   psql -h localhost -U llamactl -d llamactl
+   ```
+
+**Solutions:**
+1. **Start database service:**
+   ```bash
+   sudo systemctl start postgresql
+   sudo systemctl enable postgresql
+   ```
+
+2. **Create database and user:**
+   ```sql
+   CREATE DATABASE llamactl;
+   CREATE USER llamactl WITH PASSWORD 'password';
+   GRANT ALL PRIVILEGES ON DATABASE llamactl TO llamactl;
+   ```
+
+## Web UI Issues
+
+### Blank Page or Loading Issues
+
+**Problem:** Web UI doesn't load or shows blank page
+
+**Diagnostic Steps:**
+1. Check browser console for errors (F12)
+2. Verify API connectivity:
+   ```bash
+   curl http://localhost:8080/api/system/status
+   ```
+
+3. Check static file serving:
+   ```bash
+   curl http://localhost:8080/
+   ```
+
+**Solutions:**
+1. **Clear browser cache**
+2. **Try different browser**
+3. **Check for JavaScript errors in console**
+4. **Verify API endpoint accessibility**
+
+### Authentication Issues
+
+**Problem:** Unable to login or authentication failures
+
+**Diagnostic Steps:**
+1. Check authentication configuration:
+   ```bash
+   curl http://localhost:8080/api/config | jq .auth
+   ```
+
+2. Verify user credentials:
+   ```bash
+   # Test login endpoint
+   curl -X POST http://localhost:8080/api/auth/login \
+     -H "Content-Type: application/json" \
+     -d '{"username":"admin","password":"password"}'
+   ```
+
+**Solutions:**
+1. **Reset admin password:**
+   ```bash
+   llamactl --reset-admin-password
+   ```
+
+2. **Disable authentication temporarily:**
+   ```yaml
+   auth:
+     enabled: false
+   ```
+
+## Log Analysis
+
+### Enable Debug Logging
+
+For detailed troubleshooting, enable debug logging:
+
+```yaml
+logging:
+  level: "debug"
+  output: "/var/log/llamactl/debug.log"
+```
+
+### Key Log Patterns
+
+Look for these patterns in logs:
+
+**Startup issues:**
+```
+ERRO Failed to start server
+ERRO Database connection failed
+ERRO Port binding failed
+```
+
+**Instance issues:**
+```
+ERRO Failed to start instance
+ERRO Model loading failed
+ERRO Process crashed
+```
+
+**Performance issues:**
+```
+WARN High memory usage detected
+WARN Request timeout
+WARN Resource limit exceeded
+```
+
+## Getting Help
+
+### Collecting Information
+
+When seeking help, provide:
+
+1. **System information:**
+   ```bash
+   uname -a
+   llamactl --version
+   ```
+
+2. **Configuration:**
+   ```bash
+   llamactl --config-dump
+   ```
+
+3. **Logs:**
+   ```bash
+   tail -100 /var/log/llamactl/app.log
+   ```
+
+4. **Error details:**
+   - Exact error messages
+   - Steps to reproduce
+   - Environment details
+
+### Support Channels
+
+- **GitHub Issues:** Report bugs and feature requests
+- **Documentation:** Check this documentation first
+- **Community:** Join discussions in GitHub Discussions
+
+## Preventive Measures
+
+### Health Monitoring
+
+Set up monitoring to catch issues early:
+
+```bash
+# Regular health checks
+*/5 * * * * curl -f http://localhost:8080/api/health || alert
+```
+
+### Resource Monitoring
+
+Monitor system resources:
+
+```bash
+# Disk space monitoring
+df -h /var/log/llamactl/
+df -h /path/to/models/
+
+# Memory monitoring
+free -h
+```
+
+### Backup Configuration
+
+Regular configuration backups:
+
+```bash
+# Backup configuration
+cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
+
+# Backup instance configurations
+curl http://localhost:8080/api/instances > instances-backup.json
+```
+
+## Next Steps
+
+- Set up [Monitoring](monitoring.md) to prevent issues
+- Learn about [Advanced Configuration](backends.md)
+- Review [Best Practices](../development/contributing.md)