Create initial documentation structure

2025-12-23 09:34:23 +00:00 · 2025-08-31 14:27:00 +02:00
parent 7675271370
commit bd31c03f4a
16 changed files with 3514 additions and 0 deletions
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -0,0 +1,65 @@
 name: Build and Deploy Documentation
 on:
  push:
    branches: [ main ]
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
      - 'docs-requirements.txt'
      - '.github/workflows/docs.yml'
  pull_request:
    branches: [ main ]
    paths:
      - 'docs/**'
      - 'mkdocs.yml'
      - 'docs-requirements.txt'
 permissions:
  contents: read
  pages: write
  id-token: write
 concurrency:
  group: "pages"
  cancel-in-progress: false
 jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
        uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Needed for git-revision-date-localized plugin
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: |
          pip install -r docs-requirements.txt
      - name: Build documentation
        run: |
          mkdocs build --strict
      - name: Upload documentation artifact
        if: github.ref == 'refs/heads/main'
        uses: actions/upload-pages-artifact@v3
        with:
          path: ./site
  deploy:
    if: github.ref == 'refs/heads/main'
    environment:
      name: github-pages
      url: ${{ steps.deployment.outputs.page_url }}
    runs-on: ubuntu-latest
    needs: build
    steps:
      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -129,6 +129,50 @@ Use this format for pull request titles:
 - Use meaningful component and variable names
 - Prefer functional components over class components
 ## Documentation Development
 This project uses MkDocs for documentation. When working on documentation:
 ### Setup Documentation Environment
 ```bash
 # Install documentation dependencies
 pip install -r docs-requirements.txt
 ```
 ### Development Workflow
 ```bash
 # Serve documentation locally for development
 mkdocs serve
 ```
 The documentation will be available at http://localhost:8000
 ```bash
 # Build static documentation site
 mkdocs build
 ```
 The built site will be in the `site/` directory.
 ### Documentation Structure
 - `docs/` - Documentation content (Markdown files)
 - `mkdocs.yml` - MkDocs configuration
 - `docs-requirements.txt` - Python dependencies for documentation
 ### Adding New Documentation
 When adding new documentation:
 1. Create Markdown files in the appropriate `docs/` subdirectory
 2. Update the navigation in `mkdocs.yml`
 3. Test locally with `mkdocs serve`
 4. Submit a pull request
 ### Documentation Deployment
 Documentation is automatically built and deployed to GitHub Pages when changes are pushed to the main branch.
 ## Getting Help
 - Check existing [issues](https://github.com/lordmathis/llamactl/issues)
--- a/docs-requirements.txt
+++ b/docs-requirements.txt
@@ -0,0 +1,4 @@
 mkdocs-material==9.5.3
 mkdocs==1.5.3
 pymdown-extensions==10.7
 mkdocs-git-revision-date-localized-plugin==1.2.4
--- a/docs/advanced/backends.md
+++ b/docs/advanced/backends.md
@@ -0,0 +1,316 @@
 # Backends
 LlamaCtl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
 ## Llama.cpp Backend
 The primary backend for LlamaCtl, providing robust support for GGUF models.
 ### Features
 - **GGUF Support**: Native support for GGUF model format
 - **GPU Acceleration**: CUDA, OpenCL, and Metal support
 - **Memory Optimization**: Efficient memory usage and mapping
 - **Multi-threading**: Configurable CPU thread utilization
 - **Quantization**: Support for various quantization levels
 ### Configuration
 ```yaml
 backends:
  llamacpp:
    binary_path: "/usr/local/bin/llama-server"
    default_options:
      threads: 4
      context_size: 2048
      batch_size: 512
    gpu:
      enabled: true
      layers: 35
 ```
 ### Supported Options
 | Option | Description | Default |
 |--------|-------------|---------|
 | `threads` | Number of CPU threads | 4 |
 | `context_size` | Context window size | 2048 |
 | `batch_size` | Batch size for processing | 512 |
 | `gpu_layers` | Layers to offload to GPU | 0 |
 | `memory_lock` | Lock model in memory | false |
 | `no_mmap` | Disable memory mapping | false |
 | `rope_freq_base` | RoPE frequency base | 10000 |
 | `rope_freq_scale` | RoPE frequency scale | 1.0 |
 ### GPU Acceleration
 #### CUDA Setup
 ```bash
 # Install CUDA toolkit
 sudo apt update
 sudo apt install nvidia-cuda-toolkit
 # Verify CUDA installation
 nvcc --version
 nvidia-smi
 ```
 #### Configuration for GPU
 ```json
 {
  "name": "gpu-accelerated",
  "model_path": "/models/llama-2-13b.gguf",
  "port": 8081,
  "options": {
    "gpu_layers": 35,
    "threads": 2,
    "context_size": 4096
  }
 }
 ```
 ### Performance Tuning
 #### Memory Optimization
 ```yaml
 # For limited memory systems
 options:
  context_size: 1024
  batch_size: 256
  no_mmap: true
  memory_lock: false
 # For high-memory systems
 options:
  context_size: 8192
  batch_size: 1024
  memory_lock: true
  no_mmap: false
 ```
 #### CPU Optimization
 ```yaml
 # Match thread count to CPU cores
 # For 8-core CPU:
 options:
  threads: 6  # Leave 2 cores for system
 # For high-performance CPUs:
 options:
  threads: 16
  batch_size: 1024
 ```
 ## Future Backends
 LlamaCtl is designed to support multiple backends. Planned additions:
 ### vLLM Backend
 High-performance inference engine optimized for serving:
 - **Features**: Fast inference, batching, streaming
 - **Models**: Supports various model formats
 - **Scaling**: Horizontal scaling support
 ### TensorRT-LLM Backend
 NVIDIA's optimized inference engine:
 - **Features**: Maximum GPU performance
 - **Models**: Optimized for NVIDIA GPUs
 - **Deployment**: Production-ready inference
 ### Ollama Backend
 Integration with Ollama for easy model management:
 - **Features**: Simplified model downloading
 - **Models**: Large model library
 - **Integration**: Seamless model switching
 ## Backend Selection
 ### Automatic Detection
 LlamaCtl can automatically detect the best backend:
 ```yaml
 backends:
  auto_detect: true
  preference_order:
    - "llamacpp"
    - "vllm"
    - "tensorrt"
 ```
 ### Manual Selection
 Force a specific backend for an instance:
 ```json
 {
  "name": "manual-backend",
  "backend": "llamacpp",
  "model_path": "/models/model.gguf",
  "port": 8081
 }
 ```
 ## Backend-Specific Features
 ### Llama.cpp Features
 #### Model Formats
 - **GGUF**: Primary format, best compatibility
 - **GGML**: Legacy format (limited support)
 #### Quantization Levels
 - `Q2_K`: Smallest size, lower quality
 - `Q4_K_M`: Balanced size and quality
 - `Q5_K_M`: Higher quality, larger size
 - `Q6_K`: Near-original quality
 - `Q8_0`: Minimal loss, largest size
 #### Advanced Options
 ```yaml
 advanced:
  rope_scaling:
    type: "linear"
    factor: 2.0
  attention:
    flash_attention: true
    grouped_query: true
 ```
 ## Monitoring Backend Performance
 ### Metrics Collection
 Monitor backend-specific metrics:
 ```bash
 # Get backend statistics
 curl http://localhost:8080/api/instances/my-instance/backend/stats
 ```
 **Response:**
 ```json
 {
  "backend": "llamacpp",
  "version": "b1234",
  "metrics": {
    "tokens_per_second": 15.2,
    "memory_usage": 4294967296,
    "gpu_utilization": 85.5,
    "context_usage": 75.0
  }
 }
 ```
 ### Performance Optimization
 #### Benchmark Different Configurations
 ```bash
 # Test various thread counts
 for threads in 2 4 8 16; do
  echo "Testing $threads threads"
  curl -X PUT http://localhost:8080/api/instances/benchmark \
    -d "{\"options\": {\"threads\": $threads}}"
  # Run performance test
 done
 ```
 #### Memory Usage Optimization
 ```bash
 # Monitor memory usage
 watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
 ```
 ## Troubleshooting Backends
 ### Common Llama.cpp Issues
 **Model won't load:**
 ```bash
 # Check model file
 file /path/to/model.gguf
 # Verify format
 llama-server --model /path/to/model.gguf --dry-run
 ```
 **GPU not detected:**
 ```bash
 # Check CUDA installation
 nvidia-smi
 # Verify llama.cpp GPU support
 llama-server --help | grep -i gpu
 ```
 **Performance issues:**
 ```bash
 # Check system resources
 htop
 nvidia-smi
 # Verify configuration
 curl http://localhost:8080/api/instances/my-instance/config
 ```
 ## Custom Backend Development
 ### Backend Interface
 Implement the backend interface for custom backends:
 ```go
 type Backend interface {
    Start(config InstanceConfig) error
    Stop(instance *Instance) error
    Health(instance *Instance) (*HealthStatus, error)
    Stats(instance *Instance) (*Stats, error)
 }
 ```
 ### Registration
 Register your custom backend:
 ```go
 func init() {
    backends.Register("custom", &CustomBackend{})
 }
 ```
 ## Best Practices
 ### Production Deployments
 1. **Resource allocation**: Plan for peak usage
 2. **Backend selection**: Choose based on requirements
 3. **Monitoring**: Set up comprehensive monitoring
 4. **Fallback**: Configure backup backends
 ### Development
 1. **Rapid iteration**: Use smaller models
 2. **Resource monitoring**: Track usage patterns
 3. **Configuration testing**: Validate settings
 4. **Performance profiling**: Optimize bottlenecks
 ## Next Steps
 - Learn about [Monitoring](monitoring.md) backend performance
 - Explore [Troubleshooting](troubleshooting.md) guides
 - Set up [Production Monitoring](monitoring.md)
--- a/docs/advanced/monitoring.md
+++ b/docs/advanced/monitoring.md
@@ -0,0 +1,420 @@
 # Monitoring
 Comprehensive monitoring setup for LlamaCtl in production environments.
 ## Overview
 Effective monitoring of LlamaCtl involves tracking:
 - Instance health and performance
 - System resource usage
 - API response times
 - Error rates and alerts
 ## Built-in Monitoring
 ### Health Checks
 LlamaCtl provides built-in health monitoring:
 ```bash
 # Check overall system health
 curl http://localhost:8080/api/system/health
 # Check specific instance health
 curl http://localhost:8080/api/instances/{name}/health
 ```
 ### Metrics Endpoint
 Access Prometheus-compatible metrics:
 ```bash
 curl http://localhost:8080/metrics
 ```
 **Available Metrics:**
 - `llamactl_instances_total`: Total number of instances
 - `llamactl_instances_running`: Number of running instances
 - `llamactl_instance_memory_bytes`: Instance memory usage
 - `llamactl_instance_cpu_percent`: Instance CPU usage
 - `llamactl_api_requests_total`: Total API requests
 - `llamactl_api_request_duration_seconds`: API response times
 ## Prometheus Integration
 ### Configuration
 Add LlamaCtl as a Prometheus target:
 ```yaml
 # prometheus.yml
 scrape_configs:
  - job_name: 'llamactl'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s
 ```
 ### Custom Metrics
 Enable additional metrics in LlamaCtl:
 ```yaml
 # config.yaml
 monitoring:
  enabled: true
  prometheus:
    enabled: true
    path: "/metrics"
  metrics:
    - instance_stats
    - api_performance
    - system_resources
 ```
 ## Grafana Dashboards
 ### LlamaCtl Dashboard
 Import the official Grafana dashboard:
 1. Download dashboard JSON from releases
 2. Import into Grafana
 3. Configure Prometheus data source
 ### Key Panels
 **Instance Overview:**
 - Instance count and status
 - Resource usage per instance
 - Health status indicators
 **Performance Metrics:**
 - API response times
 - Tokens per second
 - Memory usage trends
 **System Resources:**
 - CPU and memory utilization
 - Disk I/O and network usage
 - GPU utilization (if applicable)
 ### Custom Queries
 **Instance Uptime:**
 ```promql
 (time() - llamactl_instance_start_time_seconds) / 3600
 ```
 **Memory Usage Percentage:**
 ```promql
 (llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
 ```
 **API Error Rate:**
 ```promql
 rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
 ```
 ## Alerting
 ### Prometheus Alerts
 Configure alerts for critical conditions:
 ```yaml
 # alerts.yml
 groups:
  - name: llamactl
    rules:
      - alert: InstanceDown
        expr: llamactl_instance_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "LlamaCtl instance {{ $labels.instance_name }} is down"
      - alert: HighMemoryUsage
        expr: llamactl_instance_memory_percent > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance_name }}"
      - alert: APIHighLatency
        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High API latency detected"
 ```
 ### Notification Channels
 Configure alert notifications:
 **Slack Integration:**
 ```yaml
 # alertmanager.yml
 route:
  group_by: ['alertname']
  receiver: 'slack'
 receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/...'
        channel: '#alerts'
        title: 'LlamaCtl Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
 ```
 ## Log Management
 ### Centralized Logging
 Configure log aggregation:
 ```yaml
 # config.yaml
 logging:
  level: "info"
  output: "json"
  destinations:
    - type: "file"
      path: "/var/log/llamactl/app.log"
    - type: "syslog"
      facility: "local0"
    - type: "elasticsearch"
      url: "http://elasticsearch:9200"
 ```
 ### Log Analysis
 Use ELK stack for log analysis:
 **Elasticsearch Index Template:**
 ```json
 {
  "index_patterns": ["llamactl-*"],
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "level": {"type": "keyword"},
      "message": {"type": "text"},
      "instance": {"type": "keyword"},
      "component": {"type": "keyword"}
    }
  }
 }
 ```
 **Kibana Visualizations:**
 - Log volume over time
 - Error rate by instance
 - Performance trends
 - Resource usage patterns
 ## Application Performance Monitoring
 ### OpenTelemetry Integration
 Enable distributed tracing:
 ```yaml
 # config.yaml
 telemetry:
  enabled: true
  otlp:
    endpoint: "http://jaeger:14268/api/traces"
  sampling_rate: 0.1
 ```
 ### Custom Spans
 Add custom tracing to track operations:
 ```go
 ctx, span := tracer.Start(ctx, "instance.start")
 defer span.End()
 // Track instance startup time
 span.SetAttributes(
    attribute.String("instance.name", name),
    attribute.String("model.path", modelPath),
 )
 ```
 ## Health Check Configuration
 ### Readiness Probes
 Configure Kubernetes readiness probes:
 ```yaml
 readinessProbe:
  httpGet:
    path: /api/health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
 ```
 ### Liveness Probes
 Configure liveness probes:
 ```yaml
 livenessProbe:
  httpGet:
    path: /api/health/live
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30
 ```
 ### Custom Health Checks
 Implement custom health checks:
 ```go
 func (h *HealthHandler) CustomCheck(ctx context.Context) error {
    // Check database connectivity
    if err := h.db.Ping(); err != nil {
        return fmt.Errorf("database unreachable: %w", err)
    }
    // Check instance responsiveness
    for _, instance := range h.instances {
        if !instance.IsHealthy() {
            return fmt.Errorf("instance %s unhealthy", instance.Name)
        }
    }
    return nil
 }
 ```
 ## Performance Profiling
 ### pprof Integration
 Enable Go profiling:
 ```yaml
 # config.yaml
 debug:
  pprof_enabled: true
  pprof_port: 6060
 ```
 Access profiling endpoints:
 ```bash
 # CPU profile
 go tool pprof http://localhost:6060/debug/pprof/profile
 # Memory profile
 go tool pprof http://localhost:6060/debug/pprof/heap
 # Goroutine profile
 go tool pprof http://localhost:6060/debug/pprof/goroutine
 ```
 ### Continuous Profiling
 Set up continuous profiling with Pyroscope:
 ```yaml
 # config.yaml
 profiling:
  enabled: true
  pyroscope:
    server_address: "http://pyroscope:4040"
    application_name: "llamactl"
 ```
 ## Security Monitoring
 ### Audit Logging
 Enable security audit logs:
 ```yaml
 # config.yaml
 audit:
  enabled: true
  log_file: "/var/log/llamactl/audit.log"
  events:
    - "auth.login"
    - "auth.logout"
    - "instance.create"
    - "instance.delete"
    - "config.update"
 ```
 ### Rate Limiting Monitoring
 Track rate limiting metrics:
 ```bash
 # Monitor rate limit hits
 curl http://localhost:8080/metrics | grep rate_limit
 ```
 ## Troubleshooting Monitoring
 ### Common Issues
 **Metrics not appearing:**
 1. Check Prometheus configuration
 2. Verify network connectivity
 3. Review LlamaCtl logs for errors
 **High memory usage:**
 1. Check for memory leaks in profiles
 2. Monitor garbage collection metrics
 3. Review instance configurations
 **Alert fatigue:**
 1. Tune alert thresholds
 2. Implement alert severity levels
 3. Use alert routing and suppression
 ### Debug Tools
 **Monitoring health:**
 ```bash
 # Check monitoring endpoints
 curl -v http://localhost:8080/metrics
 curl -v http://localhost:8080/api/health
 # Review logs
 tail -f /var/log/llamactl/app.log
 ```
 ## Best Practices
 ### Production Monitoring
 1. **Comprehensive coverage**: Monitor all critical components
 2. **Appropriate alerting**: Balance sensitivity and noise
 3. **Regular review**: Analyze trends and patterns
 4. **Documentation**: Maintain runbooks for alerts
 ### Performance Optimization
 1. **Baseline establishment**: Know normal operating parameters
 2. **Trend analysis**: Identify performance degradation early
 3. **Capacity planning**: Monitor resource growth trends
 4. **Optimization cycles**: Regular performance tuning
 ## Next Steps
 - Set up [Troubleshooting](troubleshooting.md) procedures
 - Learn about [Backend optimization](backends.md)
 - Configure [Production deployment](../development/building.md)
--- a/docs/advanced/troubleshooting.md
+++ b/docs/advanced/troubleshooting.md
@@ -0,0 +1,560 @@
 # Troubleshooting
 Common issues and solutions for LlamaCtl deployment and operation.
 ## Installation Issues
 ### Binary Not Found
 **Problem:** `llamactl: command not found`
 **Solutions:**
 1. Verify the binary is in your PATH:
   ```bash
   echo $PATH
   which llamactl
   ```
 2. Add to PATH or use full path:
   ```bash
   export PATH=$PATH:/path/to/llamactl
   # or
   /full/path/to/llamactl
   ```
 3. Check binary permissions:
   ```bash
   chmod +x llamactl
   ```
 ### Permission Denied
 **Problem:** Permission errors when starting LlamaCtl
 **Solutions:**
 1. Check file permissions:
   ```bash
   ls -la llamactl
   chmod +x llamactl
   ```
 2. Verify directory permissions:
   ```bash
   # Check models directory
   ls -la /path/to/models/
   # Check logs directory
   sudo mkdir -p /var/log/llamactl
   sudo chown $USER:$USER /var/log/llamactl
   ```
 3. Run with appropriate user:
   ```bash
   # Don't run as root unless necessary
   sudo -u llamactl ./llamactl
   ```
 ## Startup Issues
 ### Port Already in Use
 **Problem:** `bind: address already in use`
 **Solutions:**
 1. Find process using the port:
   ```bash
   sudo netstat -tulpn | grep :8080
   # or
   sudo lsof -i :8080
   ```
 2. Kill the conflicting process:
   ```bash
   sudo kill -9 <PID>
   ```
 3. Use a different port:
   ```bash
   llamactl --port 8081
   ```
 ### Configuration Errors
 **Problem:** Invalid configuration preventing startup
 **Solutions:**
 1. Validate configuration file:
   ```bash
   llamactl --config /path/to/config.yaml --validate
   ```
 2. Check YAML syntax:
   ```bash
   yamllint config.yaml
   ```
 3. Use minimal configuration:
   ```yaml
   server:
     host: "localhost"
     port: 8080
   ```
 ## Instance Management Issues
 ### Model Loading Failures
 **Problem:** Instance fails to start with model loading errors
 **Diagnostic Steps:**
 1. Check model file exists:
   ```bash
   ls -la /path/to/model.gguf
   file /path/to/model.gguf
   ```
 2. Verify model format:
   ```bash
   # Check if it's a valid GGUF file
   hexdump -C /path/to/model.gguf | head -5
   ```
 3. Test with llama.cpp directly:
   ```bash
   llama-server --model /path/to/model.gguf --port 8081
   ```
 **Common Solutions:**
 - **Corrupted model:** Re-download the model file
 - **Wrong format:** Ensure model is in GGUF format
 - **Insufficient memory:** Reduce context size or use smaller model
 - **Path issues:** Use absolute paths, check file permissions
 ### Memory Issues
 **Problem:** Out of memory errors or system becomes unresponsive
 **Diagnostic Steps:**
 1. Check system memory:
   ```bash
   free -h
   cat /proc/meminfo
   ```
 2. Monitor memory usage:
   ```bash
   top -p $(pgrep llamactl)
   ```
 3. Check instance memory requirements:
   ```bash
   curl http://localhost:8080/api/instances/{name}/stats
   ```
 **Solutions:**
 1. **Reduce context size:**
   ```json
   {
     "options": {
       "context_size": 1024
     }
   }
   ```
 2. **Enable memory mapping:**
   ```json
   {
     "options": {
       "no_mmap": false
     }
   }
   ```
 3. **Use quantized models:**
   - Try Q4_K_M instead of higher precision models
   - Use smaller model variants (7B instead of 13B)
 ### GPU Issues
 **Problem:** GPU not detected or not being used
 **Diagnostic Steps:**
 1. Check GPU availability:
   ```bash
   nvidia-smi
   ```
 2. Verify CUDA installation:
   ```bash
   nvcc --version
   ```
 3. Check llama.cpp GPU support:
   ```bash
   llama-server --help | grep -i gpu
   ```
 **Solutions:**
 1. **Install CUDA drivers:**
   ```bash
   sudo apt update
   sudo apt install nvidia-driver-470 nvidia-cuda-toolkit
   ```
 2. **Rebuild llama.cpp with GPU support:**
   ```bash
   cmake -DLLAMA_CUBLAS=ON ..
   make
   ```
 3. **Configure GPU layers:**
   ```json
   {
     "options": {
       "gpu_layers": 35
     }
   }
   ```
 ## Performance Issues
 ### Slow Response Times
 **Problem:** API responses are slow or timeouts occur
 **Diagnostic Steps:**
 1. Check API response times:
   ```bash
   time curl http://localhost:8080/api/instances
   ```
 2. Monitor system resources:
   ```bash
   htop
   iotop
   ```
 3. Check instance logs:
   ```bash
   curl http://localhost:8080/api/instances/{name}/logs
   ```
 **Solutions:**
 1. **Optimize thread count:**
   ```json
   {
     "options": {
       "threads": 6
     }
   }
   ```
 2. **Adjust batch size:**
   ```json
   {
     "options": {
       "batch_size": 512
     }
   }
   ```
 3. **Enable GPU acceleration:**
   ```json
   {
     "options": {
       "gpu_layers": 35
     }
   }
   ```
 ### High CPU Usage
 **Problem:** LlamaCtl consuming excessive CPU
 **Diagnostic Steps:**
 1. Identify CPU-intensive processes:
   ```bash
   top -p $(pgrep -f llamactl)
   ```
 2. Check thread allocation:
   ```bash
   curl http://localhost:8080/api/instances/{name}/config
   ```
 **Solutions:**
 1. **Reduce thread count:**
   ```json
   {
     "options": {
       "threads": 4
     }
   }
   ```
 2. **Limit concurrent instances:**
   ```yaml
   limits:
     max_instances: 3
   ```
 ## Network Issues
 ### Connection Refused
 **Problem:** Cannot connect to LlamaCtl web interface
 **Diagnostic Steps:**
 1. Check if service is running:
   ```bash
   ps aux | grep llamactl
   ```
 2. Verify port binding:
   ```bash
   netstat -tulpn | grep :8080
   ```
 3. Test local connectivity:
   ```bash
   curl http://localhost:8080/api/health
   ```
 **Solutions:**
 1. **Check firewall settings:**
   ```bash
   sudo ufw status
   sudo ufw allow 8080
   ```
 2. **Bind to correct interface:**
   ```yaml
   server:
     host: "0.0.0.0"  # Instead of "localhost"
     port: 8080
   ```
 ### CORS Errors
 **Problem:** Web UI shows CORS errors in browser console
 **Solutions:**
 1. **Enable CORS in configuration:**
   ```yaml
   server:
     cors_enabled: true
     cors_origins:
       - "http://localhost:3000"
       - "https://yourdomain.com"
   ```
 2. **Use reverse proxy:**
   ```nginx
   server {
       listen 80;
       location / {
           proxy_pass http://localhost:8080;
           proxy_set_header Host $host;
           proxy_set_header X-Real-IP $remote_addr;
       }
   }
   ```
 ## Database Issues
 ### Startup Database Errors
 **Problem:** Database connection failures on startup
 **Diagnostic Steps:**
 1. Check database service:
   ```bash
   systemctl status postgresql
   # or
   systemctl status mysql
   ```
 2. Test database connectivity:
   ```bash
   psql -h localhost -U llamactl -d llamactl
   ```
 **Solutions:**
 1. **Start database service:**
   ```bash
   sudo systemctl start postgresql
   sudo systemctl enable postgresql
   ```
 2. **Create database and user:**
   ```sql
   CREATE DATABASE llamactl;
   CREATE USER llamactl WITH PASSWORD 'password';
   GRANT ALL PRIVILEGES ON DATABASE llamactl TO llamactl;
   ```
 ## Web UI Issues
 ### Blank Page or Loading Issues
 **Problem:** Web UI doesn't load or shows blank page
 **Diagnostic Steps:**
 1. Check browser console for errors (F12)
 2. Verify API connectivity:
   ```bash
   curl http://localhost:8080/api/system/status
   ```
 3. Check static file serving:
   ```bash
   curl http://localhost:8080/
   ```
 **Solutions:**
 1. **Clear browser cache**
 2. **Try different browser**
 3. **Check for JavaScript errors in console**
 4. **Verify API endpoint accessibility**
 ### Authentication Issues
 **Problem:** Unable to login or authentication failures
 **Diagnostic Steps:**
 1. Check authentication configuration:
   ```bash
   curl http://localhost:8080/api/config | jq .auth
   ```
 2. Verify user credentials:
   ```bash
   # Test login endpoint
   curl -X POST http://localhost:8080/api/auth/login \
     -H "Content-Type: application/json" \
     -d '{"username":"admin","password":"password"}'
   ```
 **Solutions:**
 1. **Reset admin password:**
   ```bash
   llamactl --reset-admin-password
   ```
 2. **Disable authentication temporarily:**
   ```yaml
   auth:
     enabled: false
   ```
 ## Log Analysis
 ### Enable Debug Logging
 For detailed troubleshooting, enable debug logging:
 ```yaml
 logging:
  level: "debug"
  output: "/var/log/llamactl/debug.log"
 ```
 ### Key Log Patterns
 Look for these patterns in logs:
 **Startup issues:**
 ```
 ERRO Failed to start server
 ERRO Database connection failed
 ERRO Port binding failed
 ```
 **Instance issues:**
 ```
 ERRO Failed to start instance
 ERRO Model loading failed
 ERRO Process crashed
 ```
 **Performance issues:**
 ```
 WARN High memory usage detected
 WARN Request timeout
 WARN Resource limit exceeded
 ```
 ## Getting Help
 ### Collecting Information
 When seeking help, provide:
 1. **System information:**
   ```bash
   uname -a
   llamactl --version
   ```
 2. **Configuration:**
   ```bash
   llamactl --config-dump
   ```
 3. **Logs:**
   ```bash
   tail -100 /var/log/llamactl/app.log
   ```
 4. **Error details:**
   - Exact error messages
   - Steps to reproduce
   - Environment details
 ### Support Channels
 - **GitHub Issues:** Report bugs and feature requests
 - **Documentation:** Check this documentation first
 - **Community:** Join discussions in GitHub Discussions
 ## Preventive Measures
 ### Health Monitoring
 Set up monitoring to catch issues early:
 ```bash
 # Regular health checks
 */5 * * * * curl -f http://localhost:8080/api/health || alert
 ```
 ### Resource Monitoring
 Monitor system resources:
 ```bash
 # Disk space monitoring
 df -h /var/log/llamactl/
 df -h /path/to/models/
 # Memory monitoring
 free -h
 ```
 ### Backup Configuration
 Regular configuration backups:
 ```bash
 # Backup configuration
 cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
 # Backup instance configurations
 curl http://localhost:8080/api/instances > instances-backup.json
 ```
 ## Next Steps
 - Set up [Monitoring](monitoring.md) to prevent issues
 - Learn about [Advanced Configuration](backends.md)
 - Review [Best Practices](../development/contributing.md)
--- a/docs/development/building.md
+++ b/docs/development/building.md
@@ -0,0 +1,464 @@
 # Building from Source
 This guide covers building LlamaCtl from source code for development and production deployment.
 ## Prerequisites
 ### Required Tools
 - **Go 1.24+**: Download from [golang.org](https://golang.org/dl/)
 - **Node.js 22+**: Download from [nodejs.org](https://nodejs.org/)
 - **Git**: For cloning the repository
 - **Make**: For build automation (optional)
 ### System Requirements
 - **Memory**: 4GB+ RAM for building
 - **Disk**: 2GB+ free space
 - **OS**: Linux, macOS, or Windows
 ## Quick Build
 ### Clone and Build
 ```bash
 # Clone the repository
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl
 # Build the application
 go build -o llamactl cmd/server/main.go
 ```
 ### Run
 ```bash
 ./llamactl
 ```
 ## Development Build
 ### Setup Development Environment
 ```bash
 # Clone repository
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl
 # Install Go dependencies
 go mod download
 # Install frontend dependencies
 cd webui
 npm ci
 cd ..
 ```
 ### Build Components
 ```bash
 # Build backend only
 go build -o llamactl cmd/server/main.go
 # Build frontend only
 cd webui
 npm run build
 cd ..
 # Build everything
 make build
 ```
 ### Development Server
 ```bash
 # Run backend in development mode
 go run cmd/server/main.go --dev
 # Run frontend dev server (separate terminal)
 cd webui
 npm run dev
 ```
 ## Production Build
 ### Optimized Build
 ```bash
 # Build with optimizations
 go build -ldflags="-s -w" -o llamactl cmd/server/main.go
 # Or use the Makefile
 make build-prod
 ```
 ### Build Flags
 Common build flags for production:
 ```bash
 go build \
  -ldflags="-s -w -X main.version=1.0.0 -X main.buildTime=$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
  -trimpath \
  -o llamactl \
  cmd/server/main.go
 ```
 **Flag explanations:**
 - `-s`: Strip symbol table
 - `-w`: Strip debug information
 - `-X`: Set variable values at build time
 - `-trimpath`: Remove absolute paths from binary
 ## Cross-Platform Building
 ### Build for Multiple Platforms
 ```bash
 # Linux AMD64
 GOOS=linux GOARCH=amd64 go build -o llamactl-linux-amd64 cmd/server/main.go
 # Linux ARM64
 GOOS=linux GOARCH=arm64 go build -o llamactl-linux-arm64 cmd/server/main.go
 # macOS AMD64
 GOOS=darwin GOARCH=amd64 go build -o llamactl-darwin-amd64 cmd/server/main.go
 # macOS ARM64 (Apple Silicon)
 GOOS=darwin GOARCH=arm64 go build -o llamactl-darwin-arm64 cmd/server/main.go
 # Windows AMD64
 GOOS=windows GOARCH=amd64 go build -o llamactl-windows-amd64.exe cmd/server/main.go
 ```
 ### Automated Cross-Building
 Use the provided Makefile:
 ```bash
 # Build all platforms
 make build-all
 # Build specific platform
 make build-linux
 make build-darwin
 make build-windows
 ```
 ## Build with Docker
 ### Development Container
 ```dockerfile
 # Dockerfile.dev
 FROM golang:1.24-alpine AS builder
 WORKDIR /app
 COPY go.mod go.sum ./
 RUN go mod download
 COPY . .
 RUN go build -o llamactl cmd/server/main.go
 FROM alpine:latest
 RUN apk --no-cache add ca-certificates
 WORKDIR /root/
 COPY --from=builder /app/llamactl .
 EXPOSE 8080
 CMD ["./llamactl"]
 ```
 ```bash
 # Build development image
 docker build -f Dockerfile.dev -t llamactl:dev .
 # Run container
 docker run -p 8080:8080 llamactl:dev
 ```
 ### Production Container
 ```dockerfile
 # Dockerfile
 FROM node:22-alpine AS frontend-builder
 WORKDIR /app/webui
 COPY webui/package*.json ./
 RUN npm ci
 COPY webui/ ./
 RUN npm run build
 FROM golang:1.24-alpine AS backend-builder
 WORKDIR /app
 COPY go.mod go.sum ./
 RUN go mod download
 COPY . .
 COPY --from=frontend-builder /app/webui/dist ./webui/dist
 RUN CGO_ENABLED=0 GOOS=linux go build \
    -ldflags="-s -w" \
    -o llamactl \
    cmd/server/main.go
 FROM alpine:latest
 RUN apk --no-cache add ca-certificates tzdata
 RUN adduser -D -s /bin/sh llamactl
 WORKDIR /home/llamactl
 COPY --from=backend-builder /app/llamactl .
 RUN chown llamactl:llamactl llamactl
 USER llamactl
 EXPOSE 8080
 CMD ["./llamactl"]
 ```
 ## Advanced Build Options
 ### Static Linking
 For deployments without external dependencies:
 ```bash
 CGO_ENABLED=0 go build \
  -ldflags="-s -w -extldflags '-static'" \
  -o llamactl-static \
  cmd/server/main.go
 ```
 ### Debug Build
 Build with debug information:
 ```bash
 go build -gcflags="all=-N -l" -o llamactl-debug cmd/server/main.go
 ```
 ### Race Detection Build
 Build with race detection (development only):
 ```bash
 go build -race -o llamactl-race cmd/server/main.go
 ```
 ## Build Automation
 ### Makefile
 ```makefile
 # Makefile
 VERSION := $(shell git describe --tags --always --dirty)
 BUILD_TIME := $(shell date -u +%Y-%m-%dT%H:%M:%SZ)
 LDFLAGS := -s -w -X main.version=$(VERSION) -X main.buildTime=$(BUILD_TIME)
 .PHONY: build clean test install
 build:
 	@echo "Building LlamaCtl..."
 	@cd webui && npm run build
 	@go build -ldflags="$(LDFLAGS)" -o llamactl cmd/server/main.go
 build-prod:
 	@echo "Building production binary..."
 	@cd webui && npm run build
 	@CGO_ENABLED=0 go build -ldflags="$(LDFLAGS)" -trimpath -o llamactl cmd/server/main.go
 build-all: build-linux build-darwin build-windows
 build-linux:
 	@GOOS=linux GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-linux-amd64 cmd/server/main.go
 	@GOOS=linux GOARCH=arm64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-linux-arm64 cmd/server/main.go
 build-darwin:
 	@GOOS=darwin GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-darwin-amd64 cmd/server/main.go
 	@GOOS=darwin GOARCH=arm64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-darwin-arm64 cmd/server/main.go
 build-windows:
 	@GOOS=windows GOARCH=amd64 go build -ldflags="$(LDFLAGS)" -o dist/llamactl-windows-amd64.exe cmd/server/main.go
 test:
 	@go test ./...
 clean:
 	@rm -f llamactl llamactl-*
 	@rm -rf dist/
 install: build
 	@cp llamactl $(GOPATH)/bin/llamactl
 ```
 ### GitHub Actions
 ```yaml
 # .github/workflows/build.yml
 name: Build
 on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]
 jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - name: Set up Go
      uses: actions/setup-go@v4
      with:
        go-version: '1.24'
    - name: Set up Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '22'
    - name: Install dependencies
      run: |
        go mod download
        cd webui && npm ci
    - name: Run tests
      run: |
        go test ./...
        cd webui && npm test
    - name: Build
      run: make build
  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v4
    - name: Set up Go
      uses: actions/setup-go@v4
      with:
        go-version: '1.24'
    - name: Set up Node.js
      uses: actions/setup-node@v4
      with:
        node-version: '22'
    - name: Build all platforms
      run: make build-all
    - name: Upload artifacts
      uses: actions/upload-artifact@v4
      with:
        name: binaries
        path: dist/
 ```
 ## Build Troubleshooting
 ### Common Issues
 **Go version mismatch:**
 ```bash
 # Check Go version
 go version
 # Update Go
 # Download from https://golang.org/dl/
 ```
 **Node.js issues:**
 ```bash
 # Clear npm cache
 npm cache clean --force
 # Remove node_modules and reinstall
 rm -rf webui/node_modules
 cd webui && npm ci
 ```
 **Build failures:**
 ```bash
 # Clean and rebuild
 make clean
 go mod tidy
 make build
 ```
 ### Performance Issues
 **Slow builds:**
 ```bash
 # Use build cache
 export GOCACHE=$(go env GOCACHE)
 # Parallel builds
 export GOMAXPROCS=$(nproc)
 ```
 **Large binary size:**
 ```bash
 # Use UPX compression
 upx --best llamactl
 # Analyze binary size
 go tool nm -size llamactl | head -20
 ```
 ## Deployment
 ### System Service
 Create a systemd service:
 ```ini
 # /etc/systemd/system/llamactl.service
 [Unit]
 Description=LlamaCtl Server
 After=network.target
 [Service]
 Type=simple
 User=llamactl
 Group=llamactl
 ExecStart=/usr/local/bin/llamactl
 Restart=always
 RestartSec=5
 [Install]
 WantedBy=multi-user.target
 ```
 ```bash
 # Enable and start service
 sudo systemctl enable llamactl
 sudo systemctl start llamactl
 ```
 ### Configuration
 ```bash
 # Create configuration directory
 sudo mkdir -p /etc/llamactl
 # Copy configuration
 sudo cp config.yaml /etc/llamactl/
 # Set permissions
 sudo chown -R llamactl:llamactl /etc/llamactl
 ```
 ## Next Steps
 - Configure [Installation](../getting-started/installation.md)
 - Set up [Configuration](../getting-started/configuration.md)
 - Learn about [Contributing](contributing.md)
--- a/docs/development/contributing.md
+++ b/docs/development/contributing.md
@@ -0,0 +1,373 @@
 # Contributing
 Thank you for your interest in contributing to LlamaCtl! This guide will help you get started with development and contribution.
 ## Development Setup
 ### Prerequisites
 - Go 1.24 or later
 - Node.js 22 or later
 - `llama-server` executable (from [llama.cpp](https://github.com/ggml-org/llama.cpp))
 - Git
 ### Getting Started
 1. **Fork and Clone**
   ```bash
   # Fork the repository on GitHub, then clone your fork
   git clone https://github.com/yourusername/llamactl.git
   cd llamactl
   # Add upstream remote
   git remote add upstream https://github.com/lordmathis/llamactl.git
   ```
 2. **Install Dependencies**
   ```bash
   # Go dependencies
   go mod download
   # Frontend dependencies
   cd webui && npm ci && cd ..
   ```
 3. **Run Development Environment**
   ```bash
   # Start backend server
   go run ./cmd/server
   ```
   In a separate terminal:
   ```bash
   # Start frontend dev server
   cd webui && npm run dev
   ```
 ## Development Workflow
 ### Setting Up Your Environment
 1. **Configuration**
   Create a development configuration file:
   ```yaml
   # dev-config.yaml
   server:
     host: "localhost"
     port: 8080
   logging:
     level: "debug"
   ```
 2. **Test Data**
   Set up test models and instances for development.
 ### Making Changes
 1. **Create a Branch**
   ```bash
   git checkout -b feature/your-feature-name
   ```
 2. **Development Commands**
   ```bash
   # Backend
   go test ./... -v                    # Run tests
   go test -race ./... -v              # Run with race detector
   go fmt ./... && go vet ./...        # Format and vet code
   go build ./cmd/server               # Build binary
   # Frontend (from webui/ directory)
   npm run test                        # Run tests
   npm run lint                        # Lint code
   npm run type-check                  # TypeScript check
   npm run build                       # Build for production
   ```
 3. **Code Quality**
   ```bash
   # Run all checks before committing
   make lint
   make test
   make build
   ```
 ## Project Structure
 ### Backend (Go)
 ```
 cmd/
 ├── server/              # Main application entry point
 pkg/
 ├── backends/           # Model backend implementations
 ├── config/            # Configuration management
 ├── instance/          # Instance lifecycle management
 ├── manager/           # Instance manager
 ├── server/            # HTTP server and routes
 ├── testutil/          # Test utilities
 └── validation/        # Input validation
 ```
 ### Frontend (React/TypeScript)
 ```
 webui/src/
 ├── components/        # React components
 ├── contexts/         # React contexts
 ├── hooks/           # Custom hooks
 ├── lib/             # Utility libraries
 ├── schemas/         # Zod schemas
 └── types/           # TypeScript types
 ```
 ## Coding Standards
 ### Go Code
 - Follow standard Go formatting (`gofmt`)
 - Use `go vet` and address all warnings
 - Write comprehensive tests for new functionality
 - Include documentation comments for exported functions
 - Use meaningful variable and function names
 Example:
 ```go
 // CreateInstance creates a new model instance with the given configuration.
 // It validates the configuration and ensures the instance name is unique.
 func (m *Manager) CreateInstance(ctx context.Context, config InstanceConfig) (*Instance, error) {
    if err := config.Validate(); err != nil {
        return nil, fmt.Errorf("invalid configuration: %w", err)
    }
    // Implementation...
 }
 ```
 ### TypeScript/React Code
 - Use TypeScript strict mode
 - Follow React best practices
 - Use functional components with hooks
 - Implement proper error boundaries
 - Write unit tests for components
 Example:
 ```typescript
 interface InstanceCardProps {
  instance: Instance;
  onStart: (name: string) => Promise<void>;
  onStop: (name: string) => Promise<void>;
 }
 export const InstanceCard: React.FC<InstanceCardProps> = ({
  instance,
  onStart,
  onStop,
 }) => {
  // Implementation...
 };
 ```
 ## Testing
 ### Backend Tests
 ```bash
 # Run all tests
 go test ./...
 # Run tests with coverage
 go test ./... -coverprofile=coverage.out
 go tool cover -html=coverage.out
 # Run specific package tests
 go test ./pkg/manager -v
 # Run with race detection
 go test -race ./...
 ```
 ### Frontend Tests
 ```bash
 cd webui
 # Run unit tests
 npm run test
 # Run tests with coverage
 npm run test:coverage
 # Run E2E tests
 npm run test:e2e
 ```
 ### Integration Tests
 ```bash
 # Run integration tests (requires llama-server)
 go test ./... -tags=integration
 ```
 ## Pull Request Process
 ### Before Submitting
 1. **Update your branch**
   ```bash
   git fetch upstream
   git rebase upstream/main
   ```
 2. **Run all tests**
   ```bash
   make test-all
   ```
 3. **Update documentation** if needed
 4. **Write clear commit messages**
   ```
   feat: add instance health monitoring
   - Implement health check endpoint
   - Add periodic health monitoring
   - Update API documentation
   Fixes #123
   ```
 ### Submitting a PR
 1. **Push your branch**
   ```bash
   git push origin feature/your-feature-name
   ```
 2. **Create Pull Request**
   - Use the PR template
   - Provide clear description
   - Link related issues
   - Add screenshots for UI changes
 3. **PR Review Process**
   - Automated checks must pass
   - Code review by maintainers
   - Address feedback promptly
   - Keep PR scope focused
 ## Issue Guidelines
 ### Reporting Bugs
 Use the bug report template and include:
 - Steps to reproduce
 - Expected vs actual behavior
 - Environment details (OS, Go version, etc.)
 - Relevant logs or error messages
 - Minimal reproduction case
 ### Feature Requests
 Use the feature request template and include:
 - Clear description of the problem
 - Proposed solution
 - Alternative solutions considered
 - Implementation complexity estimate
 ### Security Issues
 For security vulnerabilities:
 - Do NOT create public issues
 - Email security@llamactl.dev
 - Provide detailed description
 - Allow time for fix before disclosure
 ## Development Best Practices
 ### API Design
 - Follow REST principles
 - Use consistent naming conventions
 - Provide comprehensive error messages
 - Include proper HTTP status codes
 - Document all endpoints
 ### Error Handling
 ```go
 // Wrap errors with context
 if err := instance.Start(); err != nil {
    return fmt.Errorf("failed to start instance %s: %w", instance.Name, err)
 }
 // Use structured logging
 log.WithFields(log.Fields{
    "instance": instance.Name,
    "error": err,
 }).Error("Failed to start instance")
 ```
 ### Configuration
 - Use environment variables for deployment
 - Provide sensible defaults
 - Validate configuration on startup
 - Support configuration file reloading
 ### Performance
 - Profile code for bottlenecks
 - Use efficient data structures
 - Implement proper caching
 - Monitor resource usage
 ## Release Process
 ### Version Management
 - Use semantic versioning (SemVer)
 - Tag releases properly
 - Maintain CHANGELOG.md
 - Create release notes
 ### Building Releases
 ```bash
 # Build all platforms
 make build-all
 # Create release package
 make package
 ```
 ## Getting Help
 ### Communication Channels
 - **GitHub Issues**: Bug reports and feature requests
 - **GitHub Discussions**: General questions and ideas
 - **Code Review**: PR comments and feedback
 ### Development Questions
 When asking for help:
 1. Check existing documentation
 2. Search previous issues
 3. Provide minimal reproduction case
 4. Include relevant environment details
 ## Recognition
 Contributors are recognized in:
 - CONTRIBUTORS.md file
 - Release notes
 - Documentation credits
 - Annual contributor highlights
 Thank you for contributing to LlamaCtl!
--- a/docs/getting-started/configuration.md
+++ b/docs/getting-started/configuration.md
@@ -0,0 +1,154 @@
 # Configuration
 LlamaCtl can be configured through various methods to suit your needs.
 ## Configuration File
 Create a configuration file at `~/.llamactl/config.yaml`:
 ```yaml
 # Server configuration
 server:
  host: "0.0.0.0"
  port: 8080
  cors_enabled: true
 # Authentication (optional)
 auth:
  enabled: false
  # When enabled, configure your authentication method
  # jwt_secret: "your-secret-key"
 # Default instance settings
 defaults:
  backend: "llamacpp"
  timeout: 300
  log_level: "info"
 # Paths
 paths:
  models_dir: "/path/to/your/models"
  logs_dir: "/var/log/llamactl"
  data_dir: "/var/lib/llamactl"
 # Instance limits
 limits:
  max_instances: 10
  max_memory_per_instance: "8GB"
 ```
 ## Environment Variables
 You can also configure LlamaCtl using environment variables:
 ```bash
 # Server settings
 export LLAMACTL_HOST=0.0.0.0
 export LLAMACTL_PORT=8080
 # Paths
 export LLAMACTL_MODELS_DIR=/path/to/models
 export LLAMACTL_LOGS_DIR=/var/log/llamactl
 # Limits
 export LLAMACTL_MAX_INSTANCES=5
 ```
 ## Command Line Options
 View all available command line options:
 ```bash
 llamactl --help
 ```
 Common options:
 ```bash
 # Specify config file
 llamactl --config /path/to/config.yaml
 # Set log level
 llamactl --log-level debug
 # Run on different port
 llamactl --port 9090
 ```
 ## Instance Configuration
 When creating instances, you can specify various options:
 ### Basic Options
 - `name`: Unique identifier for the instance
 - `model_path`: Path to the GGUF model file
 - `port`: Port for the instance to listen on
 ### Advanced Options
 - `threads`: Number of CPU threads to use
 - `context_size`: Context window size
 - `batch_size`: Batch size for processing
 - `gpu_layers`: Number of layers to offload to GPU
 - `memory_lock`: Lock model in memory
 - `no_mmap`: Disable memory mapping
 ### Example Instance Configuration
 ```json
 {
  "name": "production-model",
  "model_path": "/models/llama-2-13b-chat.gguf",
  "port": 8081,
  "options": {
    "threads": 8,
    "context_size": 4096,
    "batch_size": 512,
    "gpu_layers": 35,
    "memory_lock": true
  }
 }
 ```
 ## Security Configuration
 ### Enable Authentication
 To enable authentication, update your config file:
 ```yaml
 auth:
  enabled: true
  jwt_secret: "your-very-secure-secret-key"
  token_expiry: "24h"
 ```
 ### HTTPS Configuration
 For production deployments, configure HTTPS:
 ```yaml
 server:
  tls:
    enabled: true
    cert_file: "/path/to/cert.pem"
    key_file: "/path/to/key.pem"
 ```
 ## Logging Configuration
 Configure logging levels and outputs:
 ```yaml
 logging:
  level: "info"  # debug, info, warn, error
  format: "json"  # json or text
  output: "/var/log/llamactl/app.log"
 ```
 ## Next Steps
 - Learn about [Managing Instances](../user-guide/managing-instances.md)
 - Explore [Advanced Configuration](../advanced/monitoring.md)
 - Set up [Monitoring](../advanced/monitoring.md)
--- a/docs/getting-started/installation.md
+++ b/docs/getting-started/installation.md
@@ -0,0 +1,55 @@
 # Installation
 This guide will walk you through installing LlamaCtl on your system.
 ## Prerequisites
 Before installing LlamaCtl, ensure you have:
 - Go 1.19 or later
 - Git
 - Sufficient disk space for your models
 ## Installation Methods
 ### Option 1: Download Binary (Recommended)
 Download the latest release from our [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
 ```bash
 # Download for Linux
 curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-linux-amd64 -o llamactl
 # Make executable
 chmod +x llamactl
 # Move to PATH (optional)
 sudo mv llamactl /usr/local/bin/
 ```
 ### Option 2: Build from Source
 If you prefer to build from source:
 ```bash
 # Clone the repository
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl
 # Build the application
 go build -o llamactl cmd/server/main.go
 ```
 For detailed build instructions, see the [Building from Source](../development/building.md) guide.
 ## Verification
 Verify your installation by checking the version:
 ```bash
 llamactl --version
 ```
 ## Next Steps
 Now that LlamaCtl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!
--- a/docs/getting-started/quick-start.md
+++ b/docs/getting-started/quick-start.md
@@ -0,0 +1,86 @@
 # Quick Start
 This guide will help you get LlamaCtl up and running in just a few minutes.
 ## Step 1: Start LlamaCtl
 Start the LlamaCtl server:
 ```bash
 llamactl
 ```
 By default, LlamaCtl will start on `http://localhost:8080`.
 ## Step 2: Access the Web UI
 Open your web browser and navigate to:
 ```
 http://localhost:8080
 ```
 You should see the LlamaCtl web interface.
 ## Step 3: Create Your First Instance
 1. Click the "Add Instance" button
 2. Fill in the instance configuration:
   - **Name**: Give your instance a descriptive name
   - **Model Path**: Path to your Llama.cpp model file
   - **Port**: Port for the instance to run on
   - **Additional Options**: Any extra Llama.cpp parameters
 3. Click "Create Instance"
 ## Step 4: Start Your Instance
 Once created, you can:
 - **Start** the instance by clicking the start button
 - **Monitor** its status in real-time
 - **View logs** by clicking the logs button
 - **Stop** the instance when needed
 ## Example Configuration
 Here's a basic example configuration for a Llama 2 model:
 ```json
 {
  "name": "llama2-7b",
  "model_path": "/path/to/llama-2-7b-chat.gguf",
  "port": 8081,
  "options": {
    "threads": 4,
    "context_size": 2048
  }
 }
 ```
 ## Using the API
 You can also manage instances via the REST API:
 ```bash
 # List all instances
 curl http://localhost:8080/api/instances
 # Create a new instance
 curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-model",
    "model_path": "/path/to/model.gguf",
    "port": 8081
  }'
 # Start an instance
 curl -X POST http://localhost:8080/api/instances/my-model/start
 ```
 ## Next Steps
 - Learn more about the [Web UI](../user-guide/web-ui.md)
 - Explore the [API Reference](../user-guide/api-reference.md)
 - Configure advanced settings in the [Configuration](configuration.md) guide
--- a/docs/index.md
+++ b/docs/index.md
@@ -0,0 +1,41 @@
 # LlamaCtl Documentation
 Welcome to the LlamaCtl documentation! LlamaCtl is a powerful management tool for Llama.cpp instances that provides both a web interface and REST API for managing large language models.
 ## What is LlamaCtl?
 LlamaCtl is designed to simplify the deployment and management of Llama.cpp instances. It provides:
 - **Instance Management**: Start, stop, and monitor multiple Llama.cpp instances
 - **Web UI**: User-friendly interface for managing your models
 - **REST API**: Programmatic access to all functionality
 - **Health Monitoring**: Real-time status and health checks
 - **Configuration Management**: Easy setup and configuration options
 ## Key Features
 - 🚀 **Easy Setup**: Quick installation and configuration
 - 🌐 **Web Interface**: Intuitive web UI for model management
 - 🔧 **REST API**: Full API access for automation
 - 📊 **Monitoring**: Real-time health and status monitoring
 - 🔒 **Security**: Authentication and access control
 - 📱 **Responsive**: Works on desktop and mobile devices
 ## Quick Links
 - [Installation Guide](getting-started/installation.md) - Get LlamaCtl up and running
 - [Quick Start](getting-started/quick-start.md) - Your first steps with LlamaCtl
 - [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
 - [API Reference](user-guide/api-reference.md) - Complete API documentation
 ## Getting Help
 If you need help or have questions:
 - Check the [Troubleshooting](advanced/troubleshooting.md) guide
 - Visit our [GitHub repository](https://github.com/lordmathis/llamactl)
 - Read the [Contributing guide](development/contributing.md) to help improve LlamaCtl
 ---
 Ready to get started? Head over to the [Installation Guide](getting-started/installation.md)!
--- a/docs/user-guide/api-reference.md
+++ b/docs/user-guide/api-reference.md
@@ -0,0 +1,470 @@
 # API Reference
 Complete reference for the LlamaCtl REST API.
 ## Base URL
 All API endpoints are relative to the base URL:
 ```
 http://localhost:8080/api
 ```
 ## Authentication
 If authentication is enabled, include the JWT token in the Authorization header:
 ```bash
 curl -H "Authorization: Bearer <your-jwt-token>" \
  http://localhost:8080/api/instances
 ```
 ## Instances
 ### List All Instances
 Get a list of all instances.
 ```http
 GET /api/instances
 ```
 **Response:**
 ```json
 {
  "instances": [
    {
      "name": "llama2-7b",
      "status": "running",
      "model_path": "/models/llama-2-7b.gguf",
      "port": 8081,
      "created_at": "2024-01-15T10:30:00Z",
      "updated_at": "2024-01-15T12:45:00Z"
    }
  ]
 }
 ```
 ### Get Instance Details
 Get detailed information about a specific instance.
 ```http
 GET /api/instances/{name}
 ```
 **Response:**
 ```json
 {
  "name": "llama2-7b",
  "status": "running",
  "model_path": "/models/llama-2-7b.gguf",
  "port": 8081,
  "pid": 12345,
  "options": {
    "threads": 4,
    "context_size": 2048,
    "gpu_layers": 0
  },
  "stats": {
    "memory_usage": 4294967296,
    "cpu_usage": 25.5,
    "uptime": 3600
  },
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T12:45:00Z"
 }
 ```
 ### Create Instance
 Create a new instance.
 ```http
 POST /api/instances
 ```
 **Request Body:**
 ```json
 {
  "name": "my-instance",
  "model_path": "/path/to/model.gguf",
  "port": 8081,
  "options": {
    "threads": 4,
    "context_size": 2048,
    "gpu_layers": 0
  }
 }
 ```
 **Response:**
 ```json
 {
  "message": "Instance created successfully",
  "instance": {
    "name": "my-instance",
    "status": "stopped",
    "model_path": "/path/to/model.gguf",
    "port": 8081,
    "created_at": "2024-01-15T14:30:00Z"
  }
 }
 ```
 ### Update Instance
 Update an existing instance configuration.
 ```http
 PUT /api/instances/{name}
 ```
 **Request Body:**
 ```json
 {
  "options": {
    "threads": 8,
    "context_size": 4096
  }
 }
 ```
 ### Delete Instance
 Delete an instance (must be stopped first).
 ```http
 DELETE /api/instances/{name}
 ```
 **Response:**
 ```json
 {
  "message": "Instance deleted successfully"
 }
 ```
 ## Instance Operations
 ### Start Instance
 Start a stopped instance.
 ```http
 POST /api/instances/{name}/start
 ```
 **Response:**
 ```json
 {
  "message": "Instance start initiated",
  "status": "starting"
 }
 ```
 ### Stop Instance
 Stop a running instance.
 ```http
 POST /api/instances/{name}/stop
 ```
 **Request Body (Optional):**
 ```json
 {
  "force": false,
  "timeout": 30
 }
 ```
 **Response:**
 ```json
 {
  "message": "Instance stop initiated",
  "status": "stopping"
 }
 ```
 ### Restart Instance
 Restart an instance (stop then start).
 ```http
 POST /api/instances/{name}/restart
 ```
 ### Get Instance Health
 Check instance health status.
 ```http
 GET /api/instances/{name}/health
 ```
 **Response:**
 ```json
 {
  "status": "healthy",
  "checks": {
    "process": "running",
    "port": "open",
    "response": "ok"
  },
  "last_check": "2024-01-15T14:30:00Z"
 }
 ```
 ### Get Instance Logs
 Retrieve instance logs.
 ```http
 GET /api/instances/{name}/logs
 ```
 **Query Parameters:**
 - `lines`: Number of lines to return (default: 100)
 - `follow`: Stream logs (boolean)
 - `level`: Filter by log level (debug, info, warn, error)
 **Response:**
 ```json
 {
  "logs": [
    {
      "timestamp": "2024-01-15T14:30:00Z",
      "level": "info",
      "message": "Model loaded successfully"
    }
  ]
 }
 ```
 ## Batch Operations
 ### Start All Instances
 Start all stopped instances.
 ```http
 POST /api/instances/start-all
 ```
 ### Stop All Instances
 Stop all running instances.
 ```http
 POST /api/instances/stop-all
 ```
 ## System Information
 ### Get System Status
 Get overall system status and metrics.
 ```http
 GET /api/system/status
 ```
 **Response:**
 ```json
 {
  "version": "1.0.0",
  "uptime": 86400,
  "instances": {
    "total": 5,
    "running": 3,
    "stopped": 2
  },
  "resources": {
    "cpu_usage": 45.2,
    "memory_usage": 8589934592,
    "memory_total": 17179869184,
    "disk_usage": 75.5
  }
 }
 ```
 ### Get System Information
 Get detailed system information.
 ```http
 GET /api/system/info
 ```
 **Response:**
 ```json
 {
  "hostname": "server-01",
  "os": "linux",
  "arch": "amd64",
  "cpu_count": 8,
  "memory_total": 17179869184,
  "version": "1.0.0",
  "build_time": "2024-01-15T10:00:00Z"
 }
 ```
 ## Configuration
 ### Get Configuration
 Get current LlamaCtl configuration.
 ```http
 GET /api/config
 ```
 ### Update Configuration
 Update LlamaCtl configuration (requires restart).
 ```http
 PUT /api/config
 ```
 ## Authentication
 ### Login
 Authenticate and receive a JWT token.
 ```http
 POST /api/auth/login
 ```
 **Request Body:**
 ```json
 {
  "username": "admin",
  "password": "password"
 }
 ```
 **Response:**
 ```json
 {
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expires_at": "2024-01-16T14:30:00Z"
 }
 ```
 ### Refresh Token
 Refresh an existing JWT token.
 ```http
 POST /api/auth/refresh
 ```
 ## Error Responses
 All endpoints may return error responses in the following format:
 ```json
 {
  "error": "Error message",
  "code": "ERROR_CODE",
  "details": "Additional error details"
 }
 ```
 ### Common HTTP Status Codes
 - `200`: Success
 - `201`: Created
 - `400`: Bad Request
 - `401`: Unauthorized
 - `403`: Forbidden
 - `404`: Not Found
 - `409`: Conflict (e.g., instance already exists)
 - `500`: Internal Server Error
 ## WebSocket API
 ### Real-time Updates
 Connect to WebSocket for real-time updates:
 ```javascript
 const ws = new WebSocket('ws://localhost:8080/api/ws');
 ws.onmessage = function(event) {
  const data = JSON.parse(event.data);
  console.log('Update:', data);
 };
 ```
 **Message Types:**
 - `instance_status_changed`: Instance status updates
 - `instance_stats_updated`: Resource usage updates
 - `system_alert`: System-level alerts
 ## Rate Limiting
 API requests are rate limited to:
 - **100 requests per minute** for regular endpoints
 - **10 requests per minute** for resource-intensive operations
 Rate limit headers are included in responses:
 - `X-RateLimit-Limit`: Request limit
 - `X-RateLimit-Remaining`: Remaining requests
 - `X-RateLimit-Reset`: Reset time (Unix timestamp)
 ## SDKs and Libraries
 ### Go Client
 ```go
 import "github.com/lordmathis/llamactl-go-client"
 client := llamactl.NewClient("http://localhost:8080")
 instances, err := client.ListInstances()
 ```
 ### Python Client
 ```python
 from llamactl import Client
 client = Client("http://localhost:8080")
 instances = client.list_instances()
 ```
 ## Examples
 ### Complete Instance Lifecycle
 ```bash
 # Create instance
 curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "example",
    "model_path": "/models/example.gguf",
    "port": 8081
  }'
 # Start instance
 curl -X POST http://localhost:8080/api/instances/example/start
 # Check status
 curl http://localhost:8080/api/instances/example
 # Stop instance
 curl -X POST http://localhost:8080/api/instances/example/stop
 # Delete instance
 curl -X DELETE http://localhost:8080/api/instances/example
 ```
 ## Next Steps
 - Learn about [Managing Instances](managing-instances.md) in detail
 - Explore [Advanced Configuration](../advanced/backends.md)
 - Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/managing-instances.md
+++ b/docs/user-guide/managing-instances.md
@@ -0,0 +1,171 @@
 # Managing Instances
 Learn how to effectively manage your Llama.cpp instances with LlamaCtl.
 ## Instance Lifecycle
 ### Creating Instances
 Instances can be created through the Web UI or API:
 #### Via Web UI
 1. Click "Add Instance" button
 2. Fill in the configuration form
 3. Click "Create"
 #### Via API
 ```bash
 curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-instance",
    "model_path": "/path/to/model.gguf",
    "port": 8081
  }'
 ```
 ### Starting and Stopping
 #### Start an Instance
 ```bash
 # Via API
 curl -X POST http://localhost:8080/api/instances/{name}/start
 # The instance will begin loading the model
 ```
 #### Stop an Instance
 ```bash
 # Via API
 curl -X POST http://localhost:8080/api/instances/{name}/stop
 # Graceful shutdown with configurable timeout
 ```
 ### Monitoring Status
 Check instance status in real-time:
 ```bash
 # Get instance details
 curl http://localhost:8080/api/instances/{name}
 # Get health status
 curl http://localhost:8080/api/instances/{name}/health
 ```
 ## Instance States
 Instances can be in one of several states:
 - **Stopped**: Instance is not running
 - **Starting**: Instance is initializing and loading the model
 - **Running**: Instance is active and ready to serve requests
 - **Stopping**: Instance is shutting down gracefully
 - **Error**: Instance encountered an error
 ## Configuration Management
 ### Updating Instance Configuration
 Modify instance settings:
 ```bash
 curl -X PUT http://localhost:8080/api/instances/{name} \
  -H "Content-Type: application/json" \
  -d '{
    "options": {
      "threads": 8,
      "context_size": 4096
    }
  }'
 ```
 !!! note
    Configuration changes require restarting the instance to take effect.
 ### Viewing Configuration
 ```bash
 # Get current configuration
 curl http://localhost:8080/api/instances/{name}/config
 ```
 ## Resource Management
 ### Memory Usage
 Monitor memory consumption:
 ```bash
 # Get resource usage
 curl http://localhost:8080/api/instances/{name}/stats
 ```
 ### CPU and GPU Usage
 Track performance metrics:
 - CPU thread utilization
 - GPU memory usage (if applicable)
 - Request processing times
 ## Troubleshooting Common Issues
 ### Instance Won't Start
 1. **Check model path**: Ensure the model file exists and is readable
 2. **Port conflicts**: Verify the port isn't already in use
 3. **Resource limits**: Check available memory and CPU
 4. **Permissions**: Ensure proper file system permissions
 ### Performance Issues
 1. **Adjust thread count**: Match to your CPU cores
 2. **Optimize context size**: Balance memory usage and capability
 3. **GPU offloading**: Use `gpu_layers` for GPU acceleration
 4. **Batch size tuning**: Optimize for your workload
 ### Memory Problems
 1. **Reduce context size**: Lower memory requirements
 2. **Disable memory mapping**: Use `no_mmap` option
 3. **Enable memory locking**: Use `memory_lock` for performance
 4. **Monitor system resources**: Check available RAM
 ## Best Practices
 ### Production Deployments
 1. **Resource allocation**: Plan memory and CPU requirements
 2. **Health monitoring**: Set up regular health checks
 3. **Graceful shutdowns**: Use proper stop procedures
 4. **Backup configurations**: Save instance configurations
 5. **Log management**: Configure appropriate logging levels
 ### Development Environments
 1. **Resource sharing**: Use smaller models for development
 2. **Quick iterations**: Optimize for fast startup times
 3. **Debug logging**: Enable detailed logging for troubleshooting
 ## Batch Operations
 ### Managing Multiple Instances
 ```bash
 # Start all instances
 curl -X POST http://localhost:8080/api/instances/start-all
 # Stop all instances
 curl -X POST http://localhost:8080/api/instances/stop-all
 # Get status of all instances
 curl http://localhost:8080/api/instances
 ```
 ## Next Steps
 - Learn about the [Web UI](web-ui.md) interface
 - Explore the complete [API Reference](api-reference.md)
 - Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/web-ui.md
+++ b/docs/user-guide/web-ui.md
@@ -0,0 +1,216 @@
 # Web UI Guide
 The LlamaCtl Web UI provides an intuitive interface for managing your Llama.cpp instances.
 ## Overview
 The web interface is accessible at `http://localhost:8080` (or your configured host/port) and provides:
 - Instance management dashboard
 - Real-time status monitoring
 - Configuration management
 - Log viewing
 - System information
 ## Dashboard
 ### Instance Cards
 Each instance is displayed as a card showing:
 - **Instance name** and status indicator
 - **Model information** (name, size)
 - **Current state** (stopped, starting, running, error)
 - **Resource usage** (memory, CPU)
 - **Action buttons** (start, stop, configure, logs)
 ### Status Indicators
 - 🟢 **Green**: Instance is running and healthy
 - 🟡 **Yellow**: Instance is starting or stopping
 - 🔴 **Red**: Instance has encountered an error
 - ⚪ **Gray**: Instance is stopped
 ## Creating Instances
 ### Add Instance Dialog
 1. Click the **"Add Instance"** button
 2. Fill in the required fields:
   - **Name**: Unique identifier for your instance
   - **Model Path**: Full path to your GGUF model file
   - **Port**: Port number for the instance
 3. Configure optional settings:
   - **Threads**: Number of CPU threads
   - **Context Size**: Context window size
   - **GPU Layers**: Layers to offload to GPU
   - **Additional Options**: Advanced Llama.cpp parameters
 4. Click **"Create"** to save the instance
 ### Model Path Helper
 Use the file browser to select model files:
 - Navigate to your models directory
 - Select the `.gguf` file
 - Path is automatically filled in the form
 ## Managing Instances
 ### Starting Instances
 1. Click the **"Start"** button on an instance card
 2. Watch the status change to "Starting"
 3. Monitor progress in the logs
 4. Instance becomes "Running" when ready
 ### Stopping Instances
 1. Click the **"Stop"** button
 2. Instance gracefully shuts down
 3. Status changes to "Stopped"
 ### Viewing Logs
 1. Click the **"Logs"** button on any instance
 2. Real-time log viewer opens
 3. Filter by log level (Debug, Info, Warning, Error)
 4. Search through log entries
 5. Download logs for offline analysis
 ## Configuration Management
 ### Editing Instance Settings
 1. Click the **"Configure"** button
 2. Modify settings in the configuration dialog
 3. Changes require instance restart to take effect
 4. Click **"Save"** to apply changes
 ### Advanced Options
 Access advanced Llama.cpp options:
 ```yaml
 # Example advanced configuration
 options:
  rope_freq_base: 10000
  rope_freq_scale: 1.0
  yarn_ext_factor: -1.0
  yarn_attn_factor: 1.0
  yarn_beta_fast: 32.0
  yarn_beta_slow: 1.0
 ```
 ## System Information
 ### Health Dashboard
 Monitor overall system health:
 - **System Resources**: CPU, memory, disk usage
 - **Instance Summary**: Running/stopped instance counts
 - **Performance Metrics**: Request rates, response times
 ### Resource Usage
 Track resource consumption:
 - Per-instance memory usage
 - CPU utilization
 - GPU memory (if applicable)
 - Network I/O
 ## User Interface Features
 ### Theme Support
 Switch between light and dark themes:
 1. Click the theme toggle button
 2. Setting is remembered across sessions
 ### Responsive Design
 The UI adapts to different screen sizes:
 - **Desktop**: Full-featured dashboard
 - **Tablet**: Condensed layout
 - **Mobile**: Stack-based navigation
 ### Keyboard Shortcuts
 - `Ctrl+N`: Create new instance
 - `Ctrl+R`: Refresh dashboard
 - `Ctrl+L`: Open logs for selected instance
 - `Esc`: Close dialogs
 ## Authentication
 ### Login
 If authentication is enabled:
 1. Navigate to the web UI
 2. Enter your credentials
 3. JWT token is stored for the session
 4. Automatic logout on token expiry
 ### Session Management
 - Sessions persist across browser restarts
 - Logout clears authentication tokens
 - Configurable session timeout
 ## Troubleshooting
 ### Common UI Issues
 **Page won't load:**
 - Check if LlamaCtl server is running
 - Verify the correct URL and port
 - Check browser console for errors
 **Instance won't start from UI:**
 - Verify model path is correct
 - Check for port conflicts
 - Review instance logs for errors
 **Real-time updates not working:**
 - Check WebSocket connection
 - Verify firewall settings
 - Try refreshing the page
 ### Browser Compatibility
 Supported browsers:
 - Chrome/Chromium 90+
 - Firefox 88+
 - Safari 14+
 - Edge 90+
 ## Mobile Access
 ### Responsive Features
 On mobile devices:
 - Touch-friendly interface
 - Swipe gestures for navigation
 - Optimized button sizes
 - Condensed information display
 ### Limitations
 Some features may be limited on mobile:
 - Log viewing (use horizontal scrolling)
 - Complex configuration forms
 - File browser functionality
 ## Next Steps
 - Learn about [API Reference](api-reference.md) for programmatic access
 - Set up [Monitoring](../advanced/monitoring.md) for production use
 - Explore [Advanced Configuration](../advanced/backends.md) options
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -0,0 +1,75 @@
 site_name: LlamaCtl Documentation
 site_description: User documentation for LlamaCtl - A management tool for Llama.cpp instances
 site_author: LlamaCtl Team
 site_url: https://llamactl.org
 repo_name: lordmathis/llamactl
 repo_url: https://github.com/lordmathis/llamactl
 theme:
  name: material
  palette:
    # Palette toggle for light mode
    - scheme: default
      primary: indigo
      accent: indigo
      toggle:
        icon: material/brightness-7
        name: Switch to dark mode
    # Palette toggle for dark mode
    - scheme: slate
      primary: indigo
      accent: indigo
      toggle:
        icon: material/brightness-4
        name: Switch to light mode
  features:
    - navigation.tabs
    - navigation.sections
    - navigation.expand
    - navigation.top
    - search.highlight
    - search.share
    - content.code.copy
 markdown_extensions:
  - pymdownx.highlight:
      anchor_linenums: true
  - pymdownx.inlinehilite
  - pymdownx.snippets
  - pymdownx.superfences
  - admonition
  - pymdownx.details
  - pymdownx.tabbed:
      alternate_style: true
  - attr_list
  - md_in_html
  - toc:
      permalink: true
 nav:
  - Home: index.md
  - Getting Started:
    - Installation: getting-started/installation.md
    - Quick Start: getting-started/quick-start.md
    - Configuration: getting-started/configuration.md
  - User Guide:
    - Managing Instances: user-guide/managing-instances.md
    - Web UI: user-guide/web-ui.md
    - API Reference: user-guide/api-reference.md
  - Advanced:
    - Backends: advanced/backends.md
    - Monitoring: advanced/monitoring.md
    - Troubleshooting: advanced/troubleshooting.md
  - Development:
    - Contributing: development/contributing.md
    - Building from Source: development/building.md
 plugins:
  - search
  - git-revision-date-localized
 extra:
  social:
    - icon: fontawesome/brands/github
      link: https://github.com/lordmathis/llamactl