Remove misleading advanced section

2025-12-23 17:44:24 +00:00 · 2025-08-31 16:04:09 +02:00
parent 92af14b350
commit b08f15c5d0
9 changed files with 3 additions and 779 deletions
--- a/docs/advanced/backends.md
+++ b/docs/advanced/backends.md
@@ -1,316 +0,0 @@
 # Backends
 Llamactl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
 ## Llama.cpp Backend
 The primary backend for Llamactl, providing robust support for GGUF models.
 ### Features
 - **GGUF Support**: Native support for GGUF model format
 - **GPU Acceleration**: CUDA, OpenCL, and Metal support
 - **Memory Optimization**: Efficient memory usage and mapping
 - **Multi-threading**: Configurable CPU thread utilization
 - **Quantization**: Support for various quantization levels
 ### Configuration
 ```yaml
 backends:
  llamacpp:
    binary_path: "/usr/local/bin/llama-server"
    default_options:
      threads: 4
      context_size: 2048
      batch_size: 512
    gpu:
      enabled: true
      layers: 35
 ```
 ### Supported Options
 | Option | Description | Default |
 |--------|-------------|---------|
 | `threads` | Number of CPU threads | 4 |
 | `context_size` | Context window size | 2048 |
 | `batch_size` | Batch size for processing | 512 |
 | `gpu_layers` | Layers to offload to GPU | 0 |
 | `memory_lock` | Lock model in memory | false |
 | `no_mmap` | Disable memory mapping | false |
 | `rope_freq_base` | RoPE frequency base | 10000 |
 | `rope_freq_scale` | RoPE frequency scale | 1.0 |
 ### GPU Acceleration
 #### CUDA Setup
 ```bash
 # Install CUDA toolkit
 sudo apt update
 sudo apt install nvidia-cuda-toolkit
 # Verify CUDA installation
 nvcc --version
 nvidia-smi
 ```
 #### Configuration for GPU
 ```json
 {
  "name": "gpu-accelerated",
  "model_path": "/models/llama-2-13b.gguf",
  "port": 8081,
  "options": {
    "gpu_layers": 35,
    "threads": 2,
    "context_size": 4096
  }
 }
 ```
 ### Performance Tuning
 #### Memory Optimization
 ```yaml
 # For limited memory systems
 options:
  context_size: 1024
  batch_size: 256
  no_mmap: true
  memory_lock: false
 # For high-memory systems
 options:
  context_size: 8192
  batch_size: 1024
  memory_lock: true
  no_mmap: false
 ```
 #### CPU Optimization
 ```yaml
 # Match thread count to CPU cores
 # For 8-core CPU:
 options:
  threads: 6  # Leave 2 cores for system
 # For high-performance CPUs:
 options:
  threads: 16
  batch_size: 1024
 ```
 ## Future Backends
 Llamactl is designed to support multiple backends. Planned additions:
 ### vLLM Backend
 High-performance inference engine optimized for serving:
 - **Features**: Fast inference, batching, streaming
 - **Models**: Supports various model formats
 - **Scaling**: Horizontal scaling support
 ### TensorRT-LLM Backend
 NVIDIA's optimized inference engine:
 - **Features**: Maximum GPU performance
 - **Models**: Optimized for NVIDIA GPUs
 - **Deployment**: Production-ready inference
 ### Ollama Backend
 Integration with Ollama for easy model management:
 - **Features**: Simplified model downloading
 - **Models**: Large model library
 - **Integration**: Seamless model switching
 ## Backend Selection
 ### Automatic Detection
 Llamactl can automatically detect the best backend:
 ```yaml
 backends:
  auto_detect: true
  preference_order:
    - "llamacpp"
    - "vllm"
    - "tensorrt"
 ```
 ### Manual Selection
 Force a specific backend for an instance:
 ```json
 {
  "name": "manual-backend",
  "backend": "llamacpp",
  "model_path": "/models/model.gguf",
  "port": 8081
 }
 ```
 ## Backend-Specific Features
 ### Llama.cpp Features
 #### Model Formats
 - **GGUF**: Primary format, best compatibility
 - **GGML**: Legacy format (limited support)
 #### Quantization Levels
 - `Q2_K`: Smallest size, lower quality
 - `Q4_K_M`: Balanced size and quality
 - `Q5_K_M`: Higher quality, larger size
 - `Q6_K`: Near-original quality
 - `Q8_0`: Minimal loss, largest size
 #### Advanced Options
 ```yaml
 advanced:
  rope_scaling:
    type: "linear"
    factor: 2.0
  attention:
    flash_attention: true
    grouped_query: true
 ```
 ## Monitoring Backend Performance
 ### Metrics Collection
 Monitor backend-specific metrics:
 ```bash
 # Get backend statistics
 curl http://localhost:8080/api/instances/my-instance/backend/stats
 ```
 **Response:**
 ```json
 {
  "backend": "llamacpp",
  "version": "b1234",
  "metrics": {
    "tokens_per_second": 15.2,
    "memory_usage": 4294967296,
    "gpu_utilization": 85.5,
    "context_usage": 75.0
  }
 }
 ```
 ### Performance Optimization
 #### Benchmark Different Configurations
 ```bash
 # Test various thread counts
 for threads in 2 4 8 16; do
  echo "Testing $threads threads"
  curl -X PUT http://localhost:8080/api/instances/benchmark \
    -d "{\"options\": {\"threads\": $threads}}"
  # Run performance test
 done
 ```
 #### Memory Usage Optimization
 ```bash
 # Monitor memory usage
 watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
 ```
 ## Troubleshooting Backends
 ### Common Llama.cpp Issues
 **Model won't load:**
 ```bash
 # Check model file
 file /path/to/model.gguf
 # Verify format
 llama-server --model /path/to/model.gguf --dry-run
 ```
 **GPU not detected:**
 ```bash
 # Check CUDA installation
 nvidia-smi
 # Verify llama.cpp GPU support
 llama-server --help | grep -i gpu
 ```
 **Performance issues:**
 ```bash
 # Check system resources
 htop
 nvidia-smi
 # Verify configuration
 curl http://localhost:8080/api/instances/my-instance/config
 ```
 ## Custom Backend Development
 ### Backend Interface
 Implement the backend interface for custom backends:
 ```go
 type Backend interface {
    Start(config InstanceConfig) error
    Stop(instance *Instance) error
    Health(instance *Instance) (*HealthStatus, error)
    Stats(instance *Instance) (*Stats, error)
 }
 ```
 ### Registration
 Register your custom backend:
 ```go
 func init() {
    backends.Register("custom", &CustomBackend{})
 }
 ```
 ## Best Practices
 ### Production Deployments
 1. **Resource allocation**: Plan for peak usage
 2. **Backend selection**: Choose based on requirements
 3. **Monitoring**: Set up comprehensive monitoring
 4. **Fallback**: Configure backup backends
 ### Development
 1. **Rapid iteration**: Use smaller models
 2. **Resource monitoring**: Track usage patterns
 3. **Configuration testing**: Validate settings
 4. **Performance profiling**: Optimize bottlenecks
 ## Next Steps
 - Learn about [Monitoring](monitoring.md) backend performance
 - Explore [Troubleshooting](troubleshooting.md) guides
 - Set up [Production Monitoring](monitoring.md)
--- a/docs/advanced/monitoring.md
+++ b/docs/advanced/monitoring.md
@@ -1,420 +0,0 @@
 # Monitoring
 Comprehensive monitoring setup for Llamactl in production environments.
 ## Overview
 Effective monitoring of Llamactl involves tracking:
 - Instance health and performance
 - System resource usage
 - API response times
 - Error rates and alerts
 ## Built-in Monitoring
 ### Health Checks
 Llamactl provides built-in health monitoring:
 ```bash
 # Check overall system health
 curl http://localhost:8080/api/system/health
 # Check specific instance health
 curl http://localhost:8080/api/instances/{name}/health
 ```
 ### Metrics Endpoint
 Access Prometheus-compatible metrics:
 ```bash
 curl http://localhost:8080/metrics
 ```
 **Available Metrics:**
 - `llamactl_instances_total`: Total number of instances
 - `llamactl_instances_running`: Number of running instances
 - `llamactl_instance_memory_bytes`: Instance memory usage
 - `llamactl_instance_cpu_percent`: Instance CPU usage
 - `llamactl_api_requests_total`: Total API requests
 - `llamactl_api_request_duration_seconds`: API response times
 ## Prometheus Integration
 ### Configuration
 Add Llamactl as a Prometheus target:
 ```yaml
 # prometheus.yml
 scrape_configs:
  - job_name: 'llamactl'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s
 ```
 ### Custom Metrics
 Enable additional metrics in Llamactl:
 ```yaml
 # config.yaml
 monitoring:
  enabled: true
  prometheus:
    enabled: true
    path: "/metrics"
  metrics:
    - instance_stats
    - api_performance
    - system_resources
 ```
 ## Grafana Dashboards
 ### Llamactl Dashboard
 Import the official Grafana dashboard:
 1. Download dashboard JSON from releases
 2. Import into Grafana
 3. Configure Prometheus data source
 ### Key Panels
 **Instance Overview:**
 - Instance count and status
 - Resource usage per instance
 - Health status indicators
 **Performance Metrics:**
 - API response times
 - Tokens per second
 - Memory usage trends
 **System Resources:**
 - CPU and memory utilization
 - Disk I/O and network usage
 - GPU utilization (if applicable)
 ### Custom Queries
 **Instance Uptime:**
 ```promql
 (time() - llamactl_instance_start_time_seconds) / 3600
 ```
 **Memory Usage Percentage:**
 ```promql
 (llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
 ```
 **API Error Rate:**
 ```promql
 rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
 ```
 ## Alerting
 ### Prometheus Alerts
 Configure alerts for critical conditions:
 ```yaml
 # alerts.yml
 groups:
  - name: llamactl
    rules:
      - alert: InstanceDown
        expr: llamactl_instance_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Llamactl instance {{ $labels.instance_name }} is down"
      - alert: HighMemoryUsage
        expr: llamactl_instance_memory_percent > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance_name }}"
      - alert: APIHighLatency
        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High API latency detected"
 ```
 ### Notification Channels
 Configure alert notifications:
 **Slack Integration:**
 ```yaml
 # alertmanager.yml
 route:
  group_by: ['alertname']
  receiver: 'slack'
 receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/...'
        channel: '#alerts'
        title: 'Llamactl Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
 ```
 ## Log Management
 ### Centralized Logging
 Configure log aggregation:
 ```yaml
 # config.yaml
 logging:
  level: "info"
  output: "json"
  destinations:
    - type: "file"
      path: "/var/log/llamactl/app.log"
    - type: "syslog"
      facility: "local0"
    - type: "elasticsearch"
      url: "http://elasticsearch:9200"
 ```
 ### Log Analysis
 Use ELK stack for log analysis:
 **Elasticsearch Index Template:**
 ```json
 {
  "index_patterns": ["llamactl-*"],
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "level": {"type": "keyword"},
      "message": {"type": "text"},
      "instance": {"type": "keyword"},
      "component": {"type": "keyword"}
    }
  }
 }
 ```
 **Kibana Visualizations:**
 - Log volume over time
 - Error rate by instance
 - Performance trends
 - Resource usage patterns
 ## Application Performance Monitoring
 ### OpenTelemetry Integration
 Enable distributed tracing:
 ```yaml
 # config.yaml
 telemetry:
  enabled: true
  otlp:
    endpoint: "http://jaeger:14268/api/traces"
  sampling_rate: 0.1
 ```
 ### Custom Spans
 Add custom tracing to track operations:
 ```go
 ctx, span := tracer.Start(ctx, "instance.start")
 defer span.End()
 // Track instance startup time
 span.SetAttributes(
    attribute.String("instance.name", name),
    attribute.String("model.path", modelPath),
 )
 ```
 ## Health Check Configuration
 ### Readiness Probes
 Configure Kubernetes readiness probes:
 ```yaml
 readinessProbe:
  httpGet:
    path: /api/health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
 ```
 ### Liveness Probes
 Configure liveness probes:
 ```yaml
 livenessProbe:
  httpGet:
    path: /api/health/live
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30
 ```
 ### Custom Health Checks
 Implement custom health checks:
 ```go
 func (h *HealthHandler) CustomCheck(ctx context.Context) error {
    // Check database connectivity
    if err := h.db.Ping(); err != nil {
        return fmt.Errorf("database unreachable: %w", err)
    }
    // Check instance responsiveness
    for _, instance := range h.instances {
        if !instance.IsHealthy() {
            return fmt.Errorf("instance %s unhealthy", instance.Name)
        }
    }
    return nil
 }
 ```
 ## Performance Profiling
 ### pprof Integration
 Enable Go profiling:
 ```yaml
 # config.yaml
 debug:
  pprof_enabled: true
  pprof_port: 6060
 ```
 Access profiling endpoints:
 ```bash
 # CPU profile
 go tool pprof http://localhost:6060/debug/pprof/profile
 # Memory profile
 go tool pprof http://localhost:6060/debug/pprof/heap
 # Goroutine profile
 go tool pprof http://localhost:6060/debug/pprof/goroutine
 ```
 ### Continuous Profiling
 Set up continuous profiling with Pyroscope:
 ```yaml
 # config.yaml
 profiling:
  enabled: true
  pyroscope:
    server_address: "http://pyroscope:4040"
    application_name: "llamactl"
 ```
 ## Security Monitoring
 ### Audit Logging
 Enable security audit logs:
 ```yaml
 # config.yaml
 audit:
  enabled: true
  log_file: "/var/log/llamactl/audit.log"
  events:
    - "auth.login"
    - "auth.logout"
    - "instance.create"
    - "instance.delete"
    - "config.update"
 ```
 ### Rate Limiting Monitoring
 Track rate limiting metrics:
 ```bash
 # Monitor rate limit hits
 curl http://localhost:8080/metrics | grep rate_limit
 ```
 ## Troubleshooting Monitoring
 ### Common Issues
 **Metrics not appearing:**
 1. Check Prometheus configuration
 2. Verify network connectivity
 3. Review Llamactl logs for errors
 **High memory usage:**
 1. Check for memory leaks in profiles
 2. Monitor garbage collection metrics
 3. Review instance configurations
 **Alert fatigue:**
 1. Tune alert thresholds
 2. Implement alert severity levels
 3. Use alert routing and suppression
 ### Debug Tools
 **Monitoring health:**
 ```bash
 # Check monitoring endpoints
 curl -v http://localhost:8080/metrics
 curl -v http://localhost:8080/api/health
 # Review logs
 tail -f /var/log/llamactl/app.log
 ```
 ## Best Practices
 ### Production Monitoring
 1. **Comprehensive coverage**: Monitor all critical components
 2. **Appropriate alerting**: Balance sensitivity and noise
 3. **Regular review**: Analyze trends and patterns
 4. **Documentation**: Maintain runbooks for alerts
 ### Performance Optimization
 1. **Baseline establishment**: Know normal operating parameters
 2. **Trend analysis**: Identify performance degradation early
 3. **Capacity planning**: Monitor resource growth trends
 4. **Optimization cycles**: Regular performance tuning
 ## Next Steps
 - Set up [Troubleshooting](troubleshooting.md) procedures
 - Learn about [Backend optimization](backends.md)
 - Configure [Production deployment](../development/building.md)
--- a/docs/getting-started/configuration.md
+++ b/docs/getting-started/configuration.md
@@ -148,15 +148,3 @@ llamactl --help
 ```
 You can also override configuration using command line flags when starting llamactl.
 ## Next Steps
 - Learn about [Managing Instances](../user-guide/managing-instances.md)
 - Explore [Advanced Configuration](../advanced/monitoring.md)
 - Set up [Monitoring](../advanced/monitoring.md)
 ## Next Steps
 - Learn about [Managing Instances](../user-guide/managing-instances.md)
 - Explore [Advanced Configuration](../advanced/monitoring.md)
 - Set up [Monitoring](../advanced/monitoring.md)
--- a/docs/index.md
+++ b/docs/index.md
@@ -40,14 +40,13 @@ Llamactl is designed to simplify the deployment and management of llama-server i
 - [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
 - [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
 - [API Reference](user-guide/api-reference.md) - Complete API documentation
- [Monitoring](advanced/monitoring.md) - Health checks and monitoring
+
 - [Backends](advanced/backends.md) - Backend configuration options
 ## Getting Help
 If you need help or have questions:
- Check the [Troubleshooting](advanced/troubleshooting.md) guide
+- Check the [Troubleshooting](user-guide/troubleshooting.md) guide
 - Visit the [GitHub repository](https://github.com/lordmathis/llamactl)
 - Review the [Configuration Guide](getting-started/configuration.md) for advanced settings
--- a/docs/user-guide/api-reference.md
+++ b/docs/user-guide/api-reference.md
@@ -462,9 +462,3 @@ curl -X POST http://localhost:8080/api/instances/example/stop
 # Delete instance
 curl -X DELETE http://localhost:8080/api/instances/example
 ```
 ## Next Steps
 - Learn about [Managing Instances](managing-instances.md) in detail
 - Explore [Advanced Configuration](../advanced/backends.md)
 - Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/managing-instances.md
+++ b/docs/user-guide/managing-instances.md
@@ -163,9 +163,3 @@ curl -X POST http://localhost:8080/api/instances/stop-all
 # Get status of all instances
 curl http://localhost:8080/api/instances
 ```
 ## Next Steps
 - Learn about the [Web UI](web-ui.md) interface
 - Explore the complete [API Reference](api-reference.md)
 - Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/troubleshooting.md
+++ b/docs/user-guide/troubleshooting.md
@@ -552,9 +552,3 @@ cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
 # Backup instance configurations
 curl http://localhost:8080/api/instances > instances-backup.json
 ```
 ## Next Steps
 - Set up [Monitoring](monitoring.md) to prevent issues
 - Learn about [Advanced Configuration](backends.md)
 - Review [Best Practices](../development/contributing.md)
--- a/docs/user-guide/web-ui.md
+++ b/docs/user-guide/web-ui.md
@@ -208,9 +208,3 @@ Some features may be limited on mobile:
 - Log viewing (use horizontal scrolling)
 - Complex configuration forms
 - File browser functionality
 ## Next Steps
 - Learn about [API Reference](api-reference.md) for programmatic access
 - Set up [Monitoring](../advanced/monitoring.md) for production use
 - Explore [Advanced Configuration](../advanced/backends.md) options
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -57,10 +57,7 @@ nav:
    - Managing Instances: user-guide/managing-instances.md
    - Web UI: user-guide/web-ui.md
    - API Reference: user-guide/api-reference.md
-  - Advanced:
+    - Troubleshooting: user-guide/troubleshooting.md
    - Backends: advanced/backends.md
    - Monitoring: advanced/monitoring.md
    - Troubleshooting: advanced/troubleshooting.md
 plugins:
  - search