Remove misleading advanced section

2025-12-23 01:24:24 +00:00 · 2025-08-31 16:04:09 +02:00
parent 92af14b350
commit b08f15c5d0
9 changed files with 3 additions and 779 deletions
--- a/docs/advanced/backends.md
+++ b/docs/advanced/backends.md
@@ -1,316 +0,0 @@
-# Backends
-
-Llamactl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
-
-## Llama.cpp Backend
-
-The primary backend for Llamactl, providing robust support for GGUF models.
-
-### Features
-
- **GGUF Support**: Native support for GGUF model format
- **GPU Acceleration**: CUDA, OpenCL, and Metal support
- **Memory Optimization**: Efficient memory usage and mapping
- **Multi-threading**: Configurable CPU thread utilization
- **Quantization**: Support for various quantization levels
-
-### Configuration
-
-```yaml
-backends:
-  llamacpp:
-    binary_path: "/usr/local/bin/llama-server"
-    default_options:
-      threads: 4
-      context_size: 2048
-      batch_size: 512
-    gpu:
-      enabled: true
-      layers: 35
-```
-
-### Supported Options
-
-| Option | Description | Default |
-|--------|-------------|---------|
-| `threads` | Number of CPU threads | 4 |
-| `context_size` | Context window size | 2048 |
-| `batch_size` | Batch size for processing | 512 |
-| `gpu_layers` | Layers to offload to GPU | 0 |
-| `memory_lock` | Lock model in memory | false |
-| `no_mmap` | Disable memory mapping | false |
-| `rope_freq_base` | RoPE frequency base | 10000 |
-| `rope_freq_scale` | RoPE frequency scale | 1.0 |
-
-### GPU Acceleration
-
-#### CUDA Setup
-
-```bash
-# Install CUDA toolkit
-sudo apt update
-sudo apt install nvidia-cuda-toolkit
-
-# Verify CUDA installation
-nvcc --version
-nvidia-smi
-```
-
-#### Configuration for GPU
-
-```json
-{
-  "name": "gpu-accelerated",
-  "model_path": "/models/llama-2-13b.gguf",
-  "port": 8081,
-  "options": {
-    "gpu_layers": 35,
-    "threads": 2,
-    "context_size": 4096
-  }
-}
-```
-
-### Performance Tuning
-
-#### Memory Optimization
-
-```yaml
-# For limited memory systems
-options:
-  context_size: 1024
-  batch_size: 256
-  no_mmap: true
-  memory_lock: false
-
-# For high-memory systems
-options:
-  context_size: 8192
-  batch_size: 1024
-  memory_lock: true
-  no_mmap: false
-```
-
-#### CPU Optimization
-
-```yaml
-# Match thread count to CPU cores
-# For 8-core CPU:
-options:
-  threads: 6  # Leave 2 cores for system
-  
-# For high-performance CPUs:
-options:
-  threads: 16
-  batch_size: 1024
-```
-
-## Future Backends
-
-Llamactl is designed to support multiple backends. Planned additions:
-
-### vLLM Backend
-
-High-performance inference engine optimized for serving:
-
- **Features**: Fast inference, batching, streaming
- **Models**: Supports various model formats
- **Scaling**: Horizontal scaling support
-
-### TensorRT-LLM Backend
-
-NVIDIA's optimized inference engine:
-
- **Features**: Maximum GPU performance
- **Models**: Optimized for NVIDIA GPUs
- **Deployment**: Production-ready inference
-
-### Ollama Backend
-
-Integration with Ollama for easy model management:
-
- **Features**: Simplified model downloading
- **Models**: Large model library
- **Integration**: Seamless model switching
-
-## Backend Selection
-
-### Automatic Detection
-
-Llamactl can automatically detect the best backend:
-
-```yaml
-backends:
-  auto_detect: true
-  preference_order:
-    - "llamacpp"
-    - "vllm"
-    - "tensorrt"
-```
-
-### Manual Selection
-
-Force a specific backend for an instance:
-
-```json
-{
-  "name": "manual-backend",
-  "backend": "llamacpp",
-  "model_path": "/models/model.gguf",
-  "port": 8081
-}
-```
-
-## Backend-Specific Features
-
-### Llama.cpp Features
-
-#### Model Formats
-
- **GGUF**: Primary format, best compatibility
- **GGML**: Legacy format (limited support)
-
-#### Quantization Levels
-
- `Q2_K`: Smallest size, lower quality
- `Q4_K_M`: Balanced size and quality
- `Q5_K_M`: Higher quality, larger size
- `Q6_K`: Near-original quality
- `Q8_0`: Minimal loss, largest size
-
-#### Advanced Options
-
-```yaml
-advanced:
-  rope_scaling:
-    type: "linear"
-    factor: 2.0
-  attention:
-    flash_attention: true
-    grouped_query: true
-```
-
-## Monitoring Backend Performance
-
-### Metrics Collection
-
-Monitor backend-specific metrics:
-
-```bash
-# Get backend statistics
-curl http://localhost:8080/api/instances/my-instance/backend/stats
-```
-
-**Response:**
-```json
-{
-  "backend": "llamacpp",
-  "version": "b1234",
-  "metrics": {
-    "tokens_per_second": 15.2,
-    "memory_usage": 4294967296,
-    "gpu_utilization": 85.5,
-    "context_usage": 75.0
-  }
-}
-```
-
-### Performance Optimization
-
-#### Benchmark Different Configurations
-
-```bash
-# Test various thread counts
-for threads in 2 4 8 16; do
-  echo "Testing $threads threads"
-  curl -X PUT http://localhost:8080/api/instances/benchmark \
-    -d "{\"options\": {\"threads\": $threads}}"
-  # Run performance test
-done
-```
-
-#### Memory Usage Optimization
-
-```bash
-# Monitor memory usage
-watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
-```
-
-## Troubleshooting Backends
-
-### Common Llama.cpp Issues
-
-**Model won't load:**
-```bash
-# Check model file
-file /path/to/model.gguf
-
-# Verify format
-llama-server --model /path/to/model.gguf --dry-run
-```
-
-**GPU not detected:**
-```bash
-# Check CUDA installation
-nvidia-smi
-
-# Verify llama.cpp GPU support
-llama-server --help | grep -i gpu
-```
-
-**Performance issues:**
-```bash
-# Check system resources
-htop
-nvidia-smi
-
-# Verify configuration
-curl http://localhost:8080/api/instances/my-instance/config
-```
-
-## Custom Backend Development
-
-### Backend Interface
-
-Implement the backend interface for custom backends:
-
-```go
-type Backend interface {
-    Start(config InstanceConfig) error
-    Stop(instance *Instance) error
-    Health(instance *Instance) (*HealthStatus, error)
-    Stats(instance *Instance) (*Stats, error)
-}
-```
-
-### Registration
-
-Register your custom backend:
-
-```go
-func init() {
-    backends.Register("custom", &CustomBackend{})
-}
-```
-
-## Best Practices
-
-### Production Deployments
-
-1. **Resource allocation**: Plan for peak usage
-2. **Backend selection**: Choose based on requirements
-3. **Monitoring**: Set up comprehensive monitoring
-4. **Fallback**: Configure backup backends
-
-### Development
-
-1. **Rapid iteration**: Use smaller models
-2. **Resource monitoring**: Track usage patterns
-3. **Configuration testing**: Validate settings
-4. **Performance profiling**: Optimize bottlenecks
-
-## Next Steps
-
- Learn about [Monitoring](monitoring.md) backend performance
- Explore [Troubleshooting](troubleshooting.md) guides
- Set up [Production Monitoring](monitoring.md)
--- a/docs/advanced/monitoring.md
+++ b/docs/advanced/monitoring.md
@@ -1,420 +0,0 @@
-# Monitoring
-
-Comprehensive monitoring setup for Llamactl in production environments.
-
-## Overview
-
-Effective monitoring of Llamactl involves tracking:
-
- Instance health and performance
- System resource usage
- API response times
- Error rates and alerts
-
-## Built-in Monitoring
-
-### Health Checks
-
-Llamactl provides built-in health monitoring:
-
-```bash
-# Check overall system health
-curl http://localhost:8080/api/system/health
-
-# Check specific instance health
-curl http://localhost:8080/api/instances/{name}/health
-```
-
-### Metrics Endpoint
-
-Access Prometheus-compatible metrics:
-
-```bash
-curl http://localhost:8080/metrics
-```
-
-**Available Metrics:**
- `llamactl_instances_total`: Total number of instances
- `llamactl_instances_running`: Number of running instances
- `llamactl_instance_memory_bytes`: Instance memory usage
- `llamactl_instance_cpu_percent`: Instance CPU usage
- `llamactl_api_requests_total`: Total API requests
- `llamactl_api_request_duration_seconds`: API response times
-
-## Prometheus Integration
-
-### Configuration
-
-Add Llamactl as a Prometheus target:
-
-```yaml
-# prometheus.yml
-scrape_configs:
-  - job_name: 'llamactl'
-    static_configs:
-      - targets: ['localhost:8080']
-    metrics_path: '/metrics'
-    scrape_interval: 15s
-```
-
-### Custom Metrics
-
-Enable additional metrics in Llamactl:
-
-```yaml
-# config.yaml
-monitoring:
-  enabled: true
-  prometheus:
-    enabled: true
-    path: "/metrics"
-  metrics:
-    - instance_stats
-    - api_performance
-    - system_resources
-```
-
-## Grafana Dashboards
-
-### Llamactl Dashboard
-
-Import the official Grafana dashboard:
-
-1. Download dashboard JSON from releases
-2. Import into Grafana
-3. Configure Prometheus data source
-
-### Key Panels
-
-**Instance Overview:**
- Instance count and status
- Resource usage per instance
- Health status indicators
-
-**Performance Metrics:**
- API response times
- Tokens per second
- Memory usage trends
-
-**System Resources:**
- CPU and memory utilization
- Disk I/O and network usage
- GPU utilization (if applicable)
-
-### Custom Queries
-
-**Instance Uptime:**
-```promql
-(time() - llamactl_instance_start_time_seconds) / 3600
-```
-
-**Memory Usage Percentage:**
-```promql
-(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
-```
-
-**API Error Rate:**
-```promql
-rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
-```
-
-## Alerting
-
-### Prometheus Alerts
-
-Configure alerts for critical conditions:
-
-```yaml
-# alerts.yml
-groups:
-  - name: llamactl
-    rules:
-      - alert: InstanceDown
-        expr: llamactl_instance_up == 0
-        for: 1m
-        labels:
-          severity: critical
-        annotations:
-          summary: "Llamactl instance {{ $labels.instance_name }} is down"
-          
-      - alert: HighMemoryUsage
-        expr: llamactl_instance_memory_percent > 90
-        for: 5m
-        labels:
-          severity: warning
-        annotations:
-          summary: "High memory usage on {{ $labels.instance_name }}"
-          
-      - alert: APIHighLatency
-        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
-        for: 2m
-        labels:
-          severity: warning
-        annotations:
-          summary: "High API latency detected"
-```
-
-### Notification Channels
-
-Configure alert notifications:
-
-**Slack Integration:**
-```yaml
-# alertmanager.yml
-route:
-  group_by: ['alertname']
-  receiver: 'slack'
-
-receivers:
-  - name: 'slack'
-    slack_configs:
-      - api_url: 'https://hooks.slack.com/services/...'
-        channel: '#alerts'
-        title: 'Llamactl Alert'
-        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
-```
-
-## Log Management
-
-### Centralized Logging
-
-Configure log aggregation:
-
-```yaml
-# config.yaml
-logging:
-  level: "info"
-  output: "json"
-  destinations:
-    - type: "file"
-      path: "/var/log/llamactl/app.log"
-    - type: "syslog"
-      facility: "local0"
-    - type: "elasticsearch"
-      url: "http://elasticsearch:9200"
-```
-
-### Log Analysis
-
-Use ELK stack for log analysis:
-
-**Elasticsearch Index Template:**
-```json
-{
-  "index_patterns": ["llamactl-*"],
-  "mappings": {
-    "properties": {
-      "timestamp": {"type": "date"},
-      "level": {"type": "keyword"},
-      "message": {"type": "text"},
-      "instance": {"type": "keyword"},
-      "component": {"type": "keyword"}
-    }
-  }
-}
-```
-
-**Kibana Visualizations:**
- Log volume over time
- Error rate by instance
- Performance trends
- Resource usage patterns
-
-## Application Performance Monitoring
-
-### OpenTelemetry Integration
-
-Enable distributed tracing:
-
-```yaml
-# config.yaml
-telemetry:
-  enabled: true
-  otlp:
-    endpoint: "http://jaeger:14268/api/traces"
-  sampling_rate: 0.1
-```
-
-### Custom Spans
-
-Add custom tracing to track operations:
-
-```go
-ctx, span := tracer.Start(ctx, "instance.start")
-defer span.End()
-
-// Track instance startup time
-span.SetAttributes(
-    attribute.String("instance.name", name),
-    attribute.String("model.path", modelPath),
-)
-```
-
-## Health Check Configuration
-
-### Readiness Probes
-
-Configure Kubernetes readiness probes:
-
-```yaml
-readinessProbe:
-  httpGet:
-    path: /api/health
-    port: 8080
-  initialDelaySeconds: 30
-  periodSeconds: 10
-```
-
-### Liveness Probes
-
-Configure liveness probes:
-
-```yaml
-livenessProbe:
-  httpGet:
-    path: /api/health/live
-    port: 8080
-  initialDelaySeconds: 60
-  periodSeconds: 30
-```
-
-### Custom Health Checks
-
-Implement custom health checks:
-
-```go
-func (h *HealthHandler) CustomCheck(ctx context.Context) error {
-    // Check database connectivity
-    if err := h.db.Ping(); err != nil {
-        return fmt.Errorf("database unreachable: %w", err)
-    }
-    
-    // Check instance responsiveness
-    for _, instance := range h.instances {
-        if !instance.IsHealthy() {
-            return fmt.Errorf("instance %s unhealthy", instance.Name)
-        }
-    }
-    
-    return nil
-}
-```
-
-## Performance Profiling
-
-### pprof Integration
-
-Enable Go profiling:
-
-```yaml
-# config.yaml
-debug:
-  pprof_enabled: true
-  pprof_port: 6060
-```
-
-Access profiling endpoints:
-```bash
-# CPU profile
-go tool pprof http://localhost:6060/debug/pprof/profile
-
-# Memory profile
-go tool pprof http://localhost:6060/debug/pprof/heap
-
-# Goroutine profile
-go tool pprof http://localhost:6060/debug/pprof/goroutine
-```
-
-### Continuous Profiling
-
-Set up continuous profiling with Pyroscope:
-
-```yaml
-# config.yaml
-profiling:
-  enabled: true
-  pyroscope:
-    server_address: "http://pyroscope:4040"
-    application_name: "llamactl"
-```
-
-## Security Monitoring
-
-### Audit Logging
-
-Enable security audit logs:
-
-```yaml
-# config.yaml
-audit:
-  enabled: true
-  log_file: "/var/log/llamactl/audit.log"
-  events:
-    - "auth.login"
-    - "auth.logout"
-    - "instance.create"
-    - "instance.delete"
-    - "config.update"
-```
-
-### Rate Limiting Monitoring
-
-Track rate limiting metrics:
-
-```bash
-# Monitor rate limit hits
-curl http://localhost:8080/metrics | grep rate_limit
-```
-
-## Troubleshooting Monitoring
-
-### Common Issues
-
-**Metrics not appearing:**
-1. Check Prometheus configuration
-2. Verify network connectivity
-3. Review Llamactl logs for errors
-
-**High memory usage:**
-1. Check for memory leaks in profiles
-2. Monitor garbage collection metrics
-3. Review instance configurations
-
-**Alert fatigue:**
-1. Tune alert thresholds
-2. Implement alert severity levels
-3. Use alert routing and suppression
-
-### Debug Tools
-
-**Monitoring health:**
-```bash
-# Check monitoring endpoints
-curl -v http://localhost:8080/metrics
-curl -v http://localhost:8080/api/health
-
-# Review logs
-tail -f /var/log/llamactl/app.log
-```
-
-## Best Practices
-
-### Production Monitoring
-
-1. **Comprehensive coverage**: Monitor all critical components
-2. **Appropriate alerting**: Balance sensitivity and noise
-3. **Regular review**: Analyze trends and patterns
-4. **Documentation**: Maintain runbooks for alerts
-
-### Performance Optimization
-
-1. **Baseline establishment**: Know normal operating parameters
-2. **Trend analysis**: Identify performance degradation early
-3. **Capacity planning**: Monitor resource growth trends
-4. **Optimization cycles**: Regular performance tuning
-
-## Next Steps
-
- Set up [Troubleshooting](troubleshooting.md) procedures
- Learn about [Backend optimization](backends.md)
- Configure [Production deployment](../development/building.md)
--- a/docs/getting-started/configuration.md
+++ b/docs/getting-started/configuration.md
@@ -148,15 +148,3 @@ llamactl --help
 ```

 You can also override configuration using command line flags when starting llamactl.
-
-## Next Steps
-
- Learn about [Managing Instances](../user-guide/managing-instances.md)
- Explore [Advanced Configuration](../advanced/monitoring.md)
- Set up [Monitoring](../advanced/monitoring.md)
-
-## Next Steps
-
- Learn about [Managing Instances](../user-guide/managing-instances.md)
- Explore [Advanced Configuration](../advanced/monitoring.md)
- Set up [Monitoring](../advanced/monitoring.md)
--- a/docs/index.md
+++ b/docs/index.md
@@ -40,14 +40,13 @@ Llamactl is designed to simplify the deployment and management of llama-server i
 - [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
 - [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
 - [API Reference](user-guide/api-reference.md) - Complete API documentation
- [Monitoring](advanced/monitoring.md) - Health checks and monitoring
- [Backends](advanced/backends.md) - Backend configuration options
+

 ## Getting Help

 If you need help or have questions:

- Check the [Troubleshooting](advanced/troubleshooting.md) guide
+- Check the [Troubleshooting](user-guide/troubleshooting.md) guide
 - Visit the [GitHub repository](https://github.com/lordmathis/llamactl)
 - Review the [Configuration Guide](getting-started/configuration.md) for advanced settings

--- a/docs/user-guide/api-reference.md
+++ b/docs/user-guide/api-reference.md
@@ -462,9 +462,3 @@ curl -X POST http://localhost:8080/api/instances/example/stop
 # Delete instance
 curl -X DELETE http://localhost:8080/api/instances/example
 ```
-
-## Next Steps
-
- Learn about [Managing Instances](managing-instances.md) in detail
- Explore [Advanced Configuration](../advanced/backends.md)
- Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/managing-instances.md
+++ b/docs/user-guide/managing-instances.md
@@ -163,9 +163,3 @@ curl -X POST http://localhost:8080/api/instances/stop-all
 # Get status of all instances
 curl http://localhost:8080/api/instances
 ```
-
-## Next Steps
-
- Learn about the [Web UI](web-ui.md) interface
- Explore the complete [API Reference](api-reference.md)
- Set up [Monitoring](../advanced/monitoring.md) for production use
--- a/docs/user-guide/troubleshooting.md
+++ b/docs/user-guide/troubleshooting.md
@@ -552,9 +552,3 @@ cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
 # Backup instance configurations
 curl http://localhost:8080/api/instances > instances-backup.json
 ```
-
-## Next Steps
-
- Set up [Monitoring](monitoring.md) to prevent issues
- Learn about [Advanced Configuration](backends.md)
- Review [Best Practices](../development/contributing.md)
--- a/docs/user-guide/web-ui.md
+++ b/docs/user-guide/web-ui.md
@@ -208,9 +208,3 @@ Some features may be limited on mobile:
 - Log viewing (use horizontal scrolling)
 - Complex configuration forms
 - File browser functionality
-
-## Next Steps
-
- Learn about [API Reference](api-reference.md) for programmatic access
- Set up [Monitoring](../advanced/monitoring.md) for production use
- Explore [Advanced Configuration](../advanced/backends.md) options