Create initial documentation structure

2025-11-06 09:04:27 +00:00 · 2025-08-31 14:27:00 +02:00
parent 7675271370
commit bd31c03f4a
16 changed files with 3514 additions and 0 deletions
--- a/docs/advanced/monitoring.md
+++ b/docs/advanced/monitoring.md
@@ -0,0 +1,420 @@
+# Monitoring
+
+Comprehensive monitoring setup for LlamaCtl in production environments.
+
+## Overview
+
+Effective monitoring of LlamaCtl involves tracking:
+
+- Instance health and performance
+- System resource usage
+- API response times
+- Error rates and alerts
+
+## Built-in Monitoring
+
+### Health Checks
+
+LlamaCtl provides built-in health monitoring:
+
+```bash
+# Check overall system health
+curl http://localhost:8080/api/system/health
+
+# Check specific instance health
+curl http://localhost:8080/api/instances/{name}/health
+```
+
+### Metrics Endpoint
+
+Access Prometheus-compatible metrics:
+
+```bash
+curl http://localhost:8080/metrics
+```
+
+**Available Metrics:**
+- `llamactl_instances_total`: Total number of instances
+- `llamactl_instances_running`: Number of running instances
+- `llamactl_instance_memory_bytes`: Instance memory usage
+- `llamactl_instance_cpu_percent`: Instance CPU usage
+- `llamactl_api_requests_total`: Total API requests
+- `llamactl_api_request_duration_seconds`: API response times
+
+## Prometheus Integration
+
+### Configuration
+
+Add LlamaCtl as a Prometheus target:
+
+```yaml
+# prometheus.yml
+scrape_configs:
+  - job_name: 'llamactl'
+    static_configs:
+      - targets: ['localhost:8080']
+    metrics_path: '/metrics'
+    scrape_interval: 15s
+```
+
+### Custom Metrics
+
+Enable additional metrics in LlamaCtl:
+
+```yaml
+# config.yaml
+monitoring:
+  enabled: true
+  prometheus:
+    enabled: true
+    path: "/metrics"
+  metrics:
+    - instance_stats
+    - api_performance
+    - system_resources
+```
+
+## Grafana Dashboards
+
+### LlamaCtl Dashboard
+
+Import the official Grafana dashboard:
+
+1. Download dashboard JSON from releases
+2. Import into Grafana
+3. Configure Prometheus data source
+
+### Key Panels
+
+**Instance Overview:**
+- Instance count and status
+- Resource usage per instance
+- Health status indicators
+
+**Performance Metrics:**
+- API response times
+- Tokens per second
+- Memory usage trends
+
+**System Resources:**
+- CPU and memory utilization
+- Disk I/O and network usage
+- GPU utilization (if applicable)
+
+### Custom Queries
+
+**Instance Uptime:**
+```promql
+(time() - llamactl_instance_start_time_seconds) / 3600
+```
+
+**Memory Usage Percentage:**
+```promql
+(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
+```
+
+**API Error Rate:**
+```promql
+rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
+```
+
+## Alerting
+
+### Prometheus Alerts
+
+Configure alerts for critical conditions:
+
+```yaml
+# alerts.yml
+groups:
+  - name: llamactl
+    rules:
+      - alert: InstanceDown
+        expr: llamactl_instance_up == 0
+        for: 1m
+        labels:
+          severity: critical
+        annotations:
+          summary: "LlamaCtl instance {{ $labels.instance_name }} is down"
+          
+      - alert: HighMemoryUsage
+        expr: llamactl_instance_memory_percent > 90
+        for: 5m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High memory usage on {{ $labels.instance_name }}"
+          
+      - alert: APIHighLatency
+        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
+        for: 2m
+        labels:
+          severity: warning
+        annotations:
+          summary: "High API latency detected"
+```
+
+### Notification Channels
+
+Configure alert notifications:
+
+**Slack Integration:**
+```yaml
+# alertmanager.yml
+route:
+  group_by: ['alertname']
+  receiver: 'slack'
+
+receivers:
+  - name: 'slack'
+    slack_configs:
+      - api_url: 'https://hooks.slack.com/services/...'
+        channel: '#alerts'
+        title: 'LlamaCtl Alert'
+        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
+```
+
+## Log Management
+
+### Centralized Logging
+
+Configure log aggregation:
+
+```yaml
+# config.yaml
+logging:
+  level: "info"
+  output: "json"
+  destinations:
+    - type: "file"
+      path: "/var/log/llamactl/app.log"
+    - type: "syslog"
+      facility: "local0"
+    - type: "elasticsearch"
+      url: "http://elasticsearch:9200"
+```
+
+### Log Analysis
+
+Use ELK stack for log analysis:
+
+**Elasticsearch Index Template:**
+```json
+{
+  "index_patterns": ["llamactl-*"],
+  "mappings": {
+    "properties": {
+      "timestamp": {"type": "date"},
+      "level": {"type": "keyword"},
+      "message": {"type": "text"},
+      "instance": {"type": "keyword"},
+      "component": {"type": "keyword"}
+    }
+  }
+}
+```
+
+**Kibana Visualizations:**
+- Log volume over time
+- Error rate by instance
+- Performance trends
+- Resource usage patterns
+
+## Application Performance Monitoring
+
+### OpenTelemetry Integration
+
+Enable distributed tracing:
+
+```yaml
+# config.yaml
+telemetry:
+  enabled: true
+  otlp:
+    endpoint: "http://jaeger:14268/api/traces"
+  sampling_rate: 0.1
+```
+
+### Custom Spans
+
+Add custom tracing to track operations:
+
+```go
+ctx, span := tracer.Start(ctx, "instance.start")
+defer span.End()
+
+// Track instance startup time
+span.SetAttributes(
+    attribute.String("instance.name", name),
+    attribute.String("model.path", modelPath),
+)
+```
+
+## Health Check Configuration
+
+### Readiness Probes
+
+Configure Kubernetes readiness probes:
+
+```yaml
+readinessProbe:
+  httpGet:
+    path: /api/health
+    port: 8080
+  initialDelaySeconds: 30
+  periodSeconds: 10
+```
+
+### Liveness Probes
+
+Configure liveness probes:
+
+```yaml
+livenessProbe:
+  httpGet:
+    path: /api/health/live
+    port: 8080
+  initialDelaySeconds: 60
+  periodSeconds: 30
+```
+
+### Custom Health Checks
+
+Implement custom health checks:
+
+```go
+func (h *HealthHandler) CustomCheck(ctx context.Context) error {
+    // Check database connectivity
+    if err := h.db.Ping(); err != nil {
+        return fmt.Errorf("database unreachable: %w", err)
+    }
+    
+    // Check instance responsiveness
+    for _, instance := range h.instances {
+        if !instance.IsHealthy() {
+            return fmt.Errorf("instance %s unhealthy", instance.Name)
+        }
+    }
+    
+    return nil
+}
+```
+
+## Performance Profiling
+
+### pprof Integration
+
+Enable Go profiling:
+
+```yaml
+# config.yaml
+debug:
+  pprof_enabled: true
+  pprof_port: 6060
+```
+
+Access profiling endpoints:
+```bash
+# CPU profile
+go tool pprof http://localhost:6060/debug/pprof/profile
+
+# Memory profile
+go tool pprof http://localhost:6060/debug/pprof/heap
+
+# Goroutine profile
+go tool pprof http://localhost:6060/debug/pprof/goroutine
+```
+
+### Continuous Profiling
+
+Set up continuous profiling with Pyroscope:
+
+```yaml
+# config.yaml
+profiling:
+  enabled: true
+  pyroscope:
+    server_address: "http://pyroscope:4040"
+    application_name: "llamactl"
+```
+
+## Security Monitoring
+
+### Audit Logging
+
+Enable security audit logs:
+
+```yaml
+# config.yaml
+audit:
+  enabled: true
+  log_file: "/var/log/llamactl/audit.log"
+  events:
+    - "auth.login"
+    - "auth.logout"
+    - "instance.create"
+    - "instance.delete"
+    - "config.update"
+```
+
+### Rate Limiting Monitoring
+
+Track rate limiting metrics:
+
+```bash
+# Monitor rate limit hits
+curl http://localhost:8080/metrics | grep rate_limit
+```
+
+## Troubleshooting Monitoring
+
+### Common Issues
+
+**Metrics not appearing:**
+1. Check Prometheus configuration
+2. Verify network connectivity
+3. Review LlamaCtl logs for errors
+
+**High memory usage:**
+1. Check for memory leaks in profiles
+2. Monitor garbage collection metrics
+3. Review instance configurations
+
+**Alert fatigue:**
+1. Tune alert thresholds
+2. Implement alert severity levels
+3. Use alert routing and suppression
+
+### Debug Tools
+
+**Monitoring health:**
+```bash
+# Check monitoring endpoints
+curl -v http://localhost:8080/metrics
+curl -v http://localhost:8080/api/health
+
+# Review logs
+tail -f /var/log/llamactl/app.log
+```
+
+## Best Practices
+
+### Production Monitoring
+
+1. **Comprehensive coverage**: Monitor all critical components
+2. **Appropriate alerting**: Balance sensitivity and noise
+3. **Regular review**: Analyze trends and patterns
+4. **Documentation**: Maintain runbooks for alerts
+
+### Performance Optimization
+
+1. **Baseline establishment**: Know normal operating parameters
+2. **Trend analysis**: Identify performance degradation early
+3. **Capacity planning**: Monitor resource growth trends
+4. **Optimization cycles**: Regular performance tuning
+
+## Next Steps
+
+- Set up [Troubleshooting](troubleshooting.md) procedures
+- Learn about [Backend optimization](backends.md)
+- Configure [Production deployment](../development/building.md)