mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 09:04:27 +00:00
Create initial documentation structure
This commit is contained in:
420
docs/advanced/monitoring.md
Normal file
420
docs/advanced/monitoring.md
Normal file
@@ -0,0 +1,420 @@
|
||||
# Monitoring
|
||||
|
||||
Comprehensive monitoring setup for LlamaCtl in production environments.
|
||||
|
||||
## Overview
|
||||
|
||||
Effective monitoring of LlamaCtl involves tracking:
|
||||
|
||||
- Instance health and performance
|
||||
- System resource usage
|
||||
- API response times
|
||||
- Error rates and alerts
|
||||
|
||||
## Built-in Monitoring
|
||||
|
||||
### Health Checks
|
||||
|
||||
LlamaCtl provides built-in health monitoring:
|
||||
|
||||
```bash
|
||||
# Check overall system health
|
||||
curl http://localhost:8080/api/system/health
|
||||
|
||||
# Check specific instance health
|
||||
curl http://localhost:8080/api/instances/{name}/health
|
||||
```
|
||||
|
||||
### Metrics Endpoint
|
||||
|
||||
Access Prometheus-compatible metrics:
|
||||
|
||||
```bash
|
||||
curl http://localhost:8080/metrics
|
||||
```
|
||||
|
||||
**Available Metrics:**
|
||||
- `llamactl_instances_total`: Total number of instances
|
||||
- `llamactl_instances_running`: Number of running instances
|
||||
- `llamactl_instance_memory_bytes`: Instance memory usage
|
||||
- `llamactl_instance_cpu_percent`: Instance CPU usage
|
||||
- `llamactl_api_requests_total`: Total API requests
|
||||
- `llamactl_api_request_duration_seconds`: API response times
|
||||
|
||||
## Prometheus Integration
|
||||
|
||||
### Configuration
|
||||
|
||||
Add LlamaCtl as a Prometheus target:
|
||||
|
||||
```yaml
|
||||
# prometheus.yml
|
||||
scrape_configs:
|
||||
- job_name: 'llamactl'
|
||||
static_configs:
|
||||
- targets: ['localhost:8080']
|
||||
metrics_path: '/metrics'
|
||||
scrape_interval: 15s
|
||||
```
|
||||
|
||||
### Custom Metrics
|
||||
|
||||
Enable additional metrics in LlamaCtl:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
monitoring:
|
||||
enabled: true
|
||||
prometheus:
|
||||
enabled: true
|
||||
path: "/metrics"
|
||||
metrics:
|
||||
- instance_stats
|
||||
- api_performance
|
||||
- system_resources
|
||||
```
|
||||
|
||||
## Grafana Dashboards
|
||||
|
||||
### LlamaCtl Dashboard
|
||||
|
||||
Import the official Grafana dashboard:
|
||||
|
||||
1. Download dashboard JSON from releases
|
||||
2. Import into Grafana
|
||||
3. Configure Prometheus data source
|
||||
|
||||
### Key Panels
|
||||
|
||||
**Instance Overview:**
|
||||
- Instance count and status
|
||||
- Resource usage per instance
|
||||
- Health status indicators
|
||||
|
||||
**Performance Metrics:**
|
||||
- API response times
|
||||
- Tokens per second
|
||||
- Memory usage trends
|
||||
|
||||
**System Resources:**
|
||||
- CPU and memory utilization
|
||||
- Disk I/O and network usage
|
||||
- GPU utilization (if applicable)
|
||||
|
||||
### Custom Queries
|
||||
|
||||
**Instance Uptime:**
|
||||
```promql
|
||||
(time() - llamactl_instance_start_time_seconds) / 3600
|
||||
```
|
||||
|
||||
**Memory Usage Percentage:**
|
||||
```promql
|
||||
(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
|
||||
```
|
||||
|
||||
**API Error Rate:**
|
||||
```promql
|
||||
rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
|
||||
```
|
||||
|
||||
## Alerting
|
||||
|
||||
### Prometheus Alerts
|
||||
|
||||
Configure alerts for critical conditions:
|
||||
|
||||
```yaml
|
||||
# alerts.yml
|
||||
groups:
|
||||
- name: llamactl
|
||||
rules:
|
||||
- alert: InstanceDown
|
||||
expr: llamactl_instance_up == 0
|
||||
for: 1m
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "LlamaCtl instance {{ $labels.instance_name }} is down"
|
||||
|
||||
- alert: HighMemoryUsage
|
||||
expr: llamactl_instance_memory_percent > 90
|
||||
for: 5m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High memory usage on {{ $labels.instance_name }}"
|
||||
|
||||
- alert: APIHighLatency
|
||||
expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
|
||||
for: 2m
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "High API latency detected"
|
||||
```
|
||||
|
||||
### Notification Channels
|
||||
|
||||
Configure alert notifications:
|
||||
|
||||
**Slack Integration:**
|
||||
```yaml
|
||||
# alertmanager.yml
|
||||
route:
|
||||
group_by: ['alertname']
|
||||
receiver: 'slack'
|
||||
|
||||
receivers:
|
||||
- name: 'slack'
|
||||
slack_configs:
|
||||
- api_url: 'https://hooks.slack.com/services/...'
|
||||
channel: '#alerts'
|
||||
title: 'LlamaCtl Alert'
|
||||
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
|
||||
```
|
||||
|
||||
## Log Management
|
||||
|
||||
### Centralized Logging
|
||||
|
||||
Configure log aggregation:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
logging:
|
||||
level: "info"
|
||||
output: "json"
|
||||
destinations:
|
||||
- type: "file"
|
||||
path: "/var/log/llamactl/app.log"
|
||||
- type: "syslog"
|
||||
facility: "local0"
|
||||
- type: "elasticsearch"
|
||||
url: "http://elasticsearch:9200"
|
||||
```
|
||||
|
||||
### Log Analysis
|
||||
|
||||
Use ELK stack for log analysis:
|
||||
|
||||
**Elasticsearch Index Template:**
|
||||
```json
|
||||
{
|
||||
"index_patterns": ["llamactl-*"],
|
||||
"mappings": {
|
||||
"properties": {
|
||||
"timestamp": {"type": "date"},
|
||||
"level": {"type": "keyword"},
|
||||
"message": {"type": "text"},
|
||||
"instance": {"type": "keyword"},
|
||||
"component": {"type": "keyword"}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Kibana Visualizations:**
|
||||
- Log volume over time
|
||||
- Error rate by instance
|
||||
- Performance trends
|
||||
- Resource usage patterns
|
||||
|
||||
## Application Performance Monitoring
|
||||
|
||||
### OpenTelemetry Integration
|
||||
|
||||
Enable distributed tracing:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
telemetry:
|
||||
enabled: true
|
||||
otlp:
|
||||
endpoint: "http://jaeger:14268/api/traces"
|
||||
sampling_rate: 0.1
|
||||
```
|
||||
|
||||
### Custom Spans
|
||||
|
||||
Add custom tracing to track operations:
|
||||
|
||||
```go
|
||||
ctx, span := tracer.Start(ctx, "instance.start")
|
||||
defer span.End()
|
||||
|
||||
// Track instance startup time
|
||||
span.SetAttributes(
|
||||
attribute.String("instance.name", name),
|
||||
attribute.String("model.path", modelPath),
|
||||
)
|
||||
```
|
||||
|
||||
## Health Check Configuration
|
||||
|
||||
### Readiness Probes
|
||||
|
||||
Configure Kubernetes readiness probes:
|
||||
|
||||
```yaml
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /api/health
|
||||
port: 8080
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
```
|
||||
|
||||
### Liveness Probes
|
||||
|
||||
Configure liveness probes:
|
||||
|
||||
```yaml
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /api/health/live
|
||||
port: 8080
|
||||
initialDelaySeconds: 60
|
||||
periodSeconds: 30
|
||||
```
|
||||
|
||||
### Custom Health Checks
|
||||
|
||||
Implement custom health checks:
|
||||
|
||||
```go
|
||||
func (h *HealthHandler) CustomCheck(ctx context.Context) error {
|
||||
// Check database connectivity
|
||||
if err := h.db.Ping(); err != nil {
|
||||
return fmt.Errorf("database unreachable: %w", err)
|
||||
}
|
||||
|
||||
// Check instance responsiveness
|
||||
for _, instance := range h.instances {
|
||||
if !instance.IsHealthy() {
|
||||
return fmt.Errorf("instance %s unhealthy", instance.Name)
|
||||
}
|
||||
}
|
||||
|
||||
return nil
|
||||
}
|
||||
```
|
||||
|
||||
## Performance Profiling
|
||||
|
||||
### pprof Integration
|
||||
|
||||
Enable Go profiling:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
debug:
|
||||
pprof_enabled: true
|
||||
pprof_port: 6060
|
||||
```
|
||||
|
||||
Access profiling endpoints:
|
||||
```bash
|
||||
# CPU profile
|
||||
go tool pprof http://localhost:6060/debug/pprof/profile
|
||||
|
||||
# Memory profile
|
||||
go tool pprof http://localhost:6060/debug/pprof/heap
|
||||
|
||||
# Goroutine profile
|
||||
go tool pprof http://localhost:6060/debug/pprof/goroutine
|
||||
```
|
||||
|
||||
### Continuous Profiling
|
||||
|
||||
Set up continuous profiling with Pyroscope:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
profiling:
|
||||
enabled: true
|
||||
pyroscope:
|
||||
server_address: "http://pyroscope:4040"
|
||||
application_name: "llamactl"
|
||||
```
|
||||
|
||||
## Security Monitoring
|
||||
|
||||
### Audit Logging
|
||||
|
||||
Enable security audit logs:
|
||||
|
||||
```yaml
|
||||
# config.yaml
|
||||
audit:
|
||||
enabled: true
|
||||
log_file: "/var/log/llamactl/audit.log"
|
||||
events:
|
||||
- "auth.login"
|
||||
- "auth.logout"
|
||||
- "instance.create"
|
||||
- "instance.delete"
|
||||
- "config.update"
|
||||
```
|
||||
|
||||
### Rate Limiting Monitoring
|
||||
|
||||
Track rate limiting metrics:
|
||||
|
||||
```bash
|
||||
# Monitor rate limit hits
|
||||
curl http://localhost:8080/metrics | grep rate_limit
|
||||
```
|
||||
|
||||
## Troubleshooting Monitoring
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Metrics not appearing:**
|
||||
1. Check Prometheus configuration
|
||||
2. Verify network connectivity
|
||||
3. Review LlamaCtl logs for errors
|
||||
|
||||
**High memory usage:**
|
||||
1. Check for memory leaks in profiles
|
||||
2. Monitor garbage collection metrics
|
||||
3. Review instance configurations
|
||||
|
||||
**Alert fatigue:**
|
||||
1. Tune alert thresholds
|
||||
2. Implement alert severity levels
|
||||
3. Use alert routing and suppression
|
||||
|
||||
### Debug Tools
|
||||
|
||||
**Monitoring health:**
|
||||
```bash
|
||||
# Check monitoring endpoints
|
||||
curl -v http://localhost:8080/metrics
|
||||
curl -v http://localhost:8080/api/health
|
||||
|
||||
# Review logs
|
||||
tail -f /var/log/llamactl/app.log
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Production Monitoring
|
||||
|
||||
1. **Comprehensive coverage**: Monitor all critical components
|
||||
2. **Appropriate alerting**: Balance sensitivity and noise
|
||||
3. **Regular review**: Analyze trends and patterns
|
||||
4. **Documentation**: Maintain runbooks for alerts
|
||||
|
||||
### Performance Optimization
|
||||
|
||||
1. **Baseline establishment**: Know normal operating parameters
|
||||
2. **Trend analysis**: Identify performance degradation early
|
||||
3. **Capacity planning**: Monitor resource growth trends
|
||||
4. **Optimization cycles**: Regular performance tuning
|
||||
|
||||
## Next Steps
|
||||
|
||||
- Set up [Troubleshooting](troubleshooting.md) procedures
|
||||
- Learn about [Backend optimization](backends.md)
|
||||
- Configure [Production deployment](../development/building.md)
|
||||
Reference in New Issue
Block a user