8.1 KiB
Monitoring
Comprehensive monitoring setup for Llamactl in production environments.
Overview
Effective monitoring of Llamactl involves tracking:
- Instance health and performance
- System resource usage
- API response times
- Error rates and alerts
Built-in Monitoring
Health Checks
Llamactl provides built-in health monitoring:
# Check overall system health
curl http://localhost:8080/api/system/health
# Check specific instance health
curl http://localhost:8080/api/instances/{name}/health
Metrics Endpoint
Access Prometheus-compatible metrics:
curl http://localhost:8080/metrics
Available Metrics:
llamactl_instances_total: Total number of instancesllamactl_instances_running: Number of running instancesllamactl_instance_memory_bytes: Instance memory usagellamactl_instance_cpu_percent: Instance CPU usagellamactl_api_requests_total: Total API requestsllamactl_api_request_duration_seconds: API response times
Prometheus Integration
Configuration
Add Llamactl as a Prometheus target:
# prometheus.yml
scrape_configs:
- job_name: 'llamactl'
static_configs:
- targets: ['localhost:8080']
metrics_path: '/metrics'
scrape_interval: 15s
Custom Metrics
Enable additional metrics in Llamactl:
# config.yaml
monitoring:
enabled: true
prometheus:
enabled: true
path: "/metrics"
metrics:
- instance_stats
- api_performance
- system_resources
Grafana Dashboards
Llamactl Dashboard
Import the official Grafana dashboard:
- Download dashboard JSON from releases
- Import into Grafana
- Configure Prometheus data source
Key Panels
Instance Overview:
- Instance count and status
- Resource usage per instance
- Health status indicators
Performance Metrics:
- API response times
- Tokens per second
- Memory usage trends
System Resources:
- CPU and memory utilization
- Disk I/O and network usage
- GPU utilization (if applicable)
Custom Queries
Instance Uptime:
(time() - llamactl_instance_start_time_seconds) / 3600
Memory Usage Percentage:
(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
API Error Rate:
rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
Alerting
Prometheus Alerts
Configure alerts for critical conditions:
# alerts.yml
groups:
- name: llamactl
rules:
- alert: InstanceDown
expr: llamactl_instance_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Llamactl instance {{ $labels.instance_name }} is down"
- alert: HighMemoryUsage
expr: llamactl_instance_memory_percent > 90
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance_name }}"
- alert: APIHighLatency
expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
for: 2m
labels:
severity: warning
annotations:
summary: "High API latency detected"
Notification Channels
Configure alert notifications:
Slack Integration:
# alertmanager.yml
route:
group_by: ['alertname']
receiver: 'slack'
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/services/...'
channel: '#alerts'
title: 'Llamactl Alert'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
Log Management
Centralized Logging
Configure log aggregation:
# config.yaml
logging:
level: "info"
output: "json"
destinations:
- type: "file"
path: "/var/log/llamactl/app.log"
- type: "syslog"
facility: "local0"
- type: "elasticsearch"
url: "http://elasticsearch:9200"
Log Analysis
Use ELK stack for log analysis:
Elasticsearch Index Template:
{
"index_patterns": ["llamactl-*"],
"mappings": {
"properties": {
"timestamp": {"type": "date"},
"level": {"type": "keyword"},
"message": {"type": "text"},
"instance": {"type": "keyword"},
"component": {"type": "keyword"}
}
}
}
Kibana Visualizations:
- Log volume over time
- Error rate by instance
- Performance trends
- Resource usage patterns
Application Performance Monitoring
OpenTelemetry Integration
Enable distributed tracing:
# config.yaml
telemetry:
enabled: true
otlp:
endpoint: "http://jaeger:14268/api/traces"
sampling_rate: 0.1
Custom Spans
Add custom tracing to track operations:
ctx, span := tracer.Start(ctx, "instance.start")
defer span.End()
// Track instance startup time
span.SetAttributes(
attribute.String("instance.name", name),
attribute.String("model.path", modelPath),
)
Health Check Configuration
Readiness Probes
Configure Kubernetes readiness probes:
readinessProbe:
httpGet:
path: /api/health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
Liveness Probes
Configure liveness probes:
livenessProbe:
httpGet:
path: /api/health/live
port: 8080
initialDelaySeconds: 60
periodSeconds: 30
Custom Health Checks
Implement custom health checks:
func (h *HealthHandler) CustomCheck(ctx context.Context) error {
// Check database connectivity
if err := h.db.Ping(); err != nil {
return fmt.Errorf("database unreachable: %w", err)
}
// Check instance responsiveness
for _, instance := range h.instances {
if !instance.IsHealthy() {
return fmt.Errorf("instance %s unhealthy", instance.Name)
}
}
return nil
}
Performance Profiling
pprof Integration
Enable Go profiling:
# config.yaml
debug:
pprof_enabled: true
pprof_port: 6060
Access profiling endpoints:
# CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile
# Memory profile
go tool pprof http://localhost:6060/debug/pprof/heap
# Goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine
Continuous Profiling
Set up continuous profiling with Pyroscope:
# config.yaml
profiling:
enabled: true
pyroscope:
server_address: "http://pyroscope:4040"
application_name: "llamactl"
Security Monitoring
Audit Logging
Enable security audit logs:
# config.yaml
audit:
enabled: true
log_file: "/var/log/llamactl/audit.log"
events:
- "auth.login"
- "auth.logout"
- "instance.create"
- "instance.delete"
- "config.update"
Rate Limiting Monitoring
Track rate limiting metrics:
# Monitor rate limit hits
curl http://localhost:8080/metrics | grep rate_limit
Troubleshooting Monitoring
Common Issues
Metrics not appearing:
- Check Prometheus configuration
- Verify network connectivity
- Review Llamactl logs for errors
High memory usage:
- Check for memory leaks in profiles
- Monitor garbage collection metrics
- Review instance configurations
Alert fatigue:
- Tune alert thresholds
- Implement alert severity levels
- Use alert routing and suppression
Debug Tools
Monitoring health:
# Check monitoring endpoints
curl -v http://localhost:8080/metrics
curl -v http://localhost:8080/api/health
# Review logs
tail -f /var/log/llamactl/app.log
Best Practices
Production Monitoring
- Comprehensive coverage: Monitor all critical components
- Appropriate alerting: Balance sensitivity and noise
- Regular review: Analyze trends and patterns
- Documentation: Maintain runbooks for alerts
Performance Optimization
- Baseline establishment: Know normal operating parameters
- Trend analysis: Identify performance degradation early
- Capacity planning: Monitor resource growth trends
- Optimization cycles: Regular performance tuning
Next Steps
- Set up Troubleshooting procedures
- Learn about Backend optimization
- Configure Production deployment