llamactl/docs/advanced/monitoring.md

# Monitoring

Comprehensive monitoring setup for LlamaCtl in production environments.

## Overview

Effective monitoring of LlamaCtl involves tracking:

- Instance health and performance
- System resource usage
- API response times
- Error rates and alerts

## Built-in Monitoring

### Health Checks

LlamaCtl provides built-in health monitoring:

```bash
# Check overall system health
curl http://localhost:8080/api/system/health

# Check specific instance health
curl http://localhost:8080/api/instances/{name}/health
```

### Metrics Endpoint

Access Prometheus-compatible metrics:

```bash
curl http://localhost:8080/metrics
```

**Available Metrics:**
- `llamactl_instances_total`: Total number of instances
- `llamactl_instances_running`: Number of running instances
- `llamactl_instance_memory_bytes`: Instance memory usage
- `llamactl_instance_cpu_percent`: Instance CPU usage
- `llamactl_api_requests_total`: Total API requests
- `llamactl_api_request_duration_seconds`: API response times

## Prometheus Integration

### Configuration

Add LlamaCtl as a Prometheus target:

```yaml
# prometheus.yml
scrape_configs:
  - job_name: 'llamactl'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s
```

### Custom Metrics

Enable additional metrics in LlamaCtl:

```yaml
# config.yaml
monitoring:
  enabled: true
  prometheus:
    enabled: true
    path: "/metrics"
  metrics:
    - instance_stats
    - api_performance
    - system_resources
```

## Grafana Dashboards

### LlamaCtl Dashboard

Import the official Grafana dashboard:

1. Download dashboard JSON from releases
2. Import into Grafana
3. Configure Prometheus data source

### Key Panels

**Instance Overview:**
- Instance count and status
- Resource usage per instance
- Health status indicators

**Performance Metrics:**
- API response times
- Tokens per second
- Memory usage trends

**System Resources:**
- CPU and memory utilization
- Disk I/O and network usage
- GPU utilization (if applicable)

### Custom Queries

**Instance Uptime:**
```promql
(time() - llamactl_instance_start_time_seconds) / 3600
```

**Memory Usage Percentage:**
```promql
(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
```

**API Error Rate:**
```promql
rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
```

## Alerting

### Prometheus Alerts

Configure alerts for critical conditions:

```yaml
# alerts.yml
groups:
  - name: llamactl
    rules:
      - alert: InstanceDown
        expr: llamactl_instance_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "LlamaCtl instance {{ $labels.instance_name }} is down"

      - alert: HighMemoryUsage
        expr: llamactl_instance_memory_percent > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance_name }}"

      - alert: APIHighLatency
        expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "High API latency detected"
```

### Notification Channels

Configure alert notifications:

**Slack Integration:**
```yaml
# alertmanager.yml
route:
  group_by: ['alertname']
  receiver: 'slack'

receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/...'
        channel: '#alerts'
        title: 'LlamaCtl Alert'
        text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
```

## Log Management

### Centralized Logging

Configure log aggregation:

```yaml
# config.yaml
logging:
  level: "info"
  output: "json"
  destinations:
    - type: "file"
      path: "/var/log/llamactl/app.log"
    - type: "syslog"
      facility: "local0"
    - type: "elasticsearch"
      url: "http://elasticsearch:9200"
```

### Log Analysis

Use ELK stack for log analysis:

**Elasticsearch Index Template:**
```json
{
  "index_patterns": ["llamactl-*"],
  "mappings": {
    "properties": {
      "timestamp": {"type": "date"},
      "level": {"type": "keyword"},
      "message": {"type": "text"},
      "instance": {"type": "keyword"},
      "component": {"type": "keyword"}
    }
  }
}
```

**Kibana Visualizations:**
- Log volume over time
- Error rate by instance
- Performance trends
- Resource usage patterns

## Application Performance Monitoring

### OpenTelemetry Integration

Enable distributed tracing:

```yaml
# config.yaml
telemetry:
  enabled: true
  otlp:
    endpoint: "http://jaeger:14268/api/traces"
  sampling_rate: 0.1
```

### Custom Spans

Add custom tracing to track operations:

```go
ctx, span := tracer.Start(ctx, "instance.start")
defer span.End()

// Track instance startup time
span.SetAttributes(
    attribute.String("instance.name", name),
    attribute.String("model.path", modelPath),
)
```

## Health Check Configuration

### Readiness Probes

Configure Kubernetes readiness probes:

```yaml
readinessProbe:
  httpGet:
    path: /api/health
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
```

### Liveness Probes

Configure liveness probes:

```yaml
livenessProbe:
  httpGet:
    path: /api/health/live
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30
```

### Custom Health Checks

Implement custom health checks:

```go
func (h *HealthHandler) CustomCheck(ctx context.Context) error {
    // Check database connectivity
    if err := h.db.Ping(); err != nil {
        return fmt.Errorf("database unreachable: %w", err)
    }

    // Check instance responsiveness
    for _, instance := range h.instances {
        if !instance.IsHealthy() {
            return fmt.Errorf("instance %s unhealthy", instance.Name)
        }
    }

    return nil
}
```

## Performance Profiling

### pprof Integration

Enable Go profiling:

```yaml
# config.yaml
debug:
  pprof_enabled: true
  pprof_port: 6060
```

Access profiling endpoints:
```bash
# CPU profile
go tool pprof http://localhost:6060/debug/pprof/profile

# Memory profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine profile
go tool pprof http://localhost:6060/debug/pprof/goroutine
```

### Continuous Profiling

Set up continuous profiling with Pyroscope:

```yaml
# config.yaml
profiling:
  enabled: true
  pyroscope:
    server_address: "http://pyroscope:4040"
    application_name: "llamactl"
```

## Security Monitoring

### Audit Logging

Enable security audit logs:

```yaml
# config.yaml
audit:
  enabled: true
  log_file: "/var/log/llamactl/audit.log"
  events:
    - "auth.login"
    - "auth.logout"
    - "instance.create"
    - "instance.delete"
    - "config.update"
```

### Rate Limiting Monitoring

Track rate limiting metrics:

```bash
# Monitor rate limit hits
curl http://localhost:8080/metrics | grep rate_limit
```

## Troubleshooting Monitoring

### Common Issues

**Metrics not appearing:**
1. Check Prometheus configuration
2. Verify network connectivity
3. Review LlamaCtl logs for errors

**High memory usage:**
1. Check for memory leaks in profiles
2. Monitor garbage collection metrics
3. Review instance configurations

**Alert fatigue:**
1. Tune alert thresholds
2. Implement alert severity levels
3. Use alert routing and suppression

### Debug Tools

**Monitoring health:**
```bash
# Check monitoring endpoints
curl -v http://localhost:8080/metrics
curl -v http://localhost:8080/api/health

# Review logs
tail -f /var/log/llamactl/app.log
```

## Best Practices

### Production Monitoring

1. **Comprehensive coverage**: Monitor all critical components
2. **Appropriate alerting**: Balance sensitivity and noise
3. **Regular review**: Analyze trends and patterns
4. **Documentation**: Maintain runbooks for alerts

### Performance Optimization

1. **Baseline establishment**: Know normal operating parameters
2. **Trend analysis**: Identify performance degradation early
3. **Capacity planning**: Monitor resource growth trends
4. **Optimization cycles**: Regular performance tuning

## Next Steps

- Set up [Troubleshooting](troubleshooting.md) procedures
- Learn about [Backend optimization](backends.md)
- Configure [Production deployment](../development/building.md)