mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
Remove misleading advanced section
This commit is contained in:
@@ -1,316 +0,0 @@
|
|||||||
# Backends
|
|
||||||
|
|
||||||
Llamactl supports multiple backends for running large language models. This guide covers the available backends and their configuration.
|
|
||||||
|
|
||||||
## Llama.cpp Backend
|
|
||||||
|
|
||||||
The primary backend for Llamactl, providing robust support for GGUF models.
|
|
||||||
|
|
||||||
### Features
|
|
||||||
|
|
||||||
- **GGUF Support**: Native support for GGUF model format
|
|
||||||
- **GPU Acceleration**: CUDA, OpenCL, and Metal support
|
|
||||||
- **Memory Optimization**: Efficient memory usage and mapping
|
|
||||||
- **Multi-threading**: Configurable CPU thread utilization
|
|
||||||
- **Quantization**: Support for various quantization levels
|
|
||||||
|
|
||||||
### Configuration
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
backends:
|
|
||||||
llamacpp:
|
|
||||||
binary_path: "/usr/local/bin/llama-server"
|
|
||||||
default_options:
|
|
||||||
threads: 4
|
|
||||||
context_size: 2048
|
|
||||||
batch_size: 512
|
|
||||||
gpu:
|
|
||||||
enabled: true
|
|
||||||
layers: 35
|
|
||||||
```
|
|
||||||
|
|
||||||
### Supported Options
|
|
||||||
|
|
||||||
| Option | Description | Default |
|
|
||||||
|--------|-------------|---------|
|
|
||||||
| `threads` | Number of CPU threads | 4 |
|
|
||||||
| `context_size` | Context window size | 2048 |
|
|
||||||
| `batch_size` | Batch size for processing | 512 |
|
|
||||||
| `gpu_layers` | Layers to offload to GPU | 0 |
|
|
||||||
| `memory_lock` | Lock model in memory | false |
|
|
||||||
| `no_mmap` | Disable memory mapping | false |
|
|
||||||
| `rope_freq_base` | RoPE frequency base | 10000 |
|
|
||||||
| `rope_freq_scale` | RoPE frequency scale | 1.0 |
|
|
||||||
|
|
||||||
### GPU Acceleration
|
|
||||||
|
|
||||||
#### CUDA Setup
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Install CUDA toolkit
|
|
||||||
sudo apt update
|
|
||||||
sudo apt install nvidia-cuda-toolkit
|
|
||||||
|
|
||||||
# Verify CUDA installation
|
|
||||||
nvcc --version
|
|
||||||
nvidia-smi
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Configuration for GPU
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "gpu-accelerated",
|
|
||||||
"model_path": "/models/llama-2-13b.gguf",
|
|
||||||
"port": 8081,
|
|
||||||
"options": {
|
|
||||||
"gpu_layers": 35,
|
|
||||||
"threads": 2,
|
|
||||||
"context_size": 4096
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Performance Tuning
|
|
||||||
|
|
||||||
#### Memory Optimization
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# For limited memory systems
|
|
||||||
options:
|
|
||||||
context_size: 1024
|
|
||||||
batch_size: 256
|
|
||||||
no_mmap: true
|
|
||||||
memory_lock: false
|
|
||||||
|
|
||||||
# For high-memory systems
|
|
||||||
options:
|
|
||||||
context_size: 8192
|
|
||||||
batch_size: 1024
|
|
||||||
memory_lock: true
|
|
||||||
no_mmap: false
|
|
||||||
```
|
|
||||||
|
|
||||||
#### CPU Optimization
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Match thread count to CPU cores
|
|
||||||
# For 8-core CPU:
|
|
||||||
options:
|
|
||||||
threads: 6 # Leave 2 cores for system
|
|
||||||
|
|
||||||
# For high-performance CPUs:
|
|
||||||
options:
|
|
||||||
threads: 16
|
|
||||||
batch_size: 1024
|
|
||||||
```
|
|
||||||
|
|
||||||
## Future Backends
|
|
||||||
|
|
||||||
Llamactl is designed to support multiple backends. Planned additions:
|
|
||||||
|
|
||||||
### vLLM Backend
|
|
||||||
|
|
||||||
High-performance inference engine optimized for serving:
|
|
||||||
|
|
||||||
- **Features**: Fast inference, batching, streaming
|
|
||||||
- **Models**: Supports various model formats
|
|
||||||
- **Scaling**: Horizontal scaling support
|
|
||||||
|
|
||||||
### TensorRT-LLM Backend
|
|
||||||
|
|
||||||
NVIDIA's optimized inference engine:
|
|
||||||
|
|
||||||
- **Features**: Maximum GPU performance
|
|
||||||
- **Models**: Optimized for NVIDIA GPUs
|
|
||||||
- **Deployment**: Production-ready inference
|
|
||||||
|
|
||||||
### Ollama Backend
|
|
||||||
|
|
||||||
Integration with Ollama for easy model management:
|
|
||||||
|
|
||||||
- **Features**: Simplified model downloading
|
|
||||||
- **Models**: Large model library
|
|
||||||
- **Integration**: Seamless model switching
|
|
||||||
|
|
||||||
## Backend Selection
|
|
||||||
|
|
||||||
### Automatic Detection
|
|
||||||
|
|
||||||
Llamactl can automatically detect the best backend:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
backends:
|
|
||||||
auto_detect: true
|
|
||||||
preference_order:
|
|
||||||
- "llamacpp"
|
|
||||||
- "vllm"
|
|
||||||
- "tensorrt"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Manual Selection
|
|
||||||
|
|
||||||
Force a specific backend for an instance:
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "manual-backend",
|
|
||||||
"backend": "llamacpp",
|
|
||||||
"model_path": "/models/model.gguf",
|
|
||||||
"port": 8081
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Backend-Specific Features
|
|
||||||
|
|
||||||
### Llama.cpp Features
|
|
||||||
|
|
||||||
#### Model Formats
|
|
||||||
|
|
||||||
- **GGUF**: Primary format, best compatibility
|
|
||||||
- **GGML**: Legacy format (limited support)
|
|
||||||
|
|
||||||
#### Quantization Levels
|
|
||||||
|
|
||||||
- `Q2_K`: Smallest size, lower quality
|
|
||||||
- `Q4_K_M`: Balanced size and quality
|
|
||||||
- `Q5_K_M`: Higher quality, larger size
|
|
||||||
- `Q6_K`: Near-original quality
|
|
||||||
- `Q8_0`: Minimal loss, largest size
|
|
||||||
|
|
||||||
#### Advanced Options
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
advanced:
|
|
||||||
rope_scaling:
|
|
||||||
type: "linear"
|
|
||||||
factor: 2.0
|
|
||||||
attention:
|
|
||||||
flash_attention: true
|
|
||||||
grouped_query: true
|
|
||||||
```
|
|
||||||
|
|
||||||
## Monitoring Backend Performance
|
|
||||||
|
|
||||||
### Metrics Collection
|
|
||||||
|
|
||||||
Monitor backend-specific metrics:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Get backend statistics
|
|
||||||
curl http://localhost:8080/api/instances/my-instance/backend/stats
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"backend": "llamacpp",
|
|
||||||
"version": "b1234",
|
|
||||||
"metrics": {
|
|
||||||
"tokens_per_second": 15.2,
|
|
||||||
"memory_usage": 4294967296,
|
|
||||||
"gpu_utilization": 85.5,
|
|
||||||
"context_usage": 75.0
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Performance Optimization
|
|
||||||
|
|
||||||
#### Benchmark Different Configurations
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Test various thread counts
|
|
||||||
for threads in 2 4 8 16; do
|
|
||||||
echo "Testing $threads threads"
|
|
||||||
curl -X PUT http://localhost:8080/api/instances/benchmark \
|
|
||||||
-d "{\"options\": {\"threads\": $threads}}"
|
|
||||||
# Run performance test
|
|
||||||
done
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Memory Usage Optimization
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Monitor memory usage
|
|
||||||
watch -n 1 'curl -s http://localhost:8080/api/instances/my-instance/stats | jq .memory_usage'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting Backends
|
|
||||||
|
|
||||||
### Common Llama.cpp Issues
|
|
||||||
|
|
||||||
**Model won't load:**
|
|
||||||
```bash
|
|
||||||
# Check model file
|
|
||||||
file /path/to/model.gguf
|
|
||||||
|
|
||||||
# Verify format
|
|
||||||
llama-server --model /path/to/model.gguf --dry-run
|
|
||||||
```
|
|
||||||
|
|
||||||
**GPU not detected:**
|
|
||||||
```bash
|
|
||||||
# Check CUDA installation
|
|
||||||
nvidia-smi
|
|
||||||
|
|
||||||
# Verify llama.cpp GPU support
|
|
||||||
llama-server --help | grep -i gpu
|
|
||||||
```
|
|
||||||
|
|
||||||
**Performance issues:**
|
|
||||||
```bash
|
|
||||||
# Check system resources
|
|
||||||
htop
|
|
||||||
nvidia-smi
|
|
||||||
|
|
||||||
# Verify configuration
|
|
||||||
curl http://localhost:8080/api/instances/my-instance/config
|
|
||||||
```
|
|
||||||
|
|
||||||
## Custom Backend Development
|
|
||||||
|
|
||||||
### Backend Interface
|
|
||||||
|
|
||||||
Implement the backend interface for custom backends:
|
|
||||||
|
|
||||||
```go
|
|
||||||
type Backend interface {
|
|
||||||
Start(config InstanceConfig) error
|
|
||||||
Stop(instance *Instance) error
|
|
||||||
Health(instance *Instance) (*HealthStatus, error)
|
|
||||||
Stats(instance *Instance) (*Stats, error)
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Registration
|
|
||||||
|
|
||||||
Register your custom backend:
|
|
||||||
|
|
||||||
```go
|
|
||||||
func init() {
|
|
||||||
backends.Register("custom", &CustomBackend{})
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### Production Deployments
|
|
||||||
|
|
||||||
1. **Resource allocation**: Plan for peak usage
|
|
||||||
2. **Backend selection**: Choose based on requirements
|
|
||||||
3. **Monitoring**: Set up comprehensive monitoring
|
|
||||||
4. **Fallback**: Configure backup backends
|
|
||||||
|
|
||||||
### Development
|
|
||||||
|
|
||||||
1. **Rapid iteration**: Use smaller models
|
|
||||||
2. **Resource monitoring**: Track usage patterns
|
|
||||||
3. **Configuration testing**: Validate settings
|
|
||||||
4. **Performance profiling**: Optimize bottlenecks
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about [Monitoring](monitoring.md) backend performance
|
|
||||||
- Explore [Troubleshooting](troubleshooting.md) guides
|
|
||||||
- Set up [Production Monitoring](monitoring.md)
|
|
||||||
@@ -1,420 +0,0 @@
|
|||||||
# Monitoring
|
|
||||||
|
|
||||||
Comprehensive monitoring setup for Llamactl in production environments.
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
Effective monitoring of Llamactl involves tracking:
|
|
||||||
|
|
||||||
- Instance health and performance
|
|
||||||
- System resource usage
|
|
||||||
- API response times
|
|
||||||
- Error rates and alerts
|
|
||||||
|
|
||||||
## Built-in Monitoring
|
|
||||||
|
|
||||||
### Health Checks
|
|
||||||
|
|
||||||
Llamactl provides built-in health monitoring:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Check overall system health
|
|
||||||
curl http://localhost:8080/api/system/health
|
|
||||||
|
|
||||||
# Check specific instance health
|
|
||||||
curl http://localhost:8080/api/instances/{name}/health
|
|
||||||
```
|
|
||||||
|
|
||||||
### Metrics Endpoint
|
|
||||||
|
|
||||||
Access Prometheus-compatible metrics:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl http://localhost:8080/metrics
|
|
||||||
```
|
|
||||||
|
|
||||||
**Available Metrics:**
|
|
||||||
- `llamactl_instances_total`: Total number of instances
|
|
||||||
- `llamactl_instances_running`: Number of running instances
|
|
||||||
- `llamactl_instance_memory_bytes`: Instance memory usage
|
|
||||||
- `llamactl_instance_cpu_percent`: Instance CPU usage
|
|
||||||
- `llamactl_api_requests_total`: Total API requests
|
|
||||||
- `llamactl_api_request_duration_seconds`: API response times
|
|
||||||
|
|
||||||
## Prometheus Integration
|
|
||||||
|
|
||||||
### Configuration
|
|
||||||
|
|
||||||
Add Llamactl as a Prometheus target:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# prometheus.yml
|
|
||||||
scrape_configs:
|
|
||||||
- job_name: 'llamactl'
|
|
||||||
static_configs:
|
|
||||||
- targets: ['localhost:8080']
|
|
||||||
metrics_path: '/metrics'
|
|
||||||
scrape_interval: 15s
|
|
||||||
```
|
|
||||||
|
|
||||||
### Custom Metrics
|
|
||||||
|
|
||||||
Enable additional metrics in Llamactl:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
monitoring:
|
|
||||||
enabled: true
|
|
||||||
prometheus:
|
|
||||||
enabled: true
|
|
||||||
path: "/metrics"
|
|
||||||
metrics:
|
|
||||||
- instance_stats
|
|
||||||
- api_performance
|
|
||||||
- system_resources
|
|
||||||
```
|
|
||||||
|
|
||||||
## Grafana Dashboards
|
|
||||||
|
|
||||||
### Llamactl Dashboard
|
|
||||||
|
|
||||||
Import the official Grafana dashboard:
|
|
||||||
|
|
||||||
1. Download dashboard JSON from releases
|
|
||||||
2. Import into Grafana
|
|
||||||
3. Configure Prometheus data source
|
|
||||||
|
|
||||||
### Key Panels
|
|
||||||
|
|
||||||
**Instance Overview:**
|
|
||||||
- Instance count and status
|
|
||||||
- Resource usage per instance
|
|
||||||
- Health status indicators
|
|
||||||
|
|
||||||
**Performance Metrics:**
|
|
||||||
- API response times
|
|
||||||
- Tokens per second
|
|
||||||
- Memory usage trends
|
|
||||||
|
|
||||||
**System Resources:**
|
|
||||||
- CPU and memory utilization
|
|
||||||
- Disk I/O and network usage
|
|
||||||
- GPU utilization (if applicable)
|
|
||||||
|
|
||||||
### Custom Queries
|
|
||||||
|
|
||||||
**Instance Uptime:**
|
|
||||||
```promql
|
|
||||||
(time() - llamactl_instance_start_time_seconds) / 3600
|
|
||||||
```
|
|
||||||
|
|
||||||
**Memory Usage Percentage:**
|
|
||||||
```promql
|
|
||||||
(llamactl_instance_memory_bytes / llamactl_system_memory_total_bytes) * 100
|
|
||||||
```
|
|
||||||
|
|
||||||
**API Error Rate:**
|
|
||||||
```promql
|
|
||||||
rate(llamactl_api_requests_total{status=~"4.."}[5m]) / rate(llamactl_api_requests_total[5m]) * 100
|
|
||||||
```
|
|
||||||
|
|
||||||
## Alerting
|
|
||||||
|
|
||||||
### Prometheus Alerts
|
|
||||||
|
|
||||||
Configure alerts for critical conditions:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# alerts.yml
|
|
||||||
groups:
|
|
||||||
- name: llamactl
|
|
||||||
rules:
|
|
||||||
- alert: InstanceDown
|
|
||||||
expr: llamactl_instance_up == 0
|
|
||||||
for: 1m
|
|
||||||
labels:
|
|
||||||
severity: critical
|
|
||||||
annotations:
|
|
||||||
summary: "Llamactl instance {{ $labels.instance_name }} is down"
|
|
||||||
|
|
||||||
- alert: HighMemoryUsage
|
|
||||||
expr: llamactl_instance_memory_percent > 90
|
|
||||||
for: 5m
|
|
||||||
labels:
|
|
||||||
severity: warning
|
|
||||||
annotations:
|
|
||||||
summary: "High memory usage on {{ $labels.instance_name }}"
|
|
||||||
|
|
||||||
- alert: APIHighLatency
|
|
||||||
expr: histogram_quantile(0.95, rate(llamactl_api_request_duration_seconds_bucket[5m])) > 2
|
|
||||||
for: 2m
|
|
||||||
labels:
|
|
||||||
severity: warning
|
|
||||||
annotations:
|
|
||||||
summary: "High API latency detected"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Notification Channels
|
|
||||||
|
|
||||||
Configure alert notifications:
|
|
||||||
|
|
||||||
**Slack Integration:**
|
|
||||||
```yaml
|
|
||||||
# alertmanager.yml
|
|
||||||
route:
|
|
||||||
group_by: ['alertname']
|
|
||||||
receiver: 'slack'
|
|
||||||
|
|
||||||
receivers:
|
|
||||||
- name: 'slack'
|
|
||||||
slack_configs:
|
|
||||||
- api_url: 'https://hooks.slack.com/services/...'
|
|
||||||
channel: '#alerts'
|
|
||||||
title: 'Llamactl Alert'
|
|
||||||
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Log Management
|
|
||||||
|
|
||||||
### Centralized Logging
|
|
||||||
|
|
||||||
Configure log aggregation:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
logging:
|
|
||||||
level: "info"
|
|
||||||
output: "json"
|
|
||||||
destinations:
|
|
||||||
- type: "file"
|
|
||||||
path: "/var/log/llamactl/app.log"
|
|
||||||
- type: "syslog"
|
|
||||||
facility: "local0"
|
|
||||||
- type: "elasticsearch"
|
|
||||||
url: "http://elasticsearch:9200"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Log Analysis
|
|
||||||
|
|
||||||
Use ELK stack for log analysis:
|
|
||||||
|
|
||||||
**Elasticsearch Index Template:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"index_patterns": ["llamactl-*"],
|
|
||||||
"mappings": {
|
|
||||||
"properties": {
|
|
||||||
"timestamp": {"type": "date"},
|
|
||||||
"level": {"type": "keyword"},
|
|
||||||
"message": {"type": "text"},
|
|
||||||
"instance": {"type": "keyword"},
|
|
||||||
"component": {"type": "keyword"}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Kibana Visualizations:**
|
|
||||||
- Log volume over time
|
|
||||||
- Error rate by instance
|
|
||||||
- Performance trends
|
|
||||||
- Resource usage patterns
|
|
||||||
|
|
||||||
## Application Performance Monitoring
|
|
||||||
|
|
||||||
### OpenTelemetry Integration
|
|
||||||
|
|
||||||
Enable distributed tracing:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
telemetry:
|
|
||||||
enabled: true
|
|
||||||
otlp:
|
|
||||||
endpoint: "http://jaeger:14268/api/traces"
|
|
||||||
sampling_rate: 0.1
|
|
||||||
```
|
|
||||||
|
|
||||||
### Custom Spans
|
|
||||||
|
|
||||||
Add custom tracing to track operations:
|
|
||||||
|
|
||||||
```go
|
|
||||||
ctx, span := tracer.Start(ctx, "instance.start")
|
|
||||||
defer span.End()
|
|
||||||
|
|
||||||
// Track instance startup time
|
|
||||||
span.SetAttributes(
|
|
||||||
attribute.String("instance.name", name),
|
|
||||||
attribute.String("model.path", modelPath),
|
|
||||||
)
|
|
||||||
```
|
|
||||||
|
|
||||||
## Health Check Configuration
|
|
||||||
|
|
||||||
### Readiness Probes
|
|
||||||
|
|
||||||
Configure Kubernetes readiness probes:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
readinessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /api/health
|
|
||||||
port: 8080
|
|
||||||
initialDelaySeconds: 30
|
|
||||||
periodSeconds: 10
|
|
||||||
```
|
|
||||||
|
|
||||||
### Liveness Probes
|
|
||||||
|
|
||||||
Configure liveness probes:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /api/health/live
|
|
||||||
port: 8080
|
|
||||||
initialDelaySeconds: 60
|
|
||||||
periodSeconds: 30
|
|
||||||
```
|
|
||||||
|
|
||||||
### Custom Health Checks
|
|
||||||
|
|
||||||
Implement custom health checks:
|
|
||||||
|
|
||||||
```go
|
|
||||||
func (h *HealthHandler) CustomCheck(ctx context.Context) error {
|
|
||||||
// Check database connectivity
|
|
||||||
if err := h.db.Ping(); err != nil {
|
|
||||||
return fmt.Errorf("database unreachable: %w", err)
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check instance responsiveness
|
|
||||||
for _, instance := range h.instances {
|
|
||||||
if !instance.IsHealthy() {
|
|
||||||
return fmt.Errorf("instance %s unhealthy", instance.Name)
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
return nil
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Performance Profiling
|
|
||||||
|
|
||||||
### pprof Integration
|
|
||||||
|
|
||||||
Enable Go profiling:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
debug:
|
|
||||||
pprof_enabled: true
|
|
||||||
pprof_port: 6060
|
|
||||||
```
|
|
||||||
|
|
||||||
Access profiling endpoints:
|
|
||||||
```bash
|
|
||||||
# CPU profile
|
|
||||||
go tool pprof http://localhost:6060/debug/pprof/profile
|
|
||||||
|
|
||||||
# Memory profile
|
|
||||||
go tool pprof http://localhost:6060/debug/pprof/heap
|
|
||||||
|
|
||||||
# Goroutine profile
|
|
||||||
go tool pprof http://localhost:6060/debug/pprof/goroutine
|
|
||||||
```
|
|
||||||
|
|
||||||
### Continuous Profiling
|
|
||||||
|
|
||||||
Set up continuous profiling with Pyroscope:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
profiling:
|
|
||||||
enabled: true
|
|
||||||
pyroscope:
|
|
||||||
server_address: "http://pyroscope:4040"
|
|
||||||
application_name: "llamactl"
|
|
||||||
```
|
|
||||||
|
|
||||||
## Security Monitoring
|
|
||||||
|
|
||||||
### Audit Logging
|
|
||||||
|
|
||||||
Enable security audit logs:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# config.yaml
|
|
||||||
audit:
|
|
||||||
enabled: true
|
|
||||||
log_file: "/var/log/llamactl/audit.log"
|
|
||||||
events:
|
|
||||||
- "auth.login"
|
|
||||||
- "auth.logout"
|
|
||||||
- "instance.create"
|
|
||||||
- "instance.delete"
|
|
||||||
- "config.update"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Rate Limiting Monitoring
|
|
||||||
|
|
||||||
Track rate limiting metrics:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Monitor rate limit hits
|
|
||||||
curl http://localhost:8080/metrics | grep rate_limit
|
|
||||||
```
|
|
||||||
|
|
||||||
## Troubleshooting Monitoring
|
|
||||||
|
|
||||||
### Common Issues
|
|
||||||
|
|
||||||
**Metrics not appearing:**
|
|
||||||
1. Check Prometheus configuration
|
|
||||||
2. Verify network connectivity
|
|
||||||
3. Review Llamactl logs for errors
|
|
||||||
|
|
||||||
**High memory usage:**
|
|
||||||
1. Check for memory leaks in profiles
|
|
||||||
2. Monitor garbage collection metrics
|
|
||||||
3. Review instance configurations
|
|
||||||
|
|
||||||
**Alert fatigue:**
|
|
||||||
1. Tune alert thresholds
|
|
||||||
2. Implement alert severity levels
|
|
||||||
3. Use alert routing and suppression
|
|
||||||
|
|
||||||
### Debug Tools
|
|
||||||
|
|
||||||
**Monitoring health:**
|
|
||||||
```bash
|
|
||||||
# Check monitoring endpoints
|
|
||||||
curl -v http://localhost:8080/metrics
|
|
||||||
curl -v http://localhost:8080/api/health
|
|
||||||
|
|
||||||
# Review logs
|
|
||||||
tail -f /var/log/llamactl/app.log
|
|
||||||
```
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### Production Monitoring
|
|
||||||
|
|
||||||
1. **Comprehensive coverage**: Monitor all critical components
|
|
||||||
2. **Appropriate alerting**: Balance sensitivity and noise
|
|
||||||
3. **Regular review**: Analyze trends and patterns
|
|
||||||
4. **Documentation**: Maintain runbooks for alerts
|
|
||||||
|
|
||||||
### Performance Optimization
|
|
||||||
|
|
||||||
1. **Baseline establishment**: Know normal operating parameters
|
|
||||||
2. **Trend analysis**: Identify performance degradation early
|
|
||||||
3. **Capacity planning**: Monitor resource growth trends
|
|
||||||
4. **Optimization cycles**: Regular performance tuning
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Set up [Troubleshooting](troubleshooting.md) procedures
|
|
||||||
- Learn about [Backend optimization](backends.md)
|
|
||||||
- Configure [Production deployment](../development/building.md)
|
|
||||||
@@ -148,15 +148,3 @@ llamactl --help
|
|||||||
```
|
```
|
||||||
|
|
||||||
You can also override configuration using command line flags when starting llamactl.
|
You can also override configuration using command line flags when starting llamactl.
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about [Managing Instances](../user-guide/managing-instances.md)
|
|
||||||
- Explore [Advanced Configuration](../advanced/monitoring.md)
|
|
||||||
- Set up [Monitoring](../advanced/monitoring.md)
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about [Managing Instances](../user-guide/managing-instances.md)
|
|
||||||
- Explore [Advanced Configuration](../advanced/monitoring.md)
|
|
||||||
- Set up [Monitoring](../advanced/monitoring.md)
|
|
||||||
|
|||||||
@@ -40,14 +40,13 @@ Llamactl is designed to simplify the deployment and management of llama-server i
|
|||||||
- [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
|
- [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
|
||||||
- [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
|
- [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
|
||||||
- [API Reference](user-guide/api-reference.md) - Complete API documentation
|
- [API Reference](user-guide/api-reference.md) - Complete API documentation
|
||||||
- [Monitoring](advanced/monitoring.md) - Health checks and monitoring
|
|
||||||
- [Backends](advanced/backends.md) - Backend configuration options
|
|
||||||
|
|
||||||
## Getting Help
|
## Getting Help
|
||||||
|
|
||||||
If you need help or have questions:
|
If you need help or have questions:
|
||||||
|
|
||||||
- Check the [Troubleshooting](advanced/troubleshooting.md) guide
|
- Check the [Troubleshooting](user-guide/troubleshooting.md) guide
|
||||||
- Visit the [GitHub repository](https://github.com/lordmathis/llamactl)
|
- Visit the [GitHub repository](https://github.com/lordmathis/llamactl)
|
||||||
- Review the [Configuration Guide](getting-started/configuration.md) for advanced settings
|
- Review the [Configuration Guide](getting-started/configuration.md) for advanced settings
|
||||||
|
|
||||||
|
|||||||
@@ -462,9 +462,3 @@ curl -X POST http://localhost:8080/api/instances/example/stop
|
|||||||
# Delete instance
|
# Delete instance
|
||||||
curl -X DELETE http://localhost:8080/api/instances/example
|
curl -X DELETE http://localhost:8080/api/instances/example
|
||||||
```
|
```
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about [Managing Instances](managing-instances.md) in detail
|
|
||||||
- Explore [Advanced Configuration](../advanced/backends.md)
|
|
||||||
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
|
||||||
|
|||||||
@@ -163,9 +163,3 @@ curl -X POST http://localhost:8080/api/instances/stop-all
|
|||||||
# Get status of all instances
|
# Get status of all instances
|
||||||
curl http://localhost:8080/api/instances
|
curl http://localhost:8080/api/instances
|
||||||
```
|
```
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about the [Web UI](web-ui.md) interface
|
|
||||||
- Explore the complete [API Reference](api-reference.md)
|
|
||||||
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
|
||||||
|
|||||||
@@ -552,9 +552,3 @@ cp ~/.llamactl/config.yaml ~/.llamactl/config.yaml.backup
|
|||||||
# Backup instance configurations
|
# Backup instance configurations
|
||||||
curl http://localhost:8080/api/instances > instances-backup.json
|
curl http://localhost:8080/api/instances > instances-backup.json
|
||||||
```
|
```
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Set up [Monitoring](monitoring.md) to prevent issues
|
|
||||||
- Learn about [Advanced Configuration](backends.md)
|
|
||||||
- Review [Best Practices](../development/contributing.md)
|
|
||||||
@@ -208,9 +208,3 @@ Some features may be limited on mobile:
|
|||||||
- Log viewing (use horizontal scrolling)
|
- Log viewing (use horizontal scrolling)
|
||||||
- Complex configuration forms
|
- Complex configuration forms
|
||||||
- File browser functionality
|
- File browser functionality
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
- Learn about [API Reference](api-reference.md) for programmatic access
|
|
||||||
- Set up [Monitoring](../advanced/monitoring.md) for production use
|
|
||||||
- Explore [Advanced Configuration](../advanced/backends.md) options
|
|
||||||
|
|||||||
@@ -57,10 +57,7 @@ nav:
|
|||||||
- Managing Instances: user-guide/managing-instances.md
|
- Managing Instances: user-guide/managing-instances.md
|
||||||
- Web UI: user-guide/web-ui.md
|
- Web UI: user-guide/web-ui.md
|
||||||
- API Reference: user-guide/api-reference.md
|
- API Reference: user-guide/api-reference.md
|
||||||
- Advanced:
|
- Troubleshooting: user-guide/troubleshooting.md
|
||||||
- Backends: advanced/backends.md
|
|
||||||
- Monitoring: advanced/monitoring.md
|
|
||||||
- Troubleshooting: advanced/troubleshooting.md
|
|
||||||
|
|
||||||
plugins:
|
plugins:
|
||||||
- search
|
- search
|
||||||
|
|||||||
Reference in New Issue
Block a user