mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
3.9 KiB
3.9 KiB
Managing Instances
Learn how to effectively manage your Llama.cpp instances with Llamactl.
Instance Lifecycle
Creating Instances
Instances can be created through the Web UI or API:
Via Web UI
- Click "Add Instance" button
- Fill in the configuration form
- Click "Create"
Via API
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081
}'
Starting and Stopping
Start an Instance
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/start
# The instance will begin loading the model
Stop an Instance
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop
# Graceful shutdown with configurable timeout
Monitoring Status
Check instance status in real-time:
# Get instance details
curl http://localhost:8080/api/instances/{name}
# Get health status
curl http://localhost:8080/api/instances/{name}/health
Instance States
Instances can be in one of several states:
- Stopped: Instance is not running
- Starting: Instance is initializing and loading the model
- Running: Instance is active and ready to serve requests
- Stopping: Instance is shutting down gracefully
- Error: Instance encountered an error
Configuration Management
Updating Instance Configuration
Modify instance settings:
curl -X PUT http://localhost:8080/api/instances/{name} \
-H "Content-Type: application/json" \
-d '{
"options": {
"threads": 8,
"context_size": 4096
}
}'
!!! note Configuration changes require restarting the instance to take effect.
Viewing Configuration
# Get current configuration
curl http://localhost:8080/api/instances/{name}/config
Resource Management
Memory Usage
Monitor memory consumption:
# Get resource usage
curl http://localhost:8080/api/instances/{name}/stats
CPU and GPU Usage
Track performance metrics:
- CPU thread utilization
- GPU memory usage (if applicable)
- Request processing times
Troubleshooting Common Issues
Instance Won't Start
- Check model path: Ensure the model file exists and is readable
- Port conflicts: Verify the port isn't already in use
- Resource limits: Check available memory and CPU
- Permissions: Ensure proper file system permissions
Performance Issues
- Adjust thread count: Match to your CPU cores
- Optimize context size: Balance memory usage and capability
- GPU offloading: Use
gpu_layersfor GPU acceleration - Batch size tuning: Optimize for your workload
Memory Problems
- Reduce context size: Lower memory requirements
- Disable memory mapping: Use
no_mmapoption - Enable memory locking: Use
memory_lockfor performance - Monitor system resources: Check available RAM
Best Practices
Production Deployments
- Resource allocation: Plan memory and CPU requirements
- Health monitoring: Set up regular health checks
- Graceful shutdowns: Use proper stop procedures
- Backup configurations: Save instance configurations
- Log management: Configure appropriate logging levels
Development Environments
- Resource sharing: Use smaller models for development
- Quick iterations: Optimize for fast startup times
- Debug logging: Enable detailed logging for troubleshooting
Batch Operations
Managing Multiple Instances
# Start all instances
curl -X POST http://localhost:8080/api/instances/start-all
# Stop all instances
curl -X POST http://localhost:8080/api/instances/stop-all
# Get status of all instances
curl http://localhost:8080/api/instances
Next Steps
- Learn about the Web UI interface
- Explore the complete API Reference
- Set up Monitoring for production use