Files
llamactl/docs/user-guide/managing-instances.md

172 lines
3.9 KiB
Markdown

# Managing Instances
Learn how to effectively manage your Llama.cpp instances with Llamactl.
## Instance Lifecycle
### Creating Instances
Instances can be created through the Web UI or API:
#### Via Web UI
1. Click "Add Instance" button
2. Fill in the configuration form
3. Click "Create"
#### Via API
```bash
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081
}'
```
### Starting and Stopping
#### Start an Instance
```bash
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/start
# The instance will begin loading the model
```
#### Stop an Instance
```bash
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop
# Graceful shutdown with configurable timeout
```
### Monitoring Status
Check instance status in real-time:
```bash
# Get instance details
curl http://localhost:8080/api/instances/{name}
# Get health status
curl http://localhost:8080/api/instances/{name}/health
```
## Instance States
Instances can be in one of several states:
- **Stopped**: Instance is not running
- **Starting**: Instance is initializing and loading the model
- **Running**: Instance is active and ready to serve requests
- **Stopping**: Instance is shutting down gracefully
- **Error**: Instance encountered an error
## Configuration Management
### Updating Instance Configuration
Modify instance settings:
```bash
curl -X PUT http://localhost:8080/api/instances/{name} \
-H "Content-Type: application/json" \
-d '{
"options": {
"threads": 8,
"context_size": 4096
}
}'
```
!!! note
Configuration changes require restarting the instance to take effect.
### Viewing Configuration
```bash
# Get current configuration
curl http://localhost:8080/api/instances/{name}/config
```
## Resource Management
### Memory Usage
Monitor memory consumption:
```bash
# Get resource usage
curl http://localhost:8080/api/instances/{name}/stats
```
### CPU and GPU Usage
Track performance metrics:
- CPU thread utilization
- GPU memory usage (if applicable)
- Request processing times
## Troubleshooting Common Issues
### Instance Won't Start
1. **Check model path**: Ensure the model file exists and is readable
2. **Port conflicts**: Verify the port isn't already in use
3. **Resource limits**: Check available memory and CPU
4. **Permissions**: Ensure proper file system permissions
### Performance Issues
1. **Adjust thread count**: Match to your CPU cores
2. **Optimize context size**: Balance memory usage and capability
3. **GPU offloading**: Use `gpu_layers` for GPU acceleration
4. **Batch size tuning**: Optimize for your workload
### Memory Problems
1. **Reduce context size**: Lower memory requirements
2. **Disable memory mapping**: Use `no_mmap` option
3. **Enable memory locking**: Use `memory_lock` for performance
4. **Monitor system resources**: Check available RAM
## Best Practices
### Production Deployments
1. **Resource allocation**: Plan memory and CPU requirements
2. **Health monitoring**: Set up regular health checks
3. **Graceful shutdowns**: Use proper stop procedures
4. **Backup configurations**: Save instance configurations
5. **Log management**: Configure appropriate logging levels
### Development Environments
1. **Resource sharing**: Use smaller models for development
2. **Quick iterations**: Optimize for fast startup times
3. **Debug logging**: Enable detailed logging for troubleshooting
## Batch Operations
### Managing Multiple Instances
```bash
# Start all instances
curl -X POST http://localhost:8080/api/instances/start-all
# Stop all instances
curl -X POST http://localhost:8080/api/instances/stop-all
# Get status of all instances
curl http://localhost:8080/api/instances
```
## Next Steps
- Learn about the [Web UI](web-ui.md) interface
- Explore the complete [API Reference](api-reference.md)
- Set up [Monitoring](../advanced/monitoring.md) for production use