mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
5.0 KiB
5.0 KiB
Managing Instances
Learn how to effectively manage your Llama.cpp instances with Llamactl through both the Web UI and API.
Overview
Llamactl provides two ways to manage instances:
- Web UI: Accessible at
http://localhost:8080with an intuitive dashboard - REST API: Programmatic access for automation and integration
Authentication
If authentication is enabled:
- Navigate to the web UI
- Enter your credentials
- Bearer token is stored for the session
Theme Support
- Switch between light and dark themes
- Setting is remembered across sessions
Instance Cards
Each instance is displayed as a card showing:
- Instance name
- Health status badge (unknown, ready, error, failed)
- Action buttons (start, stop, edit, logs, delete)
Create Instance
Via Web UI
- Click the "Create Instance" button on the dashboard
- Enter a unique Name for your instance (only required field)
- Configure model source (choose one):
- Model Path: Full path to your downloaded GGUF model file
- HuggingFace Repo: Repository name (e.g.,
unsloth/gemma-3-27b-it-GGUF) - HuggingFace File: Specific file within the repo (optional, uses default if not specified)
- Configure optional instance management settings:
- Auto Restart: Automatically restart instance on failure
- Max Restarts: Maximum number of restart attempts
- Restart Delay: Delay in seconds between restart attempts
- On Demand Start: Start instance when receiving a request to the OpenAI compatible endpoint
- Idle Timeout: Minutes before stopping idle instance (set to 0 to disable)
- Configure optional llama-server backend options:
- Threads: Number of CPU threads to use
- Context Size: Context window size (ctx_size)
- GPU Layers: Number of layers to offload to GPU
- Port: Network port (auto-assigned by llamactl if not specified)
- Additional Parameters: Any other llama-server command line options (see llama-server documentation)
- Click "Create" to save the instance
Via API
# Create instance with local model file
curl -X POST http://localhost:8080/api/instances/my-instance \
-H "Content-Type: application/json" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/model.gguf",
"threads": 8,
"ctx_size": 4096
}
}'
# Create instance with HuggingFace model
curl -X POST http://localhost:8080/api/instances/gemma-3-27b \
-H "Content-Type: application/json" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"hf_repo": "unsloth/gemma-3-27b-it-GGUF",
"hf_file": "gemma-3-27b-it-GGUF.gguf",
"gpu_layers": 32
},
"auto_restart": true,
"max_restarts": 3
}'
Start Instance
Via Web UI
- Click the "Start" button on an instance card
- Watch the status change to "Unknown"
- Monitor progress in the logs
- Instance status changes to "Ready" when ready
Via API
curl -X POST http://localhost:8080/api/instances/{name}/start
Stop Instance
Via Web UI
- Click the "Stop" button on an instance card
- Instance gracefully shuts down
Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop
Edit Instance
Via Web UI
- Click the "Edit" button on an instance card
- Modify settings in the configuration dialog
- Changes require instance restart to take effect
- Click "Update & Restart" to apply changes
Via API
Modify instance settings:
curl -X PUT http://localhost:8080/api/instances/{name} \
-H "Content-Type: application/json" \
-d '{
"backend_options": {
"threads": 8,
"context_size": 4096
}
}'
!!! note Configuration changes require restarting the instance to take effect.
View Logs
Via Web UI
- Click the "Logs" button on any instance card
- Real-time log viewer opens
Via API
Check instance status in real-time:
# Get instance details
curl http://localhost:8080/api/instances/{name}/logs
Delete Instance
Via Web UI
- Click the "Delete" button on an instance card
- Only stopped instances can be deleted
- Confirm deletion in the dialog
Via API
curl -X DELETE http://localhost:8080/api/instances/{name}
Instance Proxy
Llamactl proxies all requests to the underlying llama-server instances.
# Get instance details
curl http://localhost:8080/api/instances/{name}/proxy/
Check llama-server docs for more information.
Instance Health
Via Web UI
- The health status badge is displayed on each instance card
Via API
Check the health status of your instances:
curl http://localhost:8080/api/instances/{name}/proxy/health

