mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Update documentation: remove Web UI guide and adjust navigation links
This commit is contained in:
@@ -138,6 +138,6 @@ curl http://localhost:8080/v1/models
|
|||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
- Learn more about the [Web UI](../user-guide/web-ui.md)
|
- Manage instances [Managing Instances](../user-guide/managing-instances.md)
|
||||||
- Explore the [API Reference](../user-guide/api-reference.md)
|
- Explore the [API Reference](../user-guide/api-reference.md)
|
||||||
- Configure advanced settings in the [Configuration](configuration.md) guide
|
- Configure advanced settings in the [Configuration](configuration.md) guide
|
||||||
|
|||||||
@@ -37,7 +37,6 @@ Llamactl is designed to simplify the deployment and management of llama-server i
|
|||||||
- [Installation Guide](getting-started/installation.md) - Get Llamactl up and running
|
- [Installation Guide](getting-started/installation.md) - Get Llamactl up and running
|
||||||
- [Configuration Guide](getting-started/configuration.md) - Detailed configuration options
|
- [Configuration Guide](getting-started/configuration.md) - Detailed configuration options
|
||||||
- [Quick Start](getting-started/quick-start.md) - Your first steps with Llamactl
|
- [Quick Start](getting-started/quick-start.md) - Your first steps with Llamactl
|
||||||
- [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface
|
|
||||||
- [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
|
- [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
|
||||||
- [API Reference](user-guide/api-reference.md) - Complete API documentation
|
- [API Reference](user-guide/api-reference.md) - Complete API documentation
|
||||||
|
|
||||||
|
|||||||
@@ -1,73 +1,121 @@
|
|||||||
# Managing Instances
|
# Managing Instances
|
||||||
|
|
||||||
Learn how to effectively manage your Llama.cpp instances with Llamactl.
|
Learn how to effectively manage your Llama.cpp instances with Llamactl through both the Web UI and API.
|
||||||
|
|
||||||
## Instance Lifecycle
|
## Overview
|
||||||
|
|
||||||
### Creating Instances
|
Llamactl provides two ways to manage instances:
|
||||||
|
|
||||||
Instances can be created through the Web UI or API:
|
- **Web UI**: Accessible at `http://localhost:8080` with an intuitive dashboard
|
||||||
|
- **REST API**: Programmatic access for automation and integration
|
||||||
|
|
||||||
#### Via Web UI
|
### Authentication
|
||||||
1. Click "Add Instance" button
|
|
||||||
2. Fill in the configuration form
|
If authentication is enabled:
|
||||||
3. Click "Create"
|
1. Navigate to the web UI
|
||||||
|
2. Enter your credentials
|
||||||
|
3. Bearer token is stored for the session
|
||||||
|
|
||||||
|
### Theme Support
|
||||||
|
|
||||||
|
- Switch between light and dark themes
|
||||||
|
- Setting is remembered across sessions
|
||||||
|
|
||||||
|
## Instance Cards
|
||||||
|
|
||||||
|
Each instance is displayed as a card showing:
|
||||||
|
|
||||||
|
- **Instance name**
|
||||||
|
- **Health status badge** (unknown, ready, error, failed)
|
||||||
|
- **Action buttons** (start, stop, edit, logs, delete)
|
||||||
|
|
||||||
|
## Create Instance
|
||||||
|
|
||||||
|
### Via Web UI
|
||||||
|
|
||||||
|
1. Click the **"Add Instance"** button on the dashboard
|
||||||
|
2. Enter a unique **Name** for your instance (only required field)
|
||||||
|
3. Configure model source (choose one):
|
||||||
|
- **Model Path**: Full path to your downloaded GGUF model file
|
||||||
|
- **HuggingFace Repo**: Repository name (e.g., `microsoft/Phi-3-mini-4k-instruct-gguf`)
|
||||||
|
- **HuggingFace File**: Specific file within the repo (optional, uses default if not specified)
|
||||||
|
4. Configure optional instance management settings:
|
||||||
|
- **Auto Restart**: Automatically restart instance on failure
|
||||||
|
- **Max Restarts**: Maximum number of restart attempts
|
||||||
|
- **Restart Delay**: Delay in seconds between restart attempts
|
||||||
|
- **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
|
||||||
|
- **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
|
||||||
|
5. Configure optional llama-server backend options:
|
||||||
|
- **Threads**: Number of CPU threads to use
|
||||||
|
- **Context Size**: Context window size (ctx_size)
|
||||||
|
- **GPU Layers**: Number of layers to offload to GPU
|
||||||
|
- **Port**: Network port (auto-assigned by llamactl if not specified)
|
||||||
|
- **Additional Parameters**: Any other llama-server command line options (see [llama-server documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md))
|
||||||
|
6. Click **"Create"** to save the instance
|
||||||
|
|
||||||
|
### Via API
|
||||||
|
|
||||||
#### Via API
|
|
||||||
```bash
|
```bash
|
||||||
curl -X POST http://localhost:8080/api/instances \
|
# Create instance with local model file
|
||||||
|
curl -X POST http://localhost:8080/api/instances/my-instance \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
-d '{
|
-d '{
|
||||||
"name": "my-instance",
|
"backend_type": "llama_cpp",
|
||||||
"model_path": "/path/to/model.gguf",
|
"backend_options": {
|
||||||
"port": 8081
|
"model": "/path/to/model.gguf",
|
||||||
|
"threads": 8,
|
||||||
|
"ctx_size": 4096
|
||||||
|
}
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Create instance with HuggingFace model
|
||||||
|
curl -X POST http://localhost:8080/api/instances/phi3-mini \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"backend_type": "llama_cpp",
|
||||||
|
"backend_options": {
|
||||||
|
"hf_repo": "microsoft/Phi-3-mini-4k-instruct-gguf",
|
||||||
|
"hf_file": "Phi-3-mini-4k-instruct-q4.gguf",
|
||||||
|
"gpu_layers": 32
|
||||||
|
},
|
||||||
|
"auto_restart": true,
|
||||||
|
"max_restarts": 3
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
### Starting and Stopping
|
## Start Instance
|
||||||
|
|
||||||
#### Start an Instance
|
### Via Web UI
|
||||||
|
1. Click the **"Start"** button on an instance card
|
||||||
|
2. Watch the status change to "Unknown"
|
||||||
|
3. Monitor progress in the logs
|
||||||
|
4. Instance status changes to "Ready" when ready
|
||||||
|
|
||||||
|
### Via API
|
||||||
```bash
|
```bash
|
||||||
# Via API
|
|
||||||
curl -X POST http://localhost:8080/api/instances/{name}/start
|
curl -X POST http://localhost:8080/api/instances/{name}/start
|
||||||
|
|
||||||
# The instance will begin loading the model
|
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Stop an Instance
|
## Stop Instance
|
||||||
|
|
||||||
|
### Via Web UI
|
||||||
|
1. Click the **"Stop"** button on an instance card
|
||||||
|
2. Instance gracefully shuts down
|
||||||
|
|
||||||
|
### Via API
|
||||||
```bash
|
```bash
|
||||||
# Via API
|
|
||||||
curl -X POST http://localhost:8080/api/instances/{name}/stop
|
curl -X POST http://localhost:8080/api/instances/{name}/stop
|
||||||
|
|
||||||
# Graceful shutdown with configurable timeout
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Monitoring Status
|
## Edit Instance
|
||||||
|
|
||||||
Check instance status in real-time:
|
### Via Web UI
|
||||||
|
1. Click the **"Edit"** button on an instance card
|
||||||
```bash
|
2. Modify settings in the configuration dialog
|
||||||
# Get instance details
|
3. Changes require instance restart to take effect
|
||||||
curl http://localhost:8080/api/instances/{name}
|
4. Click **"Update & Restart"** to apply changes
|
||||||
|
|
||||||
# Get health status
|
|
||||||
curl http://localhost:8080/api/instances/{name}/health
|
|
||||||
```
|
|
||||||
|
|
||||||
## Instance States
|
|
||||||
|
|
||||||
Instances can be in one of several states:
|
|
||||||
|
|
||||||
- **Stopped**: Instance is not running
|
|
||||||
- **Starting**: Instance is initializing and loading the model
|
|
||||||
- **Running**: Instance is active and ready to serve requests
|
|
||||||
- **Stopping**: Instance is shutting down gracefully
|
|
||||||
- **Error**: Instance encountered an error
|
|
||||||
|
|
||||||
## Configuration Management
|
|
||||||
|
|
||||||
### Updating Instance Configuration
|
|
||||||
|
|
||||||
|
### Via API
|
||||||
Modify instance settings:
|
Modify instance settings:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -84,82 +132,55 @@ curl -X PUT http://localhost:8080/api/instances/{name} \
|
|||||||
!!! note
|
!!! note
|
||||||
Configuration changes require restarting the instance to take effect.
|
Configuration changes require restarting the instance to take effect.
|
||||||
|
|
||||||
### Viewing Configuration
|
|
||||||
|
## View Logs
|
||||||
|
|
||||||
|
### Via Web UI
|
||||||
|
|
||||||
|
1. Click the **"Logs"** button on any instance card
|
||||||
|
2. Real-time log viewer opens
|
||||||
|
|
||||||
|
### Via API
|
||||||
|
Check instance status in real-time:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Get current configuration
|
# Get instance details
|
||||||
curl http://localhost:8080/api/instances/{name}/config
|
curl http://localhost:8080/api/instances/{name}/logs
|
||||||
```
|
```
|
||||||
|
|
||||||
## Resource Management
|
## Delete Instance
|
||||||
|
|
||||||
### Memory Usage
|
### Via Web UI
|
||||||
|
1. Click the **"Delete"** button on an instance card
|
||||||
|
2. Only stopped instances can be deleted
|
||||||
|
3. Confirm deletion in the dialog
|
||||||
|
|
||||||
Monitor memory consumption:
|
### Via API
|
||||||
|
```bash
|
||||||
|
curl -X DELETE http://localhost:8080/api/instances/{name}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Instance Proxy
|
||||||
|
|
||||||
|
Llamactl proxies all requests to the underlying llama-server instances.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Get resource usage
|
# Get instance details
|
||||||
curl http://localhost:8080/api/instances/{name}/stats
|
curl http://localhost:8080/api/instances/{name}/proxy/
|
||||||
```
|
```
|
||||||
|
|
||||||
### CPU and GPU Usage
|
Check llama-server [docs](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for more information.
|
||||||
|
|
||||||
Track performance metrics:
|
### Instance Health
|
||||||
|
|
||||||
- CPU thread utilization
|
#### Via Web UI
|
||||||
- GPU memory usage (if applicable)
|
|
||||||
- Request processing times
|
|
||||||
|
|
||||||
## Troubleshooting Common Issues
|
1. The health status badge is displayed on each instance card
|
||||||
|
|
||||||
### Instance Won't Start
|
#### Via API
|
||||||
|
|
||||||
1. **Check model path**: Ensure the model file exists and is readable
|
Check the health status of your instances:
|
||||||
2. **Port conflicts**: Verify the port isn't already in use
|
|
||||||
3. **Resource limits**: Check available memory and CPU
|
|
||||||
4. **Permissions**: Ensure proper file system permissions
|
|
||||||
|
|
||||||
### Performance Issues
|
|
||||||
|
|
||||||
1. **Adjust thread count**: Match to your CPU cores
|
|
||||||
2. **Optimize context size**: Balance memory usage and capability
|
|
||||||
3. **GPU offloading**: Use `gpu_layers` for GPU acceleration
|
|
||||||
4. **Batch size tuning**: Optimize for your workload
|
|
||||||
|
|
||||||
### Memory Problems
|
|
||||||
|
|
||||||
1. **Reduce context size**: Lower memory requirements
|
|
||||||
2. **Disable memory mapping**: Use `no_mmap` option
|
|
||||||
3. **Enable memory locking**: Use `memory_lock` for performance
|
|
||||||
4. **Monitor system resources**: Check available RAM
|
|
||||||
|
|
||||||
## Best Practices
|
|
||||||
|
|
||||||
### Production Deployments
|
|
||||||
|
|
||||||
1. **Resource allocation**: Plan memory and CPU requirements
|
|
||||||
2. **Health monitoring**: Set up regular health checks
|
|
||||||
3. **Graceful shutdowns**: Use proper stop procedures
|
|
||||||
4. **Backup configurations**: Save instance configurations
|
|
||||||
5. **Log management**: Configure appropriate logging levels
|
|
||||||
|
|
||||||
### Development Environments
|
|
||||||
|
|
||||||
1. **Resource sharing**: Use smaller models for development
|
|
||||||
2. **Quick iterations**: Optimize for fast startup times
|
|
||||||
3. **Debug logging**: Enable detailed logging for troubleshooting
|
|
||||||
|
|
||||||
## Batch Operations
|
|
||||||
|
|
||||||
### Managing Multiple Instances
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Start all instances
|
curl http://localhost:8080/api/instances/{name}/proxy/health
|
||||||
curl -X POST http://localhost:8080/api/instances/start-all
|
|
||||||
|
|
||||||
# Stop all instances
|
|
||||||
curl -X POST http://localhost:8080/api/instances/stop-all
|
|
||||||
|
|
||||||
# Get status of all instances
|
|
||||||
curl http://localhost:8080/api/instances
|
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -1,210 +0,0 @@
|
|||||||
# Web UI Guide
|
|
||||||
|
|
||||||
The Llamactl Web UI provides an intuitive interface for managing your Llama.cpp instances.
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The web interface is accessible at `http://localhost:8080` (or your configured host/port) and provides:
|
|
||||||
|
|
||||||
- Instance management dashboard
|
|
||||||
- Real-time status monitoring
|
|
||||||
- Configuration management
|
|
||||||
- Log viewing
|
|
||||||
- System information
|
|
||||||
|
|
||||||
## Dashboard
|
|
||||||
|
|
||||||
### Instance Cards
|
|
||||||
|
|
||||||
Each instance is displayed as a card showing:
|
|
||||||
|
|
||||||
- **Instance name** and status indicator
|
|
||||||
- **Model information** (name, size)
|
|
||||||
- **Current state** (stopped, starting, running, error)
|
|
||||||
- **Resource usage** (memory, CPU)
|
|
||||||
- **Action buttons** (start, stop, configure, logs)
|
|
||||||
|
|
||||||
### Status Indicators
|
|
||||||
|
|
||||||
- 🟢 **Green**: Instance is running and healthy
|
|
||||||
- 🟡 **Yellow**: Instance is starting or stopping
|
|
||||||
- 🔴 **Red**: Instance has encountered an error
|
|
||||||
- ⚪ **Gray**: Instance is stopped
|
|
||||||
|
|
||||||
## Creating Instances
|
|
||||||
|
|
||||||
### Add Instance Dialog
|
|
||||||
|
|
||||||
1. Click the **"Add Instance"** button
|
|
||||||
2. Fill in the required fields:
|
|
||||||
- **Name**: Unique identifier for your instance
|
|
||||||
- **Model Path**: Full path to your GGUF model file
|
|
||||||
- **Port**: Port number for the instance
|
|
||||||
|
|
||||||
3. Configure optional settings:
|
|
||||||
- **Threads**: Number of CPU threads
|
|
||||||
- **Context Size**: Context window size
|
|
||||||
- **GPU Layers**: Layers to offload to GPU
|
|
||||||
- **Additional Options**: Advanced Llama.cpp parameters
|
|
||||||
|
|
||||||
4. Click **"Create"** to save the instance
|
|
||||||
|
|
||||||
### Model Path Helper
|
|
||||||
|
|
||||||
Use the file browser to select model files:
|
|
||||||
|
|
||||||
- Navigate to your models directory
|
|
||||||
- Select the `.gguf` file
|
|
||||||
- Path is automatically filled in the form
|
|
||||||
|
|
||||||
## Managing Instances
|
|
||||||
|
|
||||||
### Starting Instances
|
|
||||||
|
|
||||||
1. Click the **"Start"** button on an instance card
|
|
||||||
2. Watch the status change to "Starting"
|
|
||||||
3. Monitor progress in the logs
|
|
||||||
4. Instance becomes "Running" when ready
|
|
||||||
|
|
||||||
### Stopping Instances
|
|
||||||
|
|
||||||
1. Click the **"Stop"** button
|
|
||||||
2. Instance gracefully shuts down
|
|
||||||
3. Status changes to "Stopped"
|
|
||||||
|
|
||||||
### Viewing Logs
|
|
||||||
|
|
||||||
1. Click the **"Logs"** button on any instance
|
|
||||||
2. Real-time log viewer opens
|
|
||||||
3. Filter by log level (Debug, Info, Warning, Error)
|
|
||||||
4. Search through log entries
|
|
||||||
5. Download logs for offline analysis
|
|
||||||
|
|
||||||
## Configuration Management
|
|
||||||
|
|
||||||
### Editing Instance Settings
|
|
||||||
|
|
||||||
1. Click the **"Configure"** button
|
|
||||||
2. Modify settings in the configuration dialog
|
|
||||||
3. Changes require instance restart to take effect
|
|
||||||
4. Click **"Save"** to apply changes
|
|
||||||
|
|
||||||
### Advanced Options
|
|
||||||
|
|
||||||
Access advanced Llama.cpp options:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
# Example advanced configuration
|
|
||||||
options:
|
|
||||||
rope_freq_base: 10000
|
|
||||||
rope_freq_scale: 1.0
|
|
||||||
yarn_ext_factor: -1.0
|
|
||||||
yarn_attn_factor: 1.0
|
|
||||||
yarn_beta_fast: 32.0
|
|
||||||
yarn_beta_slow: 1.0
|
|
||||||
```
|
|
||||||
|
|
||||||
## System Information
|
|
||||||
|
|
||||||
### Health Dashboard
|
|
||||||
|
|
||||||
Monitor overall system health:
|
|
||||||
|
|
||||||
- **System Resources**: CPU, memory, disk usage
|
|
||||||
- **Instance Summary**: Running/stopped instance counts
|
|
||||||
- **Performance Metrics**: Request rates, response times
|
|
||||||
|
|
||||||
### Resource Usage
|
|
||||||
|
|
||||||
Track resource consumption:
|
|
||||||
|
|
||||||
- Per-instance memory usage
|
|
||||||
- CPU utilization
|
|
||||||
- GPU memory (if applicable)
|
|
||||||
- Network I/O
|
|
||||||
|
|
||||||
## User Interface Features
|
|
||||||
|
|
||||||
### Theme Support
|
|
||||||
|
|
||||||
Switch between light and dark themes:
|
|
||||||
|
|
||||||
1. Click the theme toggle button
|
|
||||||
2. Setting is remembered across sessions
|
|
||||||
|
|
||||||
### Responsive Design
|
|
||||||
|
|
||||||
The UI adapts to different screen sizes:
|
|
||||||
|
|
||||||
- **Desktop**: Full-featured dashboard
|
|
||||||
- **Tablet**: Condensed layout
|
|
||||||
- **Mobile**: Stack-based navigation
|
|
||||||
|
|
||||||
### Keyboard Shortcuts
|
|
||||||
|
|
||||||
- `Ctrl+N`: Create new instance
|
|
||||||
- `Ctrl+R`: Refresh dashboard
|
|
||||||
- `Ctrl+L`: Open logs for selected instance
|
|
||||||
- `Esc`: Close dialogs
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
### Login
|
|
||||||
|
|
||||||
If authentication is enabled:
|
|
||||||
|
|
||||||
1. Navigate to the web UI
|
|
||||||
2. Enter your credentials
|
|
||||||
3. JWT token is stored for the session
|
|
||||||
4. Automatic logout on token expiry
|
|
||||||
|
|
||||||
### Session Management
|
|
||||||
|
|
||||||
- Sessions persist across browser restarts
|
|
||||||
- Logout clears authentication tokens
|
|
||||||
- Configurable session timeout
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### Common UI Issues
|
|
||||||
|
|
||||||
**Page won't load:**
|
|
||||||
- Check if Llamactl server is running
|
|
||||||
- Verify the correct URL and port
|
|
||||||
- Check browser console for errors
|
|
||||||
|
|
||||||
**Instance won't start from UI:**
|
|
||||||
- Verify model path is correct
|
|
||||||
- Check for port conflicts
|
|
||||||
- Review instance logs for errors
|
|
||||||
|
|
||||||
**Real-time updates not working:**
|
|
||||||
- Check WebSocket connection
|
|
||||||
- Verify firewall settings
|
|
||||||
- Try refreshing the page
|
|
||||||
|
|
||||||
### Browser Compatibility
|
|
||||||
|
|
||||||
Supported browsers:
|
|
||||||
- Chrome/Chromium 90+
|
|
||||||
- Firefox 88+
|
|
||||||
- Safari 14+
|
|
||||||
- Edge 90+
|
|
||||||
|
|
||||||
## Mobile Access
|
|
||||||
|
|
||||||
### Responsive Features
|
|
||||||
|
|
||||||
On mobile devices:
|
|
||||||
|
|
||||||
- Touch-friendly interface
|
|
||||||
- Swipe gestures for navigation
|
|
||||||
- Optimized button sizes
|
|
||||||
- Condensed information display
|
|
||||||
|
|
||||||
### Limitations
|
|
||||||
|
|
||||||
Some features may be limited on mobile:
|
|
||||||
- Log viewing (use horizontal scrolling)
|
|
||||||
- Complex configuration forms
|
|
||||||
- File browser functionality
|
|
||||||
@@ -55,7 +55,6 @@ nav:
|
|||||||
- Configuration: getting-started/configuration.md
|
- Configuration: getting-started/configuration.md
|
||||||
- User Guide:
|
- User Guide:
|
||||||
- Managing Instances: user-guide/managing-instances.md
|
- Managing Instances: user-guide/managing-instances.md
|
||||||
- Web UI: user-guide/web-ui.md
|
|
||||||
- API Reference: user-guide/api-reference.md
|
- API Reference: user-guide/api-reference.md
|
||||||
- Troubleshooting: user-guide/troubleshooting.md
|
- Troubleshooting: user-guide/troubleshooting.md
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user