diff --git a/docs/getting-started/quick-start.md b/docs/getting-started/quick-start.md index 6ea5720..4de1065 100644 --- a/docs/getting-started/quick-start.md +++ b/docs/getting-started/quick-start.md @@ -138,6 +138,6 @@ curl http://localhost:8080/v1/models ## Next Steps -- Learn more about the [Web UI](../user-guide/web-ui.md) +- Manage instances [Managing Instances](../user-guide/managing-instances.md) - Explore the [API Reference](../user-guide/api-reference.md) - Configure advanced settings in the [Configuration](configuration.md) guide diff --git a/docs/index.md b/docs/index.md index 0637fdc..8dc6b1c 100644 --- a/docs/index.md +++ b/docs/index.md @@ -37,7 +37,6 @@ Llamactl is designed to simplify the deployment and management of llama-server i - [Installation Guide](getting-started/installation.md) - Get Llamactl up and running - [Configuration Guide](getting-started/configuration.md) - Detailed configuration options - [Quick Start](getting-started/quick-start.md) - Your first steps with Llamactl -- [Web UI Guide](user-guide/web-ui.md) - Learn to use the web interface - [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management - [API Reference](user-guide/api-reference.md) - Complete API documentation diff --git a/docs/user-guide/managing-instances.md b/docs/user-guide/managing-instances.md index 14bbd71..9d9e4dc 100644 --- a/docs/user-guide/managing-instances.md +++ b/docs/user-guide/managing-instances.md @@ -1,73 +1,121 @@ # Managing Instances -Learn how to effectively manage your Llama.cpp instances with Llamactl. +Learn how to effectively manage your Llama.cpp instances with Llamactl through both the Web UI and API. -## Instance Lifecycle +## Overview -### Creating Instances +Llamactl provides two ways to manage instances: -Instances can be created through the Web UI or API: +- **Web UI**: Accessible at `http://localhost:8080` with an intuitive dashboard +- **REST API**: Programmatic access for automation and integration -#### Via Web UI -1. Click "Add Instance" button -2. Fill in the configuration form -3. Click "Create" +### Authentication + +If authentication is enabled: +1. Navigate to the web UI +2. Enter your credentials +3. Bearer token is stored for the session + +### Theme Support + +- Switch between light and dark themes +- Setting is remembered across sessions + +## Instance Cards + +Each instance is displayed as a card showing: + +- **Instance name** +- **Health status badge** (unknown, ready, error, failed) +- **Action buttons** (start, stop, edit, logs, delete) + +## Create Instance + +### Via Web UI + +1. Click the **"Add Instance"** button on the dashboard +2. Enter a unique **Name** for your instance (only required field) +3. Configure model source (choose one): + - **Model Path**: Full path to your downloaded GGUF model file + - **HuggingFace Repo**: Repository name (e.g., `microsoft/Phi-3-mini-4k-instruct-gguf`) + - **HuggingFace File**: Specific file within the repo (optional, uses default if not specified) +4. Configure optional instance management settings: + - **Auto Restart**: Automatically restart instance on failure + - **Max Restarts**: Maximum number of restart attempts + - **Restart Delay**: Delay in seconds between restart attempts + - **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint + - **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable) +5. Configure optional llama-server backend options: + - **Threads**: Number of CPU threads to use + - **Context Size**: Context window size (ctx_size) + - **GPU Layers**: Number of layers to offload to GPU + - **Port**: Network port (auto-assigned by llamactl if not specified) + - **Additional Parameters**: Any other llama-server command line options (see [llama-server documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md)) +6. Click **"Create"** to save the instance + +### Via API -#### Via API ```bash -curl -X POST http://localhost:8080/api/instances \ +# Create instance with local model file +curl -X POST http://localhost:8080/api/instances/my-instance \ -H "Content-Type: application/json" \ -d '{ - "name": "my-instance", - "model_path": "/path/to/model.gguf", - "port": 8081 + "backend_type": "llama_cpp", + "backend_options": { + "model": "/path/to/model.gguf", + "threads": 8, + "ctx_size": 4096 + } + }' + +# Create instance with HuggingFace model +curl -X POST http://localhost:8080/api/instances/phi3-mini \ + -H "Content-Type: application/json" \ + -d '{ + "backend_type": "llama_cpp", + "backend_options": { + "hf_repo": "microsoft/Phi-3-mini-4k-instruct-gguf", + "hf_file": "Phi-3-mini-4k-instruct-q4.gguf", + "gpu_layers": 32 + }, + "auto_restart": true, + "max_restarts": 3 }' ``` -### Starting and Stopping +## Start Instance -#### Start an Instance +### Via Web UI +1. Click the **"Start"** button on an instance card +2. Watch the status change to "Unknown" +3. Monitor progress in the logs +4. Instance status changes to "Ready" when ready + +### Via API ```bash -# Via API curl -X POST http://localhost:8080/api/instances/{name}/start - -# The instance will begin loading the model ``` -#### Stop an Instance +## Stop Instance + +### Via Web UI +1. Click the **"Stop"** button on an instance card +2. Instance gracefully shuts down + +### Via API ```bash -# Via API curl -X POST http://localhost:8080/api/instances/{name}/stop - -# Graceful shutdown with configurable timeout ``` -### Monitoring Status +## Edit Instance -Check instance status in real-time: - -```bash -# Get instance details -curl http://localhost:8080/api/instances/{name} - -# Get health status -curl http://localhost:8080/api/instances/{name}/health -``` - -## Instance States - -Instances can be in one of several states: - -- **Stopped**: Instance is not running -- **Starting**: Instance is initializing and loading the model -- **Running**: Instance is active and ready to serve requests -- **Stopping**: Instance is shutting down gracefully -- **Error**: Instance encountered an error - -## Configuration Management - -### Updating Instance Configuration +### Via Web UI +1. Click the **"Edit"** button on an instance card +2. Modify settings in the configuration dialog +3. Changes require instance restart to take effect +4. Click **"Update & Restart"** to apply changes +### Via API Modify instance settings: ```bash @@ -84,82 +132,55 @@ curl -X PUT http://localhost:8080/api/instances/{name} \ !!! note Configuration changes require restarting the instance to take effect. -### Viewing Configuration + +## View Logs + +### Via Web UI + +1. Click the **"Logs"** button on any instance card +2. Real-time log viewer opens + +### Via API +Check instance status in real-time: ```bash -# Get current configuration -curl http://localhost:8080/api/instances/{name}/config +# Get instance details +curl http://localhost:8080/api/instances/{name}/logs ``` -## Resource Management +## Delete Instance -### Memory Usage +### Via Web UI +1. Click the **"Delete"** button on an instance card +2. Only stopped instances can be deleted +3. Confirm deletion in the dialog -Monitor memory consumption: +### Via API +```bash +curl -X DELETE http://localhost:8080/api/instances/{name} +``` + +## Instance Proxy + +Llamactl proxies all requests to the underlying llama-server instances. ```bash -# Get resource usage -curl http://localhost:8080/api/instances/{name}/stats +# Get instance details +curl http://localhost:8080/api/instances/{name}/proxy/ ``` -### CPU and GPU Usage +Check llama-server [docs](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for more information. -Track performance metrics: +### Instance Health -- CPU thread utilization -- GPU memory usage (if applicable) -- Request processing times +#### Via Web UI -## Troubleshooting Common Issues +1. The health status badge is displayed on each instance card -### Instance Won't Start +#### Via API -1. **Check model path**: Ensure the model file exists and is readable -2. **Port conflicts**: Verify the port isn't already in use -3. **Resource limits**: Check available memory and CPU -4. **Permissions**: Ensure proper file system permissions - -### Performance Issues - -1. **Adjust thread count**: Match to your CPU cores -2. **Optimize context size**: Balance memory usage and capability -3. **GPU offloading**: Use `gpu_layers` for GPU acceleration -4. **Batch size tuning**: Optimize for your workload - -### Memory Problems - -1. **Reduce context size**: Lower memory requirements -2. **Disable memory mapping**: Use `no_mmap` option -3. **Enable memory locking**: Use `memory_lock` for performance -4. **Monitor system resources**: Check available RAM - -## Best Practices - -### Production Deployments - -1. **Resource allocation**: Plan memory and CPU requirements -2. **Health monitoring**: Set up regular health checks -3. **Graceful shutdowns**: Use proper stop procedures -4. **Backup configurations**: Save instance configurations -5. **Log management**: Configure appropriate logging levels - -### Development Environments - -1. **Resource sharing**: Use smaller models for development -2. **Quick iterations**: Optimize for fast startup times -3. **Debug logging**: Enable detailed logging for troubleshooting - -## Batch Operations - -### Managing Multiple Instances +Check the health status of your instances: ```bash -# Start all instances -curl -X POST http://localhost:8080/api/instances/start-all - -# Stop all instances -curl -X POST http://localhost:8080/api/instances/stop-all - -# Get status of all instances -curl http://localhost:8080/api/instances +curl http://localhost:8080/api/instances/{name}/proxy/health ``` diff --git a/docs/user-guide/web-ui.md b/docs/user-guide/web-ui.md deleted file mode 100644 index 6a3c4c1..0000000 --- a/docs/user-guide/web-ui.md +++ /dev/null @@ -1,210 +0,0 @@ -# Web UI Guide - -The Llamactl Web UI provides an intuitive interface for managing your Llama.cpp instances. - -## Overview - -The web interface is accessible at `http://localhost:8080` (or your configured host/port) and provides: - -- Instance management dashboard -- Real-time status monitoring -- Configuration management -- Log viewing -- System information - -## Dashboard - -### Instance Cards - -Each instance is displayed as a card showing: - -- **Instance name** and status indicator -- **Model information** (name, size) -- **Current state** (stopped, starting, running, error) -- **Resource usage** (memory, CPU) -- **Action buttons** (start, stop, configure, logs) - -### Status Indicators - -- 🟢 **Green**: Instance is running and healthy -- 🟡 **Yellow**: Instance is starting or stopping -- 🔴 **Red**: Instance has encountered an error -- ⚪ **Gray**: Instance is stopped - -## Creating Instances - -### Add Instance Dialog - -1. Click the **"Add Instance"** button -2. Fill in the required fields: - - **Name**: Unique identifier for your instance - - **Model Path**: Full path to your GGUF model file - - **Port**: Port number for the instance - -3. Configure optional settings: - - **Threads**: Number of CPU threads - - **Context Size**: Context window size - - **GPU Layers**: Layers to offload to GPU - - **Additional Options**: Advanced Llama.cpp parameters - -4. Click **"Create"** to save the instance - -### Model Path Helper - -Use the file browser to select model files: - -- Navigate to your models directory -- Select the `.gguf` file -- Path is automatically filled in the form - -## Managing Instances - -### Starting Instances - -1. Click the **"Start"** button on an instance card -2. Watch the status change to "Starting" -3. Monitor progress in the logs -4. Instance becomes "Running" when ready - -### Stopping Instances - -1. Click the **"Stop"** button -2. Instance gracefully shuts down -3. Status changes to "Stopped" - -### Viewing Logs - -1. Click the **"Logs"** button on any instance -2. Real-time log viewer opens -3. Filter by log level (Debug, Info, Warning, Error) -4. Search through log entries -5. Download logs for offline analysis - -## Configuration Management - -### Editing Instance Settings - -1. Click the **"Configure"** button -2. Modify settings in the configuration dialog -3. Changes require instance restart to take effect -4. Click **"Save"** to apply changes - -### Advanced Options - -Access advanced Llama.cpp options: - -```yaml -# Example advanced configuration -options: - rope_freq_base: 10000 - rope_freq_scale: 1.0 - yarn_ext_factor: -1.0 - yarn_attn_factor: 1.0 - yarn_beta_fast: 32.0 - yarn_beta_slow: 1.0 -``` - -## System Information - -### Health Dashboard - -Monitor overall system health: - -- **System Resources**: CPU, memory, disk usage -- **Instance Summary**: Running/stopped instance counts -- **Performance Metrics**: Request rates, response times - -### Resource Usage - -Track resource consumption: - -- Per-instance memory usage -- CPU utilization -- GPU memory (if applicable) -- Network I/O - -## User Interface Features - -### Theme Support - -Switch between light and dark themes: - -1. Click the theme toggle button -2. Setting is remembered across sessions - -### Responsive Design - -The UI adapts to different screen sizes: - -- **Desktop**: Full-featured dashboard -- **Tablet**: Condensed layout -- **Mobile**: Stack-based navigation - -### Keyboard Shortcuts - -- `Ctrl+N`: Create new instance -- `Ctrl+R`: Refresh dashboard -- `Ctrl+L`: Open logs for selected instance -- `Esc`: Close dialogs - -## Authentication - -### Login - -If authentication is enabled: - -1. Navigate to the web UI -2. Enter your credentials -3. JWT token is stored for the session -4. Automatic logout on token expiry - -### Session Management - -- Sessions persist across browser restarts -- Logout clears authentication tokens -- Configurable session timeout - -## Troubleshooting - -### Common UI Issues - -**Page won't load:** -- Check if Llamactl server is running -- Verify the correct URL and port -- Check browser console for errors - -**Instance won't start from UI:** -- Verify model path is correct -- Check for port conflicts -- Review instance logs for errors - -**Real-time updates not working:** -- Check WebSocket connection -- Verify firewall settings -- Try refreshing the page - -### Browser Compatibility - -Supported browsers: -- Chrome/Chromium 90+ -- Firefox 88+ -- Safari 14+ -- Edge 90+ - -## Mobile Access - -### Responsive Features - -On mobile devices: - -- Touch-friendly interface -- Swipe gestures for navigation -- Optimized button sizes -- Condensed information display - -### Limitations - -Some features may be limited on mobile: -- Log viewing (use horizontal scrolling) -- Complex configuration forms -- File browser functionality diff --git a/mkdocs.yml b/mkdocs.yml index f9fbe3d..ed4be3a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -55,7 +55,6 @@ nav: - Configuration: getting-started/configuration.md - User Guide: - Managing Instances: user-guide/managing-instances.md - - Web UI: user-guide/web-ui.md - API Reference: user-guide/api-reference.md - Troubleshooting: user-guide/troubleshooting.md