Create initial documentation structure

This commit is contained in:
2025-08-31 14:27:00 +02:00
parent 7675271370
commit bd31c03f4a
16 changed files with 3514 additions and 0 deletions

View File

@@ -0,0 +1,470 @@
# API Reference
Complete reference for the LlamaCtl REST API.
## Base URL
All API endpoints are relative to the base URL:
```
http://localhost:8080/api
```
## Authentication
If authentication is enabled, include the JWT token in the Authorization header:
```bash
curl -H "Authorization: Bearer <your-jwt-token>" \
http://localhost:8080/api/instances
```
## Instances
### List All Instances
Get a list of all instances.
```http
GET /api/instances
```
**Response:**
```json
{
"instances": [
{
"name": "llama2-7b",
"status": "running",
"model_path": "/models/llama-2-7b.gguf",
"port": 8081,
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
}
]
}
```
### Get Instance Details
Get detailed information about a specific instance.
```http
GET /api/instances/{name}
```
**Response:**
```json
{
"name": "llama2-7b",
"status": "running",
"model_path": "/models/llama-2-7b.gguf",
"port": 8081,
"pid": 12345,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
},
"stats": {
"memory_usage": 4294967296,
"cpu_usage": 25.5,
"uptime": 3600
},
"created_at": "2024-01-15T10:30:00Z",
"updated_at": "2024-01-15T12:45:00Z"
}
```
### Create Instance
Create a new instance.
```http
POST /api/instances
```
**Request Body:**
```json
{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081,
"options": {
"threads": 4,
"context_size": 2048,
"gpu_layers": 0
}
}
```
**Response:**
```json
{
"message": "Instance created successfully",
"instance": {
"name": "my-instance",
"status": "stopped",
"model_path": "/path/to/model.gguf",
"port": 8081,
"created_at": "2024-01-15T14:30:00Z"
}
}
```
### Update Instance
Update an existing instance configuration.
```http
PUT /api/instances/{name}
```
**Request Body:**
```json
{
"options": {
"threads": 8,
"context_size": 4096
}
}
```
### Delete Instance
Delete an instance (must be stopped first).
```http
DELETE /api/instances/{name}
```
**Response:**
```json
{
"message": "Instance deleted successfully"
}
```
## Instance Operations
### Start Instance
Start a stopped instance.
```http
POST /api/instances/{name}/start
```
**Response:**
```json
{
"message": "Instance start initiated",
"status": "starting"
}
```
### Stop Instance
Stop a running instance.
```http
POST /api/instances/{name}/stop
```
**Request Body (Optional):**
```json
{
"force": false,
"timeout": 30
}
```
**Response:**
```json
{
"message": "Instance stop initiated",
"status": "stopping"
}
```
### Restart Instance
Restart an instance (stop then start).
```http
POST /api/instances/{name}/restart
```
### Get Instance Health
Check instance health status.
```http
GET /api/instances/{name}/health
```
**Response:**
```json
{
"status": "healthy",
"checks": {
"process": "running",
"port": "open",
"response": "ok"
},
"last_check": "2024-01-15T14:30:00Z"
}
```
### Get Instance Logs
Retrieve instance logs.
```http
GET /api/instances/{name}/logs
```
**Query Parameters:**
- `lines`: Number of lines to return (default: 100)
- `follow`: Stream logs (boolean)
- `level`: Filter by log level (debug, info, warn, error)
**Response:**
```json
{
"logs": [
{
"timestamp": "2024-01-15T14:30:00Z",
"level": "info",
"message": "Model loaded successfully"
}
]
}
```
## Batch Operations
### Start All Instances
Start all stopped instances.
```http
POST /api/instances/start-all
```
### Stop All Instances
Stop all running instances.
```http
POST /api/instances/stop-all
```
## System Information
### Get System Status
Get overall system status and metrics.
```http
GET /api/system/status
```
**Response:**
```json
{
"version": "1.0.0",
"uptime": 86400,
"instances": {
"total": 5,
"running": 3,
"stopped": 2
},
"resources": {
"cpu_usage": 45.2,
"memory_usage": 8589934592,
"memory_total": 17179869184,
"disk_usage": 75.5
}
}
```
### Get System Information
Get detailed system information.
```http
GET /api/system/info
```
**Response:**
```json
{
"hostname": "server-01",
"os": "linux",
"arch": "amd64",
"cpu_count": 8,
"memory_total": 17179869184,
"version": "1.0.0",
"build_time": "2024-01-15T10:00:00Z"
}
```
## Configuration
### Get Configuration
Get current LlamaCtl configuration.
```http
GET /api/config
```
### Update Configuration
Update LlamaCtl configuration (requires restart).
```http
PUT /api/config
```
## Authentication
### Login
Authenticate and receive a JWT token.
```http
POST /api/auth/login
```
**Request Body:**
```json
{
"username": "admin",
"password": "password"
}
```
**Response:**
```json
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expires_at": "2024-01-16T14:30:00Z"
}
```
### Refresh Token
Refresh an existing JWT token.
```http
POST /api/auth/refresh
```
## Error Responses
All endpoints may return error responses in the following format:
```json
{
"error": "Error message",
"code": "ERROR_CODE",
"details": "Additional error details"
}
```
### Common HTTP Status Codes
- `200`: Success
- `201`: Created
- `400`: Bad Request
- `401`: Unauthorized
- `403`: Forbidden
- `404`: Not Found
- `409`: Conflict (e.g., instance already exists)
- `500`: Internal Server Error
## WebSocket API
### Real-time Updates
Connect to WebSocket for real-time updates:
```javascript
const ws = new WebSocket('ws://localhost:8080/api/ws');
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log('Update:', data);
};
```
**Message Types:**
- `instance_status_changed`: Instance status updates
- `instance_stats_updated`: Resource usage updates
- `system_alert`: System-level alerts
## Rate Limiting
API requests are rate limited to:
- **100 requests per minute** for regular endpoints
- **10 requests per minute** for resource-intensive operations
Rate limit headers are included in responses:
- `X-RateLimit-Limit`: Request limit
- `X-RateLimit-Remaining`: Remaining requests
- `X-RateLimit-Reset`: Reset time (Unix timestamp)
## SDKs and Libraries
### Go Client
```go
import "github.com/lordmathis/llamactl-go-client"
client := llamactl.NewClient("http://localhost:8080")
instances, err := client.ListInstances()
```
### Python Client
```python
from llamactl import Client
client = Client("http://localhost:8080")
instances = client.list_instances()
```
## Examples
### Complete Instance Lifecycle
```bash
# Create instance
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "example",
"model_path": "/models/example.gguf",
"port": 8081
}'
# Start instance
curl -X POST http://localhost:8080/api/instances/example/start
# Check status
curl http://localhost:8080/api/instances/example
# Stop instance
curl -X POST http://localhost:8080/api/instances/example/stop
# Delete instance
curl -X DELETE http://localhost:8080/api/instances/example
```
## Next Steps
- Learn about [Managing Instances](managing-instances.md) in detail
- Explore [Advanced Configuration](../advanced/backends.md)
- Set up [Monitoring](../advanced/monitoring.md) for production use

View File

@@ -0,0 +1,171 @@
# Managing Instances
Learn how to effectively manage your Llama.cpp instances with LlamaCtl.
## Instance Lifecycle
### Creating Instances
Instances can be created through the Web UI or API:
#### Via Web UI
1. Click "Add Instance" button
2. Fill in the configuration form
3. Click "Create"
#### Via API
```bash
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "my-instance",
"model_path": "/path/to/model.gguf",
"port": 8081
}'
```
### Starting and Stopping
#### Start an Instance
```bash
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/start
# The instance will begin loading the model
```
#### Stop an Instance
```bash
# Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop
# Graceful shutdown with configurable timeout
```
### Monitoring Status
Check instance status in real-time:
```bash
# Get instance details
curl http://localhost:8080/api/instances/{name}
# Get health status
curl http://localhost:8080/api/instances/{name}/health
```
## Instance States
Instances can be in one of several states:
- **Stopped**: Instance is not running
- **Starting**: Instance is initializing and loading the model
- **Running**: Instance is active and ready to serve requests
- **Stopping**: Instance is shutting down gracefully
- **Error**: Instance encountered an error
## Configuration Management
### Updating Instance Configuration
Modify instance settings:
```bash
curl -X PUT http://localhost:8080/api/instances/{name} \
-H "Content-Type: application/json" \
-d '{
"options": {
"threads": 8,
"context_size": 4096
}
}'
```
!!! note
Configuration changes require restarting the instance to take effect.
### Viewing Configuration
```bash
# Get current configuration
curl http://localhost:8080/api/instances/{name}/config
```
## Resource Management
### Memory Usage
Monitor memory consumption:
```bash
# Get resource usage
curl http://localhost:8080/api/instances/{name}/stats
```
### CPU and GPU Usage
Track performance metrics:
- CPU thread utilization
- GPU memory usage (if applicable)
- Request processing times
## Troubleshooting Common Issues
### Instance Won't Start
1. **Check model path**: Ensure the model file exists and is readable
2. **Port conflicts**: Verify the port isn't already in use
3. **Resource limits**: Check available memory and CPU
4. **Permissions**: Ensure proper file system permissions
### Performance Issues
1. **Adjust thread count**: Match to your CPU cores
2. **Optimize context size**: Balance memory usage and capability
3. **GPU offloading**: Use `gpu_layers` for GPU acceleration
4. **Batch size tuning**: Optimize for your workload
### Memory Problems
1. **Reduce context size**: Lower memory requirements
2. **Disable memory mapping**: Use `no_mmap` option
3. **Enable memory locking**: Use `memory_lock` for performance
4. **Monitor system resources**: Check available RAM
## Best Practices
### Production Deployments
1. **Resource allocation**: Plan memory and CPU requirements
2. **Health monitoring**: Set up regular health checks
3. **Graceful shutdowns**: Use proper stop procedures
4. **Backup configurations**: Save instance configurations
5. **Log management**: Configure appropriate logging levels
### Development Environments
1. **Resource sharing**: Use smaller models for development
2. **Quick iterations**: Optimize for fast startup times
3. **Debug logging**: Enable detailed logging for troubleshooting
## Batch Operations
### Managing Multiple Instances
```bash
# Start all instances
curl -X POST http://localhost:8080/api/instances/start-all
# Stop all instances
curl -X POST http://localhost:8080/api/instances/stop-all
# Get status of all instances
curl http://localhost:8080/api/instances
```
## Next Steps
- Learn about the [Web UI](web-ui.md) interface
- Explore the complete [API Reference](api-reference.md)
- Set up [Monitoring](../advanced/monitoring.md) for production use

216
docs/user-guide/web-ui.md Normal file
View File

@@ -0,0 +1,216 @@
# Web UI Guide
The LlamaCtl Web UI provides an intuitive interface for managing your Llama.cpp instances.
## Overview
The web interface is accessible at `http://localhost:8080` (or your configured host/port) and provides:
- Instance management dashboard
- Real-time status monitoring
- Configuration management
- Log viewing
- System information
## Dashboard
### Instance Cards
Each instance is displayed as a card showing:
- **Instance name** and status indicator
- **Model information** (name, size)
- **Current state** (stopped, starting, running, error)
- **Resource usage** (memory, CPU)
- **Action buttons** (start, stop, configure, logs)
### Status Indicators
- 🟢 **Green**: Instance is running and healthy
- 🟡 **Yellow**: Instance is starting or stopping
- 🔴 **Red**: Instance has encountered an error
-**Gray**: Instance is stopped
## Creating Instances
### Add Instance Dialog
1. Click the **"Add Instance"** button
2. Fill in the required fields:
- **Name**: Unique identifier for your instance
- **Model Path**: Full path to your GGUF model file
- **Port**: Port number for the instance
3. Configure optional settings:
- **Threads**: Number of CPU threads
- **Context Size**: Context window size
- **GPU Layers**: Layers to offload to GPU
- **Additional Options**: Advanced Llama.cpp parameters
4. Click **"Create"** to save the instance
### Model Path Helper
Use the file browser to select model files:
- Navigate to your models directory
- Select the `.gguf` file
- Path is automatically filled in the form
## Managing Instances
### Starting Instances
1. Click the **"Start"** button on an instance card
2. Watch the status change to "Starting"
3. Monitor progress in the logs
4. Instance becomes "Running" when ready
### Stopping Instances
1. Click the **"Stop"** button
2. Instance gracefully shuts down
3. Status changes to "Stopped"
### Viewing Logs
1. Click the **"Logs"** button on any instance
2. Real-time log viewer opens
3. Filter by log level (Debug, Info, Warning, Error)
4. Search through log entries
5. Download logs for offline analysis
## Configuration Management
### Editing Instance Settings
1. Click the **"Configure"** button
2. Modify settings in the configuration dialog
3. Changes require instance restart to take effect
4. Click **"Save"** to apply changes
### Advanced Options
Access advanced Llama.cpp options:
```yaml
# Example advanced configuration
options:
rope_freq_base: 10000
rope_freq_scale: 1.0
yarn_ext_factor: -1.0
yarn_attn_factor: 1.0
yarn_beta_fast: 32.0
yarn_beta_slow: 1.0
```
## System Information
### Health Dashboard
Monitor overall system health:
- **System Resources**: CPU, memory, disk usage
- **Instance Summary**: Running/stopped instance counts
- **Performance Metrics**: Request rates, response times
### Resource Usage
Track resource consumption:
- Per-instance memory usage
- CPU utilization
- GPU memory (if applicable)
- Network I/O
## User Interface Features
### Theme Support
Switch between light and dark themes:
1. Click the theme toggle button
2. Setting is remembered across sessions
### Responsive Design
The UI adapts to different screen sizes:
- **Desktop**: Full-featured dashboard
- **Tablet**: Condensed layout
- **Mobile**: Stack-based navigation
### Keyboard Shortcuts
- `Ctrl+N`: Create new instance
- `Ctrl+R`: Refresh dashboard
- `Ctrl+L`: Open logs for selected instance
- `Esc`: Close dialogs
## Authentication
### Login
If authentication is enabled:
1. Navigate to the web UI
2. Enter your credentials
3. JWT token is stored for the session
4. Automatic logout on token expiry
### Session Management
- Sessions persist across browser restarts
- Logout clears authentication tokens
- Configurable session timeout
## Troubleshooting
### Common UI Issues
**Page won't load:**
- Check if LlamaCtl server is running
- Verify the correct URL and port
- Check browser console for errors
**Instance won't start from UI:**
- Verify model path is correct
- Check for port conflicts
- Review instance logs for errors
**Real-time updates not working:**
- Check WebSocket connection
- Verify firewall settings
- Try refreshing the page
### Browser Compatibility
Supported browsers:
- Chrome/Chromium 90+
- Firefox 88+
- Safari 14+
- Edge 90+
## Mobile Access
### Responsive Features
On mobile devices:
- Touch-friendly interface
- Swipe gestures for navigation
- Optimized button sizes
- Condensed information display
### Limitations
Some features may be limited on mobile:
- Log viewing (use horizontal scrolling)
- Complex configuration forms
- File browser functionality
## Next Steps
- Learn about [API Reference](api-reference.md) for programmatic access
- Set up [Monitoring](../advanced/monitoring.md) for production use
- Explore [Advanced Configuration](../advanced/backends.md) options