mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Add documentation for remote node deployment and configuration
This commit is contained in:
@@ -25,6 +25,11 @@
|
|||||||
- **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
|
- **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
|
||||||
- **Environment Variables**: Set custom environment variables per instance for advanced configuration
|
- **Environment Variables**: Set custom environment variables per instance for advanced configuration
|
||||||
|
|
||||||
|
### 🔗 Remote Instance Deployment
|
||||||
|
- **Remote Node Support**: Deploy instances on remote hosts
|
||||||
|
- **Central Management**: Manage remote instances from a single dashboard
|
||||||
|
- **Seamless Routing**: Automatic request routing to remote instances
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|||||||
@@ -70,6 +70,10 @@ auth:
|
|||||||
inference_keys: [] # Keys for inference endpoints
|
inference_keys: [] # Keys for inference endpoints
|
||||||
require_management_auth: true # Require auth for management endpoints
|
require_management_auth: true # Require auth for management endpoints
|
||||||
management_keys: [] # Keys for management endpoints
|
management_keys: [] # Keys for management endpoints
|
||||||
|
|
||||||
|
local_node: "main" # Name of the local node (default: "main")
|
||||||
|
nodes: # Node configuration for multi-node deployment
|
||||||
|
main: # Default local node (empty config)
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration Files
|
## Configuration Files
|
||||||
@@ -241,12 +245,26 @@ auth:
|
|||||||
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
|
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
|
||||||
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
|
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
|
||||||
|
|
||||||
## Command Line Options
|
### Remote Node Configuration
|
||||||
|
|
||||||
View all available command line options:
|
llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally.
|
||||||
|
|
||||||
```bash
|
```yaml
|
||||||
llamactl --help
|
local_node: "main" # Name of the local node (default: "main")
|
||||||
|
nodes: # Node configuration map
|
||||||
|
main: # Local node (empty address means local)
|
||||||
|
address: "" # Not used for local node
|
||||||
|
api_key: "" # Not used for local node
|
||||||
|
worker1: # Remote worker node
|
||||||
|
address: "http://192.168.1.10:8080"
|
||||||
|
api_key: "worker1-api-key" # Management API key for authentication
|
||||||
```
|
```
|
||||||
|
|
||||||
You can also override configuration using command line flags when starting llamactl.
|
**Node Configuration Fields:**
|
||||||
|
- `local_node`: Specifies which node in the `nodes` map represents the local node
|
||||||
|
- `nodes`: Map of node configurations
|
||||||
|
- `address`: HTTP/HTTPS URL of the remote node (empty for local node)
|
||||||
|
- `api_key`: Management API key for authenticating with the remote node
|
||||||
|
|
||||||
|
**Environment Variables:**
|
||||||
|
- `LLAMACTL_LOCAL_NODE` - Name of the local node
|
||||||
|
|||||||
@@ -157,6 +157,12 @@ cd webui && npm ci && npm run build && cd ..
|
|||||||
go build -o llamactl ./cmd/server
|
go build -o llamactl ./cmd/server
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Remote Node Installation
|
||||||
|
|
||||||
|
For deployments with remote nodes:
|
||||||
|
- Install llamactl on each node using any of the methods above
|
||||||
|
- Configure API keys for authentication between nodes
|
||||||
|
|
||||||
## Verification
|
## Verification
|
||||||
|
|
||||||
Verify your installation by checking the version:
|
Verify your installation by checking the version:
|
||||||
@@ -168,3 +174,5 @@ llamactl --version
|
|||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
Now that Llamactl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!
|
Now that Llamactl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!
|
||||||
|
|
||||||
|
For remote node deployments, see the [Configuration Guide](configuration.md) for node setup instructions.
|
||||||
|
|||||||
@@ -126,6 +126,7 @@ POST /api/v1/instances/{name}
|
|||||||
- `on_demand_start`: Start instance when receiving requests
|
- `on_demand_start`: Start instance when receiving requests
|
||||||
- `idle_timeout`: Idle timeout in minutes
|
- `idle_timeout`: Idle timeout in minutes
|
||||||
- `environment`: Environment variables as key-value pairs
|
- `environment`: Environment variables as key-value pairs
|
||||||
|
- `nodes`: Array with single node name to deploy the instance to (for remote deployments)
|
||||||
|
|
||||||
See [Managing Instances](managing-instances.md) for complete configuration options.
|
See [Managing Instances](managing-instances.md) for complete configuration options.
|
||||||
|
|
||||||
@@ -405,6 +406,38 @@ curl -X DELETE -H "Authorization: Bearer your-api-key" \
|
|||||||
http://localhost:8080/api/v1/instances/my-model
|
http://localhost:8080/api/v1/instances/my-model
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Remote Node Instance Example
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Create instance on specific remote node
|
||||||
|
curl -X POST http://localhost:8080/api/v1/instances/remote-model \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"backend_type": "llama_cpp",
|
||||||
|
"backend_options": {
|
||||||
|
"model": "/models/llama-2-7b.gguf",
|
||||||
|
"gpu_layers": 32
|
||||||
|
},
|
||||||
|
"nodes": ["worker1"]
|
||||||
|
}'
|
||||||
|
|
||||||
|
# Check status of remote instance
|
||||||
|
curl -H "Authorization: Bearer your-api-key" \
|
||||||
|
http://localhost:8080/api/v1/instances/remote-model
|
||||||
|
|
||||||
|
# Use remote instance with OpenAI-compatible API
|
||||||
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer your-inference-api-key" \
|
||||||
|
-d '{
|
||||||
|
"model": "remote-model",
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello from remote node!"}
|
||||||
|
]
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
### Using the Proxy Endpoint
|
### Using the Proxy Endpoint
|
||||||
|
|
||||||
You can also directly proxy requests to the llama-server instance:
|
You can also directly proxy requests to the llama-server instance:
|
||||||
|
|||||||
@@ -39,26 +39,27 @@ Each instance is displayed as a card showing:
|
|||||||
|
|
||||||
1. Click the **"Create Instance"** button on the dashboard
|
1. Click the **"Create Instance"** button on the dashboard
|
||||||
2. Enter a unique **Name** for your instance (only required field)
|
2. Enter a unique **Name** for your instance (only required field)
|
||||||
3. **Choose Backend Type**:
|
3. **Select Target Node**: Choose which node to deploy the instance to from the dropdown
|
||||||
|
4. **Choose Backend Type**:
|
||||||
- **llama.cpp**: For GGUF models using llama-server
|
- **llama.cpp**: For GGUF models using llama-server
|
||||||
- **MLX**: For MLX-optimized models (macOS only)
|
- **MLX**: For MLX-optimized models (macOS only)
|
||||||
- **vLLM**: For distributed serving and high-throughput inference
|
- **vLLM**: For distributed serving and high-throughput inference
|
||||||
4. Configure model source:
|
5. Configure model source:
|
||||||
- **For llama.cpp**: GGUF model path or HuggingFace repo
|
- **For llama.cpp**: GGUF model path or HuggingFace repo
|
||||||
- **For MLX**: MLX model path or identifier (e.g., `mlx-community/Mistral-7B-Instruct-v0.3-4bit`)
|
- **For MLX**: MLX model path or identifier (e.g., `mlx-community/Mistral-7B-Instruct-v0.3-4bit`)
|
||||||
- **For vLLM**: HuggingFace model identifier (e.g., `microsoft/DialoGPT-medium`)
|
- **For vLLM**: HuggingFace model identifier (e.g., `microsoft/DialoGPT-medium`)
|
||||||
5. Configure optional instance management settings:
|
6. Configure optional instance management settings:
|
||||||
- **Auto Restart**: Automatically restart instance on failure
|
- **Auto Restart**: Automatically restart instance on failure
|
||||||
- **Max Restarts**: Maximum number of restart attempts
|
- **Max Restarts**: Maximum number of restart attempts
|
||||||
- **Restart Delay**: Delay in seconds between restart attempts
|
- **Restart Delay**: Delay in seconds between restart attempts
|
||||||
- **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
|
- **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
|
||||||
- **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
|
- **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
|
||||||
- **Environment Variables**: Set custom environment variables for the instance process
|
- **Environment Variables**: Set custom environment variables for the instance process
|
||||||
6. Configure backend-specific options:
|
7. Configure backend-specific options:
|
||||||
- **llama.cpp**: Threads, context size, GPU layers, port, etc.
|
- **llama.cpp**: Threads, context size, GPU layers, port, etc.
|
||||||
- **MLX**: Temperature, top-p, adapter path, Python environment, etc.
|
- **MLX**: Temperature, top-p, adapter path, Python environment, etc.
|
||||||
- **vLLM**: Tensor parallel size, GPU memory utilization, quantization, etc.
|
- **vLLM**: Tensor parallel size, GPU memory utilization, quantization, etc.
|
||||||
7. Click **"Create"** to save the instance
|
8. Click **"Create"** to save the instance
|
||||||
|
|
||||||
### Via API
|
### Via API
|
||||||
|
|
||||||
@@ -121,6 +122,18 @@ curl -X POST http://localhost:8080/api/instances/gemma-3-27b \
|
|||||||
"gpu_layers": 32
|
"gpu_layers": 32
|
||||||
}
|
}
|
||||||
}'
|
}'
|
||||||
|
|
||||||
|
# Create instance on specific remote node
|
||||||
|
curl -X POST http://localhost:8080/api/instances/remote-llama \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-d '{
|
||||||
|
"backend_type": "llama_cpp",
|
||||||
|
"backend_options": {
|
||||||
|
"model": "/models/llama-7b.gguf",
|
||||||
|
"gpu_layers": 32
|
||||||
|
},
|
||||||
|
"nodes": ["worker1"]
|
||||||
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
## Start Instance
|
## Start Instance
|
||||||
@@ -227,3 +240,4 @@ Check the health status of your instances:
|
|||||||
```bash
|
```bash
|
||||||
curl http://localhost:8080/api/instances/{name}/proxy/health
|
curl http://localhost:8080/api/instances/{name}/proxy/health
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|||||||
@@ -125,6 +125,30 @@ This helps determine if the issue is with llamactl or with the underlying llama.
|
|||||||
http://localhost:8080/api/v1/instances
|
http://localhost:8080/api/v1/instances
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Remote Node Issues
|
||||||
|
|
||||||
|
### Node Configuration
|
||||||
|
|
||||||
|
**Problem:** Remote instances not appearing or cannot be managed
|
||||||
|
|
||||||
|
**Solutions:**
|
||||||
|
1. **Verify node configuration:**
|
||||||
|
```yaml
|
||||||
|
local_node: "main" # Must match a key in nodes map
|
||||||
|
nodes:
|
||||||
|
main:
|
||||||
|
address: "" # Empty for local node
|
||||||
|
worker1:
|
||||||
|
address: "http://worker1.internal:8080"
|
||||||
|
api_key: "secure-key" # Must match worker1's management key
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Test remote node connectivity:**
|
||||||
|
```bash
|
||||||
|
curl -H "Authorization: Bearer remote-node-key" \
|
||||||
|
http://remote-node:8080/api/v1/instances
|
||||||
|
```
|
||||||
|
|
||||||
## Debugging and Logs
|
## Debugging and Logs
|
||||||
|
|
||||||
### Viewing Instance Logs
|
### Viewing Instance Logs
|
||||||
|
|||||||
Reference in New Issue
Block a user