mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
Add Docker support documentation and configuration for backends
This commit is contained in:
71
README.md
71
README.md
@@ -14,6 +14,7 @@
|
|||||||
### 🔗 Universal Compatibility
|
### 🔗 Universal Compatibility
|
||||||
- **OpenAI API Compatible**: Drop-in replacement - route requests by instance name
|
- **OpenAI API Compatible**: Drop-in replacement - route requests by instance name
|
||||||
- **Multi-Backend Support**: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
|
- **Multi-Backend Support**: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
|
||||||
|
- **Docker Support**: Run backends in containers
|
||||||
|
|
||||||
### 🌐 User-Friendly Interface
|
### 🌐 User-Friendly Interface
|
||||||
- **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
|
- **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
|
||||||
@@ -32,6 +33,7 @@
|
|||||||
# For llama.cpp: https://github.com/ggml-org/llama.cpp#quick-start
|
# For llama.cpp: https://github.com/ggml-org/llama.cpp#quick-start
|
||||||
# For MLX on macOS: pip install mlx-lm
|
# For MLX on macOS: pip install mlx-lm
|
||||||
# For vLLM: pip install vllm
|
# For vLLM: pip install vllm
|
||||||
|
# Or use Docker - no local installation required
|
||||||
|
|
||||||
# 2. Download and run llamactl
|
# 2. Download and run llamactl
|
||||||
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
||||||
@@ -112,6 +114,7 @@ You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp)
|
|||||||
brew install llama.cpp
|
brew install llama.cpp
|
||||||
|
|
||||||
# Or build from source - see llama.cpp docs
|
# Or build from source - see llama.cpp docs
|
||||||
|
# Or use Docker - no local installation required
|
||||||
```
|
```
|
||||||
|
|
||||||
**For MLX backend (macOS only):**
|
**For MLX backend (macOS only):**
|
||||||
@@ -139,9 +142,51 @@ python -m venv vllm-env
|
|||||||
source vllm-env/bin/activate
|
source vllm-env/bin/activate
|
||||||
pip install vllm
|
pip install vllm
|
||||||
|
|
||||||
# For production deployments, consider container-based installation
|
# Or use Docker - no local installation required
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Docker Support
|
||||||
|
|
||||||
|
llamactl supports running backends in Docker containers with identical behavior to native execution. This is particularly useful for:
|
||||||
|
- Production deployments without local backend installation
|
||||||
|
- Isolating backend dependencies
|
||||||
|
- GPU-accelerated inference using official Docker images
|
||||||
|
|
||||||
|
### Docker Configuration
|
||||||
|
|
||||||
|
Enable Docker support using the new structured backend configuration:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backends:
|
||||||
|
llama-cpp:
|
||||||
|
command: "llama-server"
|
||||||
|
docker:
|
||||||
|
enabled: true
|
||||||
|
image: "ghcr.io/ggml-org/llama.cpp:server"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
command: "vllm"
|
||||||
|
args: ["serve"]
|
||||||
|
docker:
|
||||||
|
enabled: true
|
||||||
|
image: "vllm/vllm-openai:latest"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
|
||||||
|
```
|
||||||
|
|
||||||
|
### Key Features
|
||||||
|
|
||||||
|
- **Host Networking**: Uses `--network host` for seamless port management
|
||||||
|
- **GPU Support**: Includes `--gpus all` for GPU acceleration
|
||||||
|
- **Environment Variables**: Configure container environment as needed
|
||||||
|
- **Flexible Configuration**: Per-backend Docker settings with sensible defaults
|
||||||
|
|
||||||
|
### Requirements
|
||||||
|
|
||||||
|
- Docker installed and running
|
||||||
|
- For GPU support: nvidia-docker2 (Linux) or Docker Desktop with GPU support
|
||||||
|
- No local backend installation required when using Docker
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
llamactl works out of the box with sensible defaults.
|
llamactl works out of the box with sensible defaults.
|
||||||
@@ -154,9 +199,27 @@ server:
|
|||||||
enable_swagger: false # Enable Swagger UI for API docs
|
enable_swagger: false # Enable Swagger UI for API docs
|
||||||
|
|
||||||
backends:
|
backends:
|
||||||
llama_executable: llama-server # Path to llama-server executable
|
llama-cpp:
|
||||||
mlx_lm_executable: mlx_lm.server # Path to mlx_lm.server executable
|
command: "llama-server"
|
||||||
vllm_executable: vllm # Path to vllm executable
|
args: []
|
||||||
|
docker:
|
||||||
|
enabled: false
|
||||||
|
image: "ghcr.io/ggml-org/llama.cpp:server"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
command: "vllm"
|
||||||
|
args: ["serve"]
|
||||||
|
docker:
|
||||||
|
enabled: false
|
||||||
|
image: "vllm/vllm-openai:latest"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
mlx:
|
||||||
|
command: "mlx_lm.server"
|
||||||
|
args: []
|
||||||
|
|
||||||
instances:
|
instances:
|
||||||
port_range: [8000, 9000] # Port range for instances
|
port_range: [8000, 9000] # Port range for instances
|
||||||
|
|||||||
@@ -20,9 +20,27 @@ server:
|
|||||||
enable_swagger: false # Enable Swagger UI for API docs
|
enable_swagger: false # Enable Swagger UI for API docs
|
||||||
|
|
||||||
backends:
|
backends:
|
||||||
llama_executable: llama-server # Path to llama-server executable
|
llama-cpp:
|
||||||
mlx_lm_executable: mlx_lm.server # Path to mlx_lm.server executable
|
command: "llama-server"
|
||||||
vllm_executable: vllm # Path to vllm executable
|
args: []
|
||||||
|
docker:
|
||||||
|
enabled: false
|
||||||
|
image: "ghcr.io/ggml-org/llama.cpp:server"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
command: "vllm"
|
||||||
|
args: ["serve"]
|
||||||
|
docker:
|
||||||
|
enabled: false
|
||||||
|
image: "vllm/vllm-openai:latest"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
mlx:
|
||||||
|
command: "mlx_lm.server"
|
||||||
|
args: []
|
||||||
|
|
||||||
instances:
|
instances:
|
||||||
port_range: [8000, 9000] # Port range for instances
|
port_range: [8000, 9000] # Port range for instances
|
||||||
@@ -90,18 +108,40 @@ server:
|
|||||||
- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
|
- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
|
||||||
|
|
||||||
### Backend Configuration
|
### Backend Configuration
|
||||||
|
|
||||||
```yaml
|
```yaml
|
||||||
backends:
|
backends:
|
||||||
llama_executable: "llama-server" # Path to llama-server executable (default: "llama-server")
|
llama-cpp:
|
||||||
mlx_lm_executable: "mlx_lm.server" # Path to mlx_lm.server executable (default: "mlx_lm.server")
|
command: "llama-server"
|
||||||
vllm_executable: "vllm" # Path to vllm executable (default: "vllm")
|
args: []
|
||||||
|
docker:
|
||||||
|
enabled: false # Enable Docker runtime (default: false)
|
||||||
|
image: "ghcr.io/ggml-org/llama.cpp:server"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
vllm:
|
||||||
|
command: "vllm"
|
||||||
|
args: ["serve"]
|
||||||
|
docker:
|
||||||
|
enabled: false
|
||||||
|
image: "vllm/vllm-openai:latest"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
|
||||||
|
environment: {}
|
||||||
|
|
||||||
|
mlx:
|
||||||
|
command: "mlx_lm.server"
|
||||||
|
args: []
|
||||||
|
# MLX does not support Docker
|
||||||
```
|
```
|
||||||
|
|
||||||
**Environment Variables:**
|
**Backend Configuration Fields:**
|
||||||
- `LLAMACTL_LLAMA_EXECUTABLE` - Path to llama-server executable
|
- `command`: Executable name/path for the backend
|
||||||
- `LLAMACTL_MLX_LM_EXECUTABLE` - Path to mlx_lm.server executable
|
- `args`: Default arguments prepended to all instances
|
||||||
- `LLAMACTL_VLLM_EXECUTABLE` - Path to vllm executable
|
- `docker`: Docker-specific configuration (optional)
|
||||||
|
- `enabled`: Boolean flag to enable Docker runtime
|
||||||
|
- `image`: Docker image to use
|
||||||
|
- `args`: Additional arguments passed to `docker run`
|
||||||
|
- `environment`: Environment variables for the container (optional)
|
||||||
|
|
||||||
### Instance Configuration
|
### Instance Configuration
|
||||||
|
|
||||||
|
|||||||
@@ -88,6 +88,21 @@ Here are basic example configurations for each backend:
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Docker Support
|
||||||
|
|
||||||
|
Llamactl can run backends in Docker containers. To enable Docker for a backend, add a `docker` section to that backend in your YAML configuration file (e.g. `config.yaml`) as shown below:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
backends:
|
||||||
|
vllm:
|
||||||
|
command: "vllm"
|
||||||
|
args: ["serve"]
|
||||||
|
docker:
|
||||||
|
enabled: true
|
||||||
|
image: "vllm/vllm-openai:latest"
|
||||||
|
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
|
||||||
|
```
|
||||||
|
|
||||||
## Using the API
|
## Using the API
|
||||||
|
|
||||||
You can also manage instances via the REST API:
|
You can also manage instances via the REST API:
|
||||||
|
|||||||
Reference in New Issue
Block a user