mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Update README to include MLX backend support and enhance usage instructions
This commit is contained in:
62
README.md
62
README.md
@@ -2,30 +2,28 @@
|
|||||||
|
|
||||||
  
|
  
|
||||||
|
|
||||||
**Management server and proxy for multiple llama.cpp instances with OpenAI-compatible API routing.**
|
**Management server and proxy for multiple llama.cpp and MLX instances with OpenAI-compatible API routing.**
|
||||||
|
|
||||||
## Why llamactl?
|
## Why llamactl?
|
||||||
|
|
||||||
🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
|
🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
|
||||||
🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
|
🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
|
||||||
🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
|
🍎 **Multi-Backend Support**: Native support for both llama.cpp and MLX (Apple Silicon optimized)
|
||||||
🔐 **API Key Authentication**: Separate keys for management vs inference access
|
🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
|
||||||
📊 **Instance Monitoring**: Health checks, auto-restart, log management
|
🔐 **API Key Authentication**: Separate keys for management vs inference access
|
||||||
⚡ **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
|
📊 **Instance Monitoring**: Health checks, auto-restart, log management
|
||||||
💡 **On-Demand Instance Start**: Automatically launch instances upon receiving OpenAI-compatible API requests
|
⚡ **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
|
||||||
|
💡 **On-Demand Instance Start**: Automatically launch instances upon receiving OpenAI-compatible API requests
|
||||||
💾 **State Persistence**: Ensure instances remain intact across server restarts
|
💾 **State Persistence**: Ensure instances remain intact across server restarts
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
**Choose llamactl if**: You need authentication, health monitoring, auto-restart, and centralized management of multiple llama-server instances
|
|
||||||
**Choose Ollama if**: You want the simplest setup with strong community ecosystem and third-party integrations
|
|
||||||
**Choose LM Studio if**: You prefer a polished desktop GUI experience with easy model management
|
|
||||||
|
|
||||||
## Quick Start
|
## Quick Start
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# 1. Install llama-server (one-time setup)
|
# 1. Install backend (one-time setup)
|
||||||
# See: https://github.com/ggml-org/llama.cpp#quick-start
|
# For llama.cpp: https://github.com/ggml-org/llama.cpp#quick-start
|
||||||
|
# For MLX on macOS: pip install mlx-lm
|
||||||
|
|
||||||
# 2. Download and run llamactl
|
# 2. Download and run llamactl
|
||||||
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
||||||
@@ -42,15 +40,21 @@ llamactl
|
|||||||
### Create and manage instances via web dashboard:
|
### Create and manage instances via web dashboard:
|
||||||
1. Open http://localhost:8080
|
1. Open http://localhost:8080
|
||||||
2. Click "Create Instance"
|
2. Click "Create Instance"
|
||||||
3. Set model path and GPU layers
|
3. Choose backend type (llama.cpp or MLX)
|
||||||
4. Start or stop the instance
|
4. Set model path and backend-specific options
|
||||||
|
5. Start or stop the instance
|
||||||
|
|
||||||
### Or use the REST API:
|
### Or use the REST API:
|
||||||
```bash
|
```bash
|
||||||
# Create instance
|
# Create llama.cpp instance
|
||||||
curl -X POST localhost:8080/api/v1/instances/my-7b-model \
|
curl -X POST localhost:8080/api/v1/instances/my-7b-model \
|
||||||
-H "Authorization: Bearer your-key" \
|
-H "Authorization: Bearer your-key" \
|
||||||
-d '{"model": "/path/to/model.gguf", "gpu_layers": 32}'
|
-d '{"backend_type": "llama_cpp", "backend_options": {"model": "/path/to/model.gguf", "gpu_layers": 32}}'
|
||||||
|
|
||||||
|
# Create MLX instance (macOS)
|
||||||
|
curl -X POST localhost:8080/api/v1/instances/my-mlx-model \
|
||||||
|
-H "Authorization: Bearer your-key" \
|
||||||
|
-d '{"backend_type": "mlx_lm", "backend_options": {"model": "mlx-community/Mistral-7B-Instruct-v0.3-4bit"}}'
|
||||||
|
|
||||||
# Use with OpenAI SDK
|
# Use with OpenAI SDK
|
||||||
curl -X POST localhost:8080/v1/chat/completions \
|
curl -X POST localhost:8080/v1/chat/completions \
|
||||||
@@ -85,16 +89,31 @@ go build -o llamactl ./cmd/server
|
|||||||
|
|
||||||
## Prerequisites
|
## Prerequisites
|
||||||
|
|
||||||
|
### Backend Dependencies
|
||||||
|
|
||||||
|
**For llama.cpp backend:**
|
||||||
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
|
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Quick install methods:
|
|
||||||
# Homebrew (macOS)
|
# Homebrew (macOS)
|
||||||
brew install llama.cpp
|
brew install llama.cpp
|
||||||
|
|
||||||
# Or build from source - see llama.cpp docs
|
# Or build from source - see llama.cpp docs
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**For MLX backend (macOS only):**
|
||||||
|
You need MLX-LM installed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Install via pip (requires Python 3.8+)
|
||||||
|
pip install mlx-lm
|
||||||
|
|
||||||
|
# Or in a virtual environment (recommended)
|
||||||
|
python -m venv mlx-env
|
||||||
|
source mlx-env/bin/activate
|
||||||
|
pip install mlx-lm
|
||||||
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
llamactl works out of the box with sensible defaults.
|
llamactl works out of the box with sensible defaults.
|
||||||
@@ -106,6 +125,10 @@ server:
|
|||||||
allowed_origins: ["*"] # Allowed CORS origins (default: all)
|
allowed_origins: ["*"] # Allowed CORS origins (default: all)
|
||||||
enable_swagger: false # Enable Swagger UI for API docs
|
enable_swagger: false # Enable Swagger UI for API docs
|
||||||
|
|
||||||
|
backends:
|
||||||
|
llama_executable: llama-server # Path to llama-server executable
|
||||||
|
mlx_lm_executable: mlx_lm.server # Path to mlx_lm.server executable
|
||||||
|
|
||||||
instances:
|
instances:
|
||||||
port_range: [8000, 9000] # Port range for instances
|
port_range: [8000, 9000] # Port range for instances
|
||||||
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
|
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
|
||||||
@@ -115,7 +138,6 @@ instances:
|
|||||||
max_instances: -1 # Max instances (-1 = unlimited)
|
max_instances: -1 # Max instances (-1 = unlimited)
|
||||||
max_running_instances: -1 # Max running instances (-1 = unlimited)
|
max_running_instances: -1 # Max running instances (-1 = unlimited)
|
||||||
enable_lru_eviction: true # Enable LRU eviction for idle instances
|
enable_lru_eviction: true # Enable LRU eviction for idle instances
|
||||||
llama_executable: llama-server # Path to llama-server executable
|
|
||||||
default_auto_restart: true # Auto-restart new instances by default
|
default_auto_restart: true # Auto-restart new instances by default
|
||||||
default_max_restarts: 3 # Max restarts for new instances
|
default_max_restarts: 3 # Max restarts for new instances
|
||||||
default_restart_delay: 5 # Restart delay (seconds) for new instances
|
default_restart_delay: 5 # Restart delay (seconds) for new instances
|
||||||
|
|||||||
Reference in New Issue
Block a user