mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 17:14:28 +00:00
Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 5aed01b68f | |||
| 3f9caff33b | |||
| 169254c61a | |||
| 8154b8d0ab | |||
| a26d853ad5 | |||
| 6203b64045 | |||
| 8d9c808be1 | |||
| 161cd213c5 | |||
| d6e84f0527 | |||
| 0846350d41 | |||
| dacaca8594 |
303
README.md
303
README.md
@@ -2,90 +2,132 @@
|
|||||||
|
|
||||||
  
|
  
|
||||||
|
|
||||||
A control server for managing multiple Llama Server instances with a web-based dashboard.
|
**Management server for multiple llama.cpp instances with OpenAI-compatible API routing.**
|
||||||
|
|
||||||
## Features
|
## Why llamactl?
|
||||||
|
|
||||||
- **Multi-instance Management**: Create, start, stop, restart, and delete multiple llama-server instances
|
🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
|
||||||
- **Web Dashboard**: Modern React-based UI for managing instances
|
🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
|
||||||
- **Auto-restart**: Configurable automatic restart on instance failure
|
🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
|
||||||
- **Instance Monitoring**: Real-time health checks and status monitoring
|
🔐 **API Key Authentication**: Separate keys for management vs inference access
|
||||||
- **Log Management**: View, search, and download instance logs
|
📊 **Instance Monitoring**: Health checks, auto-restart, log management
|
||||||
- **Data Persistence**: Persistent storage of instance state.
|
⚡ **Persistent State**: Instances survive server restarts
|
||||||
- **REST API**: Full API for programmatic control
|
|
||||||
- **OpenAI Compatible**: Route requests to instances by instance name
|
|
||||||
- **Configuration Management**: Comprehensive llama-server parameter support
|
|
||||||
- **System Information**: View llama-server version, devices, and help
|
|
||||||
- **API Key Authentication**: Secure access with separate management and inference keys
|
|
||||||
|
|
||||||
## Prerequisites
|
**Choose llamactl if**: You need authentication, health monitoring, auto-restart, and centralized management of multiple llama-server instances
|
||||||
|
**Choose Ollama if**: You want the simplest setup with strong community ecosystem and third-party integrations
|
||||||
|
**Choose LM Studio if**: You prefer a polished desktop GUI experience with easy model management
|
||||||
|
|
||||||
This project requires `llama-server` from llama.cpp to be installed and available in your PATH.
|
## Quick Start
|
||||||
|
|
||||||
**Install llama.cpp:**
|
```bash
|
||||||
Follow the installation instructions at https://github.com/ggml-org/llama.cpp
|
# 1. Install llama-server (one-time setup)
|
||||||
|
# See: https://github.com/ggml-org/llama.cpp#quick-start
|
||||||
|
|
||||||
|
# 2. Download and run llamactl
|
||||||
|
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
||||||
|
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz | tar -xz
|
||||||
|
sudo mv llamactl /usr/local/bin/
|
||||||
|
|
||||||
|
# 3. Start the server
|
||||||
|
llamactl
|
||||||
|
# Access dashboard at http://localhost:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Create and manage instances via web dashboard:
|
||||||
|
1. Open http://localhost:8080
|
||||||
|
2. Click "Create Instance"
|
||||||
|
3. Set model path and GPU layers
|
||||||
|
4. Start or stop the instance
|
||||||
|
|
||||||
|
### Or use the REST API:
|
||||||
|
```bash
|
||||||
|
# Create instance
|
||||||
|
curl -X POST localhost:8080/api/v1/instances/my-7b-model \
|
||||||
|
-H "Authorization: Bearer your-key" \
|
||||||
|
-d '{"model": "/path/to/model.gguf", "gpu_layers": 32}'
|
||||||
|
|
||||||
|
# Use with OpenAI SDK
|
||||||
|
curl -X POST localhost:8080/v1/chat/completions \
|
||||||
|
-H "Authorization: Bearer your-key" \
|
||||||
|
-d '{"model": "my-7b-model", "messages": [{"role": "user", "content": "Hello!"}]}'
|
||||||
|
```
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
### Download Prebuilt Binaries
|
### Option 1: Download Binary (Recommended)
|
||||||
|
|
||||||
The easiest way to install llamactl is to download a prebuilt binary from the [releases page](https://github.com/lordmathis/llamactl/releases).
|
|
||||||
|
|
||||||
**Linux/macOS:**
|
|
||||||
```bash
|
```bash
|
||||||
# Download the latest release for your platform
|
# Linux/macOS - Get latest version and download
|
||||||
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep tag_name | cut -d '"' -f 4)-linux-amd64.tar.gz | tar -xz
|
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
|
||||||
|
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
|
||||||
# Move to PATH
|
|
||||||
sudo mv llamactl /usr/local/bin/
|
sudo mv llamactl /usr/local/bin/
|
||||||
|
|
||||||
# Run the server
|
# Or download manually from the releases page:
|
||||||
llamactl
|
# https://github.com/lordmathis/llamactl/releases/latest
|
||||||
|
|
||||||
|
# Windows - Download from releases page
|
||||||
```
|
```
|
||||||
|
|
||||||
**Manual Download:**
|
### Option 2: Build from Source
|
||||||
1. Go to the [releases page](https://github.com/lordmathis/llamactl/releases)
|
Requires Go 1.24+ and Node.js 22+
|
||||||
2. Download the appropriate archive for your platform
|
|
||||||
3. Extract the archive and move the binary to a directory in your PATH
|
|
||||||
|
|
||||||
### Build from Source
|
|
||||||
|
|
||||||
If you prefer to build from source or need the latest development version:
|
|
||||||
|
|
||||||
#### Build Requirements
|
|
||||||
|
|
||||||
- Go 1.24 or later
|
|
||||||
- Node.js 22 or later (for building the web UI)
|
|
||||||
|
|
||||||
#### Building with Web UI
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
|
||||||
git clone https://github.com/lordmathis/llamactl.git
|
git clone https://github.com/lordmathis/llamactl.git
|
||||||
cd llamactl
|
cd llamactl
|
||||||
|
cd webui && npm ci && npm run build && cd ..
|
||||||
# Install Node.js dependencies
|
|
||||||
cd webui
|
|
||||||
npm ci
|
|
||||||
|
|
||||||
# Build the web UI
|
|
||||||
npm run build
|
|
||||||
|
|
||||||
# Return to project root and build
|
|
||||||
cd ..
|
|
||||||
go build -o llamactl ./cmd/server
|
go build -o llamactl ./cmd/server
|
||||||
|
```
|
||||||
|
|
||||||
# Run the server
|
## Prerequisites
|
||||||
./llamactl
|
|
||||||
|
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Quick install methods:
|
||||||
|
# Homebrew (macOS)
|
||||||
|
brew install llama.cpp
|
||||||
|
|
||||||
|
# Or build from source - see llama.cpp docs
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
|
llamactl works out of the box with sensible defaults.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
server:
|
||||||
|
host: "0.0.0.0" # Server host to bind to
|
||||||
|
port: 8080 # Server port to bind to
|
||||||
|
allowed_origins: ["*"] # Allowed CORS origins (default: all)
|
||||||
|
enable_swagger: false # Enable Swagger UI for API docs
|
||||||
|
|
||||||
|
instances:
|
||||||
|
port_range: [8000, 9000] # Port range for instances
|
||||||
|
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
|
||||||
|
configs_dir: ~/.local/share/llamactl/instances # Instance configs directory
|
||||||
|
logs_dir: ~/.local/share/llamactl/logs # Logs directory
|
||||||
|
auto_create_dirs: true # Auto-create data/config/logs dirs if missing
|
||||||
|
max_instances: -1 # Max instances (-1 = unlimited)
|
||||||
|
llama_executable: llama-server # Path to llama-server executable
|
||||||
|
default_auto_restart: true # Auto-restart new instances by default
|
||||||
|
default_max_restarts: 3 # Max restarts for new instances
|
||||||
|
default_restart_delay: 5 # Restart delay (seconds) for new instances
|
||||||
|
|
||||||
|
auth:
|
||||||
|
require_inference_auth: true # Require auth for inference endpoints
|
||||||
|
inference_keys: [] # Keys for inference endpoints
|
||||||
|
require_management_auth: true # Require auth for management endpoints
|
||||||
|
management_keys: [] # Keys for management endpoints
|
||||||
|
```
|
||||||
|
|
||||||
|
<details><summary><strong>Full Configuration Guide</strong></summary>
|
||||||
|
|
||||||
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
|
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
|
||||||
|
|
||||||
1. Hardcoded defaults
|
```
|
||||||
2. Configuration file
|
Defaults < Configuration file < Environment variables
|
||||||
3. Environment variables
|
```
|
||||||
|
|
||||||
### Configuration Files
|
### Configuration Files
|
||||||
|
|
||||||
@@ -168,147 +210,8 @@ auth:
|
|||||||
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
|
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
|
||||||
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
|
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
|
||||||
|
|
||||||
### Example Configuration
|
</details>
|
||||||
|
|
||||||
```yaml
|
|
||||||
server:
|
|
||||||
host: "0.0.0.0"
|
|
||||||
port: 8080
|
|
||||||
|
|
||||||
instances:
|
|
||||||
port_range: [8001, 8100]
|
|
||||||
data_dir: "/var/lib/llamactl"
|
|
||||||
configs_dir: "/var/lib/llamactl/instances"
|
|
||||||
logs_dir: "/var/log/llamactl"
|
|
||||||
auto_create_dirs: true
|
|
||||||
max_instances: 10
|
|
||||||
llama_executable: "/usr/local/bin/llama-server"
|
|
||||||
default_auto_restart: true
|
|
||||||
default_max_restarts: 5
|
|
||||||
default_restart_delay: 10
|
|
||||||
|
|
||||||
auth:
|
|
||||||
require_inference_auth: true
|
|
||||||
inference_keys: ["sk-inference-abc123"]
|
|
||||||
require_management_auth: true
|
|
||||||
management_keys: ["sk-management-xyz456"]
|
|
||||||
```
|
|
||||||
|
|
||||||
## Usage
|
|
||||||
|
|
||||||
### Starting the Server
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start with default configuration
|
|
||||||
./llamactl
|
|
||||||
|
|
||||||
# Start with custom config file
|
|
||||||
LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl
|
|
||||||
|
|
||||||
# Start with environment variables
|
|
||||||
LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl
|
|
||||||
```
|
|
||||||
|
|
||||||
### Authentication
|
|
||||||
|
|
||||||
llamactl supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. There are separate keys for management and inference APIs:
|
|
||||||
|
|
||||||
- **Management keys** grant full access to instance management
|
|
||||||
- **Inference keys** grant access to OpenAI-compatible endpoints
|
|
||||||
- Management keys also work for inference endpoints (higher privilege)
|
|
||||||
|
|
||||||
**How to Use:**
|
|
||||||
Pass your API key in requests using one of:
|
|
||||||
- `Authorization: Bearer <key>` header
|
|
||||||
- `X-API-Key: <key>` header
|
|
||||||
- `api_key=<key>` query parameter
|
|
||||||
|
|
||||||
**Auto-generated keys**: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables.
|
|
||||||
|
|
||||||
### Web Dashboard
|
|
||||||
|
|
||||||
Open your browser and navigate to `http://localhost:8080` to access the web dashboard.
|
|
||||||
|
|
||||||
### API Usage
|
|
||||||
|
|
||||||
The REST API is available at `http://localhost:8080/api/v1`. See the Swagger documentation at `http://localhost:8080/swagger/` for complete API reference.
|
|
||||||
|
|
||||||
#### Create an Instance
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:8080/api/v1/instances/my-instance \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer sk-management-your-key" \
|
|
||||||
-d '{
|
|
||||||
"model": "/path/to/model.gguf",
|
|
||||||
"gpu_layers": 32,
|
|
||||||
"auto_restart": true
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
#### List Instances
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -H "Authorization: Bearer sk-management-your-key" \
|
|
||||||
http://localhost:8080/api/v1/instances
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Start/Stop Instance
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start
|
|
||||||
curl -X POST \
|
|
||||||
-H "Authorization: Bearer sk-management-your-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-instance/start
|
|
||||||
|
|
||||||
# Stop
|
|
||||||
curl -X POST \
|
|
||||||
-H "Authorization: Bearer sk-management-your-key" \
|
|
||||||
http://localhost:8080/api/v1/instances/my-instance/stop
|
|
||||||
```
|
|
||||||
|
|
||||||
### OpenAI Compatible Endpoints
|
|
||||||
|
|
||||||
Route requests to instances by including the instance name as the model parameter:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
curl -X POST http://localhost:8080/v1/chat/completions \
|
|
||||||
-H "Content-Type: application/json" \
|
|
||||||
-H "Authorization: Bearer sk-inference-your-key" \
|
|
||||||
-d '{
|
|
||||||
"model": "my-instance",
|
|
||||||
"messages": [{"role": "user", "content": "Hello!"}]
|
|
||||||
}'
|
|
||||||
```
|
|
||||||
|
|
||||||
## Development
|
|
||||||
|
|
||||||
### Running Tests
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Go tests
|
|
||||||
go test ./...
|
|
||||||
|
|
||||||
# Web UI tests
|
|
||||||
cd webui
|
|
||||||
npm test
|
|
||||||
```
|
|
||||||
|
|
||||||
### Development Server
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Start Go server in development mode
|
|
||||||
go run ./cmd/server
|
|
||||||
|
|
||||||
# Start web UI development server (in another terminal)
|
|
||||||
cd webui
|
|
||||||
npm run dev
|
|
||||||
```
|
|
||||||
|
|
||||||
## API Documentation
|
|
||||||
|
|
||||||
Interactive API documentation is available at `http://localhost:8080/swagger/` when the server is running.
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
|
MIT License - see [LICENSE](LICENSE) file.
|
||||||
|
|||||||
@@ -15,12 +15,12 @@ type LlamaServerOptions struct {
|
|||||||
CPUMask string `json:"cpu_mask,omitempty"`
|
CPUMask string `json:"cpu_mask,omitempty"`
|
||||||
CPURange string `json:"cpu_range,omitempty"`
|
CPURange string `json:"cpu_range,omitempty"`
|
||||||
CPUStrict int `json:"cpu_strict,omitempty"`
|
CPUStrict int `json:"cpu_strict,omitempty"`
|
||||||
Priority int `json:"priority,omitempty"`
|
Prio int `json:"prio,omitempty"`
|
||||||
Poll int `json:"poll,omitempty"`
|
Poll int `json:"poll,omitempty"`
|
||||||
CPUMaskBatch string `json:"cpu_mask_batch,omitempty"`
|
CPUMaskBatch string `json:"cpu_mask_batch,omitempty"`
|
||||||
CPURangeBatch string `json:"cpu_range_batch,omitempty"`
|
CPURangeBatch string `json:"cpu_range_batch,omitempty"`
|
||||||
CPUStrictBatch int `json:"cpu_strict_batch,omitempty"`
|
CPUStrictBatch int `json:"cpu_strict_batch,omitempty"`
|
||||||
PriorityBatch int `json:"priority_batch,omitempty"`
|
PrioBatch int `json:"prio_batch,omitempty"`
|
||||||
PollBatch int `json:"poll_batch,omitempty"`
|
PollBatch int `json:"poll_batch,omitempty"`
|
||||||
CtxSize int `json:"ctx_size,omitempty"`
|
CtxSize int `json:"ctx_size,omitempty"`
|
||||||
Predict int `json:"predict,omitempty"`
|
Predict int `json:"predict,omitempty"`
|
||||||
@@ -83,7 +83,7 @@ type LlamaServerOptions struct {
|
|||||||
Seed int `json:"seed,omitempty"`
|
Seed int `json:"seed,omitempty"`
|
||||||
SamplingSeq string `json:"sampling_seq,omitempty"`
|
SamplingSeq string `json:"sampling_seq,omitempty"`
|
||||||
IgnoreEOS bool `json:"ignore_eos,omitempty"`
|
IgnoreEOS bool `json:"ignore_eos,omitempty"`
|
||||||
Temperature float64 `json:"temperature,omitempty"`
|
Temperature float64 `json:"temp,omitempty"`
|
||||||
TopK int `json:"top_k,omitempty"`
|
TopK int `json:"top_k,omitempty"`
|
||||||
TopP float64 `json:"top_p,omitempty"`
|
TopP float64 `json:"top_p,omitempty"`
|
||||||
MinP float64 `json:"min_p,omitempty"`
|
MinP float64 `json:"min_p,omitempty"`
|
||||||
@@ -110,7 +110,7 @@ type LlamaServerOptions struct {
|
|||||||
JSONSchema string `json:"json_schema,omitempty"`
|
JSONSchema string `json:"json_schema,omitempty"`
|
||||||
JSONSchemaFile string `json:"json_schema_file,omitempty"`
|
JSONSchemaFile string `json:"json_schema_file,omitempty"`
|
||||||
|
|
||||||
// Server/Example-specific params
|
// Example-specific params
|
||||||
NoContextShift bool `json:"no_context_shift,omitempty"`
|
NoContextShift bool `json:"no_context_shift,omitempty"`
|
||||||
Special bool `json:"special,omitempty"`
|
Special bool `json:"special,omitempty"`
|
||||||
NoWarmup bool `json:"no_warmup,omitempty"`
|
NoWarmup bool `json:"no_warmup,omitempty"`
|
||||||
@@ -150,8 +150,6 @@ type LlamaServerOptions struct {
|
|||||||
NoPrefillAssistant bool `json:"no_prefill_assistant,omitempty"`
|
NoPrefillAssistant bool `json:"no_prefill_assistant,omitempty"`
|
||||||
SlotPromptSimilarity float64 `json:"slot_prompt_similarity,omitempty"`
|
SlotPromptSimilarity float64 `json:"slot_prompt_similarity,omitempty"`
|
||||||
LoraInitWithoutApply bool `json:"lora_init_without_apply,omitempty"`
|
LoraInitWithoutApply bool `json:"lora_init_without_apply,omitempty"`
|
||||||
|
|
||||||
// Speculative decoding params
|
|
||||||
DraftMax int `json:"draft_max,omitempty"`
|
DraftMax int `json:"draft_max,omitempty"`
|
||||||
DraftMin int `json:"draft_min,omitempty"`
|
DraftMin int `json:"draft_min,omitempty"`
|
||||||
DraftPMin float64 `json:"draft_p_min,omitempty"`
|
DraftPMin float64 `json:"draft_p_min,omitempty"`
|
||||||
@@ -199,7 +197,7 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
|
|||||||
|
|
||||||
// Handle alternative field names
|
// Handle alternative field names
|
||||||
fieldMappings := map[string]string{
|
fieldMappings := map[string]string{
|
||||||
// Official llama-server short forms from the documentation
|
// Common params
|
||||||
"t": "threads", // -t, --threads N
|
"t": "threads", // -t, --threads N
|
||||||
"tb": "threads_batch", // -tb, --threads-batch N
|
"tb": "threads_batch", // -tb, --threads-batch N
|
||||||
"C": "cpu_mask", // -C, --cpu-mask M
|
"C": "cpu_mask", // -C, --cpu-mask M
|
||||||
@@ -207,7 +205,8 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
|
|||||||
"Cb": "cpu_mask_batch", // -Cb, --cpu-mask-batch M
|
"Cb": "cpu_mask_batch", // -Cb, --cpu-mask-batch M
|
||||||
"Crb": "cpu_range_batch", // -Crb, --cpu-range-batch lo-hi
|
"Crb": "cpu_range_batch", // -Crb, --cpu-range-batch lo-hi
|
||||||
"c": "ctx_size", // -c, --ctx-size N
|
"c": "ctx_size", // -c, --ctx-size N
|
||||||
"n": "predict", // -n, --predict, --n-predict N
|
"n": "predict", // -n, --predict N
|
||||||
|
"n-predict": "predict", // --n-predict N
|
||||||
"b": "batch_size", // -b, --batch-size N
|
"b": "batch_size", // -b, --batch-size N
|
||||||
"ub": "ubatch_size", // -ub, --ubatch-size N
|
"ub": "ubatch_size", // -ub, --ubatch-size N
|
||||||
"fa": "flash_attn", // -fa, --flash-attn
|
"fa": "flash_attn", // -fa, --flash-attn
|
||||||
@@ -221,6 +220,7 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
|
|||||||
"dev": "device", // -dev, --device <dev1,dev2,..>
|
"dev": "device", // -dev, --device <dev1,dev2,..>
|
||||||
"ot": "override_tensor", // --override-tensor, -ot
|
"ot": "override_tensor", // --override-tensor, -ot
|
||||||
"ngl": "gpu_layers", // -ngl, --gpu-layers, --n-gpu-layers N
|
"ngl": "gpu_layers", // -ngl, --gpu-layers, --n-gpu-layers N
|
||||||
|
"n-gpu-layers": "gpu_layers", // --n-gpu-layers N
|
||||||
"sm": "split_mode", // -sm, --split-mode
|
"sm": "split_mode", // -sm, --split-mode
|
||||||
"ts": "tensor_split", // -ts, --tensor-split N0,N1,N2,...
|
"ts": "tensor_split", // -ts, --tensor-split N0,N1,N2,...
|
||||||
"mg": "main_gpu", // -mg, --main-gpu INDEX
|
"mg": "main_gpu", // -mg, --main-gpu INDEX
|
||||||
@@ -236,21 +236,32 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
|
|||||||
"hffv": "hf_file_v", // -hffv, --hf-file-v FILE
|
"hffv": "hf_file_v", // -hffv, --hf-file-v FILE
|
||||||
"hft": "hf_token", // -hft, --hf-token TOKEN
|
"hft": "hf_token", // -hft, --hf-token TOKEN
|
||||||
"v": "verbose", // -v, --verbose, --log-verbose
|
"v": "verbose", // -v, --verbose, --log-verbose
|
||||||
|
"log-verbose": "verbose", // --log-verbose
|
||||||
"lv": "verbosity", // -lv, --verbosity, --log-verbosity N
|
"lv": "verbosity", // -lv, --verbosity, --log-verbosity N
|
||||||
|
"log-verbosity": "verbosity", // --log-verbosity N
|
||||||
|
|
||||||
|
// Sampling params
|
||||||
"s": "seed", // -s, --seed SEED
|
"s": "seed", // -s, --seed SEED
|
||||||
"temp": "temperature", // --temp N
|
|
||||||
"l": "logit_bias", // -l, --logit-bias
|
"l": "logit_bias", // -l, --logit-bias
|
||||||
"j": "json_schema", // -j, --json-schema SCHEMA
|
"j": "json_schema", // -j, --json-schema SCHEMA
|
||||||
"jf": "json_schema_file", // -jf, --json-schema-file FILE
|
"jf": "json_schema_file", // -jf, --json-schema-file FILE
|
||||||
|
|
||||||
|
// Example-specific params
|
||||||
"sp": "special", // -sp, --special
|
"sp": "special", // -sp, --special
|
||||||
"cb": "cont_batching", // -cb, --cont-batching
|
"cb": "cont_batching", // -cb, --cont-batching
|
||||||
"nocb": "no_cont_batching", // -nocb, --no-cont-batching
|
"nocb": "no_cont_batching", // -nocb, --no-cont-batching
|
||||||
"a": "alias", // -a, --alias STRING
|
"a": "alias", // -a, --alias STRING
|
||||||
|
"embeddings": "embedding", // --embeddings
|
||||||
|
"rerank": "reranking", // --reranking
|
||||||
"to": "timeout", // -to, --timeout N
|
"to": "timeout", // -to, --timeout N
|
||||||
"sps": "slot_prompt_similarity", // -sps, --slot-prompt-similarity
|
"sps": "slot_prompt_similarity", // -sps, --slot-prompt-similarity
|
||||||
|
"draft": "draft-max", // -draft, --draft-max N
|
||||||
|
"draft-n": "draft-max", // --draft-n-max N
|
||||||
|
"draft-n-min": "draft_min", // --draft-n-min N
|
||||||
"cd": "ctx_size_draft", // -cd, --ctx-size-draft N
|
"cd": "ctx_size_draft", // -cd, --ctx-size-draft N
|
||||||
"devd": "device_draft", // -devd, --device-draft
|
"devd": "device_draft", // -devd, --device-draft
|
||||||
"ngld": "gpu_layers_draft", // -ngld, --gpu-layers-draft
|
"ngld": "gpu_layers_draft", // -ngld, --gpu-layers-draft
|
||||||
|
"n-gpu-layers-draft": "gpu_layers_draft", // --n-gpu-layers-draft N
|
||||||
"md": "model_draft", // -md, --model-draft FNAME
|
"md": "model_draft", // -md, --model-draft FNAME
|
||||||
"ctkd": "cache_type_k_draft", // -ctkd, --cache-type-k-draft TYPE
|
"ctkd": "cache_type_k_draft", // -ctkd, --cache-type-k-draft TYPE
|
||||||
"ctvd": "cache_type_v_draft", // -ctvd, --cache-type-v-draft TYPE
|
"ctvd": "cache_type_v_draft", // -ctvd, --cache-type-v-draft TYPE
|
||||||
|
|||||||
@@ -113,7 +113,7 @@ func TestBuildCommandArgs_NumericFields(t *testing.T) {
|
|||||||
"--threads": "4",
|
"--threads": "4",
|
||||||
"--ctx-size": "2048",
|
"--ctx-size": "2048",
|
||||||
"--gpu-layers": "16",
|
"--gpu-layers": "16",
|
||||||
"--temperature": "0.7",
|
"--temp": "0.7",
|
||||||
"--top-k": "40",
|
"--top-k": "40",
|
||||||
"--top-p": "0.9",
|
"--top-p": "0.9",
|
||||||
}
|
}
|
||||||
@@ -231,7 +231,7 @@ func TestUnmarshalJSON_StandardFields(t *testing.T) {
|
|||||||
"verbose": true,
|
"verbose": true,
|
||||||
"ctx_size": 4096,
|
"ctx_size": 4096,
|
||||||
"gpu_layers": 32,
|
"gpu_layers": 32,
|
||||||
"temperature": 0.7
|
"temp": 0.7
|
||||||
}`
|
}`
|
||||||
|
|
||||||
var options llamacpp.LlamaServerOptions
|
var options llamacpp.LlamaServerOptions
|
||||||
|
|||||||
@@ -11,6 +11,7 @@ import {
|
|||||||
DialogTitle,
|
DialogTitle,
|
||||||
} from '@/components/ui/dialog'
|
} from '@/components/ui/dialog'
|
||||||
import { Badge } from '@/components/ui/badge'
|
import { Badge } from '@/components/ui/badge'
|
||||||
|
import { instancesApi } from '@/lib/api'
|
||||||
import {
|
import {
|
||||||
RefreshCw,
|
RefreshCw,
|
||||||
Download,
|
Download,
|
||||||
@@ -46,21 +47,15 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
const refreshIntervalRef = useRef<NodeJS.Timeout | null>(null)
|
const refreshIntervalRef = useRef<NodeJS.Timeout | null>(null)
|
||||||
|
|
||||||
// Fetch logs function
|
// Fetch logs function
|
||||||
const fetchLogs = async (lines?: number) => {
|
const fetchLogs = React.useCallback(
|
||||||
|
async (lines?: number) => {
|
||||||
if (!instanceName) return
|
if (!instanceName) return
|
||||||
|
|
||||||
setLoading(true)
|
setLoading(true)
|
||||||
setError(null)
|
setError(null)
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const params = lines ? `?lines=${lines}` : ''
|
const logText = await instancesApi.getLogs(instanceName, lines)
|
||||||
const response = await fetch(`/api/v1/instances/${instanceName}/logs${params}`)
|
|
||||||
|
|
||||||
if (!response.ok) {
|
|
||||||
throw new Error(`Failed to fetch logs: ${response.status}`)
|
|
||||||
}
|
|
||||||
|
|
||||||
const logText = await response.text()
|
|
||||||
setLogs(logText)
|
setLogs(logText)
|
||||||
|
|
||||||
// Auto-scroll to bottom
|
// Auto-scroll to bottom
|
||||||
@@ -74,20 +69,22 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
} finally {
|
} finally {
|
||||||
setLoading(false)
|
setLoading(false)
|
||||||
}
|
}
|
||||||
}
|
},
|
||||||
|
[instanceName]
|
||||||
|
)
|
||||||
|
|
||||||
// Initial load when dialog opens
|
// Initial load when dialog opens
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
if (open && instanceName) {
|
if (open && instanceName) {
|
||||||
fetchLogs(lineCount)
|
void fetchLogs(lineCount)
|
||||||
}
|
}
|
||||||
}, [open, instanceName])
|
}, [open, instanceName, fetchLogs, lineCount])
|
||||||
|
|
||||||
// Auto-refresh effect
|
// Auto-refresh effect
|
||||||
useEffect(() => {
|
useEffect(() => {
|
||||||
if (autoRefresh && isRunning && open) {
|
if (autoRefresh && isRunning && open) {
|
||||||
refreshIntervalRef.current = setInterval(() => {
|
refreshIntervalRef.current = setInterval(() => {
|
||||||
fetchLogs(lineCount)
|
void fetchLogs(lineCount)
|
||||||
}, 2000) // Refresh every 2 seconds
|
}, 2000) // Refresh every 2 seconds
|
||||||
} else {
|
} else {
|
||||||
if (refreshIntervalRef.current) {
|
if (refreshIntervalRef.current) {
|
||||||
@@ -101,7 +98,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
clearInterval(refreshIntervalRef.current)
|
clearInterval(refreshIntervalRef.current)
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}, [autoRefresh, isRunning, open, lineCount])
|
}, [autoRefresh, isRunning, open, lineCount, fetchLogs])
|
||||||
|
|
||||||
// Copy logs to clipboard
|
// Copy logs to clipboard
|
||||||
const copyLogs = async () => {
|
const copyLogs = async () => {
|
||||||
@@ -135,7 +132,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
|
|
||||||
// Apply new line count
|
// Apply new line count
|
||||||
const applyLineCount = () => {
|
const applyLineCount = () => {
|
||||||
fetchLogs(lineCount)
|
void fetchLogs(lineCount)
|
||||||
setShowSettings(false)
|
setShowSettings(false)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -198,7 +195,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
<Button
|
<Button
|
||||||
variant="outline"
|
variant="outline"
|
||||||
size="sm"
|
size="sm"
|
||||||
onClick={() => fetchLogs(lineCount)}
|
onClick={() => void fetchLogs(lineCount)}
|
||||||
disabled={loading}
|
disabled={loading}
|
||||||
>
|
>
|
||||||
{loading ? (
|
{loading ? (
|
||||||
@@ -290,7 +287,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
|
|||||||
<div className="flex items-center gap-2 w-full">
|
<div className="flex items-center gap-2 w-full">
|
||||||
<Button
|
<Button
|
||||||
variant="outline"
|
variant="outline"
|
||||||
onClick={copyLogs}
|
onClick={() => void copyLogs()}
|
||||||
disabled={!logs}
|
disabled={!logs}
|
||||||
>
|
>
|
||||||
{copied ? (
|
{copied ? (
|
||||||
|
|||||||
@@ -7,8 +7,8 @@ import { getFieldType, basicFieldsConfig } from '@/lib/zodFormUtils'
|
|||||||
|
|
||||||
interface ZodFormFieldProps {
|
interface ZodFormFieldProps {
|
||||||
fieldKey: keyof CreateInstanceOptions
|
fieldKey: keyof CreateInstanceOptions
|
||||||
value: any
|
value: string | number | boolean | string[] | undefined
|
||||||
onChange: (key: keyof CreateInstanceOptions, value: any) => void
|
onChange: (key: keyof CreateInstanceOptions, value: string | number | boolean | string[] | undefined) => void
|
||||||
}
|
}
|
||||||
|
|
||||||
const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => {
|
const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => {
|
||||||
@@ -18,7 +18,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
|
|||||||
// Get type from Zod schema
|
// Get type from Zod schema
|
||||||
const fieldType = getFieldType(fieldKey)
|
const fieldType = getFieldType(fieldKey)
|
||||||
|
|
||||||
const handleChange = (newValue: any) => {
|
const handleChange = (newValue: string | number | boolean | string[] | undefined) => {
|
||||||
onChange(fieldKey, newValue)
|
onChange(fieldKey, newValue)
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -29,7 +29,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
|
|||||||
<div className="flex items-center space-x-2">
|
<div className="flex items-center space-x-2">
|
||||||
<Checkbox
|
<Checkbox
|
||||||
id={fieldKey}
|
id={fieldKey}
|
||||||
checked={value || false}
|
checked={typeof value === 'boolean' ? value : false}
|
||||||
onCheckedChange={(checked) => handleChange(checked)}
|
onCheckedChange={(checked) => handleChange(checked)}
|
||||||
/>
|
/>
|
||||||
<Label htmlFor={fieldKey} className="text-sm font-normal">
|
<Label htmlFor={fieldKey} className="text-sm font-normal">
|
||||||
@@ -51,10 +51,14 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
|
|||||||
<Input
|
<Input
|
||||||
id={fieldKey}
|
id={fieldKey}
|
||||||
type="number"
|
type="number"
|
||||||
value={value || ''}
|
step="any" // This allows decimal numbers
|
||||||
|
value={typeof value === 'string' || typeof value === 'number' ? value : ''}
|
||||||
onChange={(e) => {
|
onChange={(e) => {
|
||||||
const numValue = e.target.value ? parseFloat(e.target.value) : undefined
|
const numValue = e.target.value ? parseFloat(e.target.value) : undefined
|
||||||
|
// Only update if the parsed value is valid or the input is empty
|
||||||
|
if (e.target.value === '' || (numValue !== undefined && !isNaN(numValue))) {
|
||||||
handleChange(numValue)
|
handleChange(numValue)
|
||||||
|
}
|
||||||
}}
|
}}
|
||||||
placeholder={config.placeholder}
|
placeholder={config.placeholder}
|
||||||
/>
|
/>
|
||||||
@@ -101,7 +105,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
|
|||||||
<Input
|
<Input
|
||||||
id={fieldKey}
|
id={fieldKey}
|
||||||
type="text"
|
type="text"
|
||||||
value={value || ''}
|
value={typeof value === 'string' || typeof value === 'number' ? value : ''}
|
||||||
onChange={(e) => handleChange(e.target.value || undefined)}
|
onChange={(e) => handleChange(e.target.value || undefined)}
|
||||||
placeholder={config.placeholder}
|
placeholder={config.placeholder}
|
||||||
/>
|
/>
|
||||||
|
|||||||
@@ -1,5 +1,4 @@
|
|||||||
import type { CreateInstanceOptions} from '@/schemas/instanceOptions';
|
import { type CreateInstanceOptions, getAllFieldKeys } from '@/schemas/instanceOptions'
|
||||||
import { getAllFieldKeys } from '@/schemas/instanceOptions'
|
|
||||||
|
|
||||||
// Only define the basic fields we want to show by default
|
// Only define the basic fields we want to show by default
|
||||||
export const basicFieldsConfig: Record<string, {
|
export const basicFieldsConfig: Record<string, {
|
||||||
|
|||||||
@@ -14,12 +14,12 @@ export const CreateInstanceOptionsSchema = z.object({
|
|||||||
cpu_mask: z.string().optional(),
|
cpu_mask: z.string().optional(),
|
||||||
cpu_range: z.string().optional(),
|
cpu_range: z.string().optional(),
|
||||||
cpu_strict: z.number().optional(),
|
cpu_strict: z.number().optional(),
|
||||||
priority: z.number().optional(),
|
prio: z.number().optional(),
|
||||||
poll: z.number().optional(),
|
poll: z.number().optional(),
|
||||||
cpu_mask_batch: z.string().optional(),
|
cpu_mask_batch: z.string().optional(),
|
||||||
cpu_range_batch: z.string().optional(),
|
cpu_range_batch: z.string().optional(),
|
||||||
cpu_strict_batch: z.number().optional(),
|
cpu_strict_batch: z.number().optional(),
|
||||||
priority_batch: z.number().optional(),
|
prio_batch: z.number().optional(),
|
||||||
poll_batch: z.number().optional(),
|
poll_batch: z.number().optional(),
|
||||||
ctx_size: z.number().optional(),
|
ctx_size: z.number().optional(),
|
||||||
predict: z.number().optional(),
|
predict: z.number().optional(),
|
||||||
@@ -82,7 +82,7 @@ export const CreateInstanceOptionsSchema = z.object({
|
|||||||
seed: z.number().optional(),
|
seed: z.number().optional(),
|
||||||
sampling_seq: z.string().optional(),
|
sampling_seq: z.string().optional(),
|
||||||
ignore_eos: z.boolean().optional(),
|
ignore_eos: z.boolean().optional(),
|
||||||
temperature: z.number().optional(),
|
temp: z.number().optional(),
|
||||||
top_k: z.number().optional(),
|
top_k: z.number().optional(),
|
||||||
top_p: z.number().optional(),
|
top_p: z.number().optional(),
|
||||||
min_p: z.number().optional(),
|
min_p: z.number().optional(),
|
||||||
@@ -109,7 +109,7 @@ export const CreateInstanceOptionsSchema = z.object({
|
|||||||
json_schema: z.string().optional(),
|
json_schema: z.string().optional(),
|
||||||
json_schema_file: z.string().optional(),
|
json_schema_file: z.string().optional(),
|
||||||
|
|
||||||
// Server/Example-specific params
|
// Example-specific params
|
||||||
no_context_shift: z.boolean().optional(),
|
no_context_shift: z.boolean().optional(),
|
||||||
special: z.boolean().optional(),
|
special: z.boolean().optional(),
|
||||||
no_warmup: z.boolean().optional(),
|
no_warmup: z.boolean().optional(),
|
||||||
@@ -149,8 +149,6 @@ export const CreateInstanceOptionsSchema = z.object({
|
|||||||
no_prefill_assistant: z.boolean().optional(),
|
no_prefill_assistant: z.boolean().optional(),
|
||||||
slot_prompt_similarity: z.number().optional(),
|
slot_prompt_similarity: z.number().optional(),
|
||||||
lora_init_without_apply: z.boolean().optional(),
|
lora_init_without_apply: z.boolean().optional(),
|
||||||
|
|
||||||
// Speculative decoding params
|
|
||||||
draft_max: z.number().optional(),
|
draft_max: z.number().optional(),
|
||||||
draft_min: z.number().optional(),
|
draft_min: z.number().optional(),
|
||||||
draft_p_min: z.number().optional(),
|
draft_p_min: z.number().optional(),
|
||||||
|
|||||||
Reference in New Issue
Block a user