11 Commits

Author SHA1 Message Date
5aed01b68f Merge pull request #17 from lordmathis/fix/forbidden-logs
fix: Refactor log fetching to use instancesApi
2025-08-06 19:12:34 +02:00
3f9caff33b Refactor log fetching to use instancesApi 2025-08-06 19:07:25 +02:00
169254c61a Merge pull request #16 from lordmathis/fix/llama-server-options
fix: Missing or wrong llama server options
2025-08-06 18:51:18 +02:00
8154b8d0ab Fix temp in tests 2025-08-06 18:49:36 +02:00
a26d853ad5 Fix missing or wrong llama server options on frontend 2025-08-06 18:40:05 +02:00
6203b64045 Fix missing or wrong llama server options 2025-08-06 18:31:17 +02:00
8d9c808be1 Merge pull request #14 from lordmathis/docs/readme-updates
docs: Update README.md to improve project description
2025-08-05 21:32:20 +02:00
161cd213c5 Update README.md to enhance project description and installation instructions 2025-08-05 21:20:37 +02:00
d6e84f0527 Merge pull request #13 from lordmathis/fix/decimal-input
fix: Allow decimal input for numeric fields in instance configuration
2025-08-05 20:03:31 +02:00
0846350d41 Fix eslint issues in ZodFormField 2025-08-05 19:21:09 +02:00
dacaca8594 Fix number input handling to allow decimal values 2025-08-05 19:15:12 +02:00
7 changed files with 241 additions and 329 deletions

305
README.md
View File

@@ -2,90 +2,132 @@
![Build and Release](https://github.com/lordmathis/llamactl/actions/workflows/release.yaml/badge.svg) ![Go Tests](https://github.com/lordmathis/llamactl/actions/workflows/go_test.yaml/badge.svg) ![WebUI Tests](https://github.com/lordmathis/llamactl/actions/workflows/webui_test.yaml/badge.svg) ![Build and Release](https://github.com/lordmathis/llamactl/actions/workflows/release.yaml/badge.svg) ![Go Tests](https://github.com/lordmathis/llamactl/actions/workflows/go_test.yaml/badge.svg) ![WebUI Tests](https://github.com/lordmathis/llamactl/actions/workflows/webui_test.yaml/badge.svg)
A control server for managing multiple Llama Server instances with a web-based dashboard. **Management server for multiple llama.cpp instances with OpenAI-compatible API routing.**
## Features ## Why llamactl?
- **Multi-instance Management**: Create, start, stop, restart, and delete multiple llama-server instances 🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
- **Web Dashboard**: Modern React-based UI for managing instances 🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
- **Auto-restart**: Configurable automatic restart on instance failure 🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
- **Instance Monitoring**: Real-time health checks and status monitoring 🔐 **API Key Authentication**: Separate keys for management vs inference access
- **Log Management**: View, search, and download instance logs 📊 **Instance Monitoring**: Health checks, auto-restart, log management
- **Data Persistence**: Persistent storage of instance state. **Persistent State**: Instances survive server restarts
- **REST API**: Full API for programmatic control
- **OpenAI Compatible**: Route requests to instances by instance name
- **Configuration Management**: Comprehensive llama-server parameter support
- **System Information**: View llama-server version, devices, and help
- **API Key Authentication**: Secure access with separate management and inference keys
## Prerequisites **Choose llamactl if**: You need authentication, health monitoring, auto-restart, and centralized management of multiple llama-server instances
**Choose Ollama if**: You want the simplest setup with strong community ecosystem and third-party integrations
**Choose LM Studio if**: You prefer a polished desktop GUI experience with easy model management
This project requires `llama-server` from llama.cpp to be installed and available in your PATH. ## Quick Start
**Install llama.cpp:** ```bash
Follow the installation instructions at https://github.com/ggml-org/llama.cpp # 1. Install llama-server (one-time setup)
# See: https://github.com/ggml-org/llama.cpp#quick-start
# 2. Download and run llamactl
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz | tar -xz
sudo mv llamactl /usr/local/bin/
# 3. Start the server
llamactl
# Access dashboard at http://localhost:8080
```
## Usage
### Create and manage instances via web dashboard:
1. Open http://localhost:8080
2. Click "Create Instance"
3. Set model path and GPU layers
4. Start or stop the instance
### Or use the REST API:
```bash
# Create instance
curl -X POST localhost:8080/api/v1/instances/my-7b-model \
-H "Authorization: Bearer your-key" \
-d '{"model": "/path/to/model.gguf", "gpu_layers": 32}'
# Use with OpenAI SDK
curl -X POST localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model": "my-7b-model", "messages": [{"role": "user", "content": "Hello!"}]}'
```
## Installation ## Installation
### Download Prebuilt Binaries ### Option 1: Download Binary (Recommended)
The easiest way to install llamactl is to download a prebuilt binary from the [releases page](https://github.com/lordmathis/llamactl/releases).
**Linux/macOS:**
```bash ```bash
# Download the latest release for your platform # Linux/macOS - Get latest version and download
curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep tag_name | cut -d '"' -f 4)-linux-amd64.tar.gz | tar -xz LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
# Move to PATH
sudo mv llamactl /usr/local/bin/ sudo mv llamactl /usr/local/bin/
# Run the server # Or download manually from the releases page:
llamactl # https://github.com/lordmathis/llamactl/releases/latest
# Windows - Download from releases page
``` ```
**Manual Download:** ### Option 2: Build from Source
1. Go to the [releases page](https://github.com/lordmathis/llamactl/releases) Requires Go 1.24+ and Node.js 22+
2. Download the appropriate archive for your platform
3. Extract the archive and move the binary to a directory in your PATH
### Build from Source
If you prefer to build from source or need the latest development version:
#### Build Requirements
- Go 1.24 or later
- Node.js 22 or later (for building the web UI)
#### Building with Web UI
```bash ```bash
# Clone the repository
git clone https://github.com/lordmathis/llamactl.git git clone https://github.com/lordmathis/llamactl.git
cd llamactl cd llamactl
cd webui && npm ci && npm run build && cd ..
# Install Node.js dependencies
cd webui
npm ci
# Build the web UI
npm run build
# Return to project root and build
cd ..
go build -o llamactl ./cmd/server go build -o llamactl ./cmd/server
```
# Run the server ## Prerequisites
./llamactl
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
```bash
# Quick install methods:
# Homebrew (macOS)
brew install llama.cpp
# Or build from source - see llama.cpp docs
``` ```
## Configuration ## Configuration
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence: llamactl works out of the box with sensible defaults.
1. Hardcoded defaults ```yaml
2. Configuration file server:
3. Environment variables host: "0.0.0.0" # Server host to bind to
port: 8080 # Server port to bind to
allowed_origins: ["*"] # Allowed CORS origins (default: all)
enable_swagger: false # Enable Swagger UI for API docs
instances:
port_range: [8000, 9000] # Port range for instances
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
configs_dir: ~/.local/share/llamactl/instances # Instance configs directory
logs_dir: ~/.local/share/llamactl/logs # Logs directory
auto_create_dirs: true # Auto-create data/config/logs dirs if missing
max_instances: -1 # Max instances (-1 = unlimited)
llama_executable: llama-server # Path to llama-server executable
default_auto_restart: true # Auto-restart new instances by default
default_max_restarts: 3 # Max restarts for new instances
default_restart_delay: 5 # Restart delay (seconds) for new instances
auth:
require_inference_auth: true # Require auth for inference endpoints
inference_keys: [] # Keys for inference endpoints
require_management_auth: true # Require auth for management endpoints
management_keys: [] # Keys for management endpoints
```
<details><summary><strong>Full Configuration Guide</strong></summary>
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
```
Defaults < Configuration file < Environment variables
```
### Configuration Files ### Configuration Files
@@ -168,147 +210,8 @@ auth:
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false) - `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys - `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
### Example Configuration </details>
```yaml
server:
host: "0.0.0.0"
port: 8080
instances:
port_range: [8001, 8100]
data_dir: "/var/lib/llamactl"
configs_dir: "/var/lib/llamactl/instances"
logs_dir: "/var/log/llamactl"
auto_create_dirs: true
max_instances: 10
llama_executable: "/usr/local/bin/llama-server"
default_auto_restart: true
default_max_restarts: 5
default_restart_delay: 10
auth:
require_inference_auth: true
inference_keys: ["sk-inference-abc123"]
require_management_auth: true
management_keys: ["sk-management-xyz456"]
```
## Usage
### Starting the Server
```bash
# Start with default configuration
./llamactl
# Start with custom config file
LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl
# Start with environment variables
LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl
```
### Authentication
llamactl supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. There are separate keys for management and inference APIs:
- **Management keys** grant full access to instance management
- **Inference keys** grant access to OpenAI-compatible endpoints
- Management keys also work for inference endpoints (higher privilege)
**How to Use:**
Pass your API key in requests using one of:
- `Authorization: Bearer <key>` header
- `X-API-Key: <key>` header
- `api_key=<key>` query parameter
**Auto-generated keys**: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables.
### Web Dashboard
Open your browser and navigate to `http://localhost:8080` to access the web dashboard.
### API Usage
The REST API is available at `http://localhost:8080/api/v1`. See the Swagger documentation at `http://localhost:8080/swagger/` for complete API reference.
#### Create an Instance
```bash
curl -X POST http://localhost:8080/api/v1/instances/my-instance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-management-your-key" \
-d '{
"model": "/path/to/model.gguf",
"gpu_layers": 32,
"auto_restart": true
}'
```
#### List Instances
```bash
curl -H "Authorization: Bearer sk-management-your-key" \
http://localhost:8080/api/v1/instances
```
#### Start/Stop Instance
```bash
# Start
curl -X POST \
-H "Authorization: Bearer sk-management-your-key" \
http://localhost:8080/api/v1/instances/my-instance/start
# Stop
curl -X POST \
-H "Authorization: Bearer sk-management-your-key" \
http://localhost:8080/api/v1/instances/my-instance/stop
```
### OpenAI Compatible Endpoints
Route requests to instances by including the instance name as the model parameter:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-inference-your-key" \
-d '{
"model": "my-instance",
"messages": [{"role": "user", "content": "Hello!"}]
}'
```
## Development
### Running Tests
```bash
# Go tests
go test ./...
# Web UI tests
cd webui
npm test
```
### Development Server
```bash
# Start Go server in development mode
go run ./cmd/server
# Start web UI development server (in another terminal)
cd webui
npm run dev
```
## API Documentation
Interactive API documentation is available at `http://localhost:8080/swagger/` when the server is running.
## License ## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details. MIT License - see [LICENSE](LICENSE) file.

View File

@@ -15,12 +15,12 @@ type LlamaServerOptions struct {
CPUMask string `json:"cpu_mask,omitempty"` CPUMask string `json:"cpu_mask,omitempty"`
CPURange string `json:"cpu_range,omitempty"` CPURange string `json:"cpu_range,omitempty"`
CPUStrict int `json:"cpu_strict,omitempty"` CPUStrict int `json:"cpu_strict,omitempty"`
Priority int `json:"priority,omitempty"` Prio int `json:"prio,omitempty"`
Poll int `json:"poll,omitempty"` Poll int `json:"poll,omitempty"`
CPUMaskBatch string `json:"cpu_mask_batch,omitempty"` CPUMaskBatch string `json:"cpu_mask_batch,omitempty"`
CPURangeBatch string `json:"cpu_range_batch,omitempty"` CPURangeBatch string `json:"cpu_range_batch,omitempty"`
CPUStrictBatch int `json:"cpu_strict_batch,omitempty"` CPUStrictBatch int `json:"cpu_strict_batch,omitempty"`
PriorityBatch int `json:"priority_batch,omitempty"` PrioBatch int `json:"prio_batch,omitempty"`
PollBatch int `json:"poll_batch,omitempty"` PollBatch int `json:"poll_batch,omitempty"`
CtxSize int `json:"ctx_size,omitempty"` CtxSize int `json:"ctx_size,omitempty"`
Predict int `json:"predict,omitempty"` Predict int `json:"predict,omitempty"`
@@ -83,7 +83,7 @@ type LlamaServerOptions struct {
Seed int `json:"seed,omitempty"` Seed int `json:"seed,omitempty"`
SamplingSeq string `json:"sampling_seq,omitempty"` SamplingSeq string `json:"sampling_seq,omitempty"`
IgnoreEOS bool `json:"ignore_eos,omitempty"` IgnoreEOS bool `json:"ignore_eos,omitempty"`
Temperature float64 `json:"temperature,omitempty"` Temperature float64 `json:"temp,omitempty"`
TopK int `json:"top_k,omitempty"` TopK int `json:"top_k,omitempty"`
TopP float64 `json:"top_p,omitempty"` TopP float64 `json:"top_p,omitempty"`
MinP float64 `json:"min_p,omitempty"` MinP float64 `json:"min_p,omitempty"`
@@ -110,7 +110,7 @@ type LlamaServerOptions struct {
JSONSchema string `json:"json_schema,omitempty"` JSONSchema string `json:"json_schema,omitempty"`
JSONSchemaFile string `json:"json_schema_file,omitempty"` JSONSchemaFile string `json:"json_schema_file,omitempty"`
// Server/Example-specific params // Example-specific params
NoContextShift bool `json:"no_context_shift,omitempty"` NoContextShift bool `json:"no_context_shift,omitempty"`
Special bool `json:"special,omitempty"` Special bool `json:"special,omitempty"`
NoWarmup bool `json:"no_warmup,omitempty"` NoWarmup bool `json:"no_warmup,omitempty"`
@@ -150,17 +150,15 @@ type LlamaServerOptions struct {
NoPrefillAssistant bool `json:"no_prefill_assistant,omitempty"` NoPrefillAssistant bool `json:"no_prefill_assistant,omitempty"`
SlotPromptSimilarity float64 `json:"slot_prompt_similarity,omitempty"` SlotPromptSimilarity float64 `json:"slot_prompt_similarity,omitempty"`
LoraInitWithoutApply bool `json:"lora_init_without_apply,omitempty"` LoraInitWithoutApply bool `json:"lora_init_without_apply,omitempty"`
DraftMax int `json:"draft_max,omitempty"`
// Speculative decoding params DraftMin int `json:"draft_min,omitempty"`
DraftMax int `json:"draft_max,omitempty"` DraftPMin float64 `json:"draft_p_min,omitempty"`
DraftMin int `json:"draft_min,omitempty"` CtxSizeDraft int `json:"ctx_size_draft,omitempty"`
DraftPMin float64 `json:"draft_p_min,omitempty"` DeviceDraft string `json:"device_draft,omitempty"`
CtxSizeDraft int `json:"ctx_size_draft,omitempty"` GPULayersDraft int `json:"gpu_layers_draft,omitempty"`
DeviceDraft string `json:"device_draft,omitempty"` ModelDraft string `json:"model_draft,omitempty"`
GPULayersDraft int `json:"gpu_layers_draft,omitempty"` CacheTypeKDraft string `json:"cache_type_k_draft,omitempty"`
ModelDraft string `json:"model_draft,omitempty"` CacheTypeVDraft string `json:"cache_type_v_draft,omitempty"`
CacheTypeKDraft string `json:"cache_type_k_draft,omitempty"`
CacheTypeVDraft string `json:"cache_type_v_draft,omitempty"`
// Audio/TTS params // Audio/TTS params
ModelVocoder string `json:"model_vocoder,omitempty"` ModelVocoder string `json:"model_vocoder,omitempty"`
@@ -199,62 +197,75 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
// Handle alternative field names // Handle alternative field names
fieldMappings := map[string]string{ fieldMappings := map[string]string{
// Official llama-server short forms from the documentation // Common params
"t": "threads", // -t, --threads N "t": "threads", // -t, --threads N
"tb": "threads_batch", // -tb, --threads-batch N "tb": "threads_batch", // -tb, --threads-batch N
"C": "cpu_mask", // -C, --cpu-mask M "C": "cpu_mask", // -C, --cpu-mask M
"Cr": "cpu_range", // -Cr, --cpu-range lo-hi "Cr": "cpu_range", // -Cr, --cpu-range lo-hi
"Cb": "cpu_mask_batch", // -Cb, --cpu-mask-batch M "Cb": "cpu_mask_batch", // -Cb, --cpu-mask-batch M
"Crb": "cpu_range_batch", // -Crb, --cpu-range-batch lo-hi "Crb": "cpu_range_batch", // -Crb, --cpu-range-batch lo-hi
"c": "ctx_size", // -c, --ctx-size N "c": "ctx_size", // -c, --ctx-size N
"n": "predict", // -n, --predict, --n-predict N "n": "predict", // -n, --predict N
"b": "batch_size", // -b, --batch-size N "n-predict": "predict", // --n-predict N
"ub": "ubatch_size", // -ub, --ubatch-size N "b": "batch_size", // -b, --batch-size N
"fa": "flash_attn", // -fa, --flash-attn "ub": "ubatch_size", // -ub, --ubatch-size N
"e": "escape", // -e, --escape "fa": "flash_attn", // -fa, --flash-attn
"dkvc": "dump_kv_cache", // -dkvc, --dump-kv-cache "e": "escape", // -e, --escape
"nkvo": "no_kv_offload", // -nkvo, --no-kv-offload "dkvc": "dump_kv_cache", // -dkvc, --dump-kv-cache
"ctk": "cache_type_k", // -ctk, --cache-type-k TYPE "nkvo": "no_kv_offload", // -nkvo, --no-kv-offload
"ctv": "cache_type_v", // -ctv, --cache-type-v TYPE "ctk": "cache_type_k", // -ctk, --cache-type-k TYPE
"dt": "defrag_thold", // -dt, --defrag-thold N "ctv": "cache_type_v", // -ctv, --cache-type-v TYPE
"np": "parallel", // -np, --parallel N "dt": "defrag_thold", // -dt, --defrag-thold N
"dev": "device", // -dev, --device <dev1,dev2,..> "np": "parallel", // -np, --parallel N
"ot": "override_tensor", // --override-tensor, -ot "dev": "device", // -dev, --device <dev1,dev2,..>
"ngl": "gpu_layers", // -ngl, --gpu-layers, --n-gpu-layers N "ot": "override_tensor", // --override-tensor, -ot
"sm": "split_mode", // -sm, --split-mode "ngl": "gpu_layers", // -ngl, --gpu-layers, --n-gpu-layers N
"ts": "tensor_split", // -ts, --tensor-split N0,N1,N2,... "n-gpu-layers": "gpu_layers", // --n-gpu-layers N
"mg": "main_gpu", // -mg, --main-gpu INDEX "sm": "split_mode", // -sm, --split-mode
"m": "model", // -m, --model FNAME "ts": "tensor_split", // -ts, --tensor-split N0,N1,N2,...
"mu": "model_url", // -mu, --model-url MODEL_URL "mg": "main_gpu", // -mg, --main-gpu INDEX
"hf": "hf_repo", // -hf, -hfr, --hf-repo "m": "model", // -m, --model FNAME
"hfr": "hf_repo", // -hf, -hfr, --hf-repo "mu": "model_url", // -mu, --model-url MODEL_URL
"hfd": "hf_repo_draft", // -hfd, -hfrd, --hf-repo-draft "hf": "hf_repo", // -hf, -hfr, --hf-repo
"hfrd": "hf_repo_draft", // -hfd, -hfrd, --hf-repo-draft "hfr": "hf_repo", // -hf, -hfr, --hf-repo
"hff": "hf_file", // -hff, --hf-file FILE "hfd": "hf_repo_draft", // -hfd, -hfrd, --hf-repo-draft
"hfv": "hf_repo_v", // -hfv, -hfrv, --hf-repo-v "hfrd": "hf_repo_draft", // -hfd, -hfrd, --hf-repo-draft
"hfrv": "hf_repo_v", // -hfv, -hfrv, --hf-repo-v "hff": "hf_file", // -hff, --hf-file FILE
"hffv": "hf_file_v", // -hffv, --hf-file-v FILE "hfv": "hf_repo_v", // -hfv, -hfrv, --hf-repo-v
"hft": "hf_token", // -hft, --hf-token TOKEN "hfrv": "hf_repo_v", // -hfv, -hfrv, --hf-repo-v
"v": "verbose", // -v, --verbose, --log-verbose "hffv": "hf_file_v", // -hffv, --hf-file-v FILE
"lv": "verbosity", // -lv, --verbosity, --log-verbosity N "hft": "hf_token", // -hft, --hf-token TOKEN
"s": "seed", // -s, --seed SEED "v": "verbose", // -v, --verbose, --log-verbose
"temp": "temperature", // --temp N "log-verbose": "verbose", // --log-verbose
"l": "logit_bias", // -l, --logit-bias "lv": "verbosity", // -lv, --verbosity, --log-verbosity N
"j": "json_schema", // -j, --json-schema SCHEMA "log-verbosity": "verbosity", // --log-verbosity N
"jf": "json_schema_file", // -jf, --json-schema-file FILE
"sp": "special", // -sp, --special // Sampling params
"cb": "cont_batching", // -cb, --cont-batching "s": "seed", // -s, --seed SEED
"nocb": "no_cont_batching", // -nocb, --no-cont-batching "l": "logit_bias", // -l, --logit-bias
"a": "alias", // -a, --alias STRING "j": "json_schema", // -j, --json-schema SCHEMA
"to": "timeout", // -to, --timeout N "jf": "json_schema_file", // -jf, --json-schema-file FILE
"sps": "slot_prompt_similarity", // -sps, --slot-prompt-similarity
"cd": "ctx_size_draft", // -cd, --ctx-size-draft N // Example-specific params
"devd": "device_draft", // -devd, --device-draft "sp": "special", // -sp, --special
"ngld": "gpu_layers_draft", // -ngld, --gpu-layers-draft "cb": "cont_batching", // -cb, --cont-batching
"md": "model_draft", // -md, --model-draft FNAME "nocb": "no_cont_batching", // -nocb, --no-cont-batching
"ctkd": "cache_type_k_draft", // -ctkd, --cache-type-k-draft TYPE "a": "alias", // -a, --alias STRING
"ctvd": "cache_type_v_draft", // -ctvd, --cache-type-v-draft TYPE "embeddings": "embedding", // --embeddings
"mv": "model_vocoder", // -mv, --model-vocoder FNAME "rerank": "reranking", // --reranking
"to": "timeout", // -to, --timeout N
"sps": "slot_prompt_similarity", // -sps, --slot-prompt-similarity
"draft": "draft-max", // -draft, --draft-max N
"draft-n": "draft-max", // --draft-n-max N
"draft-n-min": "draft_min", // --draft-n-min N
"cd": "ctx_size_draft", // -cd, --ctx-size-draft N
"devd": "device_draft", // -devd, --device-draft
"ngld": "gpu_layers_draft", // -ngld, --gpu-layers-draft
"n-gpu-layers-draft": "gpu_layers_draft", // --n-gpu-layers-draft N
"md": "model_draft", // -md, --model-draft FNAME
"ctkd": "cache_type_k_draft", // -ctkd, --cache-type-k-draft TYPE
"ctvd": "cache_type_v_draft", // -ctvd, --cache-type-v-draft TYPE
"mv": "model_vocoder", // -mv, --model-vocoder FNAME
} }
// Process alternative field names // Process alternative field names

View File

@@ -109,13 +109,13 @@ func TestBuildCommandArgs_NumericFields(t *testing.T) {
args := options.BuildCommandArgs() args := options.BuildCommandArgs()
expectedPairs := map[string]string{ expectedPairs := map[string]string{
"--port": "8080", "--port": "8080",
"--threads": "4", "--threads": "4",
"--ctx-size": "2048", "--ctx-size": "2048",
"--gpu-layers": "16", "--gpu-layers": "16",
"--temperature": "0.7", "--temp": "0.7",
"--top-k": "40", "--top-k": "40",
"--top-p": "0.9", "--top-p": "0.9",
} }
for flag, expectedValue := range expectedPairs { for flag, expectedValue := range expectedPairs {
@@ -231,7 +231,7 @@ func TestUnmarshalJSON_StandardFields(t *testing.T) {
"verbose": true, "verbose": true,
"ctx_size": 4096, "ctx_size": 4096,
"gpu_layers": 32, "gpu_layers": 32,
"temperature": 0.7 "temp": 0.7
}` }`
var options llamacpp.LlamaServerOptions var options llamacpp.LlamaServerOptions

View File

@@ -11,6 +11,7 @@ import {
DialogTitle, DialogTitle,
} from '@/components/ui/dialog' } from '@/components/ui/dialog'
import { Badge } from '@/components/ui/badge' import { Badge } from '@/components/ui/badge'
import { instancesApi } from '@/lib/api'
import { import {
RefreshCw, RefreshCw,
Download, Download,
@@ -46,48 +47,44 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
const refreshIntervalRef = useRef<NodeJS.Timeout | null>(null) const refreshIntervalRef = useRef<NodeJS.Timeout | null>(null)
// Fetch logs function // Fetch logs function
const fetchLogs = async (lines?: number) => { const fetchLogs = React.useCallback(
if (!instanceName) return async (lines?: number) => {
if (!instanceName) return
setLoading(true)
setError(null)
try {
const params = lines ? `?lines=${lines}` : ''
const response = await fetch(`/api/v1/instances/${instanceName}/logs${params}`)
if (!response.ok) { setLoading(true)
throw new Error(`Failed to fetch logs: ${response.status}`) setError(null)
try {
const logText = await instancesApi.getLogs(instanceName, lines)
setLogs(logText)
// Auto-scroll to bottom
setTimeout(() => {
if (logContainerRef.current) {
logContainerRef.current.scrollTop = logContainerRef.current.scrollHeight
}
}, 100)
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to fetch logs')
} finally {
setLoading(false)
} }
},
const logText = await response.text() [instanceName]
setLogs(logText) )
// Auto-scroll to bottom
setTimeout(() => {
if (logContainerRef.current) {
logContainerRef.current.scrollTop = logContainerRef.current.scrollHeight
}
}, 100)
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to fetch logs')
} finally {
setLoading(false)
}
}
// Initial load when dialog opens // Initial load when dialog opens
useEffect(() => { useEffect(() => {
if (open && instanceName) { if (open && instanceName) {
fetchLogs(lineCount) void fetchLogs(lineCount)
} }
}, [open, instanceName]) }, [open, instanceName, fetchLogs, lineCount])
// Auto-refresh effect // Auto-refresh effect
useEffect(() => { useEffect(() => {
if (autoRefresh && isRunning && open) { if (autoRefresh && isRunning && open) {
refreshIntervalRef.current = setInterval(() => { refreshIntervalRef.current = setInterval(() => {
fetchLogs(lineCount) void fetchLogs(lineCount)
}, 2000) // Refresh every 2 seconds }, 2000) // Refresh every 2 seconds
} else { } else {
if (refreshIntervalRef.current) { if (refreshIntervalRef.current) {
@@ -101,7 +98,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
clearInterval(refreshIntervalRef.current) clearInterval(refreshIntervalRef.current)
} }
} }
}, [autoRefresh, isRunning, open, lineCount]) }, [autoRefresh, isRunning, open, lineCount, fetchLogs])
// Copy logs to clipboard // Copy logs to clipboard
const copyLogs = async () => { const copyLogs = async () => {
@@ -135,7 +132,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
// Apply new line count // Apply new line count
const applyLineCount = () => { const applyLineCount = () => {
fetchLogs(lineCount) void fetchLogs(lineCount)
setShowSettings(false) setShowSettings(false)
} }
@@ -198,7 +195,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
<Button <Button
variant="outline" variant="outline"
size="sm" size="sm"
onClick={() => fetchLogs(lineCount)} onClick={() => void fetchLogs(lineCount)}
disabled={loading} disabled={loading}
> >
{loading ? ( {loading ? (
@@ -290,7 +287,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
<div className="flex items-center gap-2 w-full"> <div className="flex items-center gap-2 w-full">
<Button <Button
variant="outline" variant="outline"
onClick={copyLogs} onClick={() => void copyLogs()}
disabled={!logs} disabled={!logs}
> >
{copied ? ( {copied ? (

View File

@@ -7,8 +7,8 @@ import { getFieldType, basicFieldsConfig } from '@/lib/zodFormUtils'
interface ZodFormFieldProps { interface ZodFormFieldProps {
fieldKey: keyof CreateInstanceOptions fieldKey: keyof CreateInstanceOptions
value: any value: string | number | boolean | string[] | undefined
onChange: (key: keyof CreateInstanceOptions, value: any) => void onChange: (key: keyof CreateInstanceOptions, value: string | number | boolean | string[] | undefined) => void
} }
const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => { const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => {
@@ -18,7 +18,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
// Get type from Zod schema // Get type from Zod schema
const fieldType = getFieldType(fieldKey) const fieldType = getFieldType(fieldKey)
const handleChange = (newValue: any) => { const handleChange = (newValue: string | number | boolean | string[] | undefined) => {
onChange(fieldKey, newValue) onChange(fieldKey, newValue)
} }
@@ -29,7 +29,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
<div className="flex items-center space-x-2"> <div className="flex items-center space-x-2">
<Checkbox <Checkbox
id={fieldKey} id={fieldKey}
checked={value || false} checked={typeof value === 'boolean' ? value : false}
onCheckedChange={(checked) => handleChange(checked)} onCheckedChange={(checked) => handleChange(checked)}
/> />
<Label htmlFor={fieldKey} className="text-sm font-normal"> <Label htmlFor={fieldKey} className="text-sm font-normal">
@@ -51,10 +51,14 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
<Input <Input
id={fieldKey} id={fieldKey}
type="number" type="number"
value={value || ''} step="any" // This allows decimal numbers
value={typeof value === 'string' || typeof value === 'number' ? value : ''}
onChange={(e) => { onChange={(e) => {
const numValue = e.target.value ? parseFloat(e.target.value) : undefined const numValue = e.target.value ? parseFloat(e.target.value) : undefined
handleChange(numValue) // Only update if the parsed value is valid or the input is empty
if (e.target.value === '' || (numValue !== undefined && !isNaN(numValue))) {
handleChange(numValue)
}
}} }}
placeholder={config.placeholder} placeholder={config.placeholder}
/> />
@@ -101,7 +105,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
<Input <Input
id={fieldKey} id={fieldKey}
type="text" type="text"
value={value || ''} value={typeof value === 'string' || typeof value === 'number' ? value : ''}
onChange={(e) => handleChange(e.target.value || undefined)} onChange={(e) => handleChange(e.target.value || undefined)}
placeholder={config.placeholder} placeholder={config.placeholder}
/> />

View File

@@ -1,5 +1,4 @@
import type { CreateInstanceOptions} from '@/schemas/instanceOptions'; import { type CreateInstanceOptions, getAllFieldKeys } from '@/schemas/instanceOptions'
import { getAllFieldKeys } from '@/schemas/instanceOptions'
// Only define the basic fields we want to show by default // Only define the basic fields we want to show by default
export const basicFieldsConfig: Record<string, { export const basicFieldsConfig: Record<string, {

View File

@@ -14,12 +14,12 @@ export const CreateInstanceOptionsSchema = z.object({
cpu_mask: z.string().optional(), cpu_mask: z.string().optional(),
cpu_range: z.string().optional(), cpu_range: z.string().optional(),
cpu_strict: z.number().optional(), cpu_strict: z.number().optional(),
priority: z.number().optional(), prio: z.number().optional(),
poll: z.number().optional(), poll: z.number().optional(),
cpu_mask_batch: z.string().optional(), cpu_mask_batch: z.string().optional(),
cpu_range_batch: z.string().optional(), cpu_range_batch: z.string().optional(),
cpu_strict_batch: z.number().optional(), cpu_strict_batch: z.number().optional(),
priority_batch: z.number().optional(), prio_batch: z.number().optional(),
poll_batch: z.number().optional(), poll_batch: z.number().optional(),
ctx_size: z.number().optional(), ctx_size: z.number().optional(),
predict: z.number().optional(), predict: z.number().optional(),
@@ -82,7 +82,7 @@ export const CreateInstanceOptionsSchema = z.object({
seed: z.number().optional(), seed: z.number().optional(),
sampling_seq: z.string().optional(), sampling_seq: z.string().optional(),
ignore_eos: z.boolean().optional(), ignore_eos: z.boolean().optional(),
temperature: z.number().optional(), temp: z.number().optional(),
top_k: z.number().optional(), top_k: z.number().optional(),
top_p: z.number().optional(), top_p: z.number().optional(),
min_p: z.number().optional(), min_p: z.number().optional(),
@@ -109,7 +109,7 @@ export const CreateInstanceOptionsSchema = z.object({
json_schema: z.string().optional(), json_schema: z.string().optional(),
json_schema_file: z.string().optional(), json_schema_file: z.string().optional(),
// Server/Example-specific params // Example-specific params
no_context_shift: z.boolean().optional(), no_context_shift: z.boolean().optional(),
special: z.boolean().optional(), special: z.boolean().optional(),
no_warmup: z.boolean().optional(), no_warmup: z.boolean().optional(),
@@ -149,8 +149,6 @@ export const CreateInstanceOptionsSchema = z.object({
no_prefill_assistant: z.boolean().optional(), no_prefill_assistant: z.boolean().optional(),
slot_prompt_similarity: z.number().optional(), slot_prompt_similarity: z.number().optional(),
lora_init_without_apply: z.boolean().optional(), lora_init_without_apply: z.boolean().optional(),
// Speculative decoding params
draft_max: z.number().optional(), draft_max: z.number().optional(),
draft_min: z.number().optional(), draft_min: z.number().optional(),
draft_p_min: z.number().optional(), draft_p_min: z.number().optional(),