Merge pull request #17 from lordmathis/fix/forbidden-logs

fix: Refactor log fetching to use instancesApi
Refactor log fetching to use instancesApi
2025-11-06 09:04:27 +00:00 · 2025-08-06 19:12:34 +02:00 · 2025-08-06 19:07:25 +02:00 · 2025-08-06 18:51:18 +02:00 · 2025-08-06 18:49:36 +02:00 · 2025-08-06 18:40:05 +02:00
7 changed files with 241 additions and 329 deletions
--- a/README.md
+++ b/README.md
@@ -2,90 +2,132 @@

 ![Build and Release](https://github.com/lordmathis/llamactl/actions/workflows/release.yaml/badge.svg) ![Go Tests](https://github.com/lordmathis/llamactl/actions/workflows/go_test.yaml/badge.svg) ![WebUI Tests](https://github.com/lordmathis/llamactl/actions/workflows/webui_test.yaml/badge.svg)

-A control server for managing multiple Llama Server instances with a web-based dashboard.
+**Management server for multiple llama.cpp instances with OpenAI-compatible API routing.**

-## Features
+## Why llamactl?

- **Multi-instance Management**: Create, start, stop, restart, and delete multiple llama-server instances
- **Web Dashboard**: Modern React-based UI for managing instances
- **Auto-restart**: Configurable automatic restart on instance failure
- **Instance Monitoring**: Real-time health checks and status monitoring
- **Log Management**: View, search, and download instance logs
- **Data Persistence**: Persistent storage of instance state.
- **REST API**: Full API for programmatic control
- **OpenAI Compatible**: Route requests to instances by instance name
- **Configuration Management**: Comprehensive llama-server parameter support
- **System Information**: View llama-server version, devices, and help
- **API Key Authentication**: Secure access with separate management and inference keys
+🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)  
+🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name  
+🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)  
+🔐 **API Key Authentication**: Separate keys for management vs inference access  
+📊 **Instance Monitoring**: Health checks, auto-restart, log management  
+⚡ **Persistent State**: Instances survive server restarts

-## Prerequisites
+**Choose llamactl if**: You need authentication, health monitoring, auto-restart, and centralized management of multiple llama-server instances  
+**Choose Ollama if**: You want the simplest setup with strong community ecosystem and third-party integrations  
+**Choose LM Studio if**: You prefer a polished desktop GUI experience with easy model management

-This project requires `llama-server` from llama.cpp to be installed and available in your PATH.
+## Quick Start

-**Install llama.cpp:**
-Follow the installation instructions at https://github.com/ggml-org/llama.cpp
+```bash
+# 1. Install llama-server (one-time setup)
+# See: https://github.com/ggml-org/llama.cpp#quick-start
+
+# 2. Download and run llamactl
+LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
+curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz | tar -xz
+sudo mv llamactl /usr/local/bin/
+
+# 3. Start the server
+llamactl
+# Access dashboard at http://localhost:8080
+```
+
+## Usage
+
+### Create and manage instances via web dashboard:
+1. Open http://localhost:8080
+2. Click "Create Instance"
+3. Set model path and GPU layers
+4. Start or stop the instance
+
+### Or use the REST API:
+```bash
+# Create instance
+curl -X POST localhost:8080/api/v1/instances/my-7b-model \
+  -H "Authorization: Bearer your-key" \
+  -d '{"model": "/path/to/model.gguf", "gpu_layers": 32}'
+
+# Use with OpenAI SDK
+curl -X POST localhost:8080/v1/chat/completions \
+  -H "Authorization: Bearer your-key" \
+  -d '{"model": "my-7b-model", "messages": [{"role": "user", "content": "Hello!"}]}'
+```

 ## Installation

-### Download Prebuilt Binaries
+### Option 1: Download Binary (Recommended)

-The easiest way to install llamactl is to download a prebuilt binary from the [releases page](https://github.com/lordmathis/llamactl/releases).
-
-**Linux/macOS:**
 ```bash
-# Download the latest release for your platform
-curl -L https://github.com/lordmathis/llamactl/releases/latest/download/llamactl-$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep tag_name | cut -d '"' -f 4)-linux-amd64.tar.gz | tar -xz
-
-# Move to PATH
+# Linux/macOS - Get latest version and download
+LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
+curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
 sudo mv llamactl /usr/local/bin/

-# Run the server
-llamactl
+# Or download manually from the releases page:
+# https://github.com/lordmathis/llamactl/releases/latest
+
+# Windows - Download from releases page
 ```

-**Manual Download:**
-1. Go to the [releases page](https://github.com/lordmathis/llamactl/releases)
-2. Download the appropriate archive for your platform
-3. Extract the archive and move the binary to a directory in your PATH
-
-### Build from Source
-
-If you prefer to build from source or need the latest development version:
-
-#### Build Requirements
-
- Go 1.24 or later
- Node.js 22 or later (for building the web UI)
-
-#### Building with Web UI
-
+### Option 2: Build from Source
+Requires Go 1.24+ and Node.js 22+
 ```bash
-# Clone the repository
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl
-
-# Install Node.js dependencies
-cd webui
-npm ci
-
-# Build the web UI
-npm run build
-
-# Return to project root and build
-cd ..
+cd webui && npm ci && npm run build && cd ..
 go build -o llamactl ./cmd/server
+```

-# Run the server
-./llamactl
+## Prerequisites
+
+You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
+
+```bash
+# Quick install methods:
+# Homebrew (macOS)
+brew install llama.cpp
+
+# Or build from source - see llama.cpp docs
 ```

 ## Configuration

-llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
+llamactl works out of the box with sensible defaults.

-1. Hardcoded defaults
-2. Configuration file
-3. Environment variables
+```yaml
+server:
+  host: "0.0.0.0"                # Server host to bind to
+  port: 8080                     # Server port to bind to
+  allowed_origins: ["*"]         # Allowed CORS origins (default: all)
+  enable_swagger: false          # Enable Swagger UI for API docs
+
+instances:
+  port_range: [8000, 9000]       # Port range for instances
+  data_dir: ~/.local/share/llamactl         # Data directory (platform-specific, see below)
+  configs_dir: ~/.local/share/llamactl/instances  # Instance configs directory
+  logs_dir: ~/.local/share/llamactl/logs    # Logs directory
+  auto_create_dirs: true         # Auto-create data/config/logs dirs if missing
+  max_instances: -1              # Max instances (-1 = unlimited)
+  llama_executable: llama-server # Path to llama-server executable
+  default_auto_restart: true     # Auto-restart new instances by default
+  default_max_restarts: 3        # Max restarts for new instances
+  default_restart_delay: 5       # Restart delay (seconds) for new instances
+
+auth:
+  require_inference_auth: true   # Require auth for inference endpoints
+  inference_keys: []             # Keys for inference endpoints
+  require_management_auth: true  # Require auth for management endpoints
+  management_keys: []            # Keys for management endpoints
+```
+
+<details><summary><strong>Full Configuration Guide</strong></summary>
+
+llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:  
+
+```
+Defaults < Configuration file < Environment variables
+```

 ### Configuration Files

@@ -168,147 +210,8 @@ auth:
 - `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
 - `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys

-### Example Configuration
-
-```yaml
-server:
-  host: "0.0.0.0"
-  port: 8080
-
-instances:
-  port_range: [8001, 8100]
-  data_dir: "/var/lib/llamactl"
-  configs_dir: "/var/lib/llamactl/instances"
-  logs_dir: "/var/log/llamactl"
-  auto_create_dirs: true
-  max_instances: 10
-  llama_executable: "/usr/local/bin/llama-server"
-  default_auto_restart: true
-  default_max_restarts: 5
-  default_restart_delay: 10
-
-auth:
-  require_inference_auth: true
-  inference_keys: ["sk-inference-abc123"]
-  require_management_auth: true
-  management_keys: ["sk-management-xyz456"]
-```
-
-## Usage
-
-### Starting the Server
-
-```bash
-# Start with default configuration
-./llamactl
-
-# Start with custom config file
-LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl
-
-# Start with environment variables
-LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl
-```
-
-### Authentication
-
-llamactl supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. There are separate keys for management and inference APIs:
-
- **Management keys** grant full access to instance management
- **Inference keys** grant access to OpenAI-compatible endpoints
- Management keys also work for inference endpoints (higher privilege)
-
-**How to Use:**
-Pass your API key in requests using one of:
- `Authorization: Bearer <key>` header
- `X-API-Key: <key>` header
- `api_key=<key>` query parameter
-
-**Auto-generated keys**: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables.
-
-### Web Dashboard
-
-Open your browser and navigate to `http://localhost:8080` to access the web dashboard.
-
-### API Usage
-
-The REST API is available at `http://localhost:8080/api/v1`. See the Swagger documentation at `http://localhost:8080/swagger/` for complete API reference.
-
-#### Create an Instance
-
-```bash
-curl -X POST http://localhost:8080/api/v1/instances/my-instance \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer sk-management-your-key" \
-  -d '{
-    "model": "/path/to/model.gguf",
-    "gpu_layers": 32,
-    "auto_restart": true
-  }'
-```
-
-#### List Instances
-
-```bash
-curl -H "Authorization: Bearer sk-management-your-key" \
-  http://localhost:8080/api/v1/instances
-```
-
-#### Start/Stop Instance
-
-```bash
-# Start
-curl -X POST \
-  -H "Authorization: Bearer sk-management-your-key" \
-  http://localhost:8080/api/v1/instances/my-instance/start
-
-# Stop
-curl -X POST \
-  -H "Authorization: Bearer sk-management-your-key" \
-  http://localhost:8080/api/v1/instances/my-instance/stop
-```
-
-### OpenAI Compatible Endpoints
-
-Route requests to instances by including the instance name as the model parameter:
-
-```bash
-curl -X POST http://localhost:8080/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -H "Authorization: Bearer sk-inference-your-key" \
-  -d '{
-    "model": "my-instance",
-    "messages": [{"role": "user", "content": "Hello!"}]
-  }'
-```
-
-## Development
-
-### Running Tests
-
-```bash
-# Go tests
-go test ./...
-
-# Web UI tests
-cd webui
-npm test
-```
-
-### Development Server
-
-```bash
-# Start Go server in development mode
-go run ./cmd/server
-
-# Start web UI development server (in another terminal)
-cd webui
-npm run dev
-```
-
-## API Documentation
-
-Interactive API documentation is available at `http://localhost:8080/swagger/` when the server is running.
+</details>

 ## License

-This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
+MIT License - see [LICENSE](LICENSE) file.
--- a/pkg/backends/llamacpp/llama.go
+++ b/pkg/backends/llamacpp/llama.go
@@ -15,12 +15,12 @@ type LlamaServerOptions struct {
 	CPUMask                 string   `json:"cpu_mask,omitempty"`
 	CPURange                string   `json:"cpu_range,omitempty"`
 	CPUStrict               int      `json:"cpu_strict,omitempty"`
-	Priority                int      `json:"priority,omitempty"`
+	Prio                    int      `json:"prio,omitempty"`
 	Poll                    int      `json:"poll,omitempty"`
 	CPUMaskBatch            string   `json:"cpu_mask_batch,omitempty"`
 	CPURangeBatch           string   `json:"cpu_range_batch,omitempty"`
 	CPUStrictBatch          int      `json:"cpu_strict_batch,omitempty"`
-	PriorityBatch           int      `json:"priority_batch,omitempty"`
+	PrioBatch               int      `json:"prio_batch,omitempty"`
 	PollBatch               int      `json:"poll_batch,omitempty"`
 	CtxSize                 int      `json:"ctx_size,omitempty"`
 	Predict                 int      `json:"predict,omitempty"`
@@ -83,7 +83,7 @@ type LlamaServerOptions struct {
 	Seed               int      `json:"seed,omitempty"`
 	SamplingSeq        string   `json:"sampling_seq,omitempty"`
 	IgnoreEOS          bool     `json:"ignore_eos,omitempty"`
-	Temperature        float64  `json:"temperature,omitempty"`
+	Temperature        float64  `json:"temp,omitempty"`
 	TopK               int      `json:"top_k,omitempty"`
 	TopP               float64  `json:"top_p,omitempty"`
 	MinP               float64  `json:"min_p,omitempty"`
@@ -110,7 +110,7 @@ type LlamaServerOptions struct {
 	JSONSchema         string   `json:"json_schema,omitempty"`
 	JSONSchemaFile     string   `json:"json_schema_file,omitempty"`

-	// Server/Example-specific params
+	// Example-specific params
 	NoContextShift       bool    `json:"no_context_shift,omitempty"`
 	Special              bool    `json:"special,omitempty"`
 	NoWarmup             bool    `json:"no_warmup,omitempty"`
@@ -150,17 +150,15 @@ type LlamaServerOptions struct {
 	NoPrefillAssistant   bool    `json:"no_prefill_assistant,omitempty"`
 	SlotPromptSimilarity float64 `json:"slot_prompt_similarity,omitempty"`
 	LoraInitWithoutApply bool    `json:"lora_init_without_apply,omitempty"`
-
-	// Speculative decoding params
-	DraftMax        int     `json:"draft_max,omitempty"`
-	DraftMin        int     `json:"draft_min,omitempty"`
-	DraftPMin       float64 `json:"draft_p_min,omitempty"`
-	CtxSizeDraft    int     `json:"ctx_size_draft,omitempty"`
-	DeviceDraft     string  `json:"device_draft,omitempty"`
-	GPULayersDraft  int     `json:"gpu_layers_draft,omitempty"`
-	ModelDraft      string  `json:"model_draft,omitempty"`
-	CacheTypeKDraft string  `json:"cache_type_k_draft,omitempty"`
-	CacheTypeVDraft string  `json:"cache_type_v_draft,omitempty"`
+	DraftMax             int     `json:"draft_max,omitempty"`
+	DraftMin             int     `json:"draft_min,omitempty"`
+	DraftPMin            float64 `json:"draft_p_min,omitempty"`
+	CtxSizeDraft         int     `json:"ctx_size_draft,omitempty"`
+	DeviceDraft          string  `json:"device_draft,omitempty"`
+	GPULayersDraft       int     `json:"gpu_layers_draft,omitempty"`
+	ModelDraft           string  `json:"model_draft,omitempty"`
+	CacheTypeKDraft      string  `json:"cache_type_k_draft,omitempty"`
+	CacheTypeVDraft      string  `json:"cache_type_v_draft,omitempty"`

 	// Audio/TTS params
 	ModelVocoder      string `json:"model_vocoder,omitempty"`
@@ -199,62 +197,75 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {

 	// Handle alternative field names
 	fieldMappings := map[string]string{
-		// Official llama-server short forms from the documentation
-		"t":    "threads",                // -t, --threads N
-		"tb":   "threads_batch",          // -tb, --threads-batch N
-		"C":    "cpu_mask",               // -C, --cpu-mask M
-		"Cr":   "cpu_range",              // -Cr, --cpu-range lo-hi
-		"Cb":   "cpu_mask_batch",         // -Cb, --cpu-mask-batch M
-		"Crb":  "cpu_range_batch",        // -Crb, --cpu-range-batch lo-hi
-		"c":    "ctx_size",               // -c, --ctx-size N
-		"n":    "predict",                // -n, --predict, --n-predict N
-		"b":    "batch_size",             // -b, --batch-size N
-		"ub":   "ubatch_size",            // -ub, --ubatch-size N
-		"fa":   "flash_attn",             // -fa, --flash-attn
-		"e":    "escape",                 // -e, --escape
-		"dkvc": "dump_kv_cache",          // -dkvc, --dump-kv-cache
-		"nkvo": "no_kv_offload",          // -nkvo, --no-kv-offload
-		"ctk":  "cache_type_k",           // -ctk, --cache-type-k TYPE
-		"ctv":  "cache_type_v",           // -ctv, --cache-type-v TYPE
-		"dt":   "defrag_thold",           // -dt, --defrag-thold N
-		"np":   "parallel",               // -np, --parallel N
-		"dev":  "device",                 // -dev, --device <dev1,dev2,..>
-		"ot":   "override_tensor",        // --override-tensor, -ot
-		"ngl":  "gpu_layers",             // -ngl, --gpu-layers, --n-gpu-layers N
-		"sm":   "split_mode",             // -sm, --split-mode
-		"ts":   "tensor_split",           // -ts, --tensor-split N0,N1,N2,...
-		"mg":   "main_gpu",               // -mg, --main-gpu INDEX
-		"m":    "model",                  // -m, --model FNAME
-		"mu":   "model_url",              // -mu, --model-url MODEL_URL
-		"hf":   "hf_repo",                // -hf, -hfr, --hf-repo
-		"hfr":  "hf_repo",                // -hf, -hfr, --hf-repo
-		"hfd":  "hf_repo_draft",          // -hfd, -hfrd, --hf-repo-draft
-		"hfrd": "hf_repo_draft",          // -hfd, -hfrd, --hf-repo-draft
-		"hff":  "hf_file",                // -hff, --hf-file FILE
-		"hfv":  "hf_repo_v",              // -hfv, -hfrv, --hf-repo-v
-		"hfrv": "hf_repo_v",              // -hfv, -hfrv, --hf-repo-v
-		"hffv": "hf_file_v",              // -hffv, --hf-file-v FILE
-		"hft":  "hf_token",               // -hft, --hf-token TOKEN
-		"v":    "verbose",                // -v, --verbose, --log-verbose
-		"lv":   "verbosity",              // -lv, --verbosity, --log-verbosity N
-		"s":    "seed",                   // -s, --seed SEED
-		"temp": "temperature",            // --temp N
-		"l":    "logit_bias",             // -l, --logit-bias
-		"j":    "json_schema",            // -j, --json-schema SCHEMA
-		"jf":   "json_schema_file",       // -jf, --json-schema-file FILE
-		"sp":   "special",                // -sp, --special
-		"cb":   "cont_batching",          // -cb, --cont-batching
-		"nocb": "no_cont_batching",       // -nocb, --no-cont-batching
-		"a":    "alias",                  // -a, --alias STRING
-		"to":   "timeout",                // -to, --timeout N
-		"sps":  "slot_prompt_similarity", // -sps, --slot-prompt-similarity
-		"cd":   "ctx_size_draft",         // -cd, --ctx-size-draft N
-		"devd": "device_draft",           // -devd, --device-draft
-		"ngld": "gpu_layers_draft",       // -ngld, --gpu-layers-draft
-		"md":   "model_draft",            // -md, --model-draft FNAME
-		"ctkd": "cache_type_k_draft",     // -ctkd, --cache-type-k-draft TYPE
-		"ctvd": "cache_type_v_draft",     // -ctvd, --cache-type-v-draft TYPE
-		"mv":   "model_vocoder",          // -mv, --model-vocoder FNAME
+		// Common params
+		"t":             "threads",         // -t, --threads N
+		"tb":            "threads_batch",   // -tb, --threads-batch N
+		"C":             "cpu_mask",        // -C, --cpu-mask M
+		"Cr":            "cpu_range",       // -Cr, --cpu-range lo-hi
+		"Cb":            "cpu_mask_batch",  // -Cb, --cpu-mask-batch M
+		"Crb":           "cpu_range_batch", // -Crb, --cpu-range-batch lo-hi
+		"c":             "ctx_size",        // -c, --ctx-size N
+		"n":             "predict",         // -n, --predict N
+		"n-predict":     "predict",         // --n-predict N
+		"b":             "batch_size",      // -b, --batch-size N
+		"ub":            "ubatch_size",     // -ub, --ubatch-size N
+		"fa":            "flash_attn",      // -fa, --flash-attn
+		"e":             "escape",          // -e, --escape
+		"dkvc":          "dump_kv_cache",   // -dkvc, --dump-kv-cache
+		"nkvo":          "no_kv_offload",   // -nkvo, --no-kv-offload
+		"ctk":           "cache_type_k",    // -ctk, --cache-type-k TYPE
+		"ctv":           "cache_type_v",    // -ctv, --cache-type-v TYPE
+		"dt":            "defrag_thold",    // -dt, --defrag-thold N
+		"np":            "parallel",        // -np, --parallel N
+		"dev":           "device",          // -dev, --device <dev1,dev2,..>
+		"ot":            "override_tensor", // --override-tensor, -ot
+		"ngl":           "gpu_layers",      // -ngl, --gpu-layers, --n-gpu-layers N
+		"n-gpu-layers":  "gpu_layers",      // --n-gpu-layers N
+		"sm":            "split_mode",      // -sm, --split-mode
+		"ts":            "tensor_split",    // -ts, --tensor-split N0,N1,N2,...
+		"mg":            "main_gpu",        // -mg, --main-gpu INDEX
+		"m":             "model",           // -m, --model FNAME
+		"mu":            "model_url",       // -mu, --model-url MODEL_URL
+		"hf":            "hf_repo",         // -hf, -hfr, --hf-repo
+		"hfr":           "hf_repo",         // -hf, -hfr, --hf-repo
+		"hfd":           "hf_repo_draft",   // -hfd, -hfrd, --hf-repo-draft
+		"hfrd":          "hf_repo_draft",   // -hfd, -hfrd, --hf-repo-draft
+		"hff":           "hf_file",         // -hff, --hf-file FILE
+		"hfv":           "hf_repo_v",       // -hfv, -hfrv, --hf-repo-v
+		"hfrv":          "hf_repo_v",       // -hfv, -hfrv, --hf-repo-v
+		"hffv":          "hf_file_v",       // -hffv, --hf-file-v FILE
+		"hft":           "hf_token",        // -hft, --hf-token TOKEN
+		"v":             "verbose",         // -v, --verbose, --log-verbose
+		"log-verbose":   "verbose",         // --log-verbose
+		"lv":            "verbosity",       // -lv, --verbosity, --log-verbosity N
+		"log-verbosity": "verbosity",       // --log-verbosity N
+
+		// Sampling params
+		"s":  "seed",             // -s, --seed SEED
+		"l":  "logit_bias",       // -l, --logit-bias
+		"j":  "json_schema",      // -j, --json-schema SCHEMA
+		"jf": "json_schema_file", // -jf, --json-schema-file FILE
+
+		// Example-specific params
+		"sp":                 "special",                // -sp, --special
+		"cb":                 "cont_batching",          // -cb, --cont-batching
+		"nocb":               "no_cont_batching",       // -nocb, --no-cont-batching
+		"a":                  "alias",                  // -a, --alias STRING
+		"embeddings":         "embedding",              // --embeddings
+		"rerank":             "reranking",              // --reranking
+		"to":                 "timeout",                // -to, --timeout N
+		"sps":                "slot_prompt_similarity", // -sps, --slot-prompt-similarity
+		"draft":              "draft-max",              // -draft, --draft-max N
+		"draft-n":            "draft-max",              // --draft-n-max N
+		"draft-n-min":        "draft_min",              // --draft-n-min N
+		"cd":                 "ctx_size_draft",         // -cd, --ctx-size-draft N
+		"devd":               "device_draft",           // -devd, --device-draft
+		"ngld":               "gpu_layers_draft",       // -ngld, --gpu-layers-draft
+		"n-gpu-layers-draft": "gpu_layers_draft",       // --n-gpu-layers-draft N
+		"md":                 "model_draft",            // -md, --model-draft FNAME
+		"ctkd":               "cache_type_k_draft",     // -ctkd, --cache-type-k-draft TYPE
+		"ctvd":               "cache_type_v_draft",     // -ctvd, --cache-type-v-draft TYPE
+		"mv":                 "model_vocoder",          // -mv, --model-vocoder FNAME
 	}

 	// Process alternative field names
--- a/pkg/backends/llamacpp/llama_test.go
+++ b/pkg/backends/llamacpp/llama_test.go
@@ -109,13 +109,13 @@ func TestBuildCommandArgs_NumericFields(t *testing.T) {
 	args := options.BuildCommandArgs()

 	expectedPairs := map[string]string{
-		"--port":        "8080",
-		"--threads":     "4",
-		"--ctx-size":    "2048",
-		"--gpu-layers":  "16",
-		"--temperature": "0.7",
-		"--top-k":       "40",
-		"--top-p":       "0.9",
+		"--port":       "8080",
+		"--threads":    "4",
+		"--ctx-size":   "2048",
+		"--gpu-layers": "16",
+		"--temp":       "0.7",
+		"--top-k":      "40",
+		"--top-p":      "0.9",
 	}

 	for flag, expectedValue := range expectedPairs {
@@ -231,7 +231,7 @@ func TestUnmarshalJSON_StandardFields(t *testing.T) {
 		"verbose": true,
 		"ctx_size": 4096,
 		"gpu_layers": 32,
-		"temperature": 0.7
+		"temp": 0.7
 	}`

 	var options llamacpp.LlamaServerOptions
--- a/webui/src/components/LogDialog.tsx
+++ b/webui/src/components/LogDialog.tsx
@@ -11,6 +11,7 @@ import {
  DialogTitle,
 } from '@/components/ui/dialog'
 import { Badge } from '@/components/ui/badge'
+import { instancesApi } from '@/lib/api'
 import { 
  RefreshCw, 
  Download, 
@@ -46,48 +47,44 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
  const refreshIntervalRef = useRef<NodeJS.Timeout | null>(null)

  // Fetch logs function
-  const fetchLogs = async (lines?: number) => {
-    if (!instanceName) return
-    
-    setLoading(true)
-    setError(null)
-    
-    try {
-      const params = lines ? `?lines=${lines}` : ''
-      const response = await fetch(`/api/v1/instances/${instanceName}/logs${params}`)
+  const fetchLogs = React.useCallback(
+    async (lines?: number) => {
+      if (!instanceName) return
      
-      if (!response.ok) {
-        throw new Error(`Failed to fetch logs: ${response.status}`)
+      setLoading(true)
+      setError(null)
+      
+      try {
+        const logText = await instancesApi.getLogs(instanceName, lines)
+        setLogs(logText)
+        
+        // Auto-scroll to bottom
+        setTimeout(() => {
+          if (logContainerRef.current) {
+            logContainerRef.current.scrollTop = logContainerRef.current.scrollHeight
+          }
+        }, 100)
+      } catch (err) {
+        setError(err instanceof Error ? err.message : 'Failed to fetch logs')
+      } finally {
+        setLoading(false)
      }
-      
-      const logText = await response.text()
-      setLogs(logText)
-      
-      // Auto-scroll to bottom
-      setTimeout(() => {
-        if (logContainerRef.current) {
-          logContainerRef.current.scrollTop = logContainerRef.current.scrollHeight
-        }
-      }, 100)
-    } catch (err) {
-      setError(err instanceof Error ? err.message : 'Failed to fetch logs')
-    } finally {
-      setLoading(false)
-    }
-  }
+    },
+    [instanceName]
+  )

  // Initial load when dialog opens
  useEffect(() => {
    if (open && instanceName) {
-      fetchLogs(lineCount)
+      void fetchLogs(lineCount)
    }
-  }, [open, instanceName])
+  }, [open, instanceName, fetchLogs, lineCount])

  // Auto-refresh effect
  useEffect(() => {
    if (autoRefresh && isRunning && open) {
      refreshIntervalRef.current = setInterval(() => {
-        fetchLogs(lineCount)
+        void fetchLogs(lineCount)
      }, 2000) // Refresh every 2 seconds
    } else {
      if (refreshIntervalRef.current) {
@@ -101,7 +98,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
        clearInterval(refreshIntervalRef.current)
      }
    }
-  }, [autoRefresh, isRunning, open, lineCount])
+  }, [autoRefresh, isRunning, open, lineCount, fetchLogs])

  // Copy logs to clipboard
  const copyLogs = async () => {
@@ -135,7 +132,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({

  // Apply new line count
  const applyLineCount = () => {
-    fetchLogs(lineCount)
+    void fetchLogs(lineCount)
    setShowSettings(false)
  }

@@ -198,7 +195,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
              <Button
                variant="outline"
                size="sm"
-                onClick={() => fetchLogs(lineCount)}
+                onClick={() => void fetchLogs(lineCount)}
                disabled={loading}
              >
                {loading ? (
@@ -290,7 +287,7 @@ const LogsDialog: React.FC<LogsDialogProps> = ({
          <div className="flex items-center gap-2 w-full">
            <Button
              variant="outline"
-              onClick={copyLogs}
+              onClick={() => void copyLogs()}
              disabled={!logs}
            >
              {copied ? (
--- a/webui/src/components/ZodFormField.tsx
+++ b/webui/src/components/ZodFormField.tsx
@@ -7,8 +7,8 @@ import { getFieldType, basicFieldsConfig } from '@/lib/zodFormUtils'

 interface ZodFormFieldProps {
  fieldKey: keyof CreateInstanceOptions
-  value: any
-  onChange: (key: keyof CreateInstanceOptions, value: any) => void
+  value: string | number | boolean | string[] | undefined
+  onChange: (key: keyof CreateInstanceOptions, value: string | number | boolean | string[] | undefined) => void
 }

 const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => {
@@ -18,7 +18,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
  // Get type from Zod schema
  const fieldType = getFieldType(fieldKey)

-  const handleChange = (newValue: any) => {
+  const handleChange = (newValue: string | number | boolean | string[] | undefined) => {
    onChange(fieldKey, newValue)
  }

@@ -29,7 +29,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
          <div className="flex items-center space-x-2">
            <Checkbox
              id={fieldKey}
-              checked={value || false}
+              checked={typeof value === 'boolean' ? value : false}
              onCheckedChange={(checked) => handleChange(checked)}
            />
            <Label htmlFor={fieldKey} className="text-sm font-normal">
@@ -51,10 +51,14 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
            <Input
              id={fieldKey}
              type="number"
-              value={value || ''}
+              step="any" // This allows decimal numbers
+              value={typeof value === 'string' || typeof value === 'number' ? value : ''}
              onChange={(e) => {
                const numValue = e.target.value ? parseFloat(e.target.value) : undefined
-                handleChange(numValue)
+                // Only update if the parsed value is valid or the input is empty
+                if (e.target.value === '' || (numValue !== undefined && !isNaN(numValue))) {
+                  handleChange(numValue)
+                }
              }}
              placeholder={config.placeholder}
            />
@@ -101,7 +105,7 @@ const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }
            <Input
              id={fieldKey}
              type="text"
-              value={value || ''}
+              value={typeof value === 'string' || typeof value === 'number' ? value : ''}
              onChange={(e) => handleChange(e.target.value || undefined)}
              placeholder={config.placeholder}
            />
--- a/webui/src/lib/zodFormUtils.ts
+++ b/webui/src/lib/zodFormUtils.ts
@@ -1,5 +1,4 @@
-import type { CreateInstanceOptions} from '@/schemas/instanceOptions';
-import { getAllFieldKeys } from '@/schemas/instanceOptions'
+import { type CreateInstanceOptions, getAllFieldKeys } from '@/schemas/instanceOptions'

 // Only define the basic fields we want to show by default
 export const basicFieldsConfig: Record<string, { 
--- a/webui/src/schemas/instanceOptions.ts
+++ b/webui/src/schemas/instanceOptions.ts
@@ -14,12 +14,12 @@ export const CreateInstanceOptionsSchema = z.object({
  cpu_mask: z.string().optional(),
  cpu_range: z.string().optional(),
  cpu_strict: z.number().optional(),
-  priority: z.number().optional(),
+  prio: z.number().optional(),
  poll: z.number().optional(),
  cpu_mask_batch: z.string().optional(),
  cpu_range_batch: z.string().optional(),
  cpu_strict_batch: z.number().optional(),
-  priority_batch: z.number().optional(),
+  prio_batch: z.number().optional(),
  poll_batch: z.number().optional(),
  ctx_size: z.number().optional(),
  predict: z.number().optional(),
@@ -82,7 +82,7 @@ export const CreateInstanceOptionsSchema = z.object({
  seed: z.number().optional(),
  sampling_seq: z.string().optional(),
  ignore_eos: z.boolean().optional(),
-  temperature: z.number().optional(),
+  temp: z.number().optional(),
  top_k: z.number().optional(),
  top_p: z.number().optional(),
  min_p: z.number().optional(),
@@ -109,7 +109,7 @@ export const CreateInstanceOptionsSchema = z.object({
  json_schema: z.string().optional(),
  json_schema_file: z.string().optional(),

-  // Server/Example-specific params
+  // Example-specific params
  no_context_shift: z.boolean().optional(),
  special: z.boolean().optional(),
  no_warmup: z.boolean().optional(),
@@ -149,8 +149,6 @@ export const CreateInstanceOptionsSchema = z.object({
  no_prefill_assistant: z.boolean().optional(),
  slot_prompt_similarity: z.number().optional(),
  lora_init_without_apply: z.boolean().optional(),
-
-  // Speculative decoding params
  draft_max: z.number().optional(),
  draft_min: z.number().optional(),
  draft_p_min: z.number().optional(),
Author	SHA1	Message	Date
Matúš Námešný	5aed01b68f	Merge pull request #17 from lordmathis/fix/forbidden-logs fix: Refactor log fetching to use instancesApi	2025-08-06 19:12:34 +02:00
LordMathis	3f9caff33b	Refactor log fetching to use instancesApi	2025-08-06 19:07:25 +02:00
Matúš Námešný	169254c61a	Merge pull request #16 from lordmathis/fix/llama-server-options fix: Missing or wrong llama server options	2025-08-06 18:51:18 +02:00
LordMathis	8154b8d0ab	Fix temp in tests	2025-08-06 18:49:36 +02:00
LordMathis	a26d853ad5	Fix missing or wrong llama server options on frontend	2025-08-06 18:40:05 +02:00
LordMathis	6203b64045	Fix missing or wrong llama server options	2025-08-06 18:31:17 +02:00
Matúš Námešný	8d9c808be1	Merge pull request #14 from lordmathis/docs/readme-updates docs: Update README.md to improve project description	2025-08-05 21:32:20 +02:00
LordMathis	161cd213c5	Update README.md to enhance project description and installation instructions	2025-08-05 21:20:37 +02:00
Matúš Námešný	d6e84f0527	Merge pull request #13 from lordmathis/fix/decimal-input fix: Allow decimal input for numeric fields in instance configuration	2025-08-05 20:03:31 +02:00
LordMathis	0846350d41	Fix eslint issues in ZodFormField	2025-08-05 19:21:09 +02:00
LordMathis	dacaca8594	Fix number input handling to allow decimal values	2025-08-05 19:15:12 +02:00