diff --git a/dev/__pycache__/fix_line_endings.cpython-311.pyc b/dev/__pycache__/fix_line_endings.cpython-311.pyc new file mode 100644 index 0000000..bfd59fb Binary files /dev/null and b/dev/__pycache__/fix_line_endings.cpython-311.pyc differ diff --git a/dev/__pycache__/readme_sync.cpython-311.pyc b/dev/__pycache__/readme_sync.cpython-311.pyc index 9f8937b..3c84a60 100644 Binary files a/dev/__pycache__/readme_sync.cpython-311.pyc and b/dev/__pycache__/readme_sync.cpython-311.pyc differ diff --git a/dev/fix_line_endings.py b/dev/fix_line_endings.py new file mode 100644 index 0000000..9555e6b --- /dev/null +++ b/dev/fix_line_endings.py @@ -0,0 +1,60 @@ +""" +MkDocs hook to fix line endings for proper rendering. +Automatically adds two spaces at the end of lines that need line breaks. +""" +import re + + +def on_page_markdown(markdown, page, config, **kwargs): + """ + Fix line endings in markdown content for proper MkDocs rendering. + Adds two spaces at the end of lines that need line breaks. + """ + lines = markdown.split('\n') + processed_lines = [] + in_code_block = False + + for i, line in enumerate(lines): + stripped = line.strip() + + # Track code blocks + if stripped.startswith('```'): + in_code_block = not in_code_block + processed_lines.append(line) + continue + + # Skip processing inside code blocks + if in_code_block: + processed_lines.append(line) + continue + + # Skip empty lines + if not stripped: + processed_lines.append(line) + continue + + # Skip lines that shouldn't have line breaks: + # - Headers (# ## ###) + # - Blockquotes (>) + # - Table rows (|) + # - Lines already ending with two spaces + # - YAML front matter and HTML tags + # - Standalone punctuation lines + if (stripped.startswith('#') or + stripped.startswith('>') or + '|' in stripped or + line.endswith(' ') or + stripped.startswith('---') or + stripped.startswith('<') or + stripped.endswith('>') or + stripped in ('.', '!', '?', ':', ';', '```', '---', ',')): + processed_lines.append(line) + continue + + # Add two spaces to lines that end with regular text or most punctuation + if stripped and not in_code_block: + processed_lines.append(line.rstrip() + ' ') + else: + processed_lines.append(line) + + return '\n'.join(processed_lines) \ No newline at end of file diff --git a/dev/getting-started/configuration/index.html b/dev/getting-started/configuration/index.html index 94adbb7..2e50cce 100644 --- a/dev/getting-started/configuration/index.html +++ b/dev/getting-started/configuration/index.html @@ -838,12 +838,12 @@

Configuration

-

llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:

+

llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:

Defaults < Configuration file < Environment variables
 
-

llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.

+

llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.

Default Configuration

-

Here's the default configuration with all available options:

+

Here's the default configuration with all available options:

server:
   host: "0.0.0.0"                # Server host to bind to
   port: 8080                     # Server port to bind to
@@ -908,7 +908,7 @@
 

Configuration Files

Configuration File Locations

-

Configuration files are searched in the following locations (in order of precedence):

+

Configuration files are searched in the following locations (in order of precedence):

Linux:
- ./llamactl.yaml or ./config.yaml (current directory)
- $HOME/.config/llamactl/config.yaml
@@ -922,7 +922,7 @@ - %APPDATA%\llamactl\config.yaml
- %USERPROFILE%\llamactl\config.yaml
- %PROGRAMDATA%\llamactl\config.yaml

-

You can specify the path to config file with LLAMACTL_CONFIG_PATH environment variable.

+

You can specify the path to config file with LLAMACTL_CONFIG_PATH environment variable.

Configuration Options

Server Configuration

server:
@@ -932,11 +932,11 @@
   allowed_headers: ["*"]  # CORS allowed headers (default: ["*"])
   enable_swagger: false   # Enable Swagger UI (default: false)
 
-

Environment Variables: -- LLAMACTL_HOST - Server host -- LLAMACTL_PORT - Server port -- LLAMACTL_ALLOWED_ORIGINS - Comma-separated CORS origins -- LLAMACTL_ENABLE_SWAGGER - Enable Swagger UI (true/false)

+

Environment Variables:
+- LLAMACTL_HOST - Server host
+- LLAMACTL_PORT - Server port
+- LLAMACTL_ALLOWED_ORIGINS - Comma-separated CORS origins
+- LLAMACTL_ENABLE_SWAGGER - Enable Swagger UI (true/false)

Backend Configuration

backends:
   llama-cpp:
@@ -968,43 +968,43 @@
     # MLX does not support Docker
     response_headers: {}         # Additional response headers to send with responses
 
-

Backend Configuration Fields: -- command: Executable name/path for the backend -- args: Default arguments prepended to all instances -- environment: Environment variables for the backend process (optional) -- response_headers: Additional response headers to send with responses (optional) -- docker: Docker-specific configuration (optional) - - enabled: Boolean flag to enable Docker runtime - - image: Docker image to use - - args: Additional arguments passed to docker run - - environment: Environment variables for the container (optional)

+

Backend Configuration Fields:
+- command: Executable name/path for the backend
+- args: Default arguments prepended to all instances
+- environment: Environment variables for the backend process (optional)
+- response_headers: Additional response headers to send with responses (optional)
+- docker: Docker-specific configuration (optional)
+ - enabled: Boolean flag to enable Docker runtime
+ - image: Docker image to use
+ - args: Additional arguments passed to docker run
+ - environment: Environment variables for the container (optional)

If llamactl is behind an NGINX proxy, X-Accel-Buffering: no response header may be required for NGINX to properly stream the responses without buffering.

-

Environment Variables:

-

LlamaCpp Backend: -- LLAMACTL_LLAMACPP_COMMAND - LlamaCpp executable command -- LLAMACTL_LLAMACPP_ARGS - Space-separated default arguments -- LLAMACTL_LLAMACPP_ENV - Environment variables in format "KEY1=value1,KEY2=value2" -- LLAMACTL_LLAMACPP_DOCKER_ENABLED - Enable Docker runtime (true/false) -- LLAMACTL_LLAMACPP_DOCKER_IMAGE - Docker image to use -- LLAMACTL_LLAMACPP_DOCKER_ARGS - Space-separated Docker arguments -- LLAMACTL_LLAMACPP_DOCKER_ENV - Docker environment variables in format "KEY1=value1,KEY2=value2" -- LLAMACTL_LLAMACPP_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

-

VLLM Backend: -- LLAMACTL_VLLM_COMMAND - VLLM executable command -- LLAMACTL_VLLM_ARGS - Space-separated default arguments -- LLAMACTL_VLLM_ENV - Environment variables in format "KEY1=value1,KEY2=value2" -- LLAMACTL_VLLM_DOCKER_ENABLED - Enable Docker runtime (true/false) -- LLAMACTL_VLLM_DOCKER_IMAGE - Docker image to use -- LLAMACTL_VLLM_DOCKER_ARGS - Space-separated Docker arguments -- LLAMACTL_VLLM_DOCKER_ENV - Docker environment variables in format "KEY1=value1,KEY2=value2" -- LLAMACTL_VLLM_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

-

MLX Backend: -- LLAMACTL_MLX_COMMAND - MLX executable command -- LLAMACTL_MLX_ARGS - Space-separated default arguments -- LLAMACTL_MLX_ENV - Environment variables in format "KEY1=value1,KEY2=value2" -- LLAMACTL_MLX_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

+

Environment Variables:

+

LlamaCpp Backend:
+- LLAMACTL_LLAMACPP_COMMAND - LlamaCpp executable command
+- LLAMACTL_LLAMACPP_ARGS - Space-separated default arguments
+- LLAMACTL_LLAMACPP_ENV - Environment variables in format "KEY1=value1,KEY2=value2"
+- LLAMACTL_LLAMACPP_DOCKER_ENABLED - Enable Docker runtime (true/false)
+- LLAMACTL_LLAMACPP_DOCKER_IMAGE - Docker image to use
+- LLAMACTL_LLAMACPP_DOCKER_ARGS - Space-separated Docker arguments
+- LLAMACTL_LLAMACPP_DOCKER_ENV - Docker environment variables in format "KEY1=value1,KEY2=value2"
+- LLAMACTL_LLAMACPP_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

+

VLLM Backend:
+- LLAMACTL_VLLM_COMMAND - VLLM executable command
+- LLAMACTL_VLLM_ARGS - Space-separated default arguments
+- LLAMACTL_VLLM_ENV - Environment variables in format "KEY1=value1,KEY2=value2"
+- LLAMACTL_VLLM_DOCKER_ENABLED - Enable Docker runtime (true/false)
+- LLAMACTL_VLLM_DOCKER_IMAGE - Docker image to use
+- LLAMACTL_VLLM_DOCKER_ARGS - Space-separated Docker arguments
+- LLAMACTL_VLLM_DOCKER_ENV - Docker environment variables in format "KEY1=value1,KEY2=value2"
+- LLAMACTL_VLLM_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

+

MLX Backend:
+- LLAMACTL_MLX_COMMAND - MLX executable command
+- LLAMACTL_MLX_ARGS - Space-separated default arguments
+- LLAMACTL_MLX_ENV - Environment variables in format "KEY1=value1,KEY2=value2"
+- LLAMACTL_MLX_RESPONSE_HEADERS - Response headers in format "KEY1=value1;KEY2=value2"

Instance Configuration

instances:
   port_range: [8000, 9000]                          # Port range for instances (default: [8000, 9000])
@@ -1029,8 +1029,8 @@
 - LLAMACTL_LOGS_DIR - Log directory path
- LLAMACTL_AUTO_CREATE_DATA_DIR - Auto-create data/config/logs directories (true/false)
- LLAMACTL_MAX_INSTANCES - Maximum number of instances
-- LLAMACTL_MAX_RUNNING_INSTANCES - Maximum number of running instances -- LLAMACTL_ENABLE_LRU_EVICTION - Enable LRU eviction for idle instances +- LLAMACTL_MAX_RUNNING_INSTANCES - Maximum number of running instances
+- LLAMACTL_ENABLE_LRU_EVICTION - Enable LRU eviction for idle instances
- LLAMACTL_DEFAULT_AUTO_RESTART - Default auto-restart setting (true/false)
- LLAMACTL_DEFAULT_MAX_RESTARTS - Default maximum restarts
- LLAMACTL_DEFAULT_RESTART_DELAY - Default restart delay in seconds
@@ -1044,13 +1044,13 @@ require_management_auth: true # Require API key for management endpoints (default: true) management_keys: [] # List of valid management API keys
-

Environment Variables: -- LLAMACTL_REQUIRE_INFERENCE_AUTH - Require auth for OpenAI endpoints (true/false) -- LLAMACTL_INFERENCE_KEYS - Comma-separated inference API keys -- LLAMACTL_REQUIRE_MANAGEMENT_AUTH - Require auth for management endpoints (true/false) -- LLAMACTL_MANAGEMENT_KEYS - Comma-separated management API keys

+

Environment Variables:
+- LLAMACTL_REQUIRE_INFERENCE_AUTH - Require auth for OpenAI endpoints (true/false)
+- LLAMACTL_INFERENCE_KEYS - Comma-separated inference API keys
+- LLAMACTL_REQUIRE_MANAGEMENT_AUTH - Require auth for management endpoints (true/false)
+- LLAMACTL_MANAGEMENT_KEYS - Comma-separated management API keys

Remote Node Configuration

-

llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally.

+

llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally.

local_node: "main"               # Name of the local node (default: "main")
 nodes:                           # Node configuration map
   main:                          # Local node (empty address means local)
@@ -1060,13 +1060,13 @@
     address: "http://192.168.1.10:8080"
     api_key: "worker1-api-key"   # Management API key for authentication
 
-

Node Configuration Fields: -- local_node: Specifies which node in the nodes map represents the local node -- nodes: Map of node configurations - - address: HTTP/HTTPS URL of the remote node (empty for local node) - - api_key: Management API key for authenticating with the remote node

-

Environment Variables: -- LLAMACTL_LOCAL_NODE - Name of the local node

+

Node Configuration Fields:
+- local_node: Specifies which node in the nodes map represents the local node
+- nodes: Map of node configurations
+ - address: HTTP/HTTPS URL of the remote node (empty for local node)
+ - api_key: Management API key for authenticating with the remote node

+

Environment Variables:
+- LLAMACTL_LOCAL_NODE - Name of the local node

diff --git a/dev/getting-started/installation/index.html b/dev/getting-started/installation/index.html index 4c0c80e..4c03c7c 100644 --- a/dev/getting-started/installation/index.html +++ b/dev/getting-started/installation/index.html @@ -886,20 +886,20 @@

Installation

-

This guide will walk you through installing Llamactl on your system.

+

This guide will walk you through installing Llamactl on your system.

Prerequisites

Backend Dependencies

-

llamactl supports multiple backends. Install at least one:

-

For llama.cpp backend (all platforms):

-

You need llama-server from llama.cpp installed:

+

llamactl supports multiple backends. Install at least one:

+

For llama.cpp backend (all platforms):

+

You need llama-server from llama.cpp installed:

# Homebrew (macOS/Linux)
 brew install llama.cpp
 # Winget (Windows)
 winget install llama.cpp
 
-

Or build from source - see llama.cpp docs

-

For MLX backend (macOS only):

-

MLX provides optimized inference on Apple Silicon. Install MLX-LM:

+

Or build from source - see llama.cpp docs

+

For MLX backend (macOS only):

+

MLX provides optimized inference on Apple Silicon. Install MLX-LM:

# Install via pip (requires Python 3.8+)
 pip install mlx-lm
 
@@ -908,9 +908,9 @@
 source mlx-env/bin/activate
 pip install mlx-lm
 
-

Note: MLX backend is only available on macOS with Apple Silicon (M1, M2, M3, etc.)

-

For vLLM backend:

-

vLLM provides high-throughput distributed serving for LLMs. Install vLLM:

+

Note: MLX backend is only available on macOS with Apple Silicon (M1, M2, M3, etc.)

+

For vLLM backend:

+

vLLM provides high-throughput distributed serving for LLMs. Install vLLM:

# Install via pip (requires Python 3.8+, GPU required)
 pip install vllm
 
@@ -923,7 +923,7 @@
 

Installation Methods

-

Download the latest release from the GitHub releases page:

+

Download the latest release from the GitHub releases page:

# Linux/macOS - Get latest version and download
 LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
 curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
@@ -935,12 +935,12 @@
 # Windows - Download from releases page
 

Option 2: Docker

-

llamactl provides Dockerfiles for creating Docker images with backends pre-installed. The resulting images include the latest llamactl release with the respective backend.

-

Available Dockerfiles (CUDA): -- llamactl with llama.cpp CUDA: docker/Dockerfile.llamacpp (based on ghcr.io/ggml-org/llama.cpp:server-cuda) -- llamactl with vLLM CUDA: docker/Dockerfile.vllm (based on vllm/vllm-openai:latest) -- llamactl built from source: docker/Dockerfile.source (multi-stage build with webui)

-

Note: These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at llama.cpp Docker docs. For vLLM, check vLLM docs.

+

llamactl provides Dockerfiles for creating Docker images with backends pre-installed. The resulting images include the latest llamactl release with the respective backend.

+

Available Dockerfiles (CUDA):
+- llamactl with llama.cpp CUDA: docker/Dockerfile.llamacpp (based on ghcr.io/ggml-org/llama.cpp:server-cuda)
+- llamactl with vLLM CUDA: docker/Dockerfile.vllm (based on vllm/vllm-openai:latest)
+- llamactl built from source: docker/Dockerfile.source (multi-stage build with webui)

+

Note: These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at llama.cpp Docker docs. For vLLM, check vLLM docs.

Using Docker Compose

# Clone the repository
 git clone https://github.com/lordmathis/llamactl.git
@@ -955,11 +955,11 @@
 # Or start llamactl with vLLM backend
 docker-compose -f docker/docker-compose.yml up llamactl-vllm -d
 
-

Access the dashboard at: -- llamactl with llama.cpp: http://localhost:8080 -- llamactl with vLLM: http://localhost:8081

+

Access the dashboard at:
+- llamactl with llama.cpp: http://localhost:8080
+- llamactl with vLLM: http://localhost:8081

Using Docker Build and Run

-

llamactl with llama.cpp CUDA: +

llamactl with llama.cpp CUDA:

docker build -f docker/Dockerfile.llamacpp -t llamactl:llamacpp-cuda .
 docker run -d \
   --name llamactl-llamacpp \
@@ -968,7 +968,7 @@
   -v ~/.cache/llama.cpp:/root/.cache/llama.cpp \
   llamactl:llamacpp-cuda
 

-

llamactl with vLLM CUDA: +

llamactl with vLLM CUDA:

docker build -f docker/Dockerfile.vllm -t llamactl:vllm-cuda .
 docker run -d \
   --name llamactl-vllm \
@@ -977,7 +977,7 @@
   -v ~/.cache/huggingface:/root/.cache/huggingface \
   llamactl:vllm-cuda
 

-

llamactl built from source: +

llamactl built from source:

docker build -f docker/Dockerfile.source -t llamactl:source .
 docker run -d \
   --name llamactl \
@@ -985,11 +985,11 @@
   llamactl:source
 

Option 3: Build from Source

-

Requirements: -- Go 1.24 or later -- Node.js 22 or later -- Git

-

If you prefer to build from source:

+

Requirements:
+- Go 1.24 or later
+- Node.js 22 or later
+- Git

+

If you prefer to build from source:

# Clone the repository
 git clone https://github.com/lordmathis/llamactl.git
 cd llamactl
@@ -1001,16 +1001,16 @@
 go build -o llamactl ./cmd/server
 

Remote Node Installation

-

For deployments with remote nodes: -- Install llamactl on each node using any of the methods above -- Configure API keys for authentication between nodes

+

For deployments with remote nodes:
+- Install llamactl on each node using any of the methods above
+- Configure API keys for authentication between nodes

Verification

-

Verify your installation by checking the version:

+

Verify your installation by checking the version:

llamactl --version
 

Next Steps

-

Now that Llamactl is installed, continue to the Quick Start guide to get your first instance running!

-

For remote node deployments, see the Configuration Guide for node setup instructions.

+

Now that Llamactl is installed, continue to the Quick Start guide to get your first instance running!

+

For remote node deployments, see the Configuration Guide for node setup instructions.

diff --git a/dev/getting-started/quick-start/index.html b/dev/getting-started/quick-start/index.html index 8c4f988..5b90891 100644 --- a/dev/getting-started/quick-start/index.html +++ b/dev/getting-started/quick-start/index.html @@ -880,43 +880,43 @@

Quick Start

-

This guide will help you get Llamactl up and running in just a few minutes.

+

This guide will help you get Llamactl up and running in just a few minutes.

Step 1: Start Llamactl

-

Start the Llamactl server:

+

Start the Llamactl server:

llamactl
 
-

By default, Llamactl will start on http://localhost:8080.

+

By default, Llamactl will start on http://localhost:8080.

Step 2: Access the Web UI

-

Open your web browser and navigate to:

+

Open your web browser and navigate to:

http://localhost:8080
 
-

Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.

-

You should see the Llamactl web interface.

+

Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.

+

You should see the Llamactl web interface.

Step 3: Create Your First Instance

    -
  1. Click the "Add Instance" button
  2. -
  3. Fill in the instance configuration:
  4. -
  5. Name: Give your instance a descriptive name
  6. -
  7. Backend Type: Choose from llama.cpp, MLX, or vLLM
  8. -
  9. Model: Model path or identifier for your chosen backend
  10. +
  11. Click the "Add Instance" button
  12. +
  13. Fill in the instance configuration:
  14. +
  15. Name: Give your instance a descriptive name
  16. +
  17. Backend Type: Choose from llama.cpp, MLX, or vLLM
  18. +
  19. Model: Model path or identifier for your chosen backend
  20. -

    Additional Options: Backend-specific parameters

    +

    Additional Options: Backend-specific parameters

  21. -

    Click "Create Instance"

    +

    Click "Create Instance"

Step 4: Start Your Instance

-

Once created, you can:

+

Once created, you can:

Example Configurations

-

Here are basic example configurations for each backend:

-

llama.cpp backend: +

Here are basic example configurations for each backend:

+

llama.cpp backend:

{
   "name": "llama2-7b",
   "backend_type": "llama_cpp",
@@ -928,7 +928,7 @@
   }
 }
 

-

MLX backend (macOS only): +

MLX backend (macOS only):

{
   "name": "mistral-mlx",
   "backend_type": "mlx_lm",
@@ -939,7 +939,7 @@
   }
 }
 

-

vLLM backend: +

vLLM backend:

{
   "name": "dialogpt-vllm",
   "backend_type": "vllm",
@@ -951,7 +951,7 @@
 }
 

Docker Support

-

Llamactl can run backends in Docker containers. To enable Docker for a backend, add a docker section to that backend in your YAML configuration file (e.g. config.yaml) as shown below:

+

Llamactl can run backends in Docker containers. To enable Docker for a backend, add a docker section to that backend in your YAML configuration file (e.g. config.yaml) as shown below:

backends:
   vllm:
     command: "vllm"
@@ -962,7 +962,7 @@
       args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
 

Using the API

-

You can also manage instances via the REST API:

+

You can also manage instances via the REST API:

# List all instances
 curl http://localhost:8080/api/instances
 
@@ -980,9 +980,9 @@
 curl -X POST http://localhost:8080/api/instances/my-model/start
 

OpenAI Compatible API

-

Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.

+

Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.

Chat Completions

-

Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:

+

Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:

curl -X POST http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
@@ -998,7 +998,7 @@
   }'
 

Using with Python OpenAI Client

-

You can also use the official OpenAI Python client:

+

You can also use the official OpenAI Python client:

from openai import OpenAI
 
 # Point the client to your Llamactl server
@@ -1020,14 +1020,14 @@
 print(response.choices[0].message.content)
 

List Available Models

-

Get a list of running instances (models) in OpenAI-compatible format:

+

Get a list of running instances (models) in OpenAI-compatible format:

curl http://localhost:8080/v1/models
 

Next Steps

diff --git a/dev/index.html b/dev/index.html index 6c1c1a6..8a0380b 100644 --- a/dev/index.html +++ b/dev/index.html @@ -843,9 +843,9 @@

Llamactl Documentation

Welcome to the Llamactl documentation!

-

Dashboard Screenshot

+

Dashboard Screenshot

What is Llamactl?

-

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

+

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

Features

🚀 Easy Model Management