mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 09:04:27 +00:00
Deployed cf20f30 to dev with MkDocs 1.5.3 and mike 2.0.0
This commit is contained in:
@@ -838,12 +838,12 @@
|
||||
|
||||
|
||||
<h1 id="configuration">Configuration<a class="headerlink" href="#configuration" title="Permanent link">¶</a></h1>
|
||||
<p>llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:</p>
|
||||
<p>llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>Defaults < Configuration file < Environment variables
|
||||
</code></pre></div>
|
||||
<p>llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.</p>
|
||||
<p>llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs. </p>
|
||||
<h2 id="default-configuration">Default Configuration<a class="headerlink" href="#default-configuration" title="Permanent link">¶</a></h2>
|
||||
<p>Here's the default configuration with all available options:</p>
|
||||
<p>Here's the default configuration with all available options: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a><span class="nt">server</span><span class="p">:</span>
|
||||
<a id="__codelineno-1-2" name="__codelineno-1-2" href="#__codelineno-1-2"></a><span class="w"> </span><span class="nt">host</span><span class="p">:</span><span class="w"> </span><span class="s">"0.0.0.0"</span><span class="w"> </span><span class="c1"># Server host to bind to</span>
|
||||
<a id="__codelineno-1-3" name="__codelineno-1-3" href="#__codelineno-1-3"></a><span class="w"> </span><span class="nt">port</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">8080</span><span class="w"> </span><span class="c1"># Server port to bind to</span>
|
||||
@@ -908,7 +908,7 @@
|
||||
</code></pre></div>
|
||||
<h2 id="configuration-files">Configuration Files<a class="headerlink" href="#configuration-files" title="Permanent link">¶</a></h2>
|
||||
<h3 id="configuration-file-locations">Configuration File Locations<a class="headerlink" href="#configuration-file-locations" title="Permanent link">¶</a></h3>
|
||||
<p>Configuration files are searched in the following locations (in order of precedence):</p>
|
||||
<p>Configuration files are searched in the following locations (in order of precedence): </p>
|
||||
<p><strong>Linux:</strong><br />
|
||||
- <code>./llamactl.yaml</code> or <code>./config.yaml</code> (current directory)<br />
|
||||
- <code>$HOME/.config/llamactl/config.yaml</code><br />
|
||||
@@ -922,7 +922,7 @@
|
||||
- <code>%APPDATA%\llamactl\config.yaml</code><br />
|
||||
- <code>%USERPROFILE%\llamactl\config.yaml</code><br />
|
||||
- <code>%PROGRAMDATA%\llamactl\config.yaml</code> </p>
|
||||
<p>You can specify the path to config file with <code>LLAMACTL_CONFIG_PATH</code> environment variable.</p>
|
||||
<p>You can specify the path to config file with <code>LLAMACTL_CONFIG_PATH</code> environment variable. </p>
|
||||
<h2 id="configuration-options">Configuration Options<a class="headerlink" href="#configuration-options" title="Permanent link">¶</a></h2>
|
||||
<h3 id="server-configuration">Server Configuration<a class="headerlink" href="#server-configuration" title="Permanent link">¶</a></h3>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-2-1" name="__codelineno-2-1" href="#__codelineno-2-1"></a><span class="nt">server</span><span class="p">:</span>
|
||||
@@ -932,11 +932,11 @@
|
||||
<a id="__codelineno-2-5" name="__codelineno-2-5" href="#__codelineno-2-5"></a><span class="w"> </span><span class="nt">allowed_headers</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"*"</span><span class="p p-Indicator">]</span><span class="w"> </span><span class="c1"># CORS allowed headers (default: ["*"])</span>
|
||||
<a id="__codelineno-2-6" name="__codelineno-2-6" href="#__codelineno-2-6"></a><span class="w"> </span><span class="nt">enable_swagger</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span><span class="w"> </span><span class="c1"># Enable Swagger UI (default: false)</span>
|
||||
</code></pre></div>
|
||||
<p><strong>Environment Variables:</strong>
|
||||
- <code>LLAMACTL_HOST</code> - Server host
|
||||
- <code>LLAMACTL_PORT</code> - Server port
|
||||
- <code>LLAMACTL_ALLOWED_ORIGINS</code> - Comma-separated CORS origins
|
||||
- <code>LLAMACTL_ENABLE_SWAGGER</code> - Enable Swagger UI (true/false)</p>
|
||||
<p><strong>Environment Variables:</strong><br />
|
||||
- <code>LLAMACTL_HOST</code> - Server host<br />
|
||||
- <code>LLAMACTL_PORT</code> - Server port<br />
|
||||
- <code>LLAMACTL_ALLOWED_ORIGINS</code> - Comma-separated CORS origins<br />
|
||||
- <code>LLAMACTL_ENABLE_SWAGGER</code> - Enable Swagger UI (true/false) </p>
|
||||
<h3 id="backend-configuration">Backend Configuration<a class="headerlink" href="#backend-configuration" title="Permanent link">¶</a></h3>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-3-1" name="__codelineno-3-1" href="#__codelineno-3-1"></a><span class="nt">backends</span><span class="p">:</span>
|
||||
<a id="__codelineno-3-2" name="__codelineno-3-2" href="#__codelineno-3-2"></a><span class="w"> </span><span class="nt">llama-cpp</span><span class="p">:</span>
|
||||
@@ -968,43 +968,43 @@
|
||||
<a id="__codelineno-3-28" name="__codelineno-3-28" href="#__codelineno-3-28"></a><span class="w"> </span><span class="c1"># MLX does not support Docker</span>
|
||||
<a id="__codelineno-3-29" name="__codelineno-3-29" href="#__codelineno-3-29"></a><span class="w"> </span><span class="nt">response_headers</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">{}</span><span class="w"> </span><span class="c1"># Additional response headers to send with responses</span>
|
||||
</code></pre></div>
|
||||
<p><strong>Backend Configuration Fields:</strong>
|
||||
- <code>command</code>: Executable name/path for the backend
|
||||
- <code>args</code>: Default arguments prepended to all instances
|
||||
- <code>environment</code>: Environment variables for the backend process (optional)
|
||||
- <code>response_headers</code>: Additional response headers to send with responses (optional)
|
||||
- <code>docker</code>: Docker-specific configuration (optional)
|
||||
- <code>enabled</code>: Boolean flag to enable Docker runtime
|
||||
- <code>image</code>: Docker image to use
|
||||
- <code>args</code>: Additional arguments passed to <code>docker run</code>
|
||||
- <code>environment</code>: Environment variables for the container (optional)</p>
|
||||
<p><strong>Backend Configuration Fields:</strong><br />
|
||||
- <code>command</code>: Executable name/path for the backend<br />
|
||||
- <code>args</code>: Default arguments prepended to all instances<br />
|
||||
- <code>environment</code>: Environment variables for the backend process (optional)<br />
|
||||
- <code>response_headers</code>: Additional response headers to send with responses (optional)<br />
|
||||
- <code>docker</code>: Docker-specific configuration (optional)<br />
|
||||
- <code>enabled</code>: Boolean flag to enable Docker runtime<br />
|
||||
- <code>image</code>: Docker image to use<br />
|
||||
- <code>args</code>: Additional arguments passed to <code>docker run</code><br />
|
||||
- <code>environment</code>: Environment variables for the container (optional) </p>
|
||||
<blockquote>
|
||||
<p>If llamactl is behind an NGINX proxy, <code>X-Accel-Buffering: no</code> response header may be required for NGINX to properly stream the responses without buffering.</p>
|
||||
</blockquote>
|
||||
<p><strong>Environment Variables:</strong></p>
|
||||
<p><strong>LlamaCpp Backend:</strong>
|
||||
- <code>LLAMACTL_LLAMACPP_COMMAND</code> - LlamaCpp executable command
|
||||
- <code>LLAMACTL_LLAMACPP_ARGS</code> - Space-separated default arguments
|
||||
- <code>LLAMACTL_LLAMACPP_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ENABLED</code> - Enable Docker runtime (true/false)
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_IMAGE</code> - Docker image to use
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ARGS</code> - Space-separated Docker arguments
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ENV</code> - Docker environment variables in format "KEY1=value1,KEY2=value2"
|
||||
- <code>LLAMACTL_LLAMACPP_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2"</p>
|
||||
<p><strong>VLLM Backend:</strong>
|
||||
- <code>LLAMACTL_VLLM_COMMAND</code> - VLLM executable command
|
||||
- <code>LLAMACTL_VLLM_ARGS</code> - Space-separated default arguments
|
||||
- <code>LLAMACTL_VLLM_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ENABLED</code> - Enable Docker runtime (true/false)
|
||||
- <code>LLAMACTL_VLLM_DOCKER_IMAGE</code> - Docker image to use
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ARGS</code> - Space-separated Docker arguments
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ENV</code> - Docker environment variables in format "KEY1=value1,KEY2=value2"
|
||||
- <code>LLAMACTL_VLLM_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2"</p>
|
||||
<p><strong>MLX Backend:</strong>
|
||||
- <code>LLAMACTL_MLX_COMMAND</code> - MLX executable command
|
||||
- <code>LLAMACTL_MLX_ARGS</code> - Space-separated default arguments
|
||||
- <code>LLAMACTL_MLX_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"
|
||||
- <code>LLAMACTL_MLX_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2"</p>
|
||||
<p><strong>Environment Variables:</strong> </p>
|
||||
<p><strong>LlamaCpp Backend:</strong><br />
|
||||
- <code>LLAMACTL_LLAMACPP_COMMAND</code> - LlamaCpp executable command<br />
|
||||
- <code>LLAMACTL_LLAMACPP_ARGS</code> - Space-separated default arguments<br />
|
||||
- <code>LLAMACTL_LLAMACPP_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"<br />
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ENABLED</code> - Enable Docker runtime (true/false)<br />
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_IMAGE</code> - Docker image to use<br />
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ARGS</code> - Space-separated Docker arguments<br />
|
||||
- <code>LLAMACTL_LLAMACPP_DOCKER_ENV</code> - Docker environment variables in format "KEY1=value1,KEY2=value2"<br />
|
||||
- <code>LLAMACTL_LLAMACPP_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2" </p>
|
||||
<p><strong>VLLM Backend:</strong><br />
|
||||
- <code>LLAMACTL_VLLM_COMMAND</code> - VLLM executable command<br />
|
||||
- <code>LLAMACTL_VLLM_ARGS</code> - Space-separated default arguments<br />
|
||||
- <code>LLAMACTL_VLLM_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"<br />
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ENABLED</code> - Enable Docker runtime (true/false)<br />
|
||||
- <code>LLAMACTL_VLLM_DOCKER_IMAGE</code> - Docker image to use<br />
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ARGS</code> - Space-separated Docker arguments<br />
|
||||
- <code>LLAMACTL_VLLM_DOCKER_ENV</code> - Docker environment variables in format "KEY1=value1,KEY2=value2"<br />
|
||||
- <code>LLAMACTL_VLLM_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2" </p>
|
||||
<p><strong>MLX Backend:</strong><br />
|
||||
- <code>LLAMACTL_MLX_COMMAND</code> - MLX executable command<br />
|
||||
- <code>LLAMACTL_MLX_ARGS</code> - Space-separated default arguments<br />
|
||||
- <code>LLAMACTL_MLX_ENV</code> - Environment variables in format "KEY1=value1,KEY2=value2"<br />
|
||||
- <code>LLAMACTL_MLX_RESPONSE_HEADERS</code> - Response headers in format "KEY1=value1;KEY2=value2" </p>
|
||||
<h3 id="instance-configuration">Instance Configuration<a class="headerlink" href="#instance-configuration" title="Permanent link">¶</a></h3>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-4-1" name="__codelineno-4-1" href="#__codelineno-4-1"></a><span class="nt">instances</span><span class="p">:</span>
|
||||
<a id="__codelineno-4-2" name="__codelineno-4-2" href="#__codelineno-4-2"></a><span class="w"> </span><span class="nt">port_range</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="nv">8000</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="nv">9000</span><span class="p p-Indicator">]</span><span class="w"> </span><span class="c1"># Port range for instances (default: [8000, 9000])</span>
|
||||
@@ -1029,8 +1029,8 @@
|
||||
- <code>LLAMACTL_LOGS_DIR</code> - Log directory path<br />
|
||||
- <code>LLAMACTL_AUTO_CREATE_DATA_DIR</code> - Auto-create data/config/logs directories (true/false)<br />
|
||||
- <code>LLAMACTL_MAX_INSTANCES</code> - Maximum number of instances<br />
|
||||
- <code>LLAMACTL_MAX_RUNNING_INSTANCES</code> - Maximum number of running instances
|
||||
- <code>LLAMACTL_ENABLE_LRU_EVICTION</code> - Enable LRU eviction for idle instances
|
||||
- <code>LLAMACTL_MAX_RUNNING_INSTANCES</code> - Maximum number of running instances<br />
|
||||
- <code>LLAMACTL_ENABLE_LRU_EVICTION</code> - Enable LRU eviction for idle instances<br />
|
||||
- <code>LLAMACTL_DEFAULT_AUTO_RESTART</code> - Default auto-restart setting (true/false)<br />
|
||||
- <code>LLAMACTL_DEFAULT_MAX_RESTARTS</code> - Default maximum restarts<br />
|
||||
- <code>LLAMACTL_DEFAULT_RESTART_DELAY</code> - Default restart delay in seconds<br />
|
||||
@@ -1044,13 +1044,13 @@
|
||||
<a id="__codelineno-5-4" name="__codelineno-5-4" href="#__codelineno-5-4"></a><span class="w"> </span><span class="nt">require_management_auth</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"> </span><span class="c1"># Require API key for management endpoints (default: true)</span>
|
||||
<a id="__codelineno-5-5" name="__codelineno-5-5" href="#__codelineno-5-5"></a><span class="w"> </span><span class="nt">management_keys</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[]</span><span class="w"> </span><span class="c1"># List of valid management API keys</span>
|
||||
</code></pre></div>
|
||||
<p><strong>Environment Variables:</strong>
|
||||
- <code>LLAMACTL_REQUIRE_INFERENCE_AUTH</code> - Require auth for OpenAI endpoints (true/false)
|
||||
- <code>LLAMACTL_INFERENCE_KEYS</code> - Comma-separated inference API keys
|
||||
- <code>LLAMACTL_REQUIRE_MANAGEMENT_AUTH</code> - Require auth for management endpoints (true/false)
|
||||
- <code>LLAMACTL_MANAGEMENT_KEYS</code> - Comma-separated management API keys</p>
|
||||
<p><strong>Environment Variables:</strong><br />
|
||||
- <code>LLAMACTL_REQUIRE_INFERENCE_AUTH</code> - Require auth for OpenAI endpoints (true/false)<br />
|
||||
- <code>LLAMACTL_INFERENCE_KEYS</code> - Comma-separated inference API keys<br />
|
||||
- <code>LLAMACTL_REQUIRE_MANAGEMENT_AUTH</code> - Require auth for management endpoints (true/false)<br />
|
||||
- <code>LLAMACTL_MANAGEMENT_KEYS</code> - Comma-separated management API keys </p>
|
||||
<h3 id="remote-node-configuration">Remote Node Configuration<a class="headerlink" href="#remote-node-configuration" title="Permanent link">¶</a></h3>
|
||||
<p>llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally.</p>
|
||||
<p>llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally. </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-6-1" name="__codelineno-6-1" href="#__codelineno-6-1"></a><span class="nt">local_node</span><span class="p">:</span><span class="w"> </span><span class="s">"main"</span><span class="w"> </span><span class="c1"># Name of the local node (default: "main")</span>
|
||||
<a id="__codelineno-6-2" name="__codelineno-6-2" href="#__codelineno-6-2"></a><span class="nt">nodes</span><span class="p">:</span><span class="w"> </span><span class="c1"># Node configuration map</span>
|
||||
<a id="__codelineno-6-3" name="__codelineno-6-3" href="#__codelineno-6-3"></a><span class="w"> </span><span class="nt">main</span><span class="p">:</span><span class="w"> </span><span class="c1"># Local node (empty address means local)</span>
|
||||
@@ -1060,13 +1060,13 @@
|
||||
<a id="__codelineno-6-7" name="__codelineno-6-7" href="#__codelineno-6-7"></a><span class="w"> </span><span class="nt">address</span><span class="p">:</span><span class="w"> </span><span class="s">"http://192.168.1.10:8080"</span>
|
||||
<a id="__codelineno-6-8" name="__codelineno-6-8" href="#__codelineno-6-8"></a><span class="w"> </span><span class="nt">api_key</span><span class="p">:</span><span class="w"> </span><span class="s">"worker1-api-key"</span><span class="w"> </span><span class="c1"># Management API key for authentication</span>
|
||||
</code></pre></div>
|
||||
<p><strong>Node Configuration Fields:</strong>
|
||||
- <code>local_node</code>: Specifies which node in the <code>nodes</code> map represents the local node
|
||||
- <code>nodes</code>: Map of node configurations
|
||||
- <code>address</code>: HTTP/HTTPS URL of the remote node (empty for local node)
|
||||
- <code>api_key</code>: Management API key for authenticating with the remote node</p>
|
||||
<p><strong>Environment Variables:</strong>
|
||||
- <code>LLAMACTL_LOCAL_NODE</code> - Name of the local node</p>
|
||||
<p><strong>Node Configuration Fields:</strong><br />
|
||||
- <code>local_node</code>: Specifies which node in the <code>nodes</code> map represents the local node<br />
|
||||
- <code>nodes</code>: Map of node configurations<br />
|
||||
- <code>address</code>: HTTP/HTTPS URL of the remote node (empty for local node)<br />
|
||||
- <code>api_key</code>: Management API key for authenticating with the remote node </p>
|
||||
<p><strong>Environment Variables:</strong><br />
|
||||
- <code>LLAMACTL_LOCAL_NODE</code> - Name of the local node </p>
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -886,20 +886,20 @@
|
||||
|
||||
|
||||
<h1 id="installation">Installation<a class="headerlink" href="#installation" title="Permanent link">¶</a></h1>
|
||||
<p>This guide will walk you through installing Llamactl on your system.</p>
|
||||
<p>This guide will walk you through installing Llamactl on your system. </p>
|
||||
<h2 id="prerequisites">Prerequisites<a class="headerlink" href="#prerequisites" title="Permanent link">¶</a></h2>
|
||||
<h3 id="backend-dependencies">Backend Dependencies<a class="headerlink" href="#backend-dependencies" title="Permanent link">¶</a></h3>
|
||||
<p>llamactl supports multiple backends. Install at least one:</p>
|
||||
<p><strong>For llama.cpp backend (all platforms):</strong></p>
|
||||
<p>You need <code>llama-server</code> from <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> installed:</p>
|
||||
<p>llamactl supports multiple backends. Install at least one: </p>
|
||||
<p><strong>For llama.cpp backend (all platforms):</strong> </p>
|
||||
<p>You need <code>llama-server</code> from <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> installed: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a><span class="c1"># Homebrew (macOS/Linux)</span>
|
||||
<a id="__codelineno-0-2" name="__codelineno-0-2" href="#__codelineno-0-2"></a>brew<span class="w"> </span>install<span class="w"> </span>llama.cpp
|
||||
<a id="__codelineno-0-3" name="__codelineno-0-3" href="#__codelineno-0-3"></a><span class="c1"># Winget (Windows)</span>
|
||||
<a id="__codelineno-0-4" name="__codelineno-0-4" href="#__codelineno-0-4"></a>winget<span class="w"> </span>install<span class="w"> </span>llama.cpp
|
||||
</code></pre></div>
|
||||
<p>Or build from source - see llama.cpp docs</p>
|
||||
<p><strong>For MLX backend (macOS only):</strong></p>
|
||||
<p>MLX provides optimized inference on Apple Silicon. Install MLX-LM:</p>
|
||||
<p>Or build from source - see llama.cpp docs </p>
|
||||
<p><strong>For MLX backend (macOS only):</strong> </p>
|
||||
<p>MLX provides optimized inference on Apple Silicon. Install MLX-LM: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a><span class="c1"># Install via pip (requires Python 3.8+)</span>
|
||||
<a id="__codelineno-1-2" name="__codelineno-1-2" href="#__codelineno-1-2"></a>pip<span class="w"> </span>install<span class="w"> </span>mlx-lm
|
||||
<a id="__codelineno-1-3" name="__codelineno-1-3" href="#__codelineno-1-3"></a>
|
||||
@@ -908,9 +908,9 @@
|
||||
<a id="__codelineno-1-6" name="__codelineno-1-6" href="#__codelineno-1-6"></a><span class="nb">source</span><span class="w"> </span>mlx-env/bin/activate
|
||||
<a id="__codelineno-1-7" name="__codelineno-1-7" href="#__codelineno-1-7"></a>pip<span class="w"> </span>install<span class="w"> </span>mlx-lm
|
||||
</code></pre></div>
|
||||
<p>Note: MLX backend is only available on macOS with Apple Silicon (M1, M2, M3, etc.)</p>
|
||||
<p><strong>For vLLM backend:</strong></p>
|
||||
<p>vLLM provides high-throughput distributed serving for LLMs. Install vLLM:</p>
|
||||
<p>Note: MLX backend is only available on macOS with Apple Silicon (M1, M2, M3, etc.) </p>
|
||||
<p><strong>For vLLM backend:</strong> </p>
|
||||
<p>vLLM provides high-throughput distributed serving for LLMs. Install vLLM: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-2-1" name="__codelineno-2-1" href="#__codelineno-2-1"></a><span class="c1"># Install via pip (requires Python 3.8+, GPU required)</span>
|
||||
<a id="__codelineno-2-2" name="__codelineno-2-2" href="#__codelineno-2-2"></a>pip<span class="w"> </span>install<span class="w"> </span>vllm
|
||||
<a id="__codelineno-2-3" name="__codelineno-2-3" href="#__codelineno-2-3"></a>
|
||||
@@ -923,7 +923,7 @@
|
||||
</code></pre></div>
|
||||
<h2 id="installation-methods">Installation Methods<a class="headerlink" href="#installation-methods" title="Permanent link">¶</a></h2>
|
||||
<h3 id="option-1-download-binary-recommended">Option 1: Download Binary (Recommended)<a class="headerlink" href="#option-1-download-binary-recommended" title="Permanent link">¶</a></h3>
|
||||
<p>Download the latest release from the <a href="https://github.com/lordmathis/llamactl/releases">GitHub releases page</a>:</p>
|
||||
<p>Download the latest release from the <a href="https://github.com/lordmathis/llamactl/releases">GitHub releases page</a>: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-3-1" name="__codelineno-3-1" href="#__codelineno-3-1"></a><span class="c1"># Linux/macOS - Get latest version and download</span>
|
||||
<a id="__codelineno-3-2" name="__codelineno-3-2" href="#__codelineno-3-2"></a><span class="nv">LATEST_VERSION</span><span class="o">=</span><span class="k">$(</span>curl<span class="w"> </span>-s<span class="w"> </span>https://api.github.com/repos/lordmathis/llamactl/releases/latest<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span><span class="s1">'"tag_name":'</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>sed<span class="w"> </span>-E<span class="w"> </span><span class="s1">'s/.*"([^"]+)".*/\1/'</span><span class="k">)</span>
|
||||
<a id="__codelineno-3-3" name="__codelineno-3-3" href="#__codelineno-3-3"></a>curl<span class="w"> </span>-L<span class="w"> </span>https://github.com/lordmathis/llamactl/releases/download/<span class="si">${</span><span class="nv">LATEST_VERSION</span><span class="si">}</span>/llamactl-<span class="si">${</span><span class="nv">LATEST_VERSION</span><span class="si">}</span>-<span class="k">$(</span>uname<span class="w"> </span>-s<span class="w"> </span><span class="p">|</span><span class="w"> </span>tr<span class="w"> </span><span class="s1">'[:upper:]'</span><span class="w"> </span><span class="s1">'[:lower:]'</span><span class="k">)</span>-<span class="k">$(</span>uname<span class="w"> </span>-m<span class="k">)</span>.tar.gz<span class="w"> </span><span class="p">|</span><span class="w"> </span>tar<span class="w"> </span>-xz
|
||||
@@ -935,12 +935,12 @@
|
||||
<a id="__codelineno-3-9" name="__codelineno-3-9" href="#__codelineno-3-9"></a><span class="c1"># Windows - Download from releases page</span>
|
||||
</code></pre></div>
|
||||
<h3 id="option-2-docker">Option 2: Docker<a class="headerlink" href="#option-2-docker" title="Permanent link">¶</a></h3>
|
||||
<p>llamactl provides Dockerfiles for creating Docker images with backends pre-installed. The resulting images include the latest llamactl release with the respective backend.</p>
|
||||
<p><strong>Available Dockerfiles (CUDA):</strong>
|
||||
- <strong>llamactl with llama.cpp CUDA</strong>: <code>docker/Dockerfile.llamacpp</code> (based on <code>ghcr.io/ggml-org/llama.cpp:server-cuda</code>)
|
||||
- <strong>llamactl with vLLM CUDA</strong>: <code>docker/Dockerfile.vllm</code> (based on <code>vllm/vllm-openai:latest</code>)
|
||||
- <strong>llamactl built from source</strong>: <code>docker/Dockerfile.source</code> (multi-stage build with webui)</p>
|
||||
<p><strong>Note:</strong> These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md">llama.cpp Docker docs</a>. For vLLM, check <a href="https://docs.vllm.ai/en/v0.6.5/serving/deploying_with_docker.html">vLLM docs</a>.</p>
|
||||
<p>llamactl provides Dockerfiles for creating Docker images with backends pre-installed. The resulting images include the latest llamactl release with the respective backend. </p>
|
||||
<p><strong>Available Dockerfiles (CUDA):</strong><br />
|
||||
- <strong>llamactl with llama.cpp CUDA</strong>: <code>docker/Dockerfile.llamacpp</code> (based on <code>ghcr.io/ggml-org/llama.cpp:server-cuda</code>)<br />
|
||||
- <strong>llamactl with vLLM CUDA</strong>: <code>docker/Dockerfile.vllm</code> (based on <code>vllm/vllm-openai:latest</code>)<br />
|
||||
- <strong>llamactl built from source</strong>: <code>docker/Dockerfile.source</code> (multi-stage build with webui) </p>
|
||||
<p><strong>Note:</strong> These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at <a href="https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md">llama.cpp Docker docs</a>. For vLLM, check <a href="https://docs.vllm.ai/en/v0.6.5/serving/deploying_with_docker.html">vLLM docs</a>. </p>
|
||||
<h4 id="using-docker-compose">Using Docker Compose<a class="headerlink" href="#using-docker-compose" title="Permanent link">¶</a></h4>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-4-1" name="__codelineno-4-1" href="#__codelineno-4-1"></a><span class="c1"># Clone the repository</span>
|
||||
<a id="__codelineno-4-2" name="__codelineno-4-2" href="#__codelineno-4-2"></a>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/lordmathis/llamactl.git
|
||||
@@ -955,11 +955,11 @@
|
||||
<a id="__codelineno-4-11" name="__codelineno-4-11" href="#__codelineno-4-11"></a><span class="c1"># Or start llamactl with vLLM backend</span>
|
||||
<a id="__codelineno-4-12" name="__codelineno-4-12" href="#__codelineno-4-12"></a>docker-compose<span class="w"> </span>-f<span class="w"> </span>docker/docker-compose.yml<span class="w"> </span>up<span class="w"> </span>llamactl-vllm<span class="w"> </span>-d
|
||||
</code></pre></div>
|
||||
<p>Access the dashboard at:
|
||||
- llamactl with llama.cpp: http://localhost:8080
|
||||
- llamactl with vLLM: http://localhost:8081</p>
|
||||
<p>Access the dashboard at:<br />
|
||||
- llamactl with llama.cpp: http://localhost:8080<br />
|
||||
- llamactl with vLLM: http://localhost:8081 </p>
|
||||
<h4 id="using-docker-build-and-run">Using Docker Build and Run<a class="headerlink" href="#using-docker-build-and-run" title="Permanent link">¶</a></h4>
|
||||
<p><strong>llamactl with llama.cpp CUDA:</strong>
|
||||
<p><strong>llamactl with llama.cpp CUDA:</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-5-1" name="__codelineno-5-1" href="#__codelineno-5-1"></a>docker<span class="w"> </span>build<span class="w"> </span>-f<span class="w"> </span>docker/Dockerfile.llamacpp<span class="w"> </span>-t<span class="w"> </span>llamactl:llamacpp-cuda<span class="w"> </span>.
|
||||
<a id="__codelineno-5-2" name="__codelineno-5-2" href="#__codelineno-5-2"></a>docker<span class="w"> </span>run<span class="w"> </span>-d<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-5-3" name="__codelineno-5-3" href="#__codelineno-5-3"></a><span class="w"> </span>--name<span class="w"> </span>llamactl-llamacpp<span class="w"> </span><span class="se">\</span>
|
||||
@@ -968,7 +968,7 @@
|
||||
<a id="__codelineno-5-6" name="__codelineno-5-6" href="#__codelineno-5-6"></a><span class="w"> </span>-v<span class="w"> </span>~/.cache/llama.cpp:/root/.cache/llama.cpp<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-5-7" name="__codelineno-5-7" href="#__codelineno-5-7"></a><span class="w"> </span>llamactl:llamacpp-cuda
|
||||
</code></pre></div></p>
|
||||
<p><strong>llamactl with vLLM CUDA:</strong>
|
||||
<p><strong>llamactl with vLLM CUDA:</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-6-1" name="__codelineno-6-1" href="#__codelineno-6-1"></a>docker<span class="w"> </span>build<span class="w"> </span>-f<span class="w"> </span>docker/Dockerfile.vllm<span class="w"> </span>-t<span class="w"> </span>llamactl:vllm-cuda<span class="w"> </span>.
|
||||
<a id="__codelineno-6-2" name="__codelineno-6-2" href="#__codelineno-6-2"></a>docker<span class="w"> </span>run<span class="w"> </span>-d<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-6-3" name="__codelineno-6-3" href="#__codelineno-6-3"></a><span class="w"> </span>--name<span class="w"> </span>llamactl-vllm<span class="w"> </span><span class="se">\</span>
|
||||
@@ -977,7 +977,7 @@
|
||||
<a id="__codelineno-6-6" name="__codelineno-6-6" href="#__codelineno-6-6"></a><span class="w"> </span>-v<span class="w"> </span>~/.cache/huggingface:/root/.cache/huggingface<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-6-7" name="__codelineno-6-7" href="#__codelineno-6-7"></a><span class="w"> </span>llamactl:vllm-cuda
|
||||
</code></pre></div></p>
|
||||
<p><strong>llamactl built from source:</strong>
|
||||
<p><strong>llamactl built from source:</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-7-1" name="__codelineno-7-1" href="#__codelineno-7-1"></a>docker<span class="w"> </span>build<span class="w"> </span>-f<span class="w"> </span>docker/Dockerfile.source<span class="w"> </span>-t<span class="w"> </span>llamactl:source<span class="w"> </span>.
|
||||
<a id="__codelineno-7-2" name="__codelineno-7-2" href="#__codelineno-7-2"></a>docker<span class="w"> </span>run<span class="w"> </span>-d<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-7-3" name="__codelineno-7-3" href="#__codelineno-7-3"></a><span class="w"> </span>--name<span class="w"> </span>llamactl<span class="w"> </span><span class="se">\</span>
|
||||
@@ -985,11 +985,11 @@
|
||||
<a id="__codelineno-7-5" name="__codelineno-7-5" href="#__codelineno-7-5"></a><span class="w"> </span>llamactl:source
|
||||
</code></pre></div></p>
|
||||
<h3 id="option-3-build-from-source">Option 3: Build from Source<a class="headerlink" href="#option-3-build-from-source" title="Permanent link">¶</a></h3>
|
||||
<p>Requirements:
|
||||
- Go 1.24 or later
|
||||
- Node.js 22 or later
|
||||
- Git</p>
|
||||
<p>If you prefer to build from source:</p>
|
||||
<p>Requirements:<br />
|
||||
- Go 1.24 or later<br />
|
||||
- Node.js 22 or later<br />
|
||||
- Git </p>
|
||||
<p>If you prefer to build from source: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-8-1" name="__codelineno-8-1" href="#__codelineno-8-1"></a><span class="c1"># Clone the repository</span>
|
||||
<a id="__codelineno-8-2" name="__codelineno-8-2" href="#__codelineno-8-2"></a>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/lordmathis/llamactl.git
|
||||
<a id="__codelineno-8-3" name="__codelineno-8-3" href="#__codelineno-8-3"></a><span class="nb">cd</span><span class="w"> </span>llamactl
|
||||
@@ -1001,16 +1001,16 @@
|
||||
<a id="__codelineno-8-9" name="__codelineno-8-9" href="#__codelineno-8-9"></a>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>llamactl<span class="w"> </span>./cmd/server
|
||||
</code></pre></div>
|
||||
<h2 id="remote-node-installation">Remote Node Installation<a class="headerlink" href="#remote-node-installation" title="Permanent link">¶</a></h2>
|
||||
<p>For deployments with remote nodes:
|
||||
- Install llamactl on each node using any of the methods above
|
||||
- Configure API keys for authentication between nodes</p>
|
||||
<p>For deployments with remote nodes:<br />
|
||||
- Install llamactl on each node using any of the methods above<br />
|
||||
- Configure API keys for authentication between nodes </p>
|
||||
<h2 id="verification">Verification<a class="headerlink" href="#verification" title="Permanent link">¶</a></h2>
|
||||
<p>Verify your installation by checking the version:</p>
|
||||
<p>Verify your installation by checking the version: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-9-1" name="__codelineno-9-1" href="#__codelineno-9-1"></a>llamactl<span class="w"> </span>--version
|
||||
</code></pre></div>
|
||||
<h2 id="next-steps">Next Steps<a class="headerlink" href="#next-steps" title="Permanent link">¶</a></h2>
|
||||
<p>Now that Llamactl is installed, continue to the <a href="../quick-start/">Quick Start</a> guide to get your first instance running!</p>
|
||||
<p>For remote node deployments, see the <a href="../configuration/">Configuration Guide</a> for node setup instructions.</p>
|
||||
<p>Now that Llamactl is installed, continue to the <a href="../quick-start/">Quick Start</a> guide to get your first instance running! </p>
|
||||
<p>For remote node deployments, see the <a href="../configuration/">Configuration Guide</a> for node setup instructions. </p>
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -880,43 +880,43 @@
|
||||
|
||||
|
||||
<h1 id="quick-start">Quick Start<a class="headerlink" href="#quick-start" title="Permanent link">¶</a></h1>
|
||||
<p>This guide will help you get Llamactl up and running in just a few minutes.</p>
|
||||
<p>This guide will help you get Llamactl up and running in just a few minutes. </p>
|
||||
<h2 id="step-1-start-llamactl">Step 1: Start Llamactl<a class="headerlink" href="#step-1-start-llamactl" title="Permanent link">¶</a></h2>
|
||||
<p>Start the Llamactl server:</p>
|
||||
<p>Start the Llamactl server: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-0-1" name="__codelineno-0-1" href="#__codelineno-0-1"></a>llamactl
|
||||
</code></pre></div>
|
||||
<p>By default, Llamactl will start on <code>http://localhost:8080</code>.</p>
|
||||
<p>By default, Llamactl will start on <code>http://localhost:8080</code>. </p>
|
||||
<h2 id="step-2-access-the-web-ui">Step 2: Access the Web UI<a class="headerlink" href="#step-2-access-the-web-ui" title="Permanent link">¶</a></h2>
|
||||
<p>Open your web browser and navigate to:</p>
|
||||
<p>Open your web browser and navigate to: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-1-1" name="__codelineno-1-1" href="#__codelineno-1-1"></a>http://localhost:8080
|
||||
</code></pre></div>
|
||||
<p>Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.</p>
|
||||
<p>You should see the Llamactl web interface.</p>
|
||||
<p>Login with the management API key. By default it is generated during server startup. Copy it from the terminal output. </p>
|
||||
<p>You should see the Llamactl web interface. </p>
|
||||
<h2 id="step-3-create-your-first-instance">Step 3: Create Your First Instance<a class="headerlink" href="#step-3-create-your-first-instance" title="Permanent link">¶</a></h2>
|
||||
<ol>
|
||||
<li>Click the "Add Instance" button</li>
|
||||
<li>Fill in the instance configuration:</li>
|
||||
<li><strong>Name</strong>: Give your instance a descriptive name</li>
|
||||
<li><strong>Backend Type</strong>: Choose from llama.cpp, MLX, or vLLM</li>
|
||||
<li><strong>Model</strong>: Model path or identifier for your chosen backend</li>
|
||||
<li>Click the "Add Instance" button </li>
|
||||
<li>Fill in the instance configuration: </li>
|
||||
<li><strong>Name</strong>: Give your instance a descriptive name </li>
|
||||
<li><strong>Backend Type</strong>: Choose from llama.cpp, MLX, or vLLM </li>
|
||||
<li><strong>Model</strong>: Model path or identifier for your chosen backend </li>
|
||||
<li>
|
||||
<p><strong>Additional Options</strong>: Backend-specific parameters</p>
|
||||
<p><strong>Additional Options</strong>: Backend-specific parameters </p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Click "Create Instance"</p>
|
||||
<p>Click "Create Instance" </p>
|
||||
</li>
|
||||
</ol>
|
||||
<h2 id="step-4-start-your-instance">Step 4: Start Your Instance<a class="headerlink" href="#step-4-start-your-instance" title="Permanent link">¶</a></h2>
|
||||
<p>Once created, you can:</p>
|
||||
<p>Once created, you can: </p>
|
||||
<ul>
|
||||
<li><strong>Start</strong> the instance by clicking the start button</li>
|
||||
<li><strong>Monitor</strong> its status in real-time</li>
|
||||
<li><strong>View logs</strong> by clicking the logs button</li>
|
||||
<li><strong>Stop</strong> the instance when needed</li>
|
||||
<li><strong>Start</strong> the instance by clicking the start button </li>
|
||||
<li><strong>Monitor</strong> its status in real-time </li>
|
||||
<li><strong>View logs</strong> by clicking the logs button </li>
|
||||
<li><strong>Stop</strong> the instance when needed </li>
|
||||
</ul>
|
||||
<h2 id="example-configurations">Example Configurations<a class="headerlink" href="#example-configurations" title="Permanent link">¶</a></h2>
|
||||
<p>Here are basic example configurations for each backend:</p>
|
||||
<p><strong>llama.cpp backend:</strong>
|
||||
<p>Here are basic example configurations for each backend: </p>
|
||||
<p><strong>llama.cpp backend:</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-2-1" name="__codelineno-2-1" href="#__codelineno-2-1"></a><span class="p">{</span>
|
||||
<a id="__codelineno-2-2" name="__codelineno-2-2" href="#__codelineno-2-2"></a><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"llama2-7b"</span><span class="p">,</span>
|
||||
<a id="__codelineno-2-3" name="__codelineno-2-3" href="#__codelineno-2-3"></a><span class="w"> </span><span class="nt">"backend_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"llama_cpp"</span><span class="p">,</span>
|
||||
@@ -928,7 +928,7 @@
|
||||
<a id="__codelineno-2-9" name="__codelineno-2-9" href="#__codelineno-2-9"></a><span class="w"> </span><span class="p">}</span>
|
||||
<a id="__codelineno-2-10" name="__codelineno-2-10" href="#__codelineno-2-10"></a><span class="p">}</span>
|
||||
</code></pre></div></p>
|
||||
<p><strong>MLX backend (macOS only):</strong>
|
||||
<p><strong>MLX backend (macOS only):</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-3-1" name="__codelineno-3-1" href="#__codelineno-3-1"></a><span class="p">{</span>
|
||||
<a id="__codelineno-3-2" name="__codelineno-3-2" href="#__codelineno-3-2"></a><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"mistral-mlx"</span><span class="p">,</span>
|
||||
<a id="__codelineno-3-3" name="__codelineno-3-3" href="#__codelineno-3-3"></a><span class="w"> </span><span class="nt">"backend_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"mlx_lm"</span><span class="p">,</span>
|
||||
@@ -939,7 +939,7 @@
|
||||
<a id="__codelineno-3-8" name="__codelineno-3-8" href="#__codelineno-3-8"></a><span class="w"> </span><span class="p">}</span>
|
||||
<a id="__codelineno-3-9" name="__codelineno-3-9" href="#__codelineno-3-9"></a><span class="p">}</span>
|
||||
</code></pre></div></p>
|
||||
<p><strong>vLLM backend:</strong>
|
||||
<p><strong>vLLM backend:</strong><br />
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-4-1" name="__codelineno-4-1" href="#__codelineno-4-1"></a><span class="p">{</span>
|
||||
<a id="__codelineno-4-2" name="__codelineno-4-2" href="#__codelineno-4-2"></a><span class="w"> </span><span class="nt">"name"</span><span class="p">:</span><span class="w"> </span><span class="s2">"dialogpt-vllm"</span><span class="p">,</span>
|
||||
<a id="__codelineno-4-3" name="__codelineno-4-3" href="#__codelineno-4-3"></a><span class="w"> </span><span class="nt">"backend_type"</span><span class="p">:</span><span class="w"> </span><span class="s2">"vllm"</span><span class="p">,</span>
|
||||
@@ -951,7 +951,7 @@
|
||||
<a id="__codelineno-4-9" name="__codelineno-4-9" href="#__codelineno-4-9"></a><span class="p">}</span>
|
||||
</code></pre></div></p>
|
||||
<h2 id="docker-support">Docker Support<a class="headerlink" href="#docker-support" title="Permanent link">¶</a></h2>
|
||||
<p>Llamactl can run backends in Docker containers. To enable Docker for a backend, add a <code>docker</code> section to that backend in your YAML configuration file (e.g. <code>config.yaml</code>) as shown below:</p>
|
||||
<p>Llamactl can run backends in Docker containers. To enable Docker for a backend, add a <code>docker</code> section to that backend in your YAML configuration file (e.g. <code>config.yaml</code>) as shown below: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-5-1" name="__codelineno-5-1" href="#__codelineno-5-1"></a><span class="nt">backends</span><span class="p">:</span>
|
||||
<a id="__codelineno-5-2" name="__codelineno-5-2" href="#__codelineno-5-2"></a><span class="w"> </span><span class="nt">vllm</span><span class="p">:</span>
|
||||
<a id="__codelineno-5-3" name="__codelineno-5-3" href="#__codelineno-5-3"></a><span class="w"> </span><span class="nt">command</span><span class="p">:</span><span class="w"> </span><span class="s">"vllm"</span>
|
||||
@@ -962,7 +962,7 @@
|
||||
<a id="__codelineno-5-8" name="__codelineno-5-8" href="#__codelineno-5-8"></a><span class="w"> </span><span class="nt">args</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">"run"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"--rm"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"--network"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"host"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"--gpus"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"all"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"--shm-size"</span><span class="p p-Indicator">,</span><span class="w"> </span><span class="s">"1g"</span><span class="p p-Indicator">]</span>
|
||||
</code></pre></div>
|
||||
<h2 id="using-the-api">Using the API<a class="headerlink" href="#using-the-api" title="Permanent link">¶</a></h2>
|
||||
<p>You can also manage instances via the REST API:</p>
|
||||
<p>You can also manage instances via the REST API: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-6-1" name="__codelineno-6-1" href="#__codelineno-6-1"></a><span class="c1"># List all instances</span>
|
||||
<a id="__codelineno-6-2" name="__codelineno-6-2" href="#__codelineno-6-2"></a>curl<span class="w"> </span>http://localhost:8080/api/instances
|
||||
<a id="__codelineno-6-3" name="__codelineno-6-3" href="#__codelineno-6-3"></a>
|
||||
@@ -980,9 +980,9 @@
|
||||
<a id="__codelineno-6-15" name="__codelineno-6-15" href="#__codelineno-6-15"></a>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>http://localhost:8080/api/instances/my-model/start
|
||||
</code></pre></div>
|
||||
<h2 id="openai-compatible-api">OpenAI Compatible API<a class="headerlink" href="#openai-compatible-api" title="Permanent link">¶</a></h2>
|
||||
<p>Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.</p>
|
||||
<p>Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools. </p>
|
||||
<h3 id="chat-completions">Chat Completions<a class="headerlink" href="#chat-completions" title="Permanent link">¶</a></h3>
|
||||
<p>Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:</p>
|
||||
<p>Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-7-1" name="__codelineno-7-1" href="#__codelineno-7-1"></a>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>http://localhost:8080/v1/chat/completions<span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-7-2" name="__codelineno-7-2" href="#__codelineno-7-2"></a><span class="w"> </span>-H<span class="w"> </span><span class="s2">"Content-Type: application/json"</span><span class="w"> </span><span class="se">\</span>
|
||||
<a id="__codelineno-7-3" name="__codelineno-7-3" href="#__codelineno-7-3"></a><span class="w"> </span>-d<span class="w"> </span><span class="s1">'{</span>
|
||||
@@ -998,7 +998,7 @@
|
||||
<a id="__codelineno-7-13" name="__codelineno-7-13" href="#__codelineno-7-13"></a><span class="s1"> }'</span>
|
||||
</code></pre></div>
|
||||
<h3 id="using-with-python-openai-client">Using with Python OpenAI Client<a class="headerlink" href="#using-with-python-openai-client" title="Permanent link">¶</a></h3>
|
||||
<p>You can also use the official OpenAI Python client:</p>
|
||||
<p>You can also use the official OpenAI Python client: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-8-1" name="__codelineno-8-1" href="#__codelineno-8-1"></a><span class="kn">from</span><span class="w"> </span><span class="nn">openai</span><span class="w"> </span><span class="kn">import</span> <span class="n">OpenAI</span>
|
||||
<a id="__codelineno-8-2" name="__codelineno-8-2" href="#__codelineno-8-2"></a>
|
||||
<a id="__codelineno-8-3" name="__codelineno-8-3" href="#__codelineno-8-3"></a><span class="c1"># Point the client to your Llamactl server</span>
|
||||
@@ -1020,14 +1020,14 @@
|
||||
<a id="__codelineno-8-19" name="__codelineno-8-19" href="#__codelineno-8-19"></a><span class="nb">print</span><span class="p">(</span><span class="n">response</span><span class="o">.</span><span class="n">choices</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">message</span><span class="o">.</span><span class="n">content</span><span class="p">)</span>
|
||||
</code></pre></div>
|
||||
<h3 id="list-available-models">List Available Models<a class="headerlink" href="#list-available-models" title="Permanent link">¶</a></h3>
|
||||
<p>Get a list of running instances (models) in OpenAI-compatible format:</p>
|
||||
<p>Get a list of running instances (models) in OpenAI-compatible format: </p>
|
||||
<div class="highlight"><pre><span></span><code><a id="__codelineno-9-1" name="__codelineno-9-1" href="#__codelineno-9-1"></a>curl<span class="w"> </span>http://localhost:8080/v1/models
|
||||
</code></pre></div>
|
||||
<h2 id="next-steps">Next Steps<a class="headerlink" href="#next-steps" title="Permanent link">¶</a></h2>
|
||||
<ul>
|
||||
<li>Manage instances <a href="../../user-guide/managing-instances/">Managing Instances</a></li>
|
||||
<li>Explore the <a href="../../user-guide/api-reference/">API Reference</a></li>
|
||||
<li>Configure advanced settings in the <a href="../configuration/">Configuration</a> guide</li>
|
||||
<li>Manage instances <a href="../../user-guide/managing-instances/">Managing Instances</a> </li>
|
||||
<li>Explore the <a href="../../user-guide/api-reference/">API Reference</a> </li>
|
||||
<li>Configure advanced settings in the <a href="../configuration/">Configuration</a> guide </li>
|
||||
</ul>
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user