Ad core concepts to quick-start

2025-12-23 09:34:23 +00:00 · 2025-10-26 16:44:32 +01:00
parent 3ff87f24bd
commit 6c522a2199
1 changed files with 50 additions and 2 deletions
--- a/docs/quick-start.md
+++ b/docs/quick-start.md
@@ -2,6 +2,24 @@

 This guide will help you get Llamactl up and running in just a few minutes.

+## Core Concepts
+
+Before you start, let's clarify a few key terms:
+
+- **Instance**: A running backend server that serves a specific model. Each instance has a unique name and runs independently.
+- **Backend**: The inference engine that actually runs the model (llama.cpp, MLX, or vLLM). You need at least one backend installed before creating instances.
+- **Node**: In multi-machine setups, a node represents one machine. Most users will just use the default "main" node for single-machine deployments.
+- **Proxy Architecture**: Llamactl acts as a proxy in front of your instances. You make requests to llamactl (e.g., `http://localhost:8080/v1/chat/completions`), and it routes them to the appropriate backend instance. This means you don't need to track individual instance ports or endpoints.
+
+## Authentication
+
+Llamactl uses two types of API keys:
+
+- **Management API Key**: Used to authenticate with the Llamactl management API (creating, starting, stopping instances).
+- **Inference API Key**: Used to authenticate requests to the OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, etc.).
+
+By default, authentication is required. If you don't configure these keys in your configuration file, llamactl will auto-generate them and print them to the terminal on startup. You can also configure custom keys or disable authentication entirely in the [Configuration](configuration.md) guide.
+
 ## Start Llamactl

 Start the Llamactl server:
@@ -10,6 +28,33 @@ Start the Llamactl server:
 llamactl
 ```

+```
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+⚠️  MANAGEMENT AUTHENTICATION REQUIRED
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+🔑  Generated Management API Key:
+
+    sk-management-...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+⚠️  INFERENCE AUTHENTICATION REQUIRED
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+🔑  Generated Inference API Key:
+
+    sk-inference-...
+
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+⚠️  IMPORTANT
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+• These keys are auto-generated and will change on restart
+• For production, add explicit keys to your configuration
+• Copy these keys before they disappear from the terminal
+━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
+Llamactl server listening on 0.0.0.0:8080
+```
+
+Copy the **Management API Key** from the terminal - you'll need it to access the web UI.
+
 By default, Llamactl will start on `http://localhost:8080`.

 ## Access the Web UI
@@ -20,7 +65,7 @@ Open your web browser and navigate to:
 http://localhost:8080
 ```

-Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.
+Login with the management API key from the terminal output.

 You should see the Llamactl web interface.

@@ -182,7 +227,7 @@ from openai import OpenAI
 # Point the client to your Llamactl server
 client = OpenAI(
    base_url="http://localhost:8080/v1",
-    api_key="not-needed"  # Llamactl doesn't require API keys by default
+    api_key="your-inference-api-key"  # Use the inference API key from terminal or config
 )

 # Create a chat completion
@@ -198,6 +243,9 @@ response = client.chat.completions.create(
 print(response.choices[0].message.content)
 ```

+!!! note "API Key"
+    If you disabled authentication in your config, you can use any value for `api_key` (e.g., `"not-needed"`). Otherwise, use the inference API key shown in the terminal output on startup.
+
 ### List Available Models

 Get a list of running instances (models) in OpenAI-compatible format: