mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-06 00:54:23 +00:00
Ad core concepts to quick-start
This commit is contained in:
@@ -2,6 +2,24 @@
|
|||||||
|
|
||||||
This guide will help you get Llamactl up and running in just a few minutes.
|
This guide will help you get Llamactl up and running in just a few minutes.
|
||||||
|
|
||||||
|
## Core Concepts
|
||||||
|
|
||||||
|
Before you start, let's clarify a few key terms:
|
||||||
|
|
||||||
|
- **Instance**: A running backend server that serves a specific model. Each instance has a unique name and runs independently.
|
||||||
|
- **Backend**: The inference engine that actually runs the model (llama.cpp, MLX, or vLLM). You need at least one backend installed before creating instances.
|
||||||
|
- **Node**: In multi-machine setups, a node represents one machine. Most users will just use the default "main" node for single-machine deployments.
|
||||||
|
- **Proxy Architecture**: Llamactl acts as a proxy in front of your instances. You make requests to llamactl (e.g., `http://localhost:8080/v1/chat/completions`), and it routes them to the appropriate backend instance. This means you don't need to track individual instance ports or endpoints.
|
||||||
|
|
||||||
|
## Authentication
|
||||||
|
|
||||||
|
Llamactl uses two types of API keys:
|
||||||
|
|
||||||
|
- **Management API Key**: Used to authenticate with the Llamactl management API (creating, starting, stopping instances).
|
||||||
|
- **Inference API Key**: Used to authenticate requests to the OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, etc.).
|
||||||
|
|
||||||
|
By default, authentication is required. If you don't configure these keys in your configuration file, llamactl will auto-generate them and print them to the terminal on startup. You can also configure custom keys or disable authentication entirely in the [Configuration](configuration.md) guide.
|
||||||
|
|
||||||
## Start Llamactl
|
## Start Llamactl
|
||||||
|
|
||||||
Start the Llamactl server:
|
Start the Llamactl server:
|
||||||
@@ -10,6 +28,33 @@ Start the Llamactl server:
|
|||||||
llamactl
|
llamactl
|
||||||
```
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
⚠️ MANAGEMENT AUTHENTICATION REQUIRED
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
🔑 Generated Management API Key:
|
||||||
|
|
||||||
|
sk-management-...
|
||||||
|
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
⚠️ INFERENCE AUTHENTICATION REQUIRED
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
🔑 Generated Inference API Key:
|
||||||
|
|
||||||
|
sk-inference-...
|
||||||
|
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
⚠️ IMPORTANT
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
• These keys are auto-generated and will change on restart
|
||||||
|
• For production, add explicit keys to your configuration
|
||||||
|
• Copy these keys before they disappear from the terminal
|
||||||
|
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||||
|
Llamactl server listening on 0.0.0.0:8080
|
||||||
|
```
|
||||||
|
|
||||||
|
Copy the **Management API Key** from the terminal - you'll need it to access the web UI.
|
||||||
|
|
||||||
By default, Llamactl will start on `http://localhost:8080`.
|
By default, Llamactl will start on `http://localhost:8080`.
|
||||||
|
|
||||||
## Access the Web UI
|
## Access the Web UI
|
||||||
@@ -20,7 +65,7 @@ Open your web browser and navigate to:
|
|||||||
http://localhost:8080
|
http://localhost:8080
|
||||||
```
|
```
|
||||||
|
|
||||||
Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.
|
Login with the management API key from the terminal output.
|
||||||
|
|
||||||
You should see the Llamactl web interface.
|
You should see the Llamactl web interface.
|
||||||
|
|
||||||
@@ -182,7 +227,7 @@ from openai import OpenAI
|
|||||||
# Point the client to your Llamactl server
|
# Point the client to your Llamactl server
|
||||||
client = OpenAI(
|
client = OpenAI(
|
||||||
base_url="http://localhost:8080/v1",
|
base_url="http://localhost:8080/v1",
|
||||||
api_key="not-needed" # Llamactl doesn't require API keys by default
|
api_key="your-inference-api-key" # Use the inference API key from terminal or config
|
||||||
)
|
)
|
||||||
|
|
||||||
# Create a chat completion
|
# Create a chat completion
|
||||||
@@ -198,6 +243,9 @@ response = client.chat.completions.create(
|
|||||||
print(response.choices[0].message.content)
|
print(response.choices[0].message.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
!!! note "API Key"
|
||||||
|
If you disabled authentication in your config, you can use any value for `api_key` (e.g., `"not-needed"`). Otherwise, use the inference API key shown in the terminal output on startup.
|
||||||
|
|
||||||
### List Available Models
|
### List Available Models
|
||||||
|
|
||||||
Get a list of running instances (models) in OpenAI-compatible format:
|
Get a list of running instances (models) in OpenAI-compatible format:
|
||||||
|
|||||||
Reference in New Issue
Block a user