mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
Refactor documentation headings
This commit is contained in:
@@ -10,17 +10,17 @@
|
||||
|
||||
## Features
|
||||
|
||||
### 🚀 Easy Model Management
|
||||
**🚀 Easy Model Management**
|
||||
- **Multiple Models Simultaneously**: Run different models at the same time (7B for speed, 70B for quality)
|
||||
- **Smart Resource Management**: Automatic idle timeout, LRU eviction, and configurable instance limits
|
||||
- **Web Dashboard**: Modern React UI for managing instances, monitoring health, and viewing logs
|
||||
|
||||
### 🔗 Flexible Integration
|
||||
**🔗 Flexible Integration**
|
||||
- **OpenAI API Compatible**: Drop-in replacement - route requests to different models by instance name
|
||||
- **Multi-Backend Support**: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
|
||||
- **Docker Ready**: Run backends in containers with full GPU support
|
||||
|
||||
### 🌐 Distributed Deployment
|
||||
**🌐 Distributed Deployment**
|
||||
- **Remote Instances**: Deploy instances on remote hosts
|
||||
- **Central Management**: Manage everything from a single dashboard with automatic routing
|
||||
|
||||
|
||||
@@ -82,7 +82,7 @@ llamactl provides Dockerfiles for creating Docker images with backends pre-insta
|
||||
|
||||
**Note:** These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at [llama.cpp Docker docs](https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md). For vLLM, check [vLLM docs](https://docs.vllm.ai/en/v0.6.5/serving/deploying_with_docker.html).
|
||||
|
||||
#### Using Docker Compose
|
||||
**Using Docker Compose**
|
||||
|
||||
```bash
|
||||
# Clone the repository
|
||||
@@ -103,9 +103,9 @@ Access the dashboard at:
|
||||
- llamactl with llama.cpp: http://localhost:8080
|
||||
- llamactl with vLLM: http://localhost:8081
|
||||
|
||||
#### Using Docker Build and Run
|
||||
**Using Docker Build and Run**
|
||||
|
||||
**llamactl with llama.cpp CUDA:**
|
||||
1. llamactl with llama.cpp CUDA:
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.llamacpp -t llamactl:llamacpp-cuda .
|
||||
docker run -d \
|
||||
@@ -116,7 +116,7 @@ docker run -d \
|
||||
llamactl:llamacpp-cuda
|
||||
```
|
||||
|
||||
**llamactl with vLLM CUDA:**
|
||||
2. llamactl with vLLM CUDA:
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.vllm -t llamactl:vllm-cuda .
|
||||
docker run -d \
|
||||
@@ -127,7 +127,7 @@ docker run -d \
|
||||
llamactl:vllm-cuda
|
||||
```
|
||||
|
||||
**llamactl built from source:**
|
||||
3. llamactl built from source:
|
||||
```bash
|
||||
docker build -f docker/Dockerfile.source -t llamactl:source .
|
||||
docker run -d \
|
||||
|
||||
@@ -33,7 +33,7 @@ Each instance is displayed as a card showing:
|
||||
|
||||
## Create Instance
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
|
||||

|
||||
|
||||
@@ -61,7 +61,7 @@ Each instance is displayed as a card showing:
|
||||
- **vLLM**: Tensor parallel size, GPU memory utilization, quantization, etc.
|
||||
8. Click **"Create"** to save the instance
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
|
||||
```bash
|
||||
# Create llama.cpp instance with local model file
|
||||
@@ -138,37 +138,37 @@ curl -X POST http://localhost:8080/api/instances/remote-llama \
|
||||
|
||||
## Start Instance
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
1. Click the **"Start"** button on an instance card
|
||||
2. Watch the status change to "Unknown"
|
||||
3. Monitor progress in the logs
|
||||
4. Instance status changes to "Ready" when ready
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/instances/{name}/start
|
||||
```
|
||||
|
||||
## Stop Instance
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
1. Click the **"Stop"** button on an instance card
|
||||
2. Instance gracefully shuts down
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
```bash
|
||||
curl -X POST http://localhost:8080/api/instances/{name}/stop
|
||||
```
|
||||
|
||||
## Edit Instance
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
1. Click the **"Edit"** button on an instance card
|
||||
2. Modify settings in the configuration dialog
|
||||
3. Changes require instance restart to take effect
|
||||
4. Click **"Update & Restart"** to apply changes
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
Modify instance settings:
|
||||
|
||||
```bash
|
||||
@@ -188,12 +188,12 @@ curl -X PUT http://localhost:8080/api/instances/{name} \
|
||||
|
||||
## View Logs
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
|
||||
1. Click the **"Logs"** button on any instance card
|
||||
2. Real-time log viewer opens
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
Check instance status in real-time:
|
||||
|
||||
```bash
|
||||
@@ -203,12 +203,12 @@ curl http://localhost:8080/api/instances/{name}/logs
|
||||
|
||||
## Delete Instance
|
||||
|
||||
### Via Web UI
|
||||
**Via Web UI**
|
||||
1. Click the **"Delete"** button on an instance card
|
||||
2. Only stopped instances can be deleted
|
||||
3. Confirm deletion in the dialog
|
||||
|
||||
### Via API
|
||||
**Via API**
|
||||
```bash
|
||||
curl -X DELETE http://localhost:8080/api/instances/{name}
|
||||
```
|
||||
@@ -229,11 +229,11 @@ All backends provide OpenAI-compatible endpoints. Check the respective documenta
|
||||
|
||||
### Instance Health
|
||||
|
||||
#### Via Web UI
|
||||
**Via Web UI**
|
||||
|
||||
1. The health status badge is displayed on each instance card
|
||||
|
||||
#### Via API
|
||||
**Via API**
|
||||
|
||||
Check the health status of your instances:
|
||||
|
||||
|
||||
@@ -2,7 +2,7 @@
|
||||
|
||||
This guide will help you get Llamactl up and running in just a few minutes.
|
||||
|
||||
## Step 1: Start Llamactl
|
||||
## Start Llamactl
|
||||
|
||||
Start the Llamactl server:
|
||||
|
||||
@@ -12,7 +12,7 @@ llamactl
|
||||
|
||||
By default, Llamactl will start on `http://localhost:8080`.
|
||||
|
||||
## Step 2: Access the Web UI
|
||||
## Access the Web UI
|
||||
|
||||
Open your web browser and navigate to:
|
||||
|
||||
@@ -24,18 +24,18 @@ Login with the management API key. By default it is generated during server star
|
||||
|
||||
You should see the Llamactl web interface.
|
||||
|
||||
## Step 3: Create Your First Instance
|
||||
## Create Your First Instance
|
||||
|
||||
1. Click the "Add Instance" button
|
||||
2. Fill in the instance configuration:
|
||||
- **Name**: Give your instance a descriptive name
|
||||
- **Backend Type**: Choose from llama.cpp, MLX, or vLLM
|
||||
- **Model**: Model path or identifier for your chosen backend
|
||||
- **Model**: Model path or huggingface repo
|
||||
- **Additional Options**: Backend-specific parameters
|
||||
|
||||
3. Click "Create Instance"
|
||||
|
||||
## Step 4: Start Your Instance
|
||||
## Start Your Instance
|
||||
|
||||
Once created, you can:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user