diff --git a/README.md b/README.md index d016634..60aafa9 100644 --- a/README.md +++ b/README.md @@ -10,17 +10,17 @@ ## Features -### 🚀 Easy Model Management +**🚀 Easy Model Management** - **Multiple Models Simultaneously**: Run different models at the same time (7B for speed, 70B for quality) - **Smart Resource Management**: Automatic idle timeout, LRU eviction, and configurable instance limits - **Web Dashboard**: Modern React UI for managing instances, monitoring health, and viewing logs -### 🔗 Flexible Integration +**🔗 Flexible Integration** - **OpenAI API Compatible**: Drop-in replacement - route requests to different models by instance name - **Multi-Backend Support**: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM - **Docker Ready**: Run backends in containers with full GPU support -### 🌐 Distributed Deployment +**🌐 Distributed Deployment** - **Remote Instances**: Deploy instances on remote hosts - **Central Management**: Manage everything from a single dashboard with automatic routing diff --git a/docs/installation.md b/docs/installation.md index 413e1fc..9442877 100644 --- a/docs/installation.md +++ b/docs/installation.md @@ -82,7 +82,7 @@ llamactl provides Dockerfiles for creating Docker images with backends pre-insta **Note:** These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at [llama.cpp Docker docs](https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md). For vLLM, check [vLLM docs](https://docs.vllm.ai/en/v0.6.5/serving/deploying_with_docker.html). -#### Using Docker Compose +**Using Docker Compose** ```bash # Clone the repository @@ -103,9 +103,9 @@ Access the dashboard at: - llamactl with llama.cpp: http://localhost:8080 - llamactl with vLLM: http://localhost:8081 -#### Using Docker Build and Run +**Using Docker Build and Run** -**llamactl with llama.cpp CUDA:** +1. llamactl with llama.cpp CUDA: ```bash docker build -f docker/Dockerfile.llamacpp -t llamactl:llamacpp-cuda . docker run -d \ @@ -116,7 +116,7 @@ docker run -d \ llamactl:llamacpp-cuda ``` -**llamactl with vLLM CUDA:** +2. llamactl with vLLM CUDA: ```bash docker build -f docker/Dockerfile.vllm -t llamactl:vllm-cuda . docker run -d \ @@ -127,7 +127,7 @@ docker run -d \ llamactl:vllm-cuda ``` -**llamactl built from source:** +3. llamactl built from source: ```bash docker build -f docker/Dockerfile.source -t llamactl:source . docker run -d \ diff --git a/docs/managing-instances.md b/docs/managing-instances.md index 68493fc..9277c6d 100644 --- a/docs/managing-instances.md +++ b/docs/managing-instances.md @@ -33,7 +33,7 @@ Each instance is displayed as a card showing: ## Create Instance -### Via Web UI +**Via Web UI** ![Create Instance Screenshot](images/create_instance.png) @@ -61,7 +61,7 @@ Each instance is displayed as a card showing: - **vLLM**: Tensor parallel size, GPU memory utilization, quantization, etc. 8. Click **"Create"** to save the instance -### Via API +**Via API** ```bash # Create llama.cpp instance with local model file @@ -138,37 +138,37 @@ curl -X POST http://localhost:8080/api/instances/remote-llama \ ## Start Instance -### Via Web UI +**Via Web UI** 1. Click the **"Start"** button on an instance card 2. Watch the status change to "Unknown" 3. Monitor progress in the logs 4. Instance status changes to "Ready" when ready -### Via API +**Via API** ```bash curl -X POST http://localhost:8080/api/instances/{name}/start ``` ## Stop Instance -### Via Web UI +**Via Web UI** 1. Click the **"Stop"** button on an instance card 2. Instance gracefully shuts down -### Via API +**Via API** ```bash curl -X POST http://localhost:8080/api/instances/{name}/stop ``` ## Edit Instance -### Via Web UI +**Via Web UI** 1. Click the **"Edit"** button on an instance card 2. Modify settings in the configuration dialog 3. Changes require instance restart to take effect 4. Click **"Update & Restart"** to apply changes -### Via API +**Via API** Modify instance settings: ```bash @@ -188,12 +188,12 @@ curl -X PUT http://localhost:8080/api/instances/{name} \ ## View Logs -### Via Web UI +**Via Web UI** 1. Click the **"Logs"** button on any instance card 2. Real-time log viewer opens -### Via API +**Via API** Check instance status in real-time: ```bash @@ -203,12 +203,12 @@ curl http://localhost:8080/api/instances/{name}/logs ## Delete Instance -### Via Web UI +**Via Web UI** 1. Click the **"Delete"** button on an instance card 2. Only stopped instances can be deleted 3. Confirm deletion in the dialog -### Via API +**Via API** ```bash curl -X DELETE http://localhost:8080/api/instances/{name} ``` @@ -229,11 +229,11 @@ All backends provide OpenAI-compatible endpoints. Check the respective documenta ### Instance Health -#### Via Web UI +**Via Web UI** 1. The health status badge is displayed on each instance card -#### Via API +**Via API** Check the health status of your instances: diff --git a/docs/quick-start.md b/docs/quick-start.md index 55b901c..7a6dedd 100644 --- a/docs/quick-start.md +++ b/docs/quick-start.md @@ -2,7 +2,7 @@ This guide will help you get Llamactl up and running in just a few minutes. -## Step 1: Start Llamactl +## Start Llamactl Start the Llamactl server: @@ -12,7 +12,7 @@ llamactl By default, Llamactl will start on `http://localhost:8080`. -## Step 2: Access the Web UI +## Access the Web UI Open your web browser and navigate to: @@ -24,18 +24,18 @@ Login with the management API key. By default it is generated during server star You should see the Llamactl web interface. -## Step 3: Create Your First Instance +## Create Your First Instance 1. Click the "Add Instance" button 2. Fill in the instance configuration: - **Name**: Give your instance a descriptive name - **Backend Type**: Choose from llama.cpp, MLX, or vLLM - - **Model**: Model path or identifier for your chosen backend + - **Model**: Model path or huggingface repo - **Additional Options**: Backend-specific parameters 3. Click "Create Instance" -## Step 4: Start Your Instance +## Start Your Instance Once created, you can: