Files
llamactl/docs/getting-started/quick-start.md

3.2 KiB

Quick Start

This guide will help you get Llamactl up and running in just a few minutes.

Step 1: Start Llamactl

Start the Llamactl server:

llamactl

By default, Llamactl will start on http://localhost:8080.

Step 2: Access the Web UI

Open your web browser and navigate to:

http://localhost:8080

Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.

You should see the Llamactl web interface.

Step 3: Create Your First Instance

  1. Click the "Add Instance" button

  2. Fill in the instance configuration:

    • Name: Give your instance a descriptive name
    • Model Path: Path to your Llama.cpp model file
    • Additional Options: Any extra Llama.cpp parameters
  3. Click "Create Instance"

Step 4: Start Your Instance

Once created, you can:

  • Start the instance by clicking the start button
  • Monitor its status in real-time
  • View logs by clicking the logs button
  • Stop the instance when needed

Example Configuration

Here's a basic example configuration for a Llama 2 model:

{
  "name": "llama2-7b",
  "model_path": "/path/to/llama-2-7b-chat.gguf",
  "options": {
    "threads": 4,
    "context_size": 2048
  }
}

Using the API

You can also manage instances via the REST API:

# List all instances
curl http://localhost:8080/api/instances

# Create a new instance
curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-model",
    "model_path": "/path/to/model.gguf",
  }'

# Start an instance
curl -X POST http://localhost:8080/api/instances/my-model/start

OpenAI Compatible API

Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.

Chat Completions

Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "my-model",
    "messages": [
      {
        "role": "user",
        "content": "Hello! Can you help me write a Python function?"
      }
    ],
    "max_tokens": 150,
    "temperature": 0.7
  }'

Using with Python OpenAI Client

You can also use the official OpenAI Python client:

from openai import OpenAI

# Point the client to your Llamactl server
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"  # Llamactl doesn't require API keys by default
)

# Create a chat completion
response = client.chat.completions.create(
    model="my-model",  # Use the name of your instance
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    max_tokens=200,
    temperature=0.7
)

print(response.choices[0].message.content)

List Available Models

Get a list of running instances (models) in OpenAI-compatible format:

curl http://localhost:8080/v1/models

Next Steps