Files
llamactl/docs/user-guide/managing-instances.md

3.7 KiB

Managing Instances

Learn how to effectively manage your Llama.cpp instances with Llamactl.

Instance Lifecycle

Creating Instances

Instances can be created through the Web UI or API:

Via Web UI

  1. Click "Add Instance" button
  2. Fill in the configuration form
  3. Click "Create"

Via API

curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-instance",
    "model_path": "/path/to/model.gguf",
    "port": 8081
  }'

Starting and Stopping

Start an Instance

# Via API
curl -X POST http://localhost:8080/api/instances/{name}/start

# The instance will begin loading the model

Stop an Instance

# Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop

# Graceful shutdown with configurable timeout

Monitoring Status

Check instance status in real-time:

# Get instance details
curl http://localhost:8080/api/instances/{name}

# Get health status
curl http://localhost:8080/api/instances/{name}/health

Instance States

Instances can be in one of several states:

  • Stopped: Instance is not running
  • Starting: Instance is initializing and loading the model
  • Running: Instance is active and ready to serve requests
  • Stopping: Instance is shutting down gracefully
  • Error: Instance encountered an error

Configuration Management

Updating Instance Configuration

Modify instance settings:

curl -X PUT http://localhost:8080/api/instances/{name} \
  -H "Content-Type: application/json" \
  -d '{
    "options": {
      "threads": 8,
      "context_size": 4096
    }
  }'

!!! note Configuration changes require restarting the instance to take effect.

Viewing Configuration

# Get current configuration
curl http://localhost:8080/api/instances/{name}/config

Resource Management

Memory Usage

Monitor memory consumption:

# Get resource usage
curl http://localhost:8080/api/instances/{name}/stats

CPU and GPU Usage

Track performance metrics:

  • CPU thread utilization
  • GPU memory usage (if applicable)
  • Request processing times

Troubleshooting Common Issues

Instance Won't Start

  1. Check model path: Ensure the model file exists and is readable
  2. Port conflicts: Verify the port isn't already in use
  3. Resource limits: Check available memory and CPU
  4. Permissions: Ensure proper file system permissions

Performance Issues

  1. Adjust thread count: Match to your CPU cores
  2. Optimize context size: Balance memory usage and capability
  3. GPU offloading: Use gpu_layers for GPU acceleration
  4. Batch size tuning: Optimize for your workload

Memory Problems

  1. Reduce context size: Lower memory requirements
  2. Disable memory mapping: Use no_mmap option
  3. Enable memory locking: Use memory_lock for performance
  4. Monitor system resources: Check available RAM

Best Practices

Production Deployments

  1. Resource allocation: Plan memory and CPU requirements
  2. Health monitoring: Set up regular health checks
  3. Graceful shutdowns: Use proper stop procedures
  4. Backup configurations: Save instance configurations
  5. Log management: Configure appropriate logging levels

Development Environments

  1. Resource sharing: Use smaller models for development
  2. Quick iterations: Optimize for fast startup times
  3. Debug logging: Enable detailed logging for troubleshooting

Batch Operations

Managing Multiple Instances

# Start all instances
curl -X POST http://localhost:8080/api/instances/start-all

# Stop all instances
curl -X POST http://localhost:8080/api/instances/stop-all

# Get status of all instances
curl http://localhost:8080/api/instances