Mathis/llamactl

Fork 0

mirror of https://github.com/lordmathis/llamactl.git synced 2025-11-05 16:44:22 +00:00

Files

LordMathis 0b264c8015 Fix typos and consistent naming for Llamactl across documentation

2025-09-02 22:05:01 +02:00

3.9 KiB

Raw Blame History

Managing Instances

Learn how to effectively manage your Llama.cpp instances with Llamactl.

Instance Lifecycle

Creating Instances

Instances can be created through the Web UI or API:

Via Web UI

Click "Add Instance" button
Fill in the configuration form
Click "Create"

Via API

curl -X POST http://localhost:8080/api/instances \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-instance",
    "model_path": "/path/to/model.gguf",
    "port": 8081
  }'

Starting and Stopping

Start an Instance

# Via API
curl -X POST http://localhost:8080/api/instances/{name}/start

# The instance will begin loading the model

Stop an Instance

# Via API
curl -X POST http://localhost:8080/api/instances/{name}/stop

# Graceful shutdown with configurable timeout

Monitoring Status

Check instance status in real-time:

# Get instance details
curl http://localhost:8080/api/instances/{name}

# Get health status
curl http://localhost:8080/api/instances/{name}/health

Instance States

Instances can be in one of several states:

Stopped: Instance is not running
Starting: Instance is initializing and loading the model
Running: Instance is active and ready to serve requests
Stopping: Instance is shutting down gracefully
Error: Instance encountered an error

Configuration Management

Updating Instance Configuration

Modify instance settings:

curl -X PUT http://localhost:8080/api/instances/{name} \
  -H "Content-Type: application/json" \
  -d '{
    "options": {
      "threads": 8,
      "context_size": 4096
    }
  }'

!!! note Configuration changes require restarting the instance to take effect.

Viewing Configuration

# Get current configuration
curl http://localhost:8080/api/instances/{name}/config

Resource Management

Memory Usage

Monitor memory consumption:

# Get resource usage
curl http://localhost:8080/api/instances/{name}/stats

CPU and GPU Usage

Track performance metrics:

CPU thread utilization
GPU memory usage (if applicable)
Request processing times

Troubleshooting Common Issues

Instance Won't Start

Check model path: Ensure the model file exists and is readable
Port conflicts: Verify the port isn't already in use
Resource limits: Check available memory and CPU
Permissions: Ensure proper file system permissions

Performance Issues

Adjust thread count: Match to your CPU cores
Optimize context size: Balance memory usage and capability
GPU offloading: Use gpu_layers for GPU acceleration
Batch size tuning: Optimize for your workload

Memory Problems

Reduce context size: Lower memory requirements
Disable memory mapping: Use no_mmap option
Enable memory locking: Use memory_lock for performance
Monitor system resources: Check available RAM

Best Practices

Production Deployments

Resource allocation: Plan memory and CPU requirements
Health monitoring: Set up regular health checks
Graceful shutdowns: Use proper stop procedures
Backup configurations: Save instance configurations
Log management: Configure appropriate logging levels

Development Environments

Resource sharing: Use smaller models for development
Quick iterations: Optimize for fast startup times
Debug logging: Enable detailed logging for troubleshooting

Batch Operations

Managing Multiple Instances

# Start all instances
curl -X POST http://localhost:8080/api/instances/start-all

# Stop all instances
curl -X POST http://localhost:8080/api/instances/stop-all

# Get status of all instances
curl http://localhost:8080/api/instances

Next Steps

Learn about the Web UI interface
Explore the complete API Reference
Set up Monitoring for production use

3.9 KiB Raw Blame History