diff --git a/README.md b/README.md index 3d4f8c9..7723e91 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,7 @@ A control server for managing multiple Llama Server instances with a web-based d - **OpenAI Compatible**: Route requests to instances by instance name - **Configuration Management**: Comprehensive llama-server parameter support - **System Information**: View llama-server version, devices, and help +- **API Key Authentication**: Secure access with separate management and inference keys ## Prerequisites @@ -86,10 +87,8 @@ llamactl can be configured via configuration files or environment variables. Con 2. Configuration file 3. Environment variables - ### Configuration Files - #### Configuration File Locations Configuration files are searched in the following locations (in order of precedence): @@ -107,21 +106,8 @@ Configuration files are searched in the following locations (in order of precede You can specify the path to config file with `LLAMACTL_CONFIG_PATH` environment variable. -## API Key Authentication - -llamactl now supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. The are separate keys for management and inference APIs. Management keys grant full access; inference keys grant access to OpenAI-compatible endpoints - -**How to Use:** -- Pass your API key in requests using one of: - - `Authorization: Bearer ` header - - `X-API-Key: ` header - - `api_key=` query parameter - -**Auto-generated keys**: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables. - ### Configuration Options - #### Server Configuration ```yaml @@ -138,7 +124,6 @@ server: - `LLAMACTL_ALLOWED_ORIGINS` - Comma-separated CORS origins - `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false) - #### Instance Configuration ```yaml @@ -167,8 +152,7 @@ instances: - `LLAMACTL_DEFAULT_MAX_RESTARTS` - Default maximum restarts - `LLAMACTL_DEFAULT_RESTART_DELAY` - Default restart delay in seconds - -#### Auth Configuration +#### Authentication Configuration ```yaml auth: @@ -184,7 +168,6 @@ auth: - `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false) - `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys - ### Example Configuration ```yaml @@ -226,6 +209,22 @@ LLAMACTL_CONFIG_PATH=/path/to/config.yaml ./llamactl LLAMACTL_PORT=9090 LLAMACTL_LOG_DIR=/custom/logs ./llamactl ``` +### Authentication + +llamactl supports API Key authentication for both management and inference (OpenAI-compatible) endpoints. There are separate keys for management and inference APIs: + +- **Management keys** grant full access to instance management +- **Inference keys** grant access to OpenAI-compatible endpoints +- Management keys also work for inference endpoints (higher privilege) + +**How to Use:** +Pass your API key in requests using one of: +- `Authorization: Bearer ` header +- `X-API-Key: ` header +- `api_key=` query parameter + +**Auto-generated keys**: If no keys are set and authentication is required, a key will be generated and printed to the terminal at startup. For production, set your own keys in config or environment variables. + ### Web Dashboard Open your browser and navigate to `http://localhost:8080` to access the web dashboard. @@ -239,6 +238,7 @@ The REST API is available at `http://localhost:8080/api/v1`. See the Swagger doc ```bash curl -X POST http://localhost:8080/api/v1/instances/my-instance \ -H "Content-Type: application/json" \ + -H "Authorization: Bearer sk-management-your-key" \ -d '{ "model": "/path/to/model.gguf", "gpu_layers": 32, @@ -249,17 +249,22 @@ curl -X POST http://localhost:8080/api/v1/instances/my-instance \ #### List Instances ```bash -curl http://localhost:8080/api/v1/instances +curl -H "Authorization: Bearer sk-management-your-key" \ + http://localhost:8080/api/v1/instances ``` #### Start/Stop Instance ```bash # Start -curl -X POST http://localhost:8080/api/v1/instances/my-instance/start +curl -X POST \ + -H "Authorization: Bearer sk-management-your-key" \ + http://localhost:8080/api/v1/instances/my-instance/start # Stop -curl -X POST http://localhost:8080/api/v1/instances/my-instance/stop +curl -X POST \ + -H "Authorization: Bearer sk-management-your-key" \ + http://localhost:8080/api/v1/instances/my-instance/stop ``` ### OpenAI Compatible Endpoints @@ -269,6 +274,7 @@ Route requests to instances by including the instance name as the model paramete ```bash curl -X POST http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ + -H "Authorization: Bearer sk-inference-your-key" \ -d '{ "model": "my-instance", "messages": [{"role": "user", "content": "Hello!"}]