mirror of
https://github.com/lordmathis/llamactl.git
synced 2025-11-05 16:44:22 +00:00
Update api-referrence
This commit is contained in:
@@ -7,18 +7,69 @@ Complete reference for the Llamactl REST API.
|
|||||||
All API endpoints are relative to the base URL:
|
All API endpoints are relative to the base URL:
|
||||||
|
|
||||||
```
|
```
|
||||||
http://localhost:8080/api
|
http://localhost:8080/api/v1
|
||||||
```
|
```
|
||||||
|
|
||||||
## Authentication
|
## Authentication
|
||||||
|
|
||||||
If authentication is enabled, include the JWT token in the Authorization header:
|
Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl -H "Authorization: Bearer <your-jwt-token>" \
|
curl -H "Authorization: Bearer <your-api-key>" \
|
||||||
http://localhost:8080/api/instances
|
http://localhost:8080/api/v1/instances
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The server supports two types of API keys:
|
||||||
|
- **Management API Keys**: Required for instance management operations (CRUD operations on instances)
|
||||||
|
- **Inference API Keys**: Required for OpenAI-compatible inference endpoints
|
||||||
|
|
||||||
|
## System Endpoints
|
||||||
|
|
||||||
|
### Get Llamactl Version
|
||||||
|
|
||||||
|
Get the version information of the llamactl server.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/version
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:**
|
||||||
|
```
|
||||||
|
Version: 1.0.0
|
||||||
|
Commit: abc123
|
||||||
|
Build Time: 2024-01-15T10:00:00Z
|
||||||
|
```
|
||||||
|
|
||||||
|
### Get Llama Server Help
|
||||||
|
|
||||||
|
Get help text for the llama-server command.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/server/help
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:** Plain text help output from `llama-server --help`
|
||||||
|
|
||||||
|
### Get Llama Server Version
|
||||||
|
|
||||||
|
Get version information of the llama-server binary.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/server/version
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:** Plain text version output from `llama-server --version`
|
||||||
|
|
||||||
|
### List Available Devices
|
||||||
|
|
||||||
|
List available devices for llama-server.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/server/devices
|
||||||
|
```
|
||||||
|
|
||||||
|
**Response:** Plain text device list from `llama-server --list-devices`
|
||||||
|
|
||||||
## Instances
|
## Instances
|
||||||
|
|
||||||
### List All Instances
|
### List All Instances
|
||||||
@@ -26,23 +77,18 @@ curl -H "Authorization: Bearer <your-jwt-token>" \
|
|||||||
Get a list of all instances.
|
Get a list of all instances.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
GET /api/instances
|
GET /api/v1/instances
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
[
|
||||||
"instances": [
|
{
|
||||||
{
|
"name": "llama2-7b",
|
||||||
"name": "llama2-7b",
|
"status": "running",
|
||||||
"status": "running",
|
"created": 1705312200
|
||||||
"model_path": "/models/llama-2-7b.gguf",
|
}
|
||||||
"port": 8081,
|
]
|
||||||
"created_at": "2024-01-15T10:30:00Z",
|
|
||||||
"updated_at": "2024-01-15T12:45:00Z"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### Get Instance Details
|
### Get Instance Details
|
||||||
@@ -50,7 +96,7 @@ GET /api/instances
|
|||||||
Get detailed information about a specific instance.
|
Get detailed information about a specific instance.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
GET /api/instances/{name}
|
GET /api/v1/instances/{name}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
@@ -58,92 +104,57 @@ GET /api/instances/{name}
|
|||||||
{
|
{
|
||||||
"name": "llama2-7b",
|
"name": "llama2-7b",
|
||||||
"status": "running",
|
"status": "running",
|
||||||
"model_path": "/models/llama-2-7b.gguf",
|
"created": 1705312200
|
||||||
"port": 8081,
|
|
||||||
"pid": 12345,
|
|
||||||
"options": {
|
|
||||||
"threads": 4,
|
|
||||||
"context_size": 2048,
|
|
||||||
"gpu_layers": 0
|
|
||||||
},
|
|
||||||
"stats": {
|
|
||||||
"memory_usage": 4294967296,
|
|
||||||
"cpu_usage": 25.5,
|
|
||||||
"uptime": 3600
|
|
||||||
},
|
|
||||||
"created_at": "2024-01-15T10:30:00Z",
|
|
||||||
"updated_at": "2024-01-15T12:45:00Z"
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Create Instance
|
### Create Instance
|
||||||
|
|
||||||
Create a new instance.
|
Create and start a new instance.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
POST /api/instances
|
POST /api/v1/instances/{name}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Request Body:**
|
**Request Body:** JSON object with instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
|
||||||
```json
|
|
||||||
{
|
|
||||||
"name": "my-instance",
|
|
||||||
"model_path": "/path/to/model.gguf",
|
|
||||||
"port": 8081,
|
|
||||||
"options": {
|
|
||||||
"threads": 4,
|
|
||||||
"context_size": 2048,
|
|
||||||
"gpu_layers": 0
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"message": "Instance created successfully",
|
"name": "llama2-7b",
|
||||||
"instance": {
|
"status": "running",
|
||||||
"name": "my-instance",
|
"created": 1705312200
|
||||||
"status": "stopped",
|
|
||||||
"model_path": "/path/to/model.gguf",
|
|
||||||
"port": 8081,
|
|
||||||
"created_at": "2024-01-15T14:30:00Z"
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Update Instance
|
### Update Instance
|
||||||
|
|
||||||
Update an existing instance configuration.
|
Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
PUT /api/instances/{name}
|
PUT /api/v1/instances/{name}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Request Body:**
|
**Request Body:** JSON object with configuration fields to update.
|
||||||
|
|
||||||
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"options": {
|
"name": "llama2-7b",
|
||||||
"threads": 8,
|
"status": "running",
|
||||||
"context_size": 4096
|
"created": 1705312200
|
||||||
}
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Delete Instance
|
### Delete Instance
|
||||||
|
|
||||||
Delete an instance (must be stopped first).
|
Stop and remove an instance.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
DELETE /api/instances/{name}
|
DELETE /api/v1/instances/{name}
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:** `204 No Content`
|
||||||
```json
|
|
||||||
{
|
|
||||||
"message": "Instance deleted successfully"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Instance Operations
|
## Instance Operations
|
||||||
|
|
||||||
@@ -152,38 +163,36 @@ DELETE /api/instances/{name}
|
|||||||
Start a stopped instance.
|
Start a stopped instance.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
POST /api/instances/{name}/start
|
POST /api/v1/instances/{name}/start
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"message": "Instance start initiated",
|
"name": "llama2-7b",
|
||||||
"status": "starting"
|
"status": "starting",
|
||||||
|
"created": 1705312200
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Error Responses:**
|
||||||
|
- `409 Conflict`: Maximum number of running instances reached
|
||||||
|
- `500 Internal Server Error`: Failed to start instance
|
||||||
|
|
||||||
### Stop Instance
|
### Stop Instance
|
||||||
|
|
||||||
Stop a running instance.
|
Stop a running instance.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
POST /api/instances/{name}/stop
|
POST /api/v1/instances/{name}/stop
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body (Optional):**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"force": false,
|
|
||||||
"timeout": 30
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"message": "Instance stop initiated",
|
"name": "llama2-7b",
|
||||||
"status": "stopping"
|
"status": "stopping",
|
||||||
|
"created": 1705312200
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -192,27 +201,15 @@ POST /api/instances/{name}/stop
|
|||||||
Restart an instance (stop then start).
|
Restart an instance (stop then start).
|
||||||
|
|
||||||
```http
|
```http
|
||||||
POST /api/instances/{name}/restart
|
POST /api/v1/instances/{name}/restart
|
||||||
```
|
|
||||||
|
|
||||||
### Get Instance Health
|
|
||||||
|
|
||||||
Check instance health status.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/instances/{name}/health
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"status": "healthy",
|
"name": "llama2-7b",
|
||||||
"checks": {
|
"status": "restarting",
|
||||||
"process": "running",
|
"created": 1705312200
|
||||||
"port": "open",
|
|
||||||
"response": "ok"
|
|
||||||
},
|
|
||||||
"last_check": "2024-01-15T14:30:00Z"
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -221,146 +218,108 @@ GET /api/instances/{name}/health
|
|||||||
Retrieve instance logs.
|
Retrieve instance logs.
|
||||||
|
|
||||||
```http
|
```http
|
||||||
GET /api/instances/{name}/logs
|
GET /api/v1/instances/{name}/logs
|
||||||
```
|
```
|
||||||
|
|
||||||
**Query Parameters:**
|
**Query Parameters:**
|
||||||
- `lines`: Number of lines to return (default: 100)
|
- `lines`: Number of lines to return (default: all lines, use -1 for all)
|
||||||
- `follow`: Stream logs (boolean)
|
|
||||||
- `level`: Filter by log level (debug, info, warn, error)
|
**Response:** Plain text log output
|
||||||
|
|
||||||
|
**Example:**
|
||||||
|
```bash
|
||||||
|
curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Proxy to Instance
|
||||||
|
|
||||||
|
Proxy HTTP requests directly to the llama-server instance.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /api/v1/instances/{name}/proxy/*
|
||||||
|
POST /api/v1/instances/{name}/proxy/*
|
||||||
|
```
|
||||||
|
|
||||||
|
This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.
|
||||||
|
|
||||||
|
**Example - Check Instance Health:**
|
||||||
|
```bash
|
||||||
|
curl -H "Authorization: Bearer your-api-key" \
|
||||||
|
http://localhost:8080/api/v1/instances/my-model/proxy/health
|
||||||
|
```
|
||||||
|
|
||||||
|
This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.
|
||||||
|
|
||||||
|
**Error Responses:**
|
||||||
|
- `503 Service Unavailable`: Instance is not running
|
||||||
|
|
||||||
|
## OpenAI-Compatible API
|
||||||
|
|
||||||
|
Llamactl provides OpenAI-compatible endpoints for inference operations.
|
||||||
|
|
||||||
|
### List Models
|
||||||
|
|
||||||
|
List all instances in OpenAI-compatible format.
|
||||||
|
|
||||||
|
```http
|
||||||
|
GET /v1/models
|
||||||
|
```
|
||||||
|
|
||||||
**Response:**
|
**Response:**
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"logs": [
|
"object": "list",
|
||||||
|
"data": [
|
||||||
{
|
{
|
||||||
"timestamp": "2024-01-15T14:30:00Z",
|
"id": "llama2-7b",
|
||||||
"level": "info",
|
"object": "model",
|
||||||
"message": "Model loaded successfully"
|
"created": 1705312200,
|
||||||
|
"owned_by": "llamactl"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Batch Operations
|
### Chat Completions, Completions, Embeddings
|
||||||
|
|
||||||
### Start All Instances
|
All OpenAI-compatible inference endpoints are available:
|
||||||
|
|
||||||
Start all stopped instances.
|
|
||||||
|
|
||||||
```http
|
```http
|
||||||
POST /api/instances/start-all
|
POST /v1/chat/completions
|
||||||
|
POST /v1/completions
|
||||||
|
POST /v1/embeddings
|
||||||
|
POST /v1/rerank
|
||||||
|
POST /v1/reranking
|
||||||
```
|
```
|
||||||
|
|
||||||
### Stop All Instances
|
**Request Body:** Standard OpenAI format with `model` field specifying the instance name
|
||||||
|
|
||||||
Stop all running instances.
|
**Example:**
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/instances/stop-all
|
|
||||||
```
|
|
||||||
|
|
||||||
## System Information
|
|
||||||
|
|
||||||
### Get System Status
|
|
||||||
|
|
||||||
Get overall system status and metrics.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/system/status
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"version": "1.0.0",
|
"model": "llama2-7b",
|
||||||
"uptime": 86400,
|
"messages": [
|
||||||
"instances": {
|
{
|
||||||
"total": 5,
|
"role": "user",
|
||||||
"running": 3,
|
"content": "Hello, how are you?"
|
||||||
"stopped": 2
|
}
|
||||||
},
|
]
|
||||||
"resources": {
|
|
||||||
"cpu_usage": 45.2,
|
|
||||||
"memory_usage": 8589934592,
|
|
||||||
"memory_total": 17179869184,
|
|
||||||
"disk_usage": 75.5
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Get System Information
|
The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).
|
||||||
|
|
||||||
Get detailed system information.
|
**Error Responses:**
|
||||||
|
- `400 Bad Request`: Invalid request body or missing model name
|
||||||
|
- `503 Service Unavailable`: Instance is not running and on-demand start is disabled
|
||||||
|
- `409 Conflict`: Cannot start instance due to maximum instances limit
|
||||||
|
|
||||||
```http
|
## Instance Status Values
|
||||||
GET /api/system/info
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
Instances can have the following status values:
|
||||||
```json
|
- `stopped`: Instance is not running
|
||||||
{
|
- `running`: Instance is running and ready to accept requests
|
||||||
"hostname": "server-01",
|
- `failed`: Instance failed to start or crashed
|
||||||
"os": "linux",
|
|
||||||
"arch": "amd64",
|
|
||||||
"cpu_count": 8,
|
|
||||||
"memory_total": 17179869184,
|
|
||||||
"version": "1.0.0",
|
|
||||||
"build_time": "2024-01-15T10:00:00Z"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Configuration
|
|
||||||
|
|
||||||
### Get Configuration
|
|
||||||
|
|
||||||
Get current Llamactl configuration.
|
|
||||||
|
|
||||||
```http
|
|
||||||
GET /api/config
|
|
||||||
```
|
|
||||||
|
|
||||||
### Update Configuration
|
|
||||||
|
|
||||||
Update Llamactl configuration (requires restart).
|
|
||||||
|
|
||||||
```http
|
|
||||||
PUT /api/config
|
|
||||||
```
|
|
||||||
|
|
||||||
## Authentication
|
|
||||||
|
|
||||||
### Login
|
|
||||||
|
|
||||||
Authenticate and receive a JWT token.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/auth/login
|
|
||||||
```
|
|
||||||
|
|
||||||
**Request Body:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"username": "admin",
|
|
||||||
"password": "password"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**Response:**
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
|
|
||||||
"expires_at": "2024-01-16T14:30:00Z"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### Refresh Token
|
|
||||||
|
|
||||||
Refresh an existing JWT token.
|
|
||||||
|
|
||||||
```http
|
|
||||||
POST /api/auth/refresh
|
|
||||||
```
|
|
||||||
|
|
||||||
## Error Responses
|
## Error Responses
|
||||||
|
|
||||||
@@ -368,9 +327,7 @@ All endpoints may return error responses in the following format:
|
|||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"error": "Error message",
|
"error": "Error message description"
|
||||||
"code": "ERROR_CODE",
|
|
||||||
"details": "Additional error details"
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -378,87 +335,78 @@ All endpoints may return error responses in the following format:
|
|||||||
|
|
||||||
- `200`: Success
|
- `200`: Success
|
||||||
- `201`: Created
|
- `201`: Created
|
||||||
- `400`: Bad Request
|
- `204`: No Content (successful deletion)
|
||||||
- `401`: Unauthorized
|
- `400`: Bad Request (invalid parameters or request body)
|
||||||
- `403`: Forbidden
|
- `401`: Unauthorized (missing or invalid API key)
|
||||||
- `404`: Not Found
|
- `403`: Forbidden (insufficient permissions)
|
||||||
- `409`: Conflict (e.g., instance already exists)
|
- `404`: Not Found (instance not found)
|
||||||
|
- `409`: Conflict (instance already exists, max instances reached)
|
||||||
- `500`: Internal Server Error
|
- `500`: Internal Server Error
|
||||||
|
- `503`: Service Unavailable (instance not running)
|
||||||
## WebSocket API
|
|
||||||
|
|
||||||
### Real-time Updates
|
|
||||||
|
|
||||||
Connect to WebSocket for real-time updates:
|
|
||||||
|
|
||||||
```javascript
|
|
||||||
const ws = new WebSocket('ws://localhost:8080/api/ws');
|
|
||||||
|
|
||||||
ws.onmessage = function(event) {
|
|
||||||
const data = JSON.parse(event.data);
|
|
||||||
console.log('Update:', data);
|
|
||||||
};
|
|
||||||
```
|
|
||||||
|
|
||||||
**Message Types:**
|
|
||||||
- `instance_status_changed`: Instance status updates
|
|
||||||
- `instance_stats_updated`: Resource usage updates
|
|
||||||
- `system_alert`: System-level alerts
|
|
||||||
|
|
||||||
## Rate Limiting
|
|
||||||
|
|
||||||
API requests are rate limited to:
|
|
||||||
- **100 requests per minute** for regular endpoints
|
|
||||||
- **10 requests per minute** for resource-intensive operations
|
|
||||||
|
|
||||||
Rate limit headers are included in responses:
|
|
||||||
- `X-RateLimit-Limit`: Request limit
|
|
||||||
- `X-RateLimit-Remaining`: Remaining requests
|
|
||||||
- `X-RateLimit-Reset`: Reset time (Unix timestamp)
|
|
||||||
|
|
||||||
## SDKs and Libraries
|
|
||||||
|
|
||||||
### Go Client
|
|
||||||
|
|
||||||
```go
|
|
||||||
import "github.com/lordmathis/llamactl-go-client"
|
|
||||||
|
|
||||||
client := llamactl.NewClient("http://localhost:8080")
|
|
||||||
instances, err := client.ListInstances()
|
|
||||||
```
|
|
||||||
|
|
||||||
### Python Client
|
|
||||||
|
|
||||||
```python
|
|
||||||
from llamactl import Client
|
|
||||||
|
|
||||||
client = Client("http://localhost:8080")
|
|
||||||
instances = client.list_instances()
|
|
||||||
```
|
|
||||||
|
|
||||||
## Examples
|
## Examples
|
||||||
|
|
||||||
### Complete Instance Lifecycle
|
### Complete Instance Lifecycle
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Create instance
|
# Create and start instance
|
||||||
curl -X POST http://localhost:8080/api/instances \
|
curl -X POST http://localhost:8080/api/v1/instances/my-model \
|
||||||
-H "Content-Type: application/json" \
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer your-api-key" \
|
||||||
-d '{
|
-d '{
|
||||||
"name": "example",
|
"model": "/models/llama-2-7b.gguf"
|
||||||
"model_path": "/models/example.gguf",
|
|
||||||
"port": 8081
|
|
||||||
}'
|
}'
|
||||||
|
|
||||||
# Start instance
|
# Check instance status
|
||||||
curl -X POST http://localhost:8080/api/instances/example/start
|
curl -H "Authorization: Bearer your-api-key" \
|
||||||
|
http://localhost:8080/api/v1/instances/my-model
|
||||||
|
|
||||||
# Check status
|
# Get instance logs
|
||||||
curl http://localhost:8080/api/instances/example
|
curl -H "Authorization: Bearer your-api-key" \
|
||||||
|
"http://localhost:8080/api/v1/instances/my-model/logs?lines=50"
|
||||||
|
|
||||||
|
# Use OpenAI-compatible chat completions
|
||||||
|
curl -X POST http://localhost:8080/v1/chat/completions \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer your-inference-api-key" \
|
||||||
|
-d '{
|
||||||
|
"model": "my-model",
|
||||||
|
"messages": [
|
||||||
|
{"role": "user", "content": "Hello!"}
|
||||||
|
],
|
||||||
|
"max_tokens": 100
|
||||||
|
}'
|
||||||
|
|
||||||
# Stop instance
|
# Stop instance
|
||||||
curl -X POST http://localhost:8080/api/instances/example/stop
|
curl -X POST -H "Authorization: Bearer your-api-key" \
|
||||||
|
http://localhost:8080/api/v1/instances/my-model/stop
|
||||||
|
|
||||||
# Delete instance
|
# Delete instance
|
||||||
curl -X DELETE http://localhost:8080/api/instances/example
|
curl -X DELETE -H "Authorization: Bearer your-api-key" \
|
||||||
|
http://localhost:8080/api/v1/instances/my-model
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Using the Proxy Endpoint
|
||||||
|
|
||||||
|
You can also directly proxy requests to the llama-server instance:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Direct proxy to instance (bypasses OpenAI compatibility layer)
|
||||||
|
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
|
||||||
|
-H "Content-Type: application/json" \
|
||||||
|
-H "Authorization: Bearer your-api-key" \
|
||||||
|
-d '{
|
||||||
|
"prompt": "Hello, world!",
|
||||||
|
"n_predict": 50
|
||||||
|
}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Swagger Documentation
|
||||||
|
|
||||||
|
If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:
|
||||||
|
|
||||||
|
```
|
||||||
|
http://localhost:8080/swagger/
|
||||||
|
```
|
||||||
|
|
||||||
|
This provides a complete interactive interface for testing all API endpoints.
|
||||||
|
|||||||
Reference in New Issue
Block a user