279 Commits

Author SHA1 Message Date
f1666565d8 Merge pull request #74 from lordmathis/refactor/health-check
refactor: Improve frontend health check
2025-10-26 19:54:38 +01:00
13ef13449c Fix ts type check 2025-10-26 19:52:44 +01:00
777e07752b Fix health service tests 2025-10-26 19:48:07 +01:00
75e7b628ca Remove 'loading' and 'error' states 2025-10-26 19:12:35 +01:00
2a1bebeb24 Improve health checks for instances 2025-10-26 19:05:03 +01:00
f94d05dad2 Add Restarting state 2025-10-26 18:55:05 +01:00
14f4a80c89 Merge pull request #73 from lordmathis/refactor/docs
refactor: Update docs structure and improve content clarity
2025-10-26 17:30:50 +01:00
d768845805 Add Authorization header to curl examples. 2025-10-26 17:28:01 +01:00
4f94f63de3 Minor docs improvements 2025-10-26 17:19:53 +01:00
249ff2a7aa Capitalize godoc tags 2025-10-26 16:49:27 +01:00
6c522a2199 Ad core concepts to quick-start 2025-10-26 16:44:32 +01:00
3ff87f24bd Update swagger docs 2025-10-26 16:36:24 +01:00
eac4f834c0 Update API endpoints in managing instances and quick start documentation 2025-10-26 16:35:58 +01:00
59c954811d Update API routes in godoc 2025-10-26 16:35:42 +01:00
dd40b153d8 Minor docs improvements 2025-10-26 16:10:37 +01:00
c0cd03c75d Refactor troubleshooting documentation for instance management issues 2025-10-26 15:59:17 +01:00
6a840069e1 Fix llama.cpp link in troubleshooting docs 2025-10-26 15:35:51 +01:00
7509722dfa Clarify port and api key assignments 2025-10-26 15:32:40 +01:00
a5e9e01ff4 Update screenshots 2025-10-26 15:25:53 +01:00
7063e83cd2 Improve OpenAPI docs styling 2025-10-26 15:00:30 +01:00
781921fc5a Refactor documentation headings 2025-10-26 14:50:42 +01:00
85e21596d9 Auto generate mkdocs api reference from swagger 2025-10-26 14:43:27 +01:00
975c740272 Add API key security definitions to Swagger documentation 2025-10-26 14:42:55 +01:00
e387280405 Update docs path in contributiong 2025-10-26 14:41:59 +01:00
58c8899fd9 Update import path for API documentation 2025-10-26 14:08:48 +01:00
f98b09ea78 Move apidocs to docs folder 2025-10-26 14:04:53 +01:00
90b65cad79 Update docs navigation 2025-10-26 13:57:16 +01:00
9e88b63fca Flatten the docs structure 2025-10-26 13:55:05 +01:00
52d8c2a082 Simplify README.md 2025-10-26 13:43:44 +01:00
108a977a9c Merge pull request #72 from lordmathis/refactor/handlers
refactor: Extract common helper functions in API handlers
2025-10-26 12:07:42 +01:00
969fee837f Fix instance name retrieval 2025-10-26 11:34:45 +01:00
4e587953d8 Refactor llama server command handlers to use a common execution function 2025-10-26 11:00:10 +01:00
356c5be2c6 Improve comments 2025-10-26 10:34:36 +01:00
836e918fc5 Rename ProxyToInstance to InstanceProxy for clarity in routing 2025-10-26 10:22:37 +01:00
a7593e9a58 Split LlamaCppProxy handler 2025-10-26 10:21:40 +01:00
9259763054 Add getInstance method to handlers 2025-10-26 09:54:24 +01:00
94dce4c9bb Implement helper response handling functions 2025-10-26 00:12:33 +02:00
a3f9213f04 Implement ensureInstanceRunning helper 2025-10-25 23:44:21 +02:00
de5a38e7fd Refactor command parsing 2025-10-25 20:23:08 +02:00
ea6c76cc96 Update multi valued flags in backends 2025-10-25 19:02:46 +02:00
bd6436840e Implement common ParseCommand interface 2025-10-25 18:41:46 +02:00
0a7420c9f9 Merge pull request #71 from lordmathis/refactor/proxy
refactor: Move all proxy handling to instance package
2025-10-25 16:32:32 +02:00
c038aac91b Remove redundant UpdateLast RequestTime calls 2025-10-25 16:09:57 +02:00
7d9b983f93 Don't strip remote llama-cpp proxy prefix 2025-10-25 16:02:09 +02:00
889df3cb79 Add API key header for remote instances in proxy build 2025-10-25 14:14:39 +02:00
ff719f3ef9 Remove remote instance proxy handling from handlers 2025-10-25 14:07:11 +02:00
6a973fae2d Fix tests 2025-10-25 00:14:42 +02:00
58f8861d17 Switch manager to global app config 2025-10-25 00:14:12 +02:00
eff59a86fd Remove proxy, logger and process init from UnmarshalJSON 2025-10-24 23:41:33 +02:00
174d1772d6 Implement remote proxy handling in instance 2025-10-24 23:16:45 +02:00
4bbf45f0b9 Merge pull request #70 from lordmathis/refactor/manager
refactor: Split instance manager into single focus structs
2025-10-22 20:21:15 +02:00
a9fb0d613d Validate instance name in openai proxy 2025-10-22 18:55:57 +02:00
3b8bc658e3 Add name validation to backend handlers 2025-10-22 18:50:51 +02:00
c6053f6afd Remove old validation tests 2025-10-22 18:50:38 +02:00
c794e4f98b Move instance name validation to handlers 2025-10-22 18:40:39 +02:00
0f2c14d3ed Validate instance names to prevent injection attacks 2025-10-22 00:02:23 +02:00
13f3bed5fe Add URL encoding for instance name in API calls in webui 2025-10-21 23:36:26 +02:00
7c2c02ab2f Use url escape instead for instance name param 2025-10-21 23:24:27 +02:00
e0289ff42f Add instance name validation for URL safety and corresponding tests 2025-10-21 23:16:20 +02:00
bc025bbe28 Fix instance name validation 2025-10-21 22:57:23 +02:00
c6ebe47511 Fix path validation false positive 2025-10-21 22:47:41 +02:00
9bb106a1ce Remove deprecated operation mutex in instanceManager 2025-10-21 22:38:00 +02:00
bac18b5626 Unexport factory functions 2025-10-21 22:37:10 +02:00
2b51b4a47f Simplify manager tests 2025-10-21 22:30:08 +02:00
c44712e813 Remove redundant instance manager tests 2025-10-21 22:15:12 +02:00
6afe120a0e Implement more manager tests 2025-10-21 22:07:10 +02:00
4d05fcea46 Improve manager tests 2025-10-21 21:39:01 +02:00
7c64ab9cc6 Make StartInstance and StopInstance idempotent 2025-10-21 18:49:49 +02:00
62c431a041 Merge pull request #69 from lordmathis/dependabot/npm_and_yarn/webui/npm_and_yarn-fd296dbd23
Bump vite from 7.1.5 to 7.1.11 in /webui in the npm_and_yarn group across 1 directory
2025-10-21 18:40:46 +02:00
dependabot[bot]
e5f1b7c056 Bump vite in /webui in the npm_and_yarn group across 1 directory
Bumps the npm_and_yarn group with 1 update in the /webui directory: [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite).


Updates `vite` from 7.1.5 to 7.1.11
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v7.1.11/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 7.1.11
  dependency-type: direct:development
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-10-21 00:23:03 +00:00
a2d4622486 Refactor instance locking mechanism to use per-instance locks for concurrency 2025-10-20 22:59:31 +02:00
d923732aba Delete unused code 2025-10-20 22:27:22 +02:00
1ae28a0b09 Unexport member struct methods 2025-10-20 22:22:09 +02:00
c537bc48b8 Refactor API path handling in remoteManager to use a constant for base path 2025-10-20 22:00:06 +02:00
ffb4b49c94 Split manager into multiple structs 2025-10-20 21:55:50 +02:00
91d956203d Merge pull request #68 from lordmathis/refactor/backend-options
refactor: Move all backend type switching to backends package
2025-10-19 21:04:09 +02:00
b25ad48605 Refactor backend options marshaling/unmarshaling 2025-10-19 20:48:05 +02:00
d8e0da9cf8 Refactor backend options to implement common interface and streamline validation 2025-10-19 20:36:57 +02:00
f42f000539 Implement mlx and cllm tests and remove redundant code 2025-10-19 19:45:31 +02:00
72fe780e31 Simplify instance tests 2025-10-19 19:14:32 +02:00
55a9450077 Fix instance tests 2025-10-19 19:08:38 +02:00
72586fc627 Simplify config tests 2025-10-19 19:06:06 +02:00
6a91fe13e0 Fix local node override tests 2025-10-19 18:59:59 +02:00
51a7ac590e Fix preventing local proxy usage for remote instances 2025-10-19 18:55:56 +02:00
82f4f7beed Ensure local node is defined in LoadConfig by adding default config if missing 2025-10-19 18:47:02 +02:00
ec65ba8968 Add debug files to .gitignore 2025-10-19 18:39:46 +02:00
867380a06d Remove GetBackendSettings method from config 2025-10-19 18:32:05 +02:00
3500971f03 Fix JSON marshaling of backend options by using a pointer 2025-10-19 18:27:22 +02:00
9da2433a7c Refactor instance and manager tests to use BackendOptions structure 2025-10-19 18:07:14 +02:00
55f671c354 Refactor backend options handling and validation 2025-10-19 17:41:08 +02:00
2a7010d0e1 Flatten backends package structure 2025-10-19 15:50:42 +02:00
f209bc88b6 Update .gitignore and launch configuration for dev environment 2025-10-19 15:50:30 +02:00
3fffcc5b37 Merge pull request #67 from lordmathis/refactor/instance-split
refactor: Split instance struct into status, options, logger, proxy and process for better maintenance
2025-10-18 13:23:50 +02:00
851c73f058 Add tests for status change callback and options preservation 2025-10-18 13:19:01 +02:00
8ac4b370c9 Unexport struct methods 2025-10-18 11:25:26 +02:00
a7740000d2 Refactor instance creation to initialize logger, proxy, and process only for local instances 2025-10-18 10:39:04 +02:00
b13f8c471d Split off process struct 2025-10-18 10:28:15 +02:00
3f834004a8 Rename NewInstance to New 2025-10-18 00:34:18 +02:00
113b51eda2 Refactor instance node handling to use a map 2025-10-18 00:33:16 +02:00
7bf0809122 Fix test compilation after merge
Update instance tests to use correct type names:
- CreateInstanceOptions -> Options
- InstanceStatus -> Status

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 00:13:53 +02:00
a1ffdb02a4 Merge main into refactor/instance-split
Resolved conflicts in:
- pkg/instance/instance.go: Combined remote detection logic from main with refactored structure
- pkg/manager/manager_test.go: Updated manager initialization to include localNodeName parameter
- pkg/manager/remote_ops.go: Removed stripNodesFromOptions function that was deleted in main
- pkg/manager/remote_ops_test.go: Removed file that was deleted in main

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-17 00:10:09 +02:00
eb5abae173 Merge pull request #66 from lordmathis/fix/disable-node-edit
fix: Prevent node change on update
2025-10-16 22:37:59 +02:00
696a2cb18b Prevent node change on update 2025-10-16 22:35:29 +02:00
e7402f0029 Merge pull request #65 from lordmathis/fix/local-node
fix: Detect local instances based on local node in nodes array
2025-10-16 22:28:01 +02:00
5c9a397746 Fix get local proxy 2025-10-16 22:11:29 +02:00
e97ca727d1 Clarify node configuration in docs 2025-10-16 21:50:06 +02:00
9f3c01384b Remove stripNodesFromOptions function 2025-10-16 21:29:27 +02:00
c5097e59be Fix local instance detection 2025-10-16 21:26:04 +02:00
4b30791be2 Refactor instance options structure and related code 2025-10-16 20:53:24 +02:00
a96ed4d797 Fix status json tag static check 2025-10-16 20:22:12 +02:00
5afc22924f Refactor Status struct 2025-10-16 20:15:22 +02:00
e0ec00d141 Remove rendundant instance prefix from status 2025-10-16 19:40:03 +02:00
80ca0cbd4f Rename Process to Instance 2025-10-16 19:38:44 +02:00
964c6345ef Refactor backend host/port retrieval and remove redundant code for health checks 2025-10-14 22:16:26 +02:00
92a76bc84b Move proxy to separate struct 2025-10-14 22:01:09 +02:00
02909c5153 Remove redundant instance prefix from logger 2025-10-14 19:46:43 +02:00
ef3478e2a3 Move logging to separate struct 2025-10-14 19:32:15 +02:00
cf20f304b3 Merge pull request #61 from lordmathis/fix/docs-formatting
fix: Add MkDocs hook to fix line endings in markdown files
2025-10-09 23:28:09 +02:00
72eba48b80 Add MkDocs hook to fix line endings in markdown files 2025-10-09 23:23:17 +02:00
c3037f914d Merge pull request #60 from lordmathis/lordmathis-patch-1
Update docs.yaml
2025-10-09 22:31:38 +02:00
81266b4bc4 Update docs.yaml 2025-10-09 22:29:23 +02:00
a31af94e7b Merge pull request #59 from lordmathis/feat/multi-host
feat: Implement multi node support
2025-10-09 22:23:27 +02:00
9ee0a184b3 Re-validate instance name in DeleteInstance for improved security 2025-10-09 22:18:53 +02:00
5436c28a1f Add instance name validation before deletion for security 2025-10-09 22:10:40 +02:00
73b9dd5bc7 Rename workflows for consistency 2025-10-09 21:53:14 +02:00
f61e8dad5c Add User Docs badge to README 2025-10-09 21:51:38 +02:00
ab2770bdd9 Add documentation for remote node deployment and configuration 2025-10-09 21:50:39 +02:00
e7a6a7003e Skip remote instances in checkAllTimeouts and EvictLRUInstance methods 2025-10-09 21:13:38 +02:00
2b950ee649 Implement updateLocalInstanceFromRemote to preserve Nodes field when syncing remote instance data 2025-10-09 20:39:21 +02:00
b965b77c18 Prevent remote instances from using local proxy in GetProxy method 2025-10-09 20:24:54 +02:00
8a16a195de Fix getting remote instance logs 2025-10-09 20:22:32 +02:00
9684a8a09b Enhance instance management to preserve local state for remote instances 2025-10-09 19:34:52 +02:00
9d5f01d4ae Auto-select first node in InstanceSettingsCard if none is selected 2025-10-09 19:13:58 +02:00
e281708b20 Enhance auto-start logic to differentiate between remote and local instances 2025-10-09 18:56:23 +02:00
8d9b0c0621 Initialize timeProvider and logger in UnmarshalJSON for Process 2025-10-09 18:56:12 +02:00
6c1a76691d Improve cleanup of options in InstanceDialog to skip empty strings and arrays 2025-10-09 18:49:36 +02:00
5d958ed283 Fix backend_options cleanup to exclude empty arrays in InstanceDialog 2025-10-09 18:38:33 +02:00
56b95d1243 Refactor InstanceSettingsCard and API types to use NodesMap 2025-10-08 19:52:39 +02:00
688b815ca7 Add LocalNode configuration 2025-10-08 19:43:53 +02:00
7f6725da96 Refactor NodeConfig handling to use a map 2025-10-08 19:24:24 +02:00
3418735204 Add stripNodesFromOptions function to prevent routing loops in remote requests 2025-10-07 20:27:31 +02:00
2f1cf5acdc Refactor CreateRemoteInstance and UpdateRemoteInstance to directly use options parameter in API requests 2025-10-07 19:57:21 +02:00
01380e6641 Update instance manager tests to use empty NodeConfig slice 2025-10-07 19:18:13 +02:00
6298b03636 Refactor RemoteOpenAIProxy to use cached proxies and restore request body handling 2025-10-07 18:57:08 +02:00
aae3f84d49 Implement caching for remote instance proxies and enhance proxy request handling 2025-10-07 18:44:23 +02:00
554796391b Remove test config file 2025-10-07 18:05:30 +02:00
16b28bac05 Merge branch 'main' into feat/multi-host 2025-10-07 18:04:24 +02:00
1892dc8315 Merge pull request #57 from BobbyL2k/feat/llama-cpp-proxy
feat: Proxy llama.cpp API endpoints via `/llama-cpp/{name}/`
2025-10-06 20:23:44 +02:00
Anuruth Lertpiya
997bd1b063 Changed status code to StatusBadRequest (400) if requested invalid model name. 2025-10-05 14:53:20 +00:00
Anuruth Lertpiya
fa43f9e967 Added support for proxying llama.cpp native API endpoints via /llama-cpp/{name}/ 2025-10-05 14:28:33 +00:00
db9eebeb8b Merge pull request #56 from lordmathis/fix/body-already-read
Fix double read of json response when content-length header is missing
2025-10-04 22:28:22 +02:00
bd062f8ca0 Mock Response.clone for tests 2025-10-04 22:22:25 +02:00
8ebdb1a183 Fix double read of json response when content-length header is missing 2025-10-04 22:16:28 +02:00
7272212081 Merge pull request #55 from lordmathis/fix/auto-restart
fix: Set status to Stopped for instances with auto-restart disabled
2025-10-04 21:45:12 +02:00
035e184789 Merge branch 'main' into fix/auto-restart 2025-10-04 21:22:50 +02:00
d15976e7aa Implement auto-stop for instances with auto-restart disabled and add corresponding tests 2025-10-04 21:17:55 +02:00
4fa75d9801 Merge pull request #52 from BobbyL2k/feat/config-cors-headers
feat: Added support for configuring access-control-request-headers for CORS
2025-10-04 20:45:27 +02:00
Anuruth Lertpiya
0e1bc8a352 Added support for configuring CORS headers 2025-10-04 09:13:40 +00:00
b728a7c6b2 Fix fetchNodes call to ensure proper handling of promise 2025-10-03 10:53:29 +02:00
a491f29483 Add node selection functionality to InstanceSettingsCard and define Node API 2025-10-02 23:18:33 +02:00
670f8ff81b Split up handlers 2025-10-02 23:11:20 +02:00
da56456504 Add node management endpoints to handle listing and retrieving node details 2025-10-02 22:51:41 +02:00
c30053e51c Enhance instance loading to support remote instances and handle node configuration 2025-10-01 22:59:45 +02:00
347c58e15f Enhance instance manager to persist remote instances and update tracking on modifications 2025-10-01 22:58:57 +02:00
2ed67eb672 Add remote instance proxying functionality to handler 2025-10-01 22:17:19 +02:00
0188f82306 Implement remote instance creation and deletion in instance manager 2025-10-01 22:05:18 +02:00
e0f176de10 Enhance instance manager to support remote instance management and update related tests 2025-10-01 20:25:06 +02:00
2759be65a5 Add remote instance management functionality and configuration support 2025-09-30 21:09:05 +02:00
1e5e86d2c3 Merge pull request #50 from lordmathis/feat/docker-image
feat: Add Dockerfiles for running llamactl in docker
2025-09-29 21:26:23 +02:00
25d3d70707 Update README and installation guide to reflect Dockerfile paths and add source build instructions 2025-09-29 21:18:13 +02:00
e54cfd006d Add Dockerfile for building from source 2025-09-29 21:17:40 +02:00
7d39e7ee86 Move docker stuff to a dedicated folder 2025-09-29 21:16:51 +02:00
222d913b4a Merge pull request #49 from BobbyL2k/feat/reverse-proxy-support
Added support for serving behind a reverse proxy
2025-09-29 20:32:11 +02:00
Anuruth Lertpiya
03a7a5d139 Update configration.md with reverse proxy related information 2025-09-29 13:54:15 +00:00
Anuruth Lertpiya
e50660c379 Fixed broken webui tests 2025-09-29 13:38:24 +00:00
Anuruth Lertpiya
5906d89f8d Added support for serving behind a reverse proxy
- Added support for specifying response headers for each backend
  - Allowing users to set `X-Accel-Buffering: no` to disable buffering for streaming responses in nginx
  - Updated `configuration.md` to document the new configuration options
- Modified Vite config to build with `base: "./"`, making assets be accessed via relative paths
- Updated API_BASE to use `document.baseURI`, allowing API calls to be made relative to the base path
2025-09-29 12:43:10 +00:00
cb2d95139f Setup data dir in Docker and docker-compose 2025-09-28 22:17:38 +02:00
889a8707e7 Refactor Dockerfile and docker-compose to streamline environment variable configuration and remove redundant commands 2025-09-28 22:17:38 +02:00
070c91787d Add environment variable for llamactl command in Dockerfile 2025-09-28 22:17:38 +02:00
169ee422ec Update README and installation guide to clarify Docker support and CUDA configuration 2025-09-28 22:17:38 +02:00
bb0176b7f5 Update Dockerfile to use server-cuda image for improved performance 2025-09-28 22:17:38 +02:00
291ec7995f Update Docker run commands to use cached directories and remove unnecessary environment variables 2025-09-28 22:17:38 +02:00
b940b38e46 Initial support for docker 2025-09-28 22:17:38 +02:00
92cb57e816 Merge pull request #48 from lordmathis/fix/command-environment
fix: Pass host environment to instances
2025-09-28 21:40:50 +02:00
0ecd55c354 Start with host environment for instances 2025-09-28 21:37:48 +02:00
b4c17194eb Merge pull request #47 from lordmathis/fix/nil-context
fix: Initialize context before building command
2025-09-28 20:59:30 +02:00
808092decf Initialize context in Start method for command execution 2025-09-28 20:51:11 +02:00
12bbf34236 Merge pull request #46 from lordmathis/feat/env-vars
feat: Add support for passing env vars to instances
2025-09-28 15:42:02 +02:00
9a7255a52d Refactor Docker support section in README for clarity and conciseness 2025-09-28 15:31:50 +02:00
97a7c9a4e3 Detail env var support in docs 2025-09-28 15:29:43 +02:00
fa9335663a Parse backend env vars from env vars 2025-09-28 15:22:01 +02:00
d092518114 Update documentation 2025-09-28 15:10:35 +02:00
ffa0a0c161 Remove ZodFormField and BasicInstanceFields components 2025-09-28 14:42:10 +02:00
1fbf809a2d Add EnvironmentVariablesInput component and integrate into InstanceSettingsCard 2025-09-28 14:42:10 +02:00
c984d95723 Add environment variable support to instance options and command building 2025-09-28 14:42:10 +02:00
50e1355205 Add environment field to BackendSettings for improved configuration 2025-09-28 14:42:10 +02:00
7994fd05b3 Merge pull request #44 from BobbyL2k/fix/rel-dir-config
fix: InstancesDir and LogsDir not being relative path to DataDir when not set
2025-09-27 21:33:00 +02:00
Anuruth Lertpiya
f496a28f04 fix: InstancesDir and LogsDir not being relative path to DataDir when not set 2025-09-27 18:14:25 +00:00
f9371e876d Merge pull request #43 from BobbyL2k/fix/config-path
fix: llamactl reads config file per documentation
2025-09-27 19:32:13 +02:00
Anuruth Lertpiya
3a979da815 fix: llamactl reads config file per documentation
- Added logging to track config file reading operations
- llamactl now properly reads config files from the expected locations ("llamactl.yaml" and "config.yaml" under current directory)
2025-09-27 17:03:54 +00:00
a824f066ec Merge pull request #42 from lordmathis/feat/docker-backends
feat: Add support for dockerized backends
2025-09-25 23:07:24 +02:00
2cd9d374a7 Add Docker badge to UI 2025-09-25 23:04:24 +02:00
031d6c7017 Update Docker command arguments for llama-server and vllm with volume mounts 2025-09-25 22:51:51 +02:00
282344af23 Fix docker command args building 2025-09-25 22:51:40 +02:00
bc9e0535c3 Refactor command building and argument handling 2025-09-25 22:05:46 +02:00
2d925b473d Add Docker support documentation and configuration for backends 2025-09-24 22:15:21 +02:00
ba0f877185 Fix tests 2025-09-24 21:35:44 +02:00
840a7bc650 Add Docker command handling for backend options and refactor command building 2025-09-24 21:34:54 +02:00
76ac93bedc Implement Docker command handling for Llama, MLX, and vLLM backends 2025-09-24 21:31:58 +02:00
72d2a601c8 Update Docker args in LoadConfig and tests to include 'run --rm' prefix 2025-09-24 21:27:51 +02:00
9a56660f68 Refactor backend configuration to use structured settings and update environment variable handling 2025-09-24 20:31:20 +02:00
78a483ee4a Merge pull request #41 from lordmathis/fix/docs-release
fix: Refactor docs workflow to trigger on version tags
2025-09-23 22:35:05 +02:00
cdcef7c7ae Refactor docs workflow to trigger on version tags 2025-09-23 22:32:02 +02:00
6f5d886089 Merge pull request #40 from lordmathis/feat/system-info
feat: rework system info dialog
2025-09-23 22:11:42 +02:00
e3bf8ac05a Update SystemInfo dialog 2025-09-23 22:05:31 +02:00
edf0575925 Replace SystemInfoDialog with BackendInfoDialog and update related references 2025-09-23 21:44:04 +02:00
71a48aa3b6 Update server API functions to use /backends/llama-cpp path 2025-09-23 21:28:23 +02:00
30e40ecd30 Refactor API endpoints to use /backends/llama-cpp path and update related documentation 2025-09-23 21:27:58 +02:00
322e1c5eb7 Merge pull request #39 from lordmathis/feat/instance-dialog
feat: Redesign create/edit instance dialog
2025-09-23 21:14:34 +02:00
2cbd666d38 Redesign create/edit instance dialog 2025-09-23 21:11:00 +02:00
9ebc05fa3a Merge pull request #38 from lordmathis/feat/instance-card
feat: Redesign instance card
2025-09-23 19:48:20 +02:00
05e4335389 Fix instance management tests 2025-09-23 19:45:45 +02:00
850cf018e3 Refactor BackendBadge component 2025-09-23 19:20:53 +02:00
9c3da55c5d Improve InstanceCard layout 2025-09-23 18:12:58 +02:00
16d311a3d0 Merge pull request #37 from lordmathis/lordmathis-patch-1
fix: Set default docs version
2025-09-23 13:48:53 +02:00
32f58502de Update docs.yml 2025-09-23 13:46:58 +02:00
788f5a2246 Merge pull request #36 from lordmathis/lordmathis-patch-1
fix: Run docs build job on every update
2025-09-23 13:21:53 +02:00
37f464007f Update docs.yml 2025-09-23 13:19:54 +02:00
84d994c625 Merge pull request #35 from lordmathis/chore/docs-update
chore: Update docs
2025-09-22 23:24:12 +02:00
120875351f Fix image paths for MkDocs rendering in readme_sync.py 2025-09-22 23:22:27 +02:00
3a63308d5f Update error descriptions in API documentation for clarity 2025-09-22 22:39:01 +02:00
46622d2107 Update documentation and add README synchronization 2025-09-22 22:37:53 +02:00
ebc82c37aa Merge pull request #34 from lordmathis/feat/vllm-backend
feat: Implement vLLM backend
2025-09-22 21:58:19 +02:00
48b3a39dfe Move badges in instance card 2025-09-22 21:54:04 +02:00
c10153f59f Add BackendBadge component and integrate it into InstanceCard 2025-09-22 21:48:33 +02:00
588b025fb1 Handle empty responses for JSON endpoints in apiCall function 2025-09-22 21:39:44 +02:00
6dcf0f806e Fix VLLM command placeholder formatting 2025-09-22 21:30:59 +02:00
184d6df1bc Fix vllm command parsing 2025-09-22 21:25:50 +02:00
313666ea17 Fix missing vllm proxy setup 2025-09-22 20:51:00 +02:00
c3ca5b95f7 Update BuildCommandArgs to use positional argument for model and adjust tests accordingly 2025-09-22 20:32:03 +02:00
2c86fc6470 Update api referrence 2025-09-21 22:16:56 +02:00
785915943b Update api docs 2025-09-21 22:03:07 +02:00
55765d2020 Add vLLM backend support to documentation and update instance management instructions 2025-09-21 21:57:36 +02:00
6ff9aa5470 Remove vLLM backend implementation specification document 2025-09-21 21:38:10 +02:00
501afb7f0d Refactor form components and improve API error handling 2025-09-21 21:33:53 +02:00
b665194307 Add vLLM backend support to webui 2025-09-21 20:58:43 +02:00
7eb59aa7e0 Remove unused JSON unmarshal test and clean up command argument checks 2025-09-19 20:46:25 +02:00
64842e74b0 Refactor command parsing and building 2025-09-19 20:23:25 +02:00
34a949d22e Refactor command argument building and parsing 2025-09-19 19:59:46 +02:00
ec5485bd0e Refactor command argument building across backends 2025-09-19 19:46:54 +02:00
9eecb37aec Refactor MLX and VLLM server options parsing and args building 2025-09-19 19:39:36 +02:00
c7136d5206 Refactor command parsing logic across backends to utilize a unified CommandParserConfig structure 2025-09-19 18:36:23 +02:00
4df02a6519 Initial vLLM backend support 2025-09-19 18:05:12 +02:00
02fdae24ee Merge pull request #33 from lordmathis/feat/doc-versioning
feat: Docs versioning
2025-09-18 21:07:04 +02:00
9a8647775d Setup docs versioning 2025-09-18 21:04:11 +02:00
3081a1986b Merge pull request #32 from lordmathis/feat/mlx-backend
feat: Implement mlx-lm backend
2025-09-18 20:34:04 +02:00
6a580667ed Remove LlamaExecutable checks from default and file loading tests 2025-09-18 20:30:26 +02:00
2a20817078 Remove redundant LlamaExecutable field from instance configuration in tests 2025-09-18 20:29:04 +02:00
5e2d237887 Update project description for clarity and consistency in README 2025-09-18 20:21:30 +02:00
84c3453281 Refactor features section in README for improved clarity and organization 2025-09-18 20:14:03 +02:00
8006dd3841 Fix formatting in README for consistency in feature descriptions 2025-09-18 20:03:19 +02:00
8820dc1146 Enhance documentation for MLX backend support 2025-09-18 20:01:18 +02:00
11296bc5f8 Update README to include MLX backend support and enhance usage instructions 2025-09-18 19:34:40 +02:00
5121f0e302 Remove PythonPath references from MlxServerOptions and related configurations 2025-09-17 21:59:55 +02:00
587be68077 Add MLX backend support with configuration and parsing enhancements 2025-09-16 22:38:39 +02:00
cc5d8acd92 Refactor instance and manager tests to use BackendConfig for LlamaExecutable and MLXLMExecutable 2025-09-16 21:45:50 +02:00
154b754aff Add MLX command parsing and routing support 2025-09-16 21:39:08 +02:00
63fea02d66 Add MLX backend support in CreateInstanceOptions and validation 2025-09-16 21:38:33 +02:00
468688cdbc Pass backend options to instances 2025-09-16 21:37:48 +02:00
988c4aca40 Add MLX backend config options 2025-09-16 21:14:19 +02:00
1f25e9d05b Merge pull request #31 from lordmathis/feat/parse-command
feat: Implement command parsing in Create Instance
2025-09-15 22:18:39 +02:00
1b5934303b Enhance command parsing in ParseLlamaCommand and improve error handling in ParseCommandRequest 2025-09-15 22:12:56 +02:00
ccabd84568 Add margin to textarea in ParseCommandDialog for improved spacing 2025-09-15 21:36:24 +02:00
e7b06341c3 Enhance command parsing in ParseLlamaCommand 2025-09-15 21:29:46 +02:00
323056096c Implement llama-server command parsing and add UI components for command input 2025-09-15 21:04:14 +02:00
cb1669f853 Merge pull request #30 from lordmathis/dependabot/npm_and_yarn/webui/npm_and_yarn-f5c1666f0c
Bump vite from 7.0.5 to 7.1.5 in /webui in the npm_and_yarn group across 1 directory
2025-09-14 10:47:38 +02:00
dependabot[bot]
a5d1f24cbf Bump vite in /webui in the npm_and_yarn group across 1 directory
Bumps the npm_and_yarn group with 1 update in the /webui directory: [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite).


Updates `vite` from 7.0.5 to 7.1.5
- [Release notes](https://github.com/vitejs/vite/releases)
- [Changelog](https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md)
- [Commits](https://github.com/vitejs/vite/commits/v7.1.5/packages/vite)

---
updated-dependencies:
- dependency-name: vite
  dependency-version: 7.1.5
  dependency-type: direct:development
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-09-09 21:38:38 +00:00
92f0bd02f2 Merge pull request #29 from lordmathis/lordmathis-patch-1
chore: Switch main dashboard screenshot
2025-09-04 22:54:06 +02:00
0a16f617ad Add files via upload 2025-09-04 22:47:14 +02:00
124 changed files with 17154 additions and 8560 deletions

45
.dockerignore Normal file
View File

@@ -0,0 +1,45 @@
# Git and version control
.git/
.gitignore
# Documentation
*.md
docs/
# Development files
.vscode/
.idea/
# Build artifacts
webui/node_modules/
webui/dist/
webui/.next/
*.log
*.tmp
# Data directories
data/
models/
logs/
# Test files
*_test.go
**/*_test.go
# CI/CD
.github/
# Local configuration
llamactl.yaml
config.yaml
.env
.env.local
# OS files
.DS_Store
Thumbs.db
# Backup files
*.bak
*.backup
*~

103
.github/workflows/docs.yaml vendored Normal file
View File

@@ -0,0 +1,103 @@
name: User Docs
on:
push:
branches: [ main ]
tags: [ 'v*' ]
pull_request:
branches: [ main ]
paths:
- 'docs/**'
- 'mkdocs.yml'
- 'docs-requirements.txt'
permissions:
contents: write
pages: write
id-token: write
concurrency:
group: "pages"
cancel-in-progress: false
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r docs-requirements.txt
- name: Build documentation
run: |
mkdocs build --strict
deploy-dev:
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r docs-requirements.txt
- name: Configure Git
run: |
git config --global user.name "${{ github.actor }}"
git config --global user.email "${{ github.actor }}@users.noreply.github.com"
- name: Deploy development version
run: |
mike deploy --push --update-aliases dev latest
# Set dev as default if no default exists
if ! mike list | grep -q "default"; then
mike set-default --push dev
fi
deploy-release:
runs-on: ubuntu-latest
if: startsWith(github.ref, 'refs/tags/v')
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r docs-requirements.txt
- name: Configure Git
run: |
git config --global user.name "${{ github.actor }}"
git config --global user.email "${{ github.actor }}@users.noreply.github.com"
- name: Deploy release version
run: |
VERSION=${GITHUB_REF#refs/tags/}
mike deploy --push --update-aliases $VERSION stable
mike set-default --push stable

View File

@@ -1,65 +0,0 @@
name: Build and Deploy Documentation
on:
push:
branches: [ main ]
paths:
- 'docs/**'
- 'mkdocs.yml'
- 'docs-requirements.txt'
- '.github/workflows/docs.yml'
pull_request:
branches: [ main ]
paths:
- 'docs/**'
- 'mkdocs.yml'
- 'docs-requirements.txt'
permissions:
contents: read
pages: write
id-token: write
concurrency:
group: "pages"
cancel-in-progress: false
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0 # Needed for git-revision-date-localized plugin
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install -r docs-requirements.txt
- name: Build documentation
run: |
mkdocs build --strict
- name: Upload documentation artifact
if: github.ref == 'refs/heads/main'
uses: actions/upload-pages-artifact@v3
with:
path: ./site
deploy:
if: github.ref == 'refs/heads/main'
environment:
name: github-pages
url: ${{ steps.deployment.outputs.page_url }}
runs-on: ubuntu-latest
needs: build
steps:
- name: Deploy to GitHub Pages
id: deployment
uses: actions/deploy-pages@v4

12
.gitignore vendored
View File

@@ -32,4 +32,14 @@ go.work.sum
# .vscode/
node_modules/
dist/
dist/
__pycache__/
site/
# Dev config
llamactl.dev.yaml
# Debug files
__debug*

2
.vscode/launch.json vendored
View File

@@ -12,7 +12,7 @@
"program": "${workspaceFolder}/cmd/server/main.go",
"env": {
"GO_ENV": "development",
"LLAMACTL_REQUIRE_MANAGEMENT_AUTH": "false"
"LLAMACTL_CONFIG_PATH": "${workspaceFolder}/llamactl.dev.yaml"
},
}
]

View File

@@ -86,7 +86,7 @@ go install github.com/swaggo/swag/cmd/swag@latest
# Update Swagger comments in pkg/server/handlers.go
# Then regenerate docs
swag init -g cmd/server/main.go -o apidocs
swag init -g cmd/server/main.go
```
## Pull Request Guidelines

178
README.md
View File

@@ -1,61 +1,91 @@
# llamactl
![Build and Release](https://github.com/lordmathis/llamactl/actions/workflows/release.yaml/badge.svg) ![Go Tests](https://github.com/lordmathis/llamactl/actions/workflows/go_test.yaml/badge.svg) ![WebUI Tests](https://github.com/lordmathis/llamactl/actions/workflows/webui_test.yaml/badge.svg)
![Build and Release](https://github.com/lordmathis/llamactl/actions/workflows/release.yaml/badge.svg) ![Go Tests](https://github.com/lordmathis/llamactl/actions/workflows/go_test.yaml/badge.svg) ![WebUI Tests](https://github.com/lordmathis/llamactl/actions/workflows/webui_test.yaml/badge.svg) ![User Docs](https://github.com/lordmathis/llamactl/actions/workflows/docs.yaml/badge.svg)
**Management server and proxy for multiple llama.cpp instances with OpenAI-compatible API routing.**
**Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.**
## Why llamactl?
🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
🔐 **API Key Authentication**: Separate keys for management vs inference access
📊 **Instance Monitoring**: Health checks, auto-restart, log management
**Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
💡 **On-Demand Instance Start**: Automatically launch instances upon receiving OpenAI-compatible API requests
💾 **State Persistence**: Ensure instances remain intact across server restarts
📚 **[Full Documentation →](https://llamactl.org)**
![Dashboard Screenshot](docs/images/dashboard.png)
**Choose llamactl if**: You need authentication, health monitoring, auto-restart, and centralized management of multiple llama-server instances
**Choose Ollama if**: You want the simplest setup with strong community ecosystem and third-party integrations
**Choose LM Studio if**: You prefer a polished desktop GUI experience with easy model management
## Features
**🚀 Easy Model Management**
- **Multiple Models Simultaneously**: Run different models at the same time (7B for speed, 70B for quality)
- **Smart Resource Management**: Automatic idle timeout, LRU eviction, and configurable instance limits
- **Web Dashboard**: Modern React UI for managing instances, monitoring health, and viewing logs
**🔗 Flexible Integration**
- **OpenAI API Compatible**: Drop-in replacement - route requests to different models by instance name
- **Multi-Backend Support**: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
- **Docker Ready**: Run backends in containers with full GPU support
**🌐 Distributed Deployment**
- **Remote Instances**: Deploy instances on remote hosts
- **Central Management**: Manage everything from a single dashboard with automatic routing
## Quick Start
1. Install a backend (llama.cpp, MLX, or vLLM) - see [Prerequisites](#prerequisites) below
2. [Download llamactl](#installation) for your platform
3. Run `llamactl` and open http://localhost:8080
4. Create an instance and start inferencing!
## Prerequisites
### Backend Dependencies
**For llama.cpp backend:**
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
```bash
# 1. Install llama-server (one-time setup)
# See: https://github.com/ggml-org/llama.cpp#quick-start
# Homebrew (macOS)
brew install llama.cpp
# 2. Download and run llamactl
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz | tar -xz
sudo mv llamactl /usr/local/bin/
# 3. Start the server
llamactl
# Access dashboard at http://localhost:8080
# Or build from source - see llama.cpp docs
# Or use Docker - no local installation required
```
## Usage
**For MLX backend (macOS only):**
You need MLX-LM installed:
### Create and manage instances via web dashboard:
1. Open http://localhost:8080
2. Click "Create Instance"
3. Set model path and GPU layers
4. Start or stop the instance
### Or use the REST API:
```bash
# Create instance
curl -X POST localhost:8080/api/v1/instances/my-7b-model \
-H "Authorization: Bearer your-key" \
-d '{"model": "/path/to/model.gguf", "gpu_layers": 32}'
# Install via pip (requires Python 3.8+)
pip install mlx-lm
# Use with OpenAI SDK
curl -X POST localhost:8080/v1/chat/completions \
-H "Authorization: Bearer your-key" \
-d '{"model": "my-7b-model", "messages": [{"role": "user", "content": "Hello!"}]}'
# Or in a virtual environment (recommended)
python -m venv mlx-env
source mlx-env/bin/activate
pip install mlx-lm
```
**For vLLM backend:**
You need vLLM installed:
```bash
# Install via pip (requires Python 3.8+, GPU required)
pip install vllm
# Or in a virtual environment (recommended)
python -m venv vllm-env
source vllm-env/bin/activate
pip install vllm
# Or use Docker - no local installation required
```
### Docker Support
llamactl can run backends in Docker containers, eliminating the need for local backend installation:
```yaml
backends:
llama-cpp:
docker:
enabled: true
vllm:
docker:
enabled: true
```
## Installation
@@ -74,7 +104,27 @@ sudo mv llamactl /usr/local/bin/
# Windows - Download from releases page
```
### Option 2: Build from Source
### Option 2: Docker (No local backend installation required)
```bash
# Clone repository and build Docker images
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
mkdir -p data/llamacpp data/vllm models
# Build and start llamactl with llama.cpp CUDA backend
docker-compose -f docker/docker-compose.yml up llamactl-llamacpp -d
# Build and start llamactl with vLLM CUDA backend
docker-compose -f docker/docker-compose.yml up llamactl-vllm -d
# Build from source using multi-stage build
docker build -f docker/Dockerfile.source -t llamactl:source .
```
**Note:** Dockerfiles are configured for CUDA. Adapt base images for other platforms (CPU, ROCm, etc.).
### Option 3: Build from Source
Requires Go 1.24+ and Node.js 22+
```bash
git clone https://github.com/lordmathis/llamactl.git
@@ -83,17 +133,13 @@ cd webui && npm ci && npm run build && cd ..
go build -o llamactl ./cmd/server
```
## Prerequisites
## Usage
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
```bash
# Quick install methods:
# Homebrew (macOS)
brew install llama.cpp
# Or build from source - see llama.cpp docs
```
1. Open http://localhost:8080
2. Click "Create Instance"
3. Choose backend type (llama.cpp, MLX, or vLLM)
4. Configure your model and options (ports and API keys are auto-assigned)
5. Start the instance and use it with any OpenAI-compatible client
## Configuration
@@ -104,8 +150,35 @@ server:
host: "0.0.0.0" # Server host to bind to
port: 8080 # Server port to bind to
allowed_origins: ["*"] # Allowed CORS origins (default: all)
allowed_headers: ["*"] # Allowed CORS headers (default: all)
enable_swagger: false # Enable Swagger UI for API docs
backends:
llama-cpp:
command: "llama-server"
args: []
environment: {} # Environment variables for the backend process
docker:
enabled: false
image: "ghcr.io/ggml-org/llama.cpp:server"
args: ["run", "--rm", "--network", "host", "--gpus", "all", "-v", "~/.local/share/llamactl/llama.cpp:/root/.cache/llama.cpp"]
environment: {} # Environment variables for the container
vllm:
command: "vllm"
args: ["serve"]
environment: {} # Environment variables for the backend process
docker:
enabled: false
image: "vllm/vllm-openai:latest"
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g", "-v", "~/.local/share/llamactl/huggingface:/root/.cache/huggingface"]
environment: {} # Environment variables for the container
mlx:
command: "mlx_lm.server"
args: []
environment: {} # Environment variables for the backend process
instances:
port_range: [8000, 9000] # Port range for instances
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
@@ -115,7 +188,6 @@ instances:
max_instances: -1 # Max instances (-1 = unlimited)
max_running_instances: -1 # Max running instances (-1 = unlimited)
enable_lru_eviction: true # Enable LRU eviction for idle instances
llama_executable: llama-server # Path to llama-server executable
default_auto_restart: true # Auto-restart new instances by default
default_max_restarts: 3 # Max restarts for new instances
default_restart_delay: 5 # Restart delay (seconds) for new instances

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,822 +0,0 @@
basePath: /api/v1
definitions:
instance.CreateInstanceOptions:
properties:
alias:
type: string
api_key:
type: string
api_key_file:
type: string
auto_restart:
description: Auto restart
type: boolean
batch_size:
type: integer
cache_reuse:
type: integer
cache_type_k:
type: string
cache_type_k_draft:
type: string
cache_type_v:
type: string
cache_type_v_draft:
type: string
chat_template:
type: string
chat_template_file:
type: string
chat_template_kwargs:
type: string
check_tensors:
type: boolean
cont_batching:
type: boolean
control_vector:
items:
type: string
type: array
control_vector_layer_range:
type: string
control_vector_scaled:
items:
type: string
type: array
cpu_mask:
type: string
cpu_mask_batch:
type: string
cpu_range:
type: string
cpu_range_batch:
type: string
cpu_strict:
type: integer
cpu_strict_batch:
type: integer
ctx_size:
type: integer
ctx_size_draft:
type: integer
defrag_thold:
type: number
device:
type: string
device_draft:
type: string
draft_max:
type: integer
draft_min:
type: integer
draft_p_min:
type: number
dry_allowed_length:
type: integer
dry_base:
type: number
dry_multiplier:
type: number
dry_penalty_last_n:
type: integer
dry_sequence_breaker:
items:
type: string
type: array
dump_kv_cache:
type: boolean
dynatemp_exp:
type: number
dynatemp_range:
type: number
embd_bge_small_en_default:
description: Default model params
type: boolean
embd_e5_small_en_default:
type: boolean
embd_gte_small_default:
type: boolean
embedding:
type: boolean
escape:
type: boolean
fim_qwen_1_5b_default:
type: boolean
fim_qwen_3b_default:
type: boolean
fim_qwen_7b_default:
type: boolean
fim_qwen_7b_spec:
type: boolean
fim_qwen_14b_spec:
type: boolean
flash_attn:
type: boolean
frequency_penalty:
type: number
gpu_layers:
type: integer
gpu_layers_draft:
type: integer
grammar:
type: string
grammar_file:
type: string
hf_file:
type: string
hf_file_v:
type: string
hf_repo:
type: string
hf_repo_draft:
type: string
hf_repo_v:
type: string
hf_token:
type: string
host:
type: string
idle_timeout:
description: Idle timeout
type: integer
ignore_eos:
type: boolean
jinja:
type: boolean
json_schema:
type: string
json_schema_file:
type: string
keep:
type: integer
log_colors:
type: boolean
log_disable:
type: boolean
log_file:
type: string
log_prefix:
type: boolean
log_timestamps:
type: boolean
logit_bias:
items:
type: string
type: array
lora:
items:
type: string
type: array
lora_init_without_apply:
type: boolean
lora_scaled:
items:
type: string
type: array
main_gpu:
type: integer
max_restarts:
type: integer
metrics:
type: boolean
min_p:
type: number
mirostat:
type: integer
mirostat_ent:
type: number
mirostat_lr:
type: number
mlock:
type: boolean
mmproj:
type: string
mmproj_url:
type: string
model:
type: string
model_draft:
type: string
model_url:
type: string
model_vocoder:
description: Audio/TTS params
type: string
no_cont_batching:
type: boolean
no_context_shift:
description: Example-specific params
type: boolean
no_escape:
type: boolean
no_kv_offload:
type: boolean
no_mmap:
type: boolean
no_mmproj:
type: boolean
no_mmproj_offload:
type: boolean
no_perf:
type: boolean
no_prefill_assistant:
type: boolean
no_slots:
type: boolean
no_warmup:
type: boolean
no_webui:
type: boolean
numa:
type: string
on_demand_start:
description: On demand start
type: boolean
override_kv:
items:
type: string
type: array
override_tensor:
items:
type: string
type: array
parallel:
type: integer
path:
type: string
poll:
type: integer
poll_batch:
type: integer
pooling:
type: string
port:
type: integer
predict:
type: integer
presence_penalty:
type: number
prio:
type: integer
prio_batch:
type: integer
props:
type: boolean
reasoning_budget:
type: integer
reasoning_format:
type: string
repeat_last_n:
type: integer
repeat_penalty:
type: number
reranking:
type: boolean
restart_delay:
type: integer
rope_freq_base:
type: number
rope_freq_scale:
type: number
rope_scale:
type: number
rope_scaling:
type: string
samplers:
description: Sampling params
type: string
sampling_seq:
type: string
seed:
type: integer
slot_prompt_similarity:
type: number
slot_save_path:
type: string
slots:
type: boolean
special:
type: boolean
split_mode:
type: string
spm_infill:
type: boolean
ssl_cert_file:
type: string
ssl_key_file:
type: string
temp:
type: number
tensor_split:
type: string
threads:
type: integer
threads_batch:
type: integer
threads_http:
type: integer
timeout:
type: integer
top_k:
type: integer
top_p:
type: number
tts_use_guide_tokens:
type: boolean
typical:
type: number
ubatch_size:
type: integer
verbose:
type: boolean
verbose_prompt:
description: Common params
type: boolean
verbosity:
type: integer
xtc_probability:
type: number
xtc_threshold:
type: number
yarn_attn_factor:
type: number
yarn_beta_fast:
type: number
yarn_beta_slow:
type: number
yarn_ext_factor:
type: number
yarn_orig_ctx:
type: integer
type: object
instance.InstanceStatus:
enum:
- 0
- 1
- 2
type: integer
x-enum-varnames:
- Stopped
- Running
- Failed
instance.Process:
properties:
created:
description: Creation time
type: integer
name:
type: string
status:
allOf:
- $ref: '#/definitions/instance.InstanceStatus'
description: Status
type: object
server.OpenAIInstance:
properties:
created:
type: integer
id:
type: string
object:
type: string
owned_by:
type: string
type: object
server.OpenAIListInstancesResponse:
properties:
data:
items:
$ref: '#/definitions/server.OpenAIInstance'
type: array
object:
type: string
type: object
info:
contact: {}
description: llamactl is a control server for managing Llama Server instances.
license:
name: MIT License
url: https://opensource.org/license/mit/
title: llamactl API
version: "1.0"
paths:
/instances:
get:
description: Returns a list of all instances managed by the server
responses:
"200":
description: List of instances
schema:
items:
$ref: '#/definitions/instance.Process'
type: array
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: List all instances
tags:
- instances
/instances/{name}:
delete:
description: Stops and removes a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"204":
description: No Content
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Delete an instance
tags:
- instances
get:
description: Returns the details of a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Get details of a specific instance
tags:
- instances
post:
consumes:
- application/json
description: Creates a new instance with the provided configuration options
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
- description: Instance configuration options
in: body
name: options
required: true
schema:
$ref: '#/definitions/instance.CreateInstanceOptions'
responses:
"201":
description: Created instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid request body
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Create and start a new instance
tags:
- instances
put:
consumes:
- application/json
description: Updates the configuration of a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
- description: Instance configuration options
in: body
name: options
required: true
schema:
$ref: '#/definitions/instance.CreateInstanceOptions'
responses:
"200":
description: Updated instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Update an instance's configuration
tags:
- instances
/instances/{name}/logs:
get:
description: Returns the logs from a specific instance by name with optional
line limit
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
- description: 'Number of lines to retrieve (default: all lines)'
in: query
name: lines
type: string
responses:
"200":
description: Instance logs
schema:
type: string
"400":
description: Invalid name format or lines parameter
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Get logs from a specific instance
tags:
- instances
/instances/{name}/proxy:
get:
description: Forwards HTTP requests to the llama-server instance running on
a specific port
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Request successfully proxied to instance
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
"503":
description: Instance is not running
schema:
type: string
security:
- ApiKeyAuth: []
summary: Proxy requests to a specific instance
tags:
- instances
post:
description: Forwards HTTP requests to the llama-server instance running on
a specific port
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Request successfully proxied to instance
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
"503":
description: Instance is not running
schema:
type: string
security:
- ApiKeyAuth: []
summary: Proxy requests to a specific instance
tags:
- instances
/instances/{name}/restart:
post:
description: Restarts a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Restarted instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Restart a running instance
tags:
- instances
/instances/{name}/start:
post:
description: Starts a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Started instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Start a stopped instance
tags:
- instances
/instances/{name}/stop:
post:
description: Stops a specific instance by name
parameters:
- description: Instance Name
in: path
name: name
required: true
type: string
responses:
"200":
description: Stopped instance details
schema:
$ref: '#/definitions/instance.Process'
"400":
description: Invalid name format
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Stop a running instance
tags:
- instances
/server/devices:
get:
description: Returns a list of available devices for the llama server
responses:
"200":
description: List of devices
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: List available devices for llama server
tags:
- server
/server/help:
get:
description: Returns the help text for the llama server command
responses:
"200":
description: Help text
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Get help for llama server
tags:
- server
/server/version:
get:
description: Returns the version of the llama server command
responses:
"200":
description: Version information
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Get version of llama server
tags:
- server
/v1/:
post:
consumes:
- application/json
description: Handles all POST requests to /v1/*, routing to the appropriate
instance based on the request body. Requires API key authentication via the
`Authorization` header.
responses:
"200":
description: OpenAI response
"400":
description: Invalid request body or model name
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: OpenAI-compatible proxy endpoint
tags:
- openai
/v1/models:
get:
description: Returns a list of instances in a format compatible with OpenAI
API
responses:
"200":
description: List of OpenAI-compatible instances
schema:
$ref: '#/definitions/server.OpenAIListInstancesResponse'
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: List instances in OpenAI-compatible format
tags:
- openai
/version:
get:
description: Returns the version of the llamactl command
responses:
"200":
description: Version information
schema:
type: string
"500":
description: Internal Server Error
schema:
type: string
security:
- ApiKeyAuth: []
summary: Get llamactl version
tags:
- version
swagger: "2.0"

View File

@@ -22,6 +22,9 @@ var buildTime string = "unknown"
// @license.name MIT License
// @license.url https://opensource.org/license/mit/
// @basePath /api/v1
// @securityDefinitions.apikey ApiKeyAuth
// @in header
// @name X-API-Key
func main() {
// --version flag to print the version
@@ -58,7 +61,7 @@ func main() {
}
// Initialize the instance manager
instanceManager := manager.NewInstanceManager(cfg.Instances)
instanceManager := manager.New(&cfg)
// Create a new handler with the instance manager
handler := server.NewHandler(instanceManager, cfg)

View File

@@ -0,0 +1,23 @@
FROM ghcr.io/ggml-org/llama.cpp:server-cuda
# Install curl for downloading llamactl
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Download and install the latest llamactl release
RUN LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/') && \
curl -L "https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz" | tar -xz && \
mv llamactl /usr/local/bin/ && \
chmod +x /usr/local/bin/llamactl
# Set working directory
RUN mkdir -p /data
WORKDIR /data
# Expose the default llamactl port
EXPOSE 8080
ENV LLAMACTL_LLAMACPP_COMMAND=/app/llama-server
ENV LD_LIBRARY_PATH="/app:/usr/local/lib:/usr/lib"
# Set llamactl as the entrypoint
ENTRYPOINT ["llamactl"]

64
docker/Dockerfile.source Normal file
View File

@@ -0,0 +1,64 @@
# WebUI build stage
FROM node:20-alpine AS webui-builder
WORKDIR /webui
# Copy webui package files
COPY webui/package*.json ./
# Install dependencies
RUN npm ci
# Copy webui source
COPY webui/ ./
# Build webui
RUN npm run build
# Go build stage
FROM golang:1.24-alpine AS builder
# Install build dependencies
RUN apk add --no-cache git ca-certificates
# Set working directory
WORKDIR /build
# Copy go mod files
COPY go.mod go.sum ./
# Download dependencies
RUN go mod download
# Copy source code
COPY cmd/ ./cmd/
COPY pkg/ ./pkg/
COPY docs/ ./docs/
COPY webui/webui.go ./webui/
# Copy built webui from webui-builder
COPY --from=webui-builder /webui/dist ./webui/dist
# Build the application
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -ldflags="-w -s" -o llamactl ./cmd/server
# Final stage
FROM alpine:latest
# Install runtime dependencies
RUN apk --no-cache add ca-certificates
# Create data directory
RUN mkdir -p /data
# Set working directory
WORKDIR /data
# Copy binary from builder
COPY --from=builder /build/llamactl /usr/local/bin/llamactl
# Expose the default port
EXPOSE 8080
# Set llamactl as the entrypoint
ENTRYPOINT ["llamactl"]

20
docker/Dockerfile.vllm Normal file
View File

@@ -0,0 +1,20 @@
FROM vllm/vllm-openai:latest
# Install curl for downloading llamactl
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Download and install the latest llamactl release
RUN LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/') && \
curl -L "https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-linux-amd64.tar.gz" | tar -xz && \
mv llamactl /usr/local/bin/ && \
chmod +x /usr/local/bin/llamactl
# Set working directory
RUN mkdir -p /data
WORKDIR /data
# Expose the default llamactl port
EXPOSE 8080
# Set llamactl as the entrypoint
ENTRYPOINT ["llamactl"]

56
docker/docker-compose.yml Normal file
View File

@@ -0,0 +1,56 @@
version: '3.8'
services:
llamactl-llamacpp:
build:
context: ..
dockerfile: docker/Dockerfile.llamacpp
image: llamactl:llamacpp-cuda
container_name: llamactl-llamacpp
ports:
- "8080:8080"
volumes:
- ./data/llamacpp:/data
- ./models:/models # Mount models directory
- ~/.cache/llama.cpp:/root/.cache/llama.cpp # Llama.cpp cache
environment:
# Set data directory for persistence
- LLAMACTL_DATA_DIR=/data
# Enable Docker mode for nested containers (if needed)
- LLAMACTL_LLAMACPP_DOCKER_ENABLED=false
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped
llamactl-vllm:
build:
context: ..
dockerfile: docker/Dockerfile.vllm
image: llamactl:vllm-cuda
container_name: llamactl-vllm
ports:
- "8081:8080" # Use different port to avoid conflicts
volumes:
- ./data/vllm:/data
- ./models:/models # Mount models directory
- ~/.cache/huggingface:/root/.cache/huggingface # HuggingFace cache
environment:
# Set data directory for persistence
- LLAMACTL_DATA_DIR=/data
# Enable Docker mode for nested containers (if needed)
- LLAMACTL_VLLM_DOCKER_ENABLED=false
# vLLM specific environment variables
- CUDA_VISIBLE_DEVICES=all
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
restart: unless-stopped

View File

@@ -1,4 +1,6 @@
mkdocs-material==9.5.3
mkdocs==1.5.3
pymdown-extensions==10.7
mkdocs-git-revision-date-localized-plugin==1.2.4
mkdocs-material==9.6.22
mkdocs==1.6.1
pymdown-extensions==10.16.1
mkdocs-git-revision-date-localized-plugin==1.4.7
mike==2.1.3
neoteroi-mkdocs==1.1.3

1
docs/api-reference.md Normal file
View File

@@ -0,0 +1 @@
[OAD(swagger.yaml)]

270
docs/configuration.md Normal file
View File

@@ -0,0 +1,270 @@
# Configuration
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
```
Defaults < Configuration file < Environment variables
```
llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.
## Default Configuration
Here's the default configuration with all available options:
```yaml
server:
host: "0.0.0.0" # Server host to bind to
port: 8080 # Server port to bind to
allowed_origins: ["*"] # Allowed CORS origins (default: all)
allowed_headers: ["*"] # Allowed CORS headers (default: all)
enable_swagger: false # Enable Swagger UI for API docs
backends:
llama-cpp:
command: "llama-server"
args: []
environment: {} # Environment variables for the backend process
docker:
enabled: false
image: "ghcr.io/ggml-org/llama.cpp:server"
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
environment: {}
response_headers: {} # Additional response headers to send with responses
vllm:
command: "vllm"
args: ["serve"]
environment: {} # Environment variables for the backend process
docker:
enabled: false
image: "vllm/vllm-openai:latest"
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
environment: {}
response_headers: {} # Additional response headers to send with responses
mlx:
command: "mlx_lm.server"
args: []
environment: {} # Environment variables for the backend process
response_headers: {} # Additional response headers to send with responses
instances:
port_range: [8000, 9000] # Port range for instances
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
configs_dir: ~/.local/share/llamactl/instances # Instance configs directory
logs_dir: ~/.local/share/llamactl/logs # Logs directory
auto_create_dirs: true # Auto-create data/config/logs dirs if missing
max_instances: -1 # Max instances (-1 = unlimited)
max_running_instances: -1 # Max running instances (-1 = unlimited)
enable_lru_eviction: true # Enable LRU eviction for idle instances
default_auto_restart: true # Auto-restart new instances by default
default_max_restarts: 3 # Max restarts for new instances
default_restart_delay: 5 # Restart delay (seconds) for new instances
default_on_demand_start: true # Default on-demand start setting
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
timeout_check_interval: 5 # Idle instance timeout check in minutes
auth:
require_inference_auth: true # Require auth for inference endpoints
inference_keys: [] # Keys for inference endpoints
require_management_auth: true # Require auth for management endpoints
management_keys: [] # Keys for management endpoints
local_node: "main" # Name of the local node (default: "main")
nodes: # Node configuration for multi-node deployment
main: # Default local node (empty config)
```
## Configuration Files
### Configuration File Locations
Configuration files are searched in the following locations (in order of precedence, first found is used):
**Linux:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `$HOME/.config/llamactl/config.yaml`
- `/etc/llamactl/config.yaml`
**macOS:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `$HOME/Library/Application Support/llamactl/config.yaml`
- `/Library/Application Support/llamactl/config.yaml`
**Windows:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `%APPDATA%\llamactl\config.yaml`
- `%USERPROFILE%\llamactl\config.yaml`
- `%PROGRAMDATA%\llamactl\config.yaml`
You can specify the path to config file with `LLAMACTL_CONFIG_PATH` environment variable.
## Configuration Options
### Server Configuration
```yaml
server:
host: "0.0.0.0" # Server host to bind to (default: "0.0.0.0")
port: 8080 # Server port to bind to (default: 8080)
allowed_origins: ["*"] # CORS allowed origins (default: ["*"])
allowed_headers: ["*"] # CORS allowed headers (default: ["*"])
enable_swagger: false # Enable Swagger UI (default: false)
```
**Environment Variables:**
- `LLAMACTL_HOST` - Server host
- `LLAMACTL_PORT` - Server port
- `LLAMACTL_ALLOWED_ORIGINS` - Comma-separated CORS origins
- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
### Backend Configuration
```yaml
backends:
llama-cpp:
command: "llama-server"
args: []
environment: {} # Environment variables for the backend process
docker:
enabled: false # Enable Docker runtime (default: false)
image: "ghcr.io/ggml-org/llama.cpp:server"
args: ["run", "--rm", "--network", "host", "--gpus", "all"]
environment: {}
response_headers: {} # Additional response headers to send with responses
vllm:
command: "vllm"
args: ["serve"]
environment: {} # Environment variables for the backend process
docker:
enabled: false # Enable Docker runtime (default: false)
image: "vllm/vllm-openai:latest"
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
environment: {}
response_headers: {} # Additional response headers to send with responses
mlx:
command: "mlx_lm.server"
args: []
environment: {} # Environment variables for the backend process
# MLX does not support Docker
response_headers: {} # Additional response headers to send with responses
```
**Backend Configuration Fields:**
- `command`: Executable name/path for the backend
- `args`: Default arguments prepended to all instances
- `environment`: Environment variables for the backend process (optional)
- `response_headers`: Additional response headers to send with responses (optional)
- `docker`: Docker-specific configuration (optional)
- `enabled`: Boolean flag to enable Docker runtime
- `image`: Docker image to use
- `args`: Additional arguments passed to `docker run`
- `environment`: Environment variables for the container (optional)
> If llamactl is behind an NGINX proxy, `X-Accel-Buffering: no` response header may be required for NGINX to properly stream the responses without buffering.
**Environment Variables:**
**LlamaCpp Backend:**
- `LLAMACTL_LLAMACPP_COMMAND` - LlamaCpp executable command
- `LLAMACTL_LLAMACPP_ARGS` - Space-separated default arguments
- `LLAMACTL_LLAMACPP_ENV` - Environment variables in format "KEY1=value1,KEY2=value2"
- `LLAMACTL_LLAMACPP_DOCKER_ENABLED` - Enable Docker runtime (true/false)
- `LLAMACTL_LLAMACPP_DOCKER_IMAGE` - Docker image to use
- `LLAMACTL_LLAMACPP_DOCKER_ARGS` - Space-separated Docker arguments
- `LLAMACTL_LLAMACPP_DOCKER_ENV` - Docker environment variables in format "KEY1=value1,KEY2=value2"
- `LLAMACTL_LLAMACPP_RESPONSE_HEADERS` - Response headers in format "KEY1=value1;KEY2=value2"
**VLLM Backend:**
- `LLAMACTL_VLLM_COMMAND` - VLLM executable command
- `LLAMACTL_VLLM_ARGS` - Space-separated default arguments
- `LLAMACTL_VLLM_ENV` - Environment variables in format "KEY1=value1,KEY2=value2"
- `LLAMACTL_VLLM_DOCKER_ENABLED` - Enable Docker runtime (true/false)
- `LLAMACTL_VLLM_DOCKER_IMAGE` - Docker image to use
- `LLAMACTL_VLLM_DOCKER_ARGS` - Space-separated Docker arguments
- `LLAMACTL_VLLM_DOCKER_ENV` - Docker environment variables in format "KEY1=value1,KEY2=value2"
- `LLAMACTL_VLLM_RESPONSE_HEADERS` - Response headers in format "KEY1=value1;KEY2=value2"
**MLX Backend:**
- `LLAMACTL_MLX_COMMAND` - MLX executable command
- `LLAMACTL_MLX_ARGS` - Space-separated default arguments
- `LLAMACTL_MLX_ENV` - Environment variables in format "KEY1=value1,KEY2=value2"
- `LLAMACTL_MLX_RESPONSE_HEADERS` - Response headers in format "KEY1=value1;KEY2=value2"
### Instance Configuration
```yaml
instances:
port_range: [8000, 9000] # Port range for instances (default: [8000, 9000])
data_dir: "~/.local/share/llamactl" # Directory for all llamactl data (default varies by OS)
configs_dir: "~/.local/share/llamactl/instances" # Directory for instance configs (default: data_dir/instances)
logs_dir: "~/.local/share/llamactl/logs" # Directory for instance logs (default: data_dir/logs)
auto_create_dirs: true # Automatically create data/config/logs directories (default: true)
max_instances: -1 # Maximum instances (-1 = unlimited)
max_running_instances: -1 # Maximum running instances (-1 = unlimited)
enable_lru_eviction: true # Enable LRU eviction for idle instances
default_auto_restart: true # Default auto-restart setting
default_max_restarts: 3 # Default maximum restart attempts
default_restart_delay: 5 # Default restart delay in seconds
default_on_demand_start: true # Default on-demand start setting
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
timeout_check_interval: 5 # Default instance timeout check interval in minutes
```
**Environment Variables:**
- `LLAMACTL_INSTANCE_PORT_RANGE` - Port range (format: "8000-9000" or "8000,9000")
- `LLAMACTL_DATA_DIRECTORY` - Data directory path
- `LLAMACTL_INSTANCES_DIR` - Instance configs directory path
- `LLAMACTL_LOGS_DIR` - Log directory path
- `LLAMACTL_AUTO_CREATE_DATA_DIR` - Auto-create data/config/logs directories (true/false)
- `LLAMACTL_MAX_INSTANCES` - Maximum number of instances
- `LLAMACTL_MAX_RUNNING_INSTANCES` - Maximum number of running instances
- `LLAMACTL_ENABLE_LRU_EVICTION` - Enable LRU eviction for idle instances
- `LLAMACTL_DEFAULT_AUTO_RESTART` - Default auto-restart setting (true/false)
- `LLAMACTL_DEFAULT_MAX_RESTARTS` - Default maximum restarts
- `LLAMACTL_DEFAULT_RESTART_DELAY` - Default restart delay in seconds
- `LLAMACTL_DEFAULT_ON_DEMAND_START` - Default on-demand start setting (true/false)
- `LLAMACTL_ON_DEMAND_START_TIMEOUT` - Default on-demand start timeout in seconds
- `LLAMACTL_TIMEOUT_CHECK_INTERVAL` - Default instance timeout check interval in minutes
### Authentication Configuration
```yaml
auth:
require_inference_auth: true # Require API key for OpenAI endpoints (default: true)
inference_keys: [] # List of valid inference API keys
require_management_auth: true # Require API key for management endpoints (default: true)
management_keys: [] # List of valid management API keys
```
**Environment Variables:**
- `LLAMACTL_REQUIRE_INFERENCE_AUTH` - Require auth for OpenAI endpoints (true/false)
- `LLAMACTL_INFERENCE_KEYS` - Comma-separated inference API keys
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
### Remote Node Configuration
llamactl supports remote node deployments. Configure remote nodes to deploy instances on remote hosts and manage them centrally.
```yaml
local_node: "main" # Name of the local node (default: "main")
nodes: # Node configuration map
main: # Local node (empty address means local)
address: "" # Not used for local node
api_key: "" # Not used for local node
worker1: # Remote worker node
address: "http://192.168.1.10:8080"
api_key: "worker1-api-key" # Management API key for authentication
```
**Node Configuration Fields:**
- `local_node`: Specifies which node in the `nodes` map represents the local node. Must match exactly what other nodes call this node.
- `nodes`: Map of node configurations
- `address`: HTTP/HTTPS URL of the remote node (empty for local node)
- `api_key`: Management API key for authenticating with the remote node
**Environment Variables:**
- `LLAMACTL_LOCAL_NODE` - Name of the local node

1814
docs/css/css-v1.1.3.css Normal file

File diff suppressed because it is too large Load Diff

1594
docs/docs.go Normal file

File diff suppressed because it is too large Load Diff

60
docs/fix_line_endings.py Normal file
View File

@@ -0,0 +1,60 @@
"""
MkDocs hook to fix line endings for proper rendering.
Automatically adds two spaces at the end of lines that need line breaks.
"""
import re
def on_page_markdown(markdown, page, config, **kwargs):
"""
Fix line endings in markdown content for proper MkDocs rendering.
Adds two spaces at the end of lines that need line breaks.
"""
lines = markdown.split('\n')
processed_lines = []
in_code_block = False
for i, line in enumerate(lines):
stripped = line.strip()
# Track code blocks
if stripped.startswith('```'):
in_code_block = not in_code_block
processed_lines.append(line)
continue
# Skip processing inside code blocks
if in_code_block:
processed_lines.append(line)
continue
# Skip empty lines
if not stripped:
processed_lines.append(line)
continue
# Skip lines that shouldn't have line breaks:
# - Headers (# ## ###)
# - Blockquotes (>)
# - Table rows (|)
# - Lines already ending with two spaces
# - YAML front matter and HTML tags
# - Standalone punctuation lines
if (stripped.startswith('#') or
stripped.startswith('>') or
'|' in stripped or
line.endswith(' ') or
stripped.startswith('---') or
stripped.startswith('<') or
stripped.endswith('>') or
stripped in ('.', '!', '?', ':', ';', '```', '---', ',')):
processed_lines.append(line)
continue
# Add two spaces to lines that end with regular text or most punctuation
if stripped and not in_code_block:
processed_lines.append(line.rstrip() + ' ')
else:
processed_lines.append(line)
return '\n'.join(processed_lines)

View File

@@ -1,150 +0,0 @@
# Configuration
llamactl can be configured via configuration files or environment variables. Configuration is loaded in the following order of precedence:
```
Defaults < Configuration file < Environment variables
```
llamactl works out of the box with sensible defaults, but you can customize the behavior to suit your needs.
## Default Configuration
Here's the default configuration with all available options:
```yaml
server:
host: "0.0.0.0" # Server host to bind to
port: 8080 # Server port to bind to
allowed_origins: ["*"] # Allowed CORS origins (default: all)
enable_swagger: false # Enable Swagger UI for API docs
instances:
port_range: [8000, 9000] # Port range for instances
data_dir: ~/.local/share/llamactl # Data directory (platform-specific, see below)
configs_dir: ~/.local/share/llamactl/instances # Instance configs directory
logs_dir: ~/.local/share/llamactl/logs # Logs directory
auto_create_dirs: true # Auto-create data/config/logs dirs if missing
max_instances: -1 # Max instances (-1 = unlimited)
max_running_instances: -1 # Max running instances (-1 = unlimited)
enable_lru_eviction: true # Enable LRU eviction for idle instances
llama_executable: llama-server # Path to llama-server executable
default_auto_restart: true # Auto-restart new instances by default
default_max_restarts: 3 # Max restarts for new instances
default_restart_delay: 5 # Restart delay (seconds) for new instances
default_on_demand_start: true # Default on-demand start setting
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
timeout_check_interval: 5 # Idle instance timeout check in minutes
auth:
require_inference_auth: true # Require auth for inference endpoints
inference_keys: [] # Keys for inference endpoints
require_management_auth: true # Require auth for management endpoints
management_keys: [] # Keys for management endpoints
```
## Configuration Files
### Configuration File Locations
Configuration files are searched in the following locations (in order of precedence):
**Linux:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `$HOME/.config/llamactl/config.yaml`
- `/etc/llamactl/config.yaml`
**macOS:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `$HOME/Library/Application Support/llamactl/config.yaml`
- `/Library/Application Support/llamactl/config.yaml`
**Windows:**
- `./llamactl.yaml` or `./config.yaml` (current directory)
- `%APPDATA%\llamactl\config.yaml`
- `%USERPROFILE%\llamactl\config.yaml`
- `%PROGRAMDATA%\llamactl\config.yaml`
You can specify the path to config file with `LLAMACTL_CONFIG_PATH` environment variable.
## Configuration Options
### Server Configuration
```yaml
server:
host: "0.0.0.0" # Server host to bind to (default: "0.0.0.0")
port: 8080 # Server port to bind to (default: 8080)
allowed_origins: ["*"] # CORS allowed origins (default: ["*"])
enable_swagger: false # Enable Swagger UI (default: false)
```
**Environment Variables:**
- `LLAMACTL_HOST` - Server host
- `LLAMACTL_PORT` - Server port
- `LLAMACTL_ALLOWED_ORIGINS` - Comma-separated CORS origins
- `LLAMACTL_ENABLE_SWAGGER` - Enable Swagger UI (true/false)
### Instance Configuration
```yaml
instances:
port_range: [8000, 9000] # Port range for instances (default: [8000, 9000])
data_dir: "~/.local/share/llamactl" # Directory for all llamactl data (default varies by OS)
configs_dir: "~/.local/share/llamactl/instances" # Directory for instance configs (default: data_dir/instances)
logs_dir: "~/.local/share/llamactl/logs" # Directory for instance logs (default: data_dir/logs)
auto_create_dirs: true # Automatically create data/config/logs directories (default: true)
max_instances: -1 # Maximum instances (-1 = unlimited)
max_running_instances: -1 # Maximum running instances (-1 = unlimited)
enable_lru_eviction: true # Enable LRU eviction for idle instances
llama_executable: "llama-server" # Path to llama-server executable
default_auto_restart: true # Default auto-restart setting
default_max_restarts: 3 # Default maximum restart attempts
default_restart_delay: 5 # Default restart delay in seconds
default_on_demand_start: true # Default on-demand start setting
on_demand_start_timeout: 120 # Default on-demand start timeout in seconds
timeout_check_interval: 5 # Default instance timeout check interval in minutes
```
**Environment Variables:**
- `LLAMACTL_INSTANCE_PORT_RANGE` - Port range (format: "8000-9000" or "8000,9000")
- `LLAMACTL_DATA_DIRECTORY` - Data directory path
- `LLAMACTL_INSTANCES_DIR` - Instance configs directory path
- `LLAMACTL_LOGS_DIR` - Log directory path
- `LLAMACTL_AUTO_CREATE_DATA_DIR` - Auto-create data/config/logs directories (true/false)
- `LLAMACTL_MAX_INSTANCES` - Maximum number of instances
- `LLAMACTL_MAX_RUNNING_INSTANCES` - Maximum number of running instances
- `LLAMACTL_ENABLE_LRU_EVICTION` - Enable LRU eviction for idle instances
- `LLAMACTL_LLAMA_EXECUTABLE` - Path to llama-server executable
- `LLAMACTL_DEFAULT_AUTO_RESTART` - Default auto-restart setting (true/false)
- `LLAMACTL_DEFAULT_MAX_RESTARTS` - Default maximum restarts
- `LLAMACTL_DEFAULT_RESTART_DELAY` - Default restart delay in seconds
- `LLAMACTL_DEFAULT_ON_DEMAND_START` - Default on-demand start setting (true/false)
- `LLAMACTL_ON_DEMAND_START_TIMEOUT` - Default on-demand start timeout in seconds
- `LLAMACTL_TIMEOUT_CHECK_INTERVAL` - Default instance timeout check interval in minutes
### Authentication Configuration
```yaml
auth:
require_inference_auth: true # Require API key for OpenAI endpoints (default: true)
inference_keys: [] # List of valid inference API keys
require_management_auth: true # Require API key for management endpoints (default: true)
management_keys: [] # List of valid management API keys
```
**Environment Variables:**
- `LLAMACTL_REQUIRE_INFERENCE_AUTH` - Require auth for OpenAI endpoints (true/false)
- `LLAMACTL_INFERENCE_KEYS` - Comma-separated inference API keys
- `LLAMACTL_REQUIRE_MANAGEMENT_AUTH` - Require auth for management endpoints (true/false)
- `LLAMACTL_MANAGEMENT_KEYS` - Comma-separated management API keys
## Command Line Options
View all available command line options:
```bash
llamactl --help
```
You can also override configuration using command line flags when starting llamactl.

View File

@@ -1,70 +0,0 @@
# Installation
This guide will walk you through installing Llamactl on your system.
## Prerequisites
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
**Quick install methods:**
```bash
# Homebrew (macOS/Linux)
brew install llama.cpp
# Winget (Windows)
winget install llama.cpp
```
Or build from source - see llama.cpp docs
## Installation Methods
### Option 1: Download Binary (Recommended)
Download the latest release from the [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
```bash
# Linux/macOS - Get latest version and download
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
sudo mv llamactl /usr/local/bin/
# Or download manually from:
# https://github.com/lordmathis/llamactl/releases/latest
# Windows - Download from releases page
```
### Option 2: Build from Source
Requirements:
- Go 1.24 or later
- Node.js 22 or later
- Git
If you prefer to build from source:
```bash
# Clone the repository
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
# Build the web UI
cd webui && npm ci && npm run build && cd ..
# Build the application
go build -o llamactl ./cmd/server
```
## Verification
Verify your installation by checking the version:
```bash
llamactl --version
```
## Next Steps
Now that Llamactl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!

View File

@@ -1,143 +0,0 @@
# Quick Start
This guide will help you get Llamactl up and running in just a few minutes.
## Step 1: Start Llamactl
Start the Llamactl server:
```bash
llamactl
```
By default, Llamactl will start on `http://localhost:8080`.
## Step 2: Access the Web UI
Open your web browser and navigate to:
```
http://localhost:8080
```
Login with the management API key. By default it is generated during server startup. Copy it from the terminal output.
You should see the Llamactl web interface.
## Step 3: Create Your First Instance
1. Click the "Add Instance" button
2. Fill in the instance configuration:
- **Name**: Give your instance a descriptive name
- **Model Path**: Path to your Llama.cpp model file
- **Additional Options**: Any extra Llama.cpp parameters
3. Click "Create Instance"
## Step 4: Start Your Instance
Once created, you can:
- **Start** the instance by clicking the start button
- **Monitor** its status in real-time
- **View logs** by clicking the logs button
- **Stop** the instance when needed
## Example Configuration
Here's a basic example configuration for a Llama 2 model:
```json
{
"name": "llama2-7b",
"model_path": "/path/to/llama-2-7b-chat.gguf",
"options": {
"threads": 4,
"context_size": 2048
}
}
```
## Using the API
You can also manage instances via the REST API:
```bash
# List all instances
curl http://localhost:8080/api/instances
# Create a new instance
curl -X POST http://localhost:8080/api/instances \
-H "Content-Type: application/json" \
-d '{
"name": "my-model",
"model_path": "/path/to/model.gguf",
}'
# Start an instance
curl -X POST http://localhost:8080/api/instances/my-model/start
```
## OpenAI Compatible API
Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.
### Chat Completions
Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "Hello! Can you help me write a Python function?"
}
],
"max_tokens": 150,
"temperature": 0.7
}'
```
### Using with Python OpenAI Client
You can also use the official OpenAI Python client:
```python
from openai import OpenAI
# Point the client to your Llamactl server
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed" # Llamactl doesn't require API keys by default
)
# Create a chat completion
response = client.chat.completions.create(
model="my-model", # Use the name of your instance
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
max_tokens=200,
temperature=0.7
)
print(response.choices[0].message.content)
```
### List Available Models
Get a list of running instances (models) in OpenAI-compatible format:
```bash
curl http://localhost:8080/v1/models
```
## Next Steps
- Manage instances [Managing Instances](../user-guide/managing-instances.md)
- Explore the [API Reference](../user-guide/api-reference.md)
- Configure advanced settings in the [Configuration](configuration.md) guide

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 45 KiB

View File

@@ -1,40 +1,33 @@
# Llamactl Documentation
Welcome to the Llamactl documentation! **Management server and proxy for multiple llama.cpp instances with OpenAI-compatible API routing.**
Welcome to the Llamactl documentation!
![Dashboard Screenshot](images/dashboard.png)
## What is Llamactl?
Llamactl is designed to simplify the deployment and management of llama-server instances. It provides a modern solution for running multiple large language models with centralized management.
**{{HEADLINE}}**
## Features
🚀 **Multiple Model Serving**: Run different models simultaneously (7B for speed, 70B for quality)
🔗 **OpenAI API Compatible**: Drop-in replacement - route requests by model name
🌐 **Web Dashboard**: Modern React UI for visual management (unlike CLI-only tools)
🔐 **API Key Authentication**: Separate keys for management vs inference access
📊 **Instance Monitoring**: Health checks, auto-restart, log management
**Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
💡 **On-Demand Instance Start**: Automatically launch instances upon receiving OpenAI-compatible API requests
💾 **State Persistence**: Ensure instances remain intact across server restarts
{{FEATURES}}
## Quick Links
- [Installation Guide](getting-started/installation.md) - Get Llamactl up and running
- [Configuration Guide](getting-started/configuration.md) - Detailed configuration options
- [Quick Start](getting-started/quick-start.md) - Your first steps with Llamactl
- [Managing Instances](user-guide/managing-instances.md) - Instance lifecycle management
- [API Reference](user-guide/api-reference.md) - Complete API documentation
- [Installation Guide](installation.md) - Get Llamactl up and running
- [Configuration Guide](configuration.md) - Detailed configuration options
- [Quick Start](quick-start.md) - Your first steps with Llamactl
- [Managing Instances](managing-instances.md) - Instance lifecycle management
- [API Reference](api-reference.md) - Complete API documentation
## Getting Help
If you need help or have questions:
- Check the [Troubleshooting](user-guide/troubleshooting.md) guide
- Check the [Troubleshooting](troubleshooting.md) guide
- Visit the [GitHub repository](https://github.com/lordmathis/llamactl)
- Review the [Configuration Guide](getting-started/configuration.md) for advanced settings
- Review the [Configuration Guide](configuration.md) for advanced settings
## License

174
docs/installation.md Normal file
View File

@@ -0,0 +1,174 @@
# Installation
This guide will walk you through installing Llamactl on your system.
## Prerequisites
### Backend Dependencies
llamactl supports multiple backends. Install at least one:
**For llama.cpp backend (all platforms):**
You need `llama-server` from [llama.cpp](https://github.com/ggml-org/llama.cpp) installed:
```bash
# Homebrew (macOS/Linux)
brew install llama.cpp
# Winget (Windows)
winget install llama.cpp
```
Or build from source - see llama.cpp docs
**For MLX backend (macOS only):**
MLX provides optimized inference on Apple Silicon. Install MLX-LM:
```bash
# Install via pip (requires Python 3.8+)
pip install mlx-lm
# Or in a virtual environment (recommended)
python -m venv mlx-env
source mlx-env/bin/activate
pip install mlx-lm
```
Note: MLX backend is only available on macOS with Apple Silicon (M1, M2, M3, etc.)
**For vLLM backend:**
vLLM provides high-throughput distributed serving for LLMs. Install vLLM:
```bash
# Install in a virtual environment
python -m venv vllm-env
source vllm-env/bin/activate
pip install vllm
```
## Installation Methods
### Option 1: Download Binary (Recommended)
Download the latest release from the [GitHub releases page](https://github.com/lordmathis/llamactl/releases):
```bash
# Linux/macOS - Get latest version and download
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
sudo mv llamactl /usr/local/bin/
# Or download manually from:
# https://github.com/lordmathis/llamactl/releases/latest
# Windows - Download from releases page
```
### Option 2: Docker
llamactl provides Dockerfiles for creating Docker images with backends pre-installed. The resulting images include the latest llamactl release with the respective backend.
**Available Dockerfiles (CUDA):**
- **llamactl with llama.cpp CUDA**: `docker/Dockerfile.llamacpp` (based on `ghcr.io/ggml-org/llama.cpp:server-cuda`)
- **llamactl with vLLM CUDA**: `docker/Dockerfile.vllm` (based on `vllm/vllm-openai:latest`)
- **llamactl built from source**: `docker/Dockerfile.source` (multi-stage build with webui)
**Note:** These Dockerfiles are configured for CUDA. For other platforms (CPU, ROCm, Vulkan, etc.), adapt the base image. For llama.cpp, see available tags at [llama.cpp Docker docs](https://github.com/ggml-org/llama.cpp/blob/master/docs/docker.md). For vLLM, check [vLLM docs](https://docs.vllm.ai/en/v0.6.5/serving/deploying_with_docker.html).
**Using Docker Compose**
```bash
# Clone the repository
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
# Create directories for data and models
mkdir -p data/llamacpp data/vllm models
# Start llamactl with llama.cpp backend
docker-compose -f docker/docker-compose.yml up llamactl-llamacpp -d
# Or start llamactl with vLLM backend
docker-compose -f docker/docker-compose.yml up llamactl-vllm -d
```
Access the dashboard at:
- llamactl with llama.cpp: http://localhost:8080
- llamactl with vLLM: http://localhost:8081
**Using Docker Build and Run**
1. llamactl with llama.cpp CUDA:
```bash
docker build -f docker/Dockerfile.llamacpp -t llamactl:llamacpp-cuda .
docker run -d \
--name llamactl-llamacpp \
--gpus all \
-p 8080:8080 \
-v ~/.cache/llama.cpp:/root/.cache/llama.cpp \
llamactl:llamacpp-cuda
```
2. llamactl with vLLM CUDA:
```bash
docker build -f docker/Dockerfile.vllm -t llamactl:vllm-cuda .
docker run -d \
--name llamactl-vllm \
--gpus all \
-p 8080:8080 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
llamactl:vllm-cuda
```
3. llamactl built from source:
```bash
docker build -f docker/Dockerfile.source -t llamactl:source .
docker run -d \
--name llamactl \
-p 8080:8080 \
llamactl:source
```
### Option 3: Build from Source
Requirements:
- Go 1.24 or later
- Node.js 22 or later
- Git
If you prefer to build from source:
```bash
# Clone the repository
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
# Build the web UI
cd webui && npm ci && npm run build && cd ..
# Build the application
go build -o llamactl ./cmd/server
```
## Remote Node Installation
For deployments with remote nodes:
- Install llamactl on each node using any of the methods above
- Configure API keys for authentication between nodes
- Ensure node names are consistent across all configurations
## Verification
Verify your installation by checking the version:
```bash
llamactl --version
```
## Next Steps
Now that Llamactl is installed, continue to the [Quick Start](quick-start.md) guide to get your first instance running!
For remote node deployments, see the [Configuration Guide](configuration.md) for node setup instructions.

280
docs/managing-instances.md Normal file
View File

@@ -0,0 +1,280 @@
# Managing Instances
Learn how to effectively manage your llama.cpp, MLX, and vLLM instances with Llamactl through both the Web UI and API.
## Overview
Llamactl provides two ways to manage instances:
- **Web UI**: Accessible at `http://localhost:8080` with an intuitive dashboard
- **REST API**: Programmatic access for automation and integration
![Dashboard Screenshot](images/dashboard.png)
### Authentication
Llamactl uses a **Management API Key** to authenticate requests to the management API (creating, starting, stopping instances). All curl examples below use `<token>` as a placeholder - replace this with your actual Management API Key.
By default, authentication is required. If you don't configure a management API key in your configuration file, llamactl will auto-generate one and print it to the terminal on startup. See the [Configuration](configuration.md) guide for details.
For Web UI access:
1. Navigate to the web UI
2. Enter your Management API Key
3. Bearer token is stored for the session
### Theme Support
- Switch between light and dark themes
- Setting is remembered across sessions
## Instance Cards
Each instance is displayed as a card showing:
- **Instance name**
- **Health status badge** (unknown, ready, error, failed)
- **Action buttons** (start, stop, edit, logs, delete)
## Create Instance
**Via Web UI**
![Create Instance Screenshot](images/create_instance.png)
1. Click the **"Create Instance"** button on the dashboard
2. Enter a unique **Name** for your instance (only required field)
3. **Select Target Node**: Choose which node to deploy the instance to from the dropdown
4. **Choose Backend Type**:
- **llama.cpp**: For GGUF models using llama-server
- **MLX**: For MLX-optimized models (macOS only)
- **vLLM**: For distributed serving and high-throughput inference
5. Configure model source:
- **For llama.cpp**: GGUF model path or HuggingFace repo
- **For MLX**: MLX model path or identifier (e.g., `mlx-community/Mistral-7B-Instruct-v0.3-4bit`)
- **For vLLM**: HuggingFace model identifier (e.g., `microsoft/DialoGPT-medium`)
6. Configure optional instance management settings:
- **Auto Restart**: Automatically restart instance on failure
- **Max Restarts**: Maximum number of restart attempts
- **Restart Delay**: Delay in seconds between restart attempts
- **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
- **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
- **Environment Variables**: Set custom environment variables for the instance process
7. Configure backend-specific options:
- **llama.cpp**: Threads, context size, GPU layers, port, etc.
- **MLX**: Temperature, top-p, adapter path, Python environment, etc.
- **vLLM**: Tensor parallel size, GPU memory utilization, quantization, etc.
!!! tip "Auto-Assignment"
Llamactl automatically assigns ports from the configured port range (default: 8000-9000) and generates API keys if authentication is enabled. You typically don't need to manually specify these values.
8. Click **"Create"** to save the instance
**Via API**
```bash
# Create llama.cpp instance with local model file
curl -X POST http://localhost:8080/api/v1/instances/my-llama-instance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/model.gguf",
"threads": 8,
"ctx_size": 4096,
"gpu_layers": 32
},
"nodes": ["main"]
}'
# Create MLX instance (macOS only)
curl -X POST http://localhost:8080/api/v1/instances/my-mlx-instance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "mlx_lm",
"backend_options": {
"model": "mlx-community/Mistral-7B-Instruct-v0.3-4bit",
"temp": 0.7,
"top_p": 0.9,
"max_tokens": 2048
},
"auto_restart": true,
"max_restarts": 3,
"nodes": ["main"]
}'
# Create vLLM instance
curl -X POST http://localhost:8080/api/v1/instances/my-vllm-instance \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "vllm",
"backend_options": {
"model": "microsoft/DialoGPT-medium",
"tensor_parallel_size": 2,
"gpu_memory_utilization": 0.9
},
"auto_restart": true,
"on_demand_start": true,
"environment": {
"CUDA_VISIBLE_DEVICES": "0,1",
"NCCL_DEBUG": "INFO",
"PYTHONPATH": "/custom/path"
},
"nodes": ["main"]
}'
# Create llama.cpp instance with HuggingFace model
curl -X POST http://localhost:8080/api/v1/instances/gemma-3-27b \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"hf_repo": "unsloth/gemma-3-27b-it-GGUF",
"hf_file": "gemma-3-27b-it-GGUF.gguf",
"gpu_layers": 32
},
"nodes": ["main"]
}'
# Create instance on specific remote node
curl -X POST http://localhost:8080/api/v1/instances/remote-llama \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/models/llama-7b.gguf",
"gpu_layers": 32
},
"nodes": ["worker1"]
}'
# Create instance on multiple nodes for high availability
curl -X POST http://localhost:8080/api/v1/instances/multi-node-llama \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/models/llama-7b.gguf",
"gpu_layers": 32
},
"nodes": ["worker1", "worker2", "worker3"]
}'
```
## Start Instance
**Via Web UI**
1. Click the **"Start"** button on an instance card
2. Watch the status change to "Unknown"
3. Monitor progress in the logs
4. Instance status changes to "Ready" when ready
**Via API**
```bash
curl -X POST http://localhost:8080/api/v1/instances/{name}/start \
-H "Authorization: Bearer <token>"
```
## Stop Instance
**Via Web UI**
1. Click the **"Stop"** button on an instance card
2. Instance gracefully shuts down
**Via API**
```bash
curl -X POST http://localhost:8080/api/v1/instances/{name}/stop \
-H "Authorization: Bearer <token>"
```
## Edit Instance
**Via Web UI**
1. Click the **"Edit"** button on an instance card
2. Modify settings in the configuration dialog
3. Changes require instance restart to take effect
4. Click **"Update & Restart"** to apply changes
**Via API**
Modify instance settings:
```bash
curl -X PUT http://localhost:8080/api/v1/instances/{name} \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <token>" \
-d '{
"backend_options": {
"threads": 8,
"context_size": 4096
}
}'
```
!!! note
Configuration changes require restarting the instance to take effect.
## View Logs
**Via Web UI**
1. Click the **"Logs"** button on any instance card
2. Real-time log viewer opens
**Via API**
Check instance status in real-time:
```bash
# Get instance logs
curl http://localhost:8080/api/v1/instances/{name}/logs \
-H "Authorization: Bearer <token>"
```
## Delete Instance
**Via Web UI**
1. Click the **"Delete"** button on an instance card
2. Only stopped instances can be deleted
3. Confirm deletion in the dialog
**Via API**
```bash
curl -X DELETE http://localhost:8080/api/v1/instances/{name} \
-H "Authorization: Bearer <token>"
```
## Instance Proxy
Llamactl proxies all requests to the underlying backend instances (llama-server, MLX, or vLLM).
```bash
# Proxy requests to the instance
curl http://localhost:8080/api/v1/instances/{name}/proxy/ \
-H "Authorization: Bearer <token>"
```
All backends provide OpenAI-compatible endpoints. Check the respective documentation:
- [llama-server docs](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md)
- [MLX-LM docs](https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md)
- [vLLM docs](https://docs.vllm.ai/en/latest/)
### Instance Health
**Via Web UI**
1. The health status badge is displayed on each instance card
**Via API**
Check the health status of your instances:
```bash
curl http://localhost:8080/api/v1/instances/{name}/proxy/health \
-H "Authorization: Bearer <token>"
```

263
docs/quick-start.md Normal file
View File

@@ -0,0 +1,263 @@
# Quick Start
This guide will help you get Llamactl up and running in just a few minutes.
**Before you begin:** Ensure you have at least one backend installed (llama.cpp, MLX, or vLLM). See the [Installation Guide](installation.md#prerequisites) for backend setup.
## Core Concepts
Before you start, let's clarify a few key terms:
- **Instance**: A running backend server that serves a specific model. Each instance has a unique name and runs independently.
- **Backend**: The inference engine that actually runs the model (llama.cpp, MLX, or vLLM). You need at least one backend installed before creating instances.
- **Node**: In multi-machine setups, a node represents one machine. Most users will just use the default "main" node for single-machine deployments.
- **Proxy Architecture**: Llamactl acts as a proxy in front of your instances. You make requests to llamactl (e.g., `http://localhost:8080/v1/chat/completions`), and it routes them to the appropriate backend instance. This means you don't need to track individual instance ports or endpoints.
## Authentication
Llamactl uses two types of API keys:
- **Management API Key**: Used to authenticate with the Llamactl management API (creating, starting, stopping instances).
- **Inference API Key**: Used to authenticate requests to the OpenAI-compatible endpoints (`/v1/chat/completions`, `/v1/completions`, etc.).
By default, authentication is required. If you don't configure these keys in your configuration file, llamactl will auto-generate them and print them to the terminal on startup. You can also configure custom keys or disable authentication entirely in the [Configuration](configuration.md) guide.
## Start Llamactl
Start the Llamactl server:
```bash
llamactl
```
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ MANAGEMENT AUTHENTICATION REQUIRED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔑 Generated Management API Key:
sk-management-...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ INFERENCE AUTHENTICATION REQUIRED
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔑 Generated Inference API Key:
sk-inference-...
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ IMPORTANT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• These keys are auto-generated and will change on restart
• For production, add explicit keys to your configuration
• Copy these keys before they disappear from the terminal
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Llamactl server listening on 0.0.0.0:8080
```
Copy the **Management** and **Inference** API Keys from the terminal - you'll need them to access the web UI and make inference requests.
By default, Llamactl will start on `http://localhost:8080`.
## Access the Web UI
Open your web browser and navigate to:
```
http://localhost:8080
```
Login with the management API key from the terminal output.
You should see the Llamactl web interface.
## Create Your First Instance
1. Click the "Add Instance" button
2. Fill in the instance configuration:
- **Name**: Give your instance a descriptive name
- **Node**: Select which node to deploy the instance to (defaults to "main" for single-node setups)
- **Backend Type**: Choose from llama.cpp, MLX, or vLLM
- **Model**: Model path or huggingface repo
- **Additional Options**: Backend-specific parameters
!!! tip "Auto-Assignment"
Llamactl automatically assigns ports from the configured port range (default: 8000-9000) and generates API keys if authentication is enabled. You typically don't need to manually specify these values.
!!! note "Remote Node Deployment"
If you have configured remote nodes in your configuration file, you can select which node to deploy the instance to. This allows you to distribute instances across multiple machines. See the [Configuration](configuration.md#remote-node-configuration) guide for details on setting up remote nodes.
3. Click "Create Instance"
## Start Your Instance
Once created, you can:
- **Start** the instance by clicking the start button
- **Monitor** its status in real-time
- **View logs** by clicking the logs button
- **Stop** the instance when needed
## Example Configurations
Here are basic example configurations for each backend:
**llama.cpp backend:**
```json
{
"name": "llama2-7b",
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/llama-2-7b-chat.gguf",
"threads": 4,
"ctx_size": 2048,
"gpu_layers": 32
},
"nodes": ["main"]
}
```
**MLX backend (macOS only):**
```json
{
"name": "mistral-mlx",
"backend_type": "mlx_lm",
"backend_options": {
"model": "mlx-community/Mistral-7B-Instruct-v0.3-4bit",
"temp": 0.7,
"max_tokens": 2048
},
"nodes": ["main"]
}
```
**vLLM backend:**
```json
{
"name": "dialogpt-vllm",
"backend_type": "vllm",
"backend_options": {
"model": "microsoft/DialoGPT-medium",
"tensor_parallel_size": 2,
"gpu_memory_utilization": 0.9
},
"nodes": ["main"]
}
```
**Remote node deployment example:**
```json
{
"name": "distributed-model",
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/model.gguf",
"gpu_layers": 32
},
"nodes": ["worker1"]
}
```
## Docker Support
Llamactl can run backends in Docker containers. To enable Docker for a backend, add a `docker` section to that backend in your YAML configuration file (e.g. `config.yaml`) as shown below:
```yaml
backends:
vllm:
command: "vllm"
args: ["serve"]
docker:
enabled: true
image: "vllm/vllm-openai:latest"
args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g"]
```
## Using the API
You can also manage instances via the REST API:
```bash
# List all instances
curl http://localhost:8080/api/v1/instances
# Create a new llama.cpp instance
curl -X POST http://localhost:8080/api/v1/instances/my-model \
-H "Content-Type: application/json" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/model.gguf"
}
}'
# Start an instance
curl -X POST http://localhost:8080/api/v1/instances/my-model/start
```
## OpenAI Compatible API
Llamactl provides OpenAI-compatible endpoints, making it easy to integrate with existing OpenAI client libraries and tools.
### Chat Completions
Once you have an instance running, you can use it with the OpenAI-compatible chat completions endpoint:
```bash
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "Hello! Can you help me write a Python function?"
}
],
"max_tokens": 150,
"temperature": 0.7
}'
```
### Using with Python OpenAI Client
You can also use the official OpenAI Python client:
```python
from openai import OpenAI
# Point the client to your Llamactl server
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="your-inference-api-key" # Use the inference API key from terminal or config
)
# Create a chat completion
response = client.chat.completions.create(
model="my-model", # Use the name of your instance
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
max_tokens=200,
temperature=0.7
)
print(response.choices[0].message.content)
```
!!! note "API Key"
If you disabled authentication in your config, you can use any value for `api_key` (e.g., `"not-needed"`). Otherwise, use the inference API key shown in the terminal output on startup.
### List Available Models
Get a list of running instances (models) in OpenAI-compatible format:
```bash
curl http://localhost:8080/v1/models
```
## Next Steps
- Manage instances [Managing Instances](managing-instances.md)
- Explore the [API Reference](api-reference.md)
- Configure advanced settings in the [Configuration](configuration.md) guide

62
docs/readme_sync.py Normal file
View File

@@ -0,0 +1,62 @@
"""
MkDocs hook to sync content from README.md to docs/index.md
"""
import re
import os
def on_page_markdown(markdown, page, config, **kwargs):
"""Process markdown content before rendering"""
# Only process the index.md file
if page.file.src_path != 'index.md':
return markdown
# Get the path to README.md (relative to mkdocs.yml)
readme_path = os.path.join(os.path.dirname(config['config_file_path']), 'README.md')
if not os.path.exists(readme_path):
print(f"Warning: README.md not found at {readme_path}")
return markdown
try:
with open(readme_path, 'r', encoding='utf-8') as f:
readme_content = f.read()
except Exception as e:
print(f"Error reading README.md: {e}")
return markdown
# Extract headline (the text in bold after the title)
headline_match = re.search(r'\*\*(.*?)\*\*', readme_content)
headline = headline_match.group(1) if headline_match else 'Management server for llama.cpp and MLX instances'
# Extract features section - everything between ## Features and the next ## heading
features_match = re.search(r'## Features\n(.*?)(?=\n## |\Z)', readme_content, re.DOTALL)
if features_match:
features_content = features_match.group(1).strip()
# Just add line breaks at the end of each line for proper MkDocs rendering
features_with_breaks = add_line_breaks(features_content)
else:
features_with_breaks = "Features content not found in README.md"
# Replace placeholders in the markdown
markdown = markdown.replace('{{HEADLINE}}', headline)
markdown = markdown.replace('{{FEATURES}}', features_with_breaks)
# Fix image paths: convert docs/images/ to images/ for MkDocs
markdown = re.sub(r'docs/images/', 'images/', markdown)
return markdown
def add_line_breaks(content):
"""Add two spaces at the end of each line for proper MkDocs line breaks"""
lines = content.split('\n')
processed_lines = []
for line in lines:
if line.strip(): # Only add spaces to non-empty lines
processed_lines.append(line.rstrip() + ' ')
else:
processed_lines.append(line)
return '\n'.join(processed_lines)

1569
docs/swagger.json Normal file

File diff suppressed because it is too large Load Diff

1014
docs/swagger.yaml Normal file

File diff suppressed because it is too large Load Diff

193
docs/troubleshooting.md Normal file
View File

@@ -0,0 +1,193 @@
# Troubleshooting
Issues specific to Llamactl deployment and operation.
## Configuration Issues
### Invalid Configuration
**Problem:** Invalid configuration preventing startup
**Solutions:**
1. Use minimal configuration:
```yaml
server:
host: "0.0.0.0"
port: 8080
instances:
port_range: [8000, 9000]
```
2. Check data directory permissions:
```bash
# Ensure data directory is writable (default: ~/.local/share/llamactl)
mkdir -p ~/.local/share/llamactl/{instances,logs}
```
## Instance Management Issues
### Instance Fails to Start
**Problem:** Instance fails to start or immediately stops
**Solutions:**
1. **Check instance logs** to see the actual error:
```bash
curl http://localhost:8080/api/v1/instances/{name}/logs
# Or check log files directly
tail -f ~/.local/share/llamactl/logs/{instance-name}.log
```
2. **Verify backend is installed:**
- **llama.cpp**: Ensure `llama-server` is in PATH
- **MLX**: Ensure `mlx-lm` Python package is installed
- **vLLM**: Ensure `vllm` Python package is installed
3. **Check model path and format:**
- Use absolute paths to model files
- Verify model format matches backend (GGUF for llama.cpp, etc.)
4. **Verify backend command configuration:**
- Check that the backend `command` is correctly configured in the global config
- For virtual environments, specify the full path to the command (e.g., `/path/to/venv/bin/mlx_lm.server`)
- See the [Configuration Guide](configuration.md) for backend configuration details
- Test the backend directly (see [Backend-Specific Issues](#backend-specific-issues) below)
### Backend-Specific Issues
**Problem:** Model loading, memory, GPU, or performance issues
Most model-specific issues (memory, GPU configuration, performance tuning) are backend-specific and should be resolved by consulting the respective backend documentation:
**llama.cpp:**
- [llama.cpp GitHub](https://github.com/ggml-org/llama.cpp)
- [llama-server README](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md)
**MLX:**
- [MLX-LM GitHub](https://github.com/ml-explore/mlx-lm)
- [MLX-LM Server Guide](https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/SERVER.md)
**vLLM:**
- [vLLM Documentation](https://docs.vllm.ai/en/stable/)
- [OpenAI Compatible Server](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html)
- [vllm serve Command](https://docs.vllm.ai/en/stable/cli/serve.html#vllm-serve)
**Testing backends directly:**
Testing your model and configuration directly with the backend helps determine if the issue is with llamactl or the backend itself:
```bash
# llama.cpp
llama-server --model /path/to/model.gguf --port 8081
# MLX
mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --port 8081
# vLLM
vllm serve microsoft/DialoGPT-medium --port 8081
```
## API and Network Issues
### CORS Errors
**Problem:** Web UI shows CORS errors in browser console
**Solutions:**
1. **Configure allowed origins:**
```yaml
server:
allowed_origins:
- "http://localhost:3000"
- "https://yourdomain.com"
```
## Authentication Issues
**Problem:** API requests failing with authentication errors
**Solutions:**
1. **Disable authentication temporarily:**
```yaml
auth:
require_management_auth: false
require_inference_auth: false
```
2. **Configure API keys:**
```yaml
auth:
management_keys:
- "your-management-key"
inference_keys:
- "your-inference-key"
```
3. **Use correct Authorization header:**
```bash
curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances
```
## Remote Node Issues
### Node Configuration
**Problem:** Remote instances not appearing or cannot be managed
**Solutions:**
1. **Verify node configuration:**
```yaml
local_node: "main" # Must match a key in nodes map
nodes:
main:
address: "" # Empty for local node
worker1:
address: "http://worker1.internal:8080"
api_key: "secure-key" # Must match worker1's management key
```
2. **Check node name consistency:**
- `local_node` on each node must match what other nodes call it
- Node names are case-sensitive
3. **Test remote node connectivity:**
```bash
curl -H "Authorization: Bearer remote-node-key" \
http://remote-node:8080/api/v1/instances
```
## Debugging and Logs
### Viewing Instance Logs
```bash
# Get instance logs via API
curl http://localhost:8080/api/v1/instances/{name}/logs
# Or check log files directly
tail -f ~/.local/share/llamactl/logs/{instance-name}.log
```
### Enable Debug Logging
```bash
export LLAMACTL_LOG_LEVEL=debug
llamactl
```
## Getting Help
When reporting issues, include:
1. **System information:**
```bash
llamactl --version
```
2. **Configuration file** (remove sensitive keys)
3. **Relevant log output**
4. **Steps to reproduce the issue**

View File

@@ -1,412 +0,0 @@
# API Reference
Complete reference for the Llamactl REST API.
## Base URL
All API endpoints are relative to the base URL:
```
http://localhost:8080/api/v1
```
## Authentication
Llamactl supports API key authentication. If authentication is enabled, include the API key in the Authorization header:
```bash
curl -H "Authorization: Bearer <your-api-key>" \
http://localhost:8080/api/v1/instances
```
The server supports two types of API keys:
- **Management API Keys**: Required for instance management operations (CRUD operations on instances)
- **Inference API Keys**: Required for OpenAI-compatible inference endpoints
## System Endpoints
### Get Llamactl Version
Get the version information of the llamactl server.
```http
GET /api/v1/version
```
**Response:**
```
Version: 1.0.0
Commit: abc123
Build Time: 2024-01-15T10:00:00Z
```
### Get Llama Server Help
Get help text for the llama-server command.
```http
GET /api/v1/server/help
```
**Response:** Plain text help output from `llama-server --help`
### Get Llama Server Version
Get version information of the llama-server binary.
```http
GET /api/v1/server/version
```
**Response:** Plain text version output from `llama-server --version`
### List Available Devices
List available devices for llama-server.
```http
GET /api/v1/server/devices
```
**Response:** Plain text device list from `llama-server --list-devices`
## Instances
### List All Instances
Get a list of all instances.
```http
GET /api/v1/instances
```
**Response:**
```json
[
{
"name": "llama2-7b",
"status": "running",
"created": 1705312200
}
]
```
### Get Instance Details
Get detailed information about a specific instance.
```http
GET /api/v1/instances/{name}
```
**Response:**
```json
{
"name": "llama2-7b",
"status": "running",
"created": 1705312200
}
```
### Create Instance
Create and start a new instance.
```http
POST /api/v1/instances/{name}
```
**Request Body:** JSON object with instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
**Response:**
```json
{
"name": "llama2-7b",
"status": "running",
"created": 1705312200
}
```
### Update Instance
Update an existing instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
```http
PUT /api/v1/instances/{name}
```
**Request Body:** JSON object with configuration fields to update.
**Response:**
```json
{
"name": "llama2-7b",
"status": "running",
"created": 1705312200
}
```
### Delete Instance
Stop and remove an instance.
```http
DELETE /api/v1/instances/{name}
```
**Response:** `204 No Content`
## Instance Operations
### Start Instance
Start a stopped instance.
```http
POST /api/v1/instances/{name}/start
```
**Response:**
```json
{
"name": "llama2-7b",
"status": "starting",
"created": 1705312200
}
```
**Error Responses:**
- `409 Conflict`: Maximum number of running instances reached
- `500 Internal Server Error`: Failed to start instance
### Stop Instance
Stop a running instance.
```http
POST /api/v1/instances/{name}/stop
```
**Response:**
```json
{
"name": "llama2-7b",
"status": "stopping",
"created": 1705312200
}
```
### Restart Instance
Restart an instance (stop then start).
```http
POST /api/v1/instances/{name}/restart
```
**Response:**
```json
{
"name": "llama2-7b",
"status": "restarting",
"created": 1705312200
}
```
### Get Instance Logs
Retrieve instance logs.
```http
GET /api/v1/instances/{name}/logs
```
**Query Parameters:**
- `lines`: Number of lines to return (default: all lines, use -1 for all)
**Response:** Plain text log output
**Example:**
```bash
curl "http://localhost:8080/api/v1/instances/my-instance/logs?lines=100"
```
### Proxy to Instance
Proxy HTTP requests directly to the llama-server instance.
```http
GET /api/v1/instances/{name}/proxy/*
POST /api/v1/instances/{name}/proxy/*
```
This endpoint forwards all requests to the underlying llama-server instance running on its configured port. The proxy strips the `/api/v1/instances/{name}/proxy` prefix and forwards the remaining path to the instance.
**Example - Check Instance Health:**
```bash
curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model/proxy/health
```
This forwards the request to `http://instance-host:instance-port/health` on the actual llama-server instance.
**Error Responses:**
- `503 Service Unavailable`: Instance is not running
## OpenAI-Compatible API
Llamactl provides OpenAI-compatible endpoints for inference operations.
### List Models
List all instances in OpenAI-compatible format.
```http
GET /v1/models
```
**Response:**
```json
{
"object": "list",
"data": [
{
"id": "llama2-7b",
"object": "model",
"created": 1705312200,
"owned_by": "llamactl"
}
]
}
```
### Chat Completions, Completions, Embeddings
All OpenAI-compatible inference endpoints are available:
```http
POST /v1/chat/completions
POST /v1/completions
POST /v1/embeddings
POST /v1/rerank
POST /v1/reranking
```
**Request Body:** Standard OpenAI format with `model` field specifying the instance name
**Example:**
```json
{
"model": "llama2-7b",
"messages": [
{
"role": "user",
"content": "Hello, how are you?"
}
]
}
```
The server routes requests to the appropriate instance based on the `model` field in the request body. Instances with on-demand starting enabled will be automatically started if not running. For configuration details, see [Managing Instances](managing-instances.md).
**Error Responses:**
- `400 Bad Request`: Invalid request body or missing model name
- `503 Service Unavailable`: Instance is not running and on-demand start is disabled
- `409 Conflict`: Cannot start instance due to maximum instances limit
## Instance Status Values
Instances can have the following status values:
- `stopped`: Instance is not running
- `running`: Instance is running and ready to accept requests
- `failed`: Instance failed to start or crashed
## Error Responses
All endpoints may return error responses in the following format:
```json
{
"error": "Error message description"
}
```
### Common HTTP Status Codes
- `200`: Success
- `201`: Created
- `204`: No Content (successful deletion)
- `400`: Bad Request (invalid parameters or request body)
- `401`: Unauthorized (missing or invalid API key)
- `403`: Forbidden (insufficient permissions)
- `404`: Not Found (instance not found)
- `409`: Conflict (instance already exists, max instances reached)
- `500`: Internal Server Error
- `503`: Service Unavailable (instance not running)
## Examples
### Complete Instance Lifecycle
```bash
# Create and start instance
curl -X POST http://localhost:8080/api/v1/instances/my-model \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"model": "/models/llama-2-7b.gguf"
}'
# Check instance status
curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model
# Get instance logs
curl -H "Authorization: Bearer your-api-key" \
"http://localhost:8080/api/v1/instances/my-model/logs?lines=50"
# Use OpenAI-compatible chat completions
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-inference-api-key" \
-d '{
"model": "my-model",
"messages": [
{"role": "user", "content": "Hello!"}
],
"max_tokens": 100
}'
# Stop instance
curl -X POST -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model/stop
# Delete instance
curl -X DELETE -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances/my-model
```
### Using the Proxy Endpoint
You can also directly proxy requests to the llama-server instance:
```bash
# Direct proxy to instance (bypasses OpenAI compatibility layer)
curl -X POST http://localhost:8080/api/v1/instances/my-model/proxy/completion \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{
"prompt": "Hello, world!",
"n_predict": 50
}'
```
## Swagger Documentation
If swagger documentation is enabled in the server configuration, you can access the interactive API documentation at:
```
http://localhost:8080/swagger/
```
This provides a complete interactive interface for testing all API endpoints.

View File

@@ -1,190 +0,0 @@
# Managing Instances
Learn how to effectively manage your Llama.cpp instances with Llamactl through both the Web UI and API.
## Overview
Llamactl provides two ways to manage instances:
- **Web UI**: Accessible at `http://localhost:8080` with an intuitive dashboard
- **REST API**: Programmatic access for automation and integration
![Dashboard Screenshot](../images/dashboard.png)
### Authentication
If authentication is enabled:
1. Navigate to the web UI
2. Enter your credentials
3. Bearer token is stored for the session
### Theme Support
- Switch between light and dark themes
- Setting is remembered across sessions
## Instance Cards
Each instance is displayed as a card showing:
- **Instance name**
- **Health status badge** (unknown, ready, error, failed)
- **Action buttons** (start, stop, edit, logs, delete)
## Create Instance
### Via Web UI
![Create Instance Screenshot](../images/create_instance.png)
1. Click the **"Create Instance"** button on the dashboard
2. Enter a unique **Name** for your instance (only required field)
3. Configure model source (choose one):
- **Model Path**: Full path to your downloaded GGUF model file
- **HuggingFace Repo**: Repository name (e.g., `unsloth/gemma-3-27b-it-GGUF`)
- **HuggingFace File**: Specific file within the repo (optional, uses default if not specified)
4. Configure optional instance management settings:
- **Auto Restart**: Automatically restart instance on failure
- **Max Restarts**: Maximum number of restart attempts
- **Restart Delay**: Delay in seconds between restart attempts
- **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
- **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
5. Configure optional llama-server backend options:
- **Threads**: Number of CPU threads to use
- **Context Size**: Context window size (ctx_size)
- **GPU Layers**: Number of layers to offload to GPU
- **Port**: Network port (auto-assigned by llamactl if not specified)
- **Additional Parameters**: Any other llama-server command line options (see [llama-server documentation](https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md))
6. Click **"Create"** to save the instance
### Via API
```bash
# Create instance with local model file
curl -X POST http://localhost:8080/api/instances/my-instance \
-H "Content-Type: application/json" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"model": "/path/to/model.gguf",
"threads": 8,
"ctx_size": 4096
}
}'
# Create instance with HuggingFace model
curl -X POST http://localhost:8080/api/instances/gemma-3-27b \
-H "Content-Type: application/json" \
-d '{
"backend_type": "llama_cpp",
"backend_options": {
"hf_repo": "unsloth/gemma-3-27b-it-GGUF",
"hf_file": "gemma-3-27b-it-GGUF.gguf",
"gpu_layers": 32
},
"auto_restart": true,
"max_restarts": 3
}'
```
## Start Instance
### Via Web UI
1. Click the **"Start"** button on an instance card
2. Watch the status change to "Unknown"
3. Monitor progress in the logs
4. Instance status changes to "Ready" when ready
### Via API
```bash
curl -X POST http://localhost:8080/api/instances/{name}/start
```
## Stop Instance
### Via Web UI
1. Click the **"Stop"** button on an instance card
2. Instance gracefully shuts down
### Via API
```bash
curl -X POST http://localhost:8080/api/instances/{name}/stop
```
## Edit Instance
### Via Web UI
1. Click the **"Edit"** button on an instance card
2. Modify settings in the configuration dialog
3. Changes require instance restart to take effect
4. Click **"Update & Restart"** to apply changes
### Via API
Modify instance settings:
```bash
curl -X PUT http://localhost:8080/api/instances/{name} \
-H "Content-Type: application/json" \
-d '{
"backend_options": {
"threads": 8,
"context_size": 4096
}
}'
```
!!! note
Configuration changes require restarting the instance to take effect.
## View Logs
### Via Web UI
1. Click the **"Logs"** button on any instance card
2. Real-time log viewer opens
### Via API
Check instance status in real-time:
```bash
# Get instance details
curl http://localhost:8080/api/instances/{name}/logs
```
## Delete Instance
### Via Web UI
1. Click the **"Delete"** button on an instance card
2. Only stopped instances can be deleted
3. Confirm deletion in the dialog
### Via API
```bash
curl -X DELETE http://localhost:8080/api/instances/{name}
```
## Instance Proxy
Llamactl proxies all requests to the underlying llama-server instances.
```bash
# Get instance details
curl http://localhost:8080/api/instances/{name}/proxy/
```
Check llama-server [docs](https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md) for more information.
### Instance Health
#### Via Web UI
1. The health status badge is displayed on each instance card
#### Via API
Check the health status of your instances:
```bash
curl http://localhost:8080/api/instances/{name}/proxy/health
```

View File

@@ -1,160 +0,0 @@
# Troubleshooting
Issues specific to Llamactl deployment and operation.
## Configuration Issues
### Invalid Configuration
**Problem:** Invalid configuration preventing startup
**Solutions:**
1. Use minimal configuration:
```yaml
server:
host: "0.0.0.0"
port: 8080
instances:
port_range: [8000, 9000]
```
2. Check data directory permissions:
```bash
# Ensure data directory is writable (default: ~/.local/share/llamactl)
mkdir -p ~/.local/share/llamactl/{instances,logs}
```
## Instance Management Issues
### Model Loading Failures
**Problem:** Instance fails to start with model loading errors
**Common Solutions:**
- **llama-server not found:** Ensure `llama-server` binary is in PATH
- **Wrong model format:** Ensure model is in GGUF format
- **Insufficient memory:** Use smaller model or reduce context size
- **Path issues:** Use absolute paths to model files
### Memory Issues
**Problem:** Out of memory errors or system becomes unresponsive
**Solutions:**
1. **Reduce context size:**
```json
{
"n_ctx": 1024
}
```
2. **Use quantized models:**
- Try Q4_K_M instead of higher precision models
- Use smaller model variants (7B instead of 13B)
### GPU Configuration
**Problem:** GPU not being used effectively
**Solutions:**
1. **Configure GPU layers:**
```json
{
"n_gpu_layers": 35
}
```
### Advanced Instance Issues
**Problem:** Complex model loading, performance, or compatibility issues
Since llamactl uses `llama-server` under the hood, many instance-related issues are actually llama.cpp issues. For advanced troubleshooting:
**Resources:**
- **llama.cpp Documentation:** [https://github.com/ggml/llama.cpp](https://github.com/ggml/llama.cpp)
- **llama.cpp Issues:** [https://github.com/ggml/llama.cpp/issues](https://github.com/ggml/llama.cpp/issues)
- **llama.cpp Discussions:** [https://github.com/ggml/llama.cpp/discussions](https://github.com/ggml/llama.cpp/discussions)
**Testing directly with llama-server:**
```bash
# Test your model and parameters directly with llama-server
llama-server --model /path/to/model.gguf --port 8081 --n-gpu-layers 35
```
This helps determine if the issue is with llamactl or with the underlying llama.cpp/llama-server.
## API and Network Issues
### CORS Errors
**Problem:** Web UI shows CORS errors in browser console
**Solutions:**
1. **Configure allowed origins:**
```yaml
server:
allowed_origins:
- "http://localhost:3000"
- "https://yourdomain.com"
```
## Authentication Issues
**Problem:** API requests failing with authentication errors
**Solutions:**
1. **Disable authentication temporarily:**
```yaml
auth:
require_management_auth: false
require_inference_auth: false
```
2. **Configure API keys:**
```yaml
auth:
management_keys:
- "your-management-key"
inference_keys:
- "your-inference-key"
```
3. **Use correct Authorization header:**
```bash
curl -H "Authorization: Bearer your-api-key" \
http://localhost:8080/api/v1/instances
```
## Debugging and Logs
### Viewing Instance Logs
```bash
# Get instance logs via API
curl http://localhost:8080/api/v1/instances/{name}/logs
# Or check log files directly
tail -f ~/.local/share/llamactl/logs/{instance-name}.log
```
### Enable Debug Logging
```bash
export LLAMACTL_LOG_LEVEL=debug
llamactl
```
## Getting Help
When reporting issues, include:
1. **System information:**
```bash
llamactl --version
```
2. **Configuration file** (remove sensitive keys)
3. **Relevant log output**
4. **Steps to reproduce the issue**

View File

@@ -25,8 +25,8 @@ theme:
name: Switch to light mode
features:
- navigation.tabs
- navigation.sections
- navigation.expand
- navigation.tabs.sticky
- toc.integrate
- navigation.top
- search.highlight
- search.share
@@ -49,20 +49,35 @@ markdown_extensions:
nav:
- Home: index.md
- Getting Started:
- Installation: getting-started/installation.md
- Quick Start: getting-started/quick-start.md
- Configuration: getting-started/configuration.md
- User Guide:
- Managing Instances: user-guide/managing-instances.md
- API Reference: user-guide/api-reference.md
- Troubleshooting: user-guide/troubleshooting.md
- Installation: installation.md
- Quick Start: quick-start.md
- Configuration: configuration.md
- Managing Instances: managing-instances.md
- API Reference: api-reference.md
- Troubleshooting: troubleshooting.md
plugins:
- search
- git-revision-date-localized
- mike:
version_selector: true
css_dir: css
javascript_dir: js
canonical_version: null
- neoteroi.mkdocsoad:
use_pymdownx: true
hooks:
- docs/readme_sync.py
- docs/fix_line_endings.py
extra:
version:
provider: mike
default: stable
social:
- icon: fontawesome/brands/github
link: https://github.com/lordmathis/llamactl
extra_css:
- css/css-v1.1.3.css

View File

@@ -1,7 +1,252 @@
package backends
import (
"encoding/json"
"fmt"
"llamactl/pkg/config"
"llamactl/pkg/validation"
"maps"
)
type BackendType string
const (
BackendTypeLlamaCpp BackendType = "llama_cpp"
BackendTypeMlxLm BackendType = "mlx_lm"
BackendTypeVllm BackendType = "vllm"
)
type backend interface {
BuildCommandArgs() []string
BuildDockerArgs() []string
GetPort() int
SetPort(int)
GetHost() string
Validate() error
ParseCommand(string) (any, error)
}
var backendConstructors = map[BackendType]func() backend{
BackendTypeLlamaCpp: func() backend { return &LlamaServerOptions{} },
BackendTypeMlxLm: func() backend { return &MlxServerOptions{} },
BackendTypeVllm: func() backend { return &VllmServerOptions{} },
}
type Options struct {
BackendType BackendType `json:"backend_type"`
BackendOptions map[string]any `json:"backend_options,omitempty"`
// Backend-specific options
LlamaServerOptions *LlamaServerOptions `json:"-"`
MlxServerOptions *MlxServerOptions `json:"-"`
VllmServerOptions *VllmServerOptions `json:"-"`
}
func (o *Options) UnmarshalJSON(data []byte) error {
type Alias Options
aux := &struct {
*Alias
}{
Alias: (*Alias)(o),
}
if err := json.Unmarshal(data, aux); err != nil {
return err
}
// Create backend from constructor map
if o.BackendOptions != nil {
constructor, exists := backendConstructors[o.BackendType]
if !exists {
return fmt.Errorf("unsupported backend type: %s", o.BackendType)
}
backend := constructor()
optionsData, err := json.Marshal(o.BackendOptions)
if err != nil {
return fmt.Errorf("failed to marshal backend options: %w", err)
}
if err := json.Unmarshal(optionsData, backend); err != nil {
return fmt.Errorf("failed to unmarshal backend options: %w", err)
}
// Store in the appropriate typed field for backward compatibility
o.setBackendOptions(backend)
}
return nil
}
func (o *Options) MarshalJSON() ([]byte, error) {
type Alias Options
aux := &struct {
*Alias
}{
Alias: (*Alias)(o),
}
// Get backend and marshal it
backend := o.getBackend()
if backend != nil {
optionsData, err := json.Marshal(backend)
if err != nil {
return nil, fmt.Errorf("failed to marshal backend options: %w", err)
}
if err := json.Unmarshal(optionsData, &aux.BackendOptions); err != nil {
return nil, fmt.Errorf("failed to unmarshal backend options to map: %w", err)
}
}
return json.Marshal(aux)
}
// setBackendOptions stores the backend in the appropriate typed field
func (o *Options) setBackendOptions(bcknd backend) {
switch v := bcknd.(type) {
case *LlamaServerOptions:
o.LlamaServerOptions = v
case *MlxServerOptions:
o.MlxServerOptions = v
case *VllmServerOptions:
o.VllmServerOptions = v
}
}
func (o *Options) getBackendSettings(backendConfig *config.BackendConfig) *config.BackendSettings {
switch o.BackendType {
case BackendTypeLlamaCpp:
return &backendConfig.LlamaCpp
case BackendTypeMlxLm:
return &backendConfig.MLX
case BackendTypeVllm:
return &backendConfig.VLLM
default:
return nil
}
}
// getBackend returns the actual backend implementation
func (o *Options) getBackend() backend {
switch o.BackendType {
case BackendTypeLlamaCpp:
return o.LlamaServerOptions
case BackendTypeMlxLm:
return o.MlxServerOptions
case BackendTypeVllm:
return o.VllmServerOptions
default:
return nil
}
}
func (o *Options) isDockerEnabled(backend *config.BackendSettings) bool {
if backend.Docker != nil && backend.Docker.Enabled && o.BackendType != BackendTypeMlxLm {
return true
}
return false
}
func (o *Options) IsDockerEnabled(backendConfig *config.BackendConfig) bool {
backendSettings := o.getBackendSettings(backendConfig)
return o.isDockerEnabled(backendSettings)
}
// GetCommand builds the command to run the backend
func (o *Options) GetCommand(backendConfig *config.BackendConfig) string {
backendSettings := o.getBackendSettings(backendConfig)
if o.isDockerEnabled(backendSettings) {
return "docker"
}
return backendSettings.Command
}
// buildCommandArgs builds command line arguments for the backend
func (o *Options) BuildCommandArgs(backendConfig *config.BackendConfig) []string {
var args []string
backendSettings := o.getBackendSettings(backendConfig)
backend := o.getBackend()
if backend == nil {
return args
}
if o.isDockerEnabled(backendSettings) {
// For Docker, start with Docker args
args = append(args, backendSettings.Docker.Args...)
args = append(args, backendSettings.Docker.Image)
args = append(args, backend.BuildDockerArgs()...)
} else {
// For native execution, start with backend args
args = append(args, backendSettings.Args...)
args = append(args, backend.BuildCommandArgs()...)
}
return args
}
// BuildEnvironment builds the environment variables for the backend process
func (o *Options) BuildEnvironment(backendConfig *config.BackendConfig, environment map[string]string) map[string]string {
backendSettings := o.getBackendSettings(backendConfig)
env := map[string]string{}
if backendSettings.Environment != nil {
maps.Copy(env, backendSettings.Environment)
}
if o.isDockerEnabled(backendSettings) {
if backendSettings.Docker.Environment != nil {
maps.Copy(env, backendSettings.Docker.Environment)
}
}
if environment != nil {
maps.Copy(env, environment)
}
return env
}
func (o *Options) GetPort() int {
backend := o.getBackend()
if backend != nil {
return backend.GetPort()
}
return 0
}
func (o *Options) SetPort(port int) {
backend := o.getBackend()
if backend != nil {
backend.SetPort(port)
}
}
func (o *Options) GetHost() string {
backend := o.getBackend()
if backend != nil {
return backend.GetHost()
}
return "localhost"
}
func (o *Options) GetResponseHeaders(backendConfig *config.BackendConfig) map[string]string {
backendSettings := o.getBackendSettings(backendConfig)
return backendSettings.ResponseHeaders
}
// ValidateInstanceOptions performs validation based on backend type
func (o *Options) ValidateInstanceOptions() error {
backend := o.getBackend()
if backend == nil {
return validation.ValidationError(fmt.Errorf("backend options cannot be nil for backend type %s", o.BackendType))
}
return backend.Validate()
}

95
pkg/backends/builder.go Normal file
View File

@@ -0,0 +1,95 @@
package backends
import (
"fmt"
"llamactl/pkg/config"
"reflect"
"strconv"
"strings"
)
// BuildCommandArgs converts a struct to command line arguments
func BuildCommandArgs(options any, multipleFlags map[string]struct{}) []string {
var args []string
v := reflect.ValueOf(options).Elem()
t := v.Type()
for i := 0; i < v.NumField(); i++ {
field := v.Field(i)
fieldType := t.Field(i)
if !field.CanInterface() {
continue
}
jsonTag := fieldType.Tag.Get("json")
if jsonTag == "" || jsonTag == "-" {
continue
}
// Get flag name from JSON tag (snake_case)
jsonFieldName := strings.Split(jsonTag, ",")[0]
// Convert to kebab-case for CLI flags
flagName := strings.ReplaceAll(jsonFieldName, "_", "-")
switch field.Kind() {
case reflect.Bool:
if field.Bool() {
args = append(args, "--"+flagName)
}
case reflect.Int:
if field.Int() != 0 {
args = append(args, "--"+flagName, strconv.FormatInt(field.Int(), 10))
}
case reflect.Float64:
if field.Float() != 0 {
args = append(args, "--"+flagName, strconv.FormatFloat(field.Float(), 'f', -1, 64))
}
case reflect.String:
if field.String() != "" {
args = append(args, "--"+flagName, field.String())
}
case reflect.Slice:
if field.Type().Elem().Kind() == reflect.String && field.Len() > 0 {
// Use jsonFieldName (snake_case) for multipleFlags lookup
if _, isMultiValue := multipleFlags[jsonFieldName]; isMultiValue {
// Multiple flags: --flag value1 --flag value2
for j := 0; j < field.Len(); j++ {
args = append(args, "--"+flagName, field.Index(j).String())
}
} else {
// Comma-separated: --flag value1,value2
var values []string
for j := 0; j < field.Len(); j++ {
values = append(values, field.Index(j).String())
}
args = append(args, "--"+flagName, strings.Join(values, ","))
}
}
}
}
return args
}
// BuildDockerCommand builds a Docker command with the specified configuration and arguments
func BuildDockerCommand(backendConfig *config.BackendSettings, instanceArgs []string) (string, []string, error) {
// Start with configured Docker arguments (should include "run", "--rm", etc.)
dockerArgs := make([]string, len(backendConfig.Docker.Args))
copy(dockerArgs, backendConfig.Docker.Args)
// Add environment variables
for key, value := range backendConfig.Docker.Environment {
dockerArgs = append(dockerArgs, "-e", fmt.Sprintf("%s=%s", key, value))
}
// Add image name
dockerArgs = append(dockerArgs, backendConfig.Docker.Image)
// Add backend args and instance args
dockerArgs = append(dockerArgs, backendConfig.Args...)
dockerArgs = append(dockerArgs, instanceArgs...)
return "docker", dockerArgs, nil
}

View File

@@ -1,12 +1,26 @@
package llamacpp
package backends
import (
"encoding/json"
"fmt"
"llamactl/pkg/validation"
"reflect"
"strconv"
"strings"
)
// llamaMultiValuedFlags defines flags that should be repeated for each value rather than comma-separated
// Keys use snake_case as the parser converts kebab-case flags to snake_case before lookup
var llamaMultiValuedFlags = map[string]struct{}{
"override_tensor": {},
"override_kv": {},
"lora": {},
"lora_scaled": {},
"control_vector": {},
"control_vector_scaled": {},
"dry_sequence_breaker": {},
"logit_bias": {},
}
type LlamaServerOptions struct {
// Common params
VerbosePrompt bool `json:"verbose_prompt,omitempty"`
@@ -313,64 +327,63 @@ func (o *LlamaServerOptions) UnmarshalJSON(data []byte) error {
return nil
}
// BuildCommandArgs converts InstanceOptions to command line arguments
func (o *LlamaServerOptions) BuildCommandArgs() []string {
var args []string
func (o *LlamaServerOptions) GetPort() int {
return o.Port
}
v := reflect.ValueOf(o).Elem()
t := v.Type()
func (o *LlamaServerOptions) SetPort(port int) {
o.Port = port
}
for i := 0; i < v.NumField(); i++ {
field := v.Field(i)
fieldType := t.Field(i)
func (o *LlamaServerOptions) GetHost() string {
return o.Host
}
// Skip unexported fields
if !field.CanInterface() {
continue
}
// Get the JSON tag to determine the flag name
jsonTag := fieldType.Tag.Get("json")
if jsonTag == "" || jsonTag == "-" {
continue
}
// Remove ",omitempty" from the tag
flagName := jsonTag
if commaIndex := strings.Index(jsonTag, ","); commaIndex != -1 {
flagName = jsonTag[:commaIndex]
}
// Convert snake_case to kebab-case for CLI flags
flagName = strings.ReplaceAll(flagName, "_", "-")
// Add the appropriate arguments based on field type and value
switch field.Kind() {
case reflect.Bool:
if field.Bool() {
args = append(args, "--"+flagName)
}
case reflect.Int:
if field.Int() != 0 {
args = append(args, "--"+flagName, strconv.FormatInt(field.Int(), 10))
}
case reflect.Float64:
if field.Float() != 0 {
args = append(args, "--"+flagName, strconv.FormatFloat(field.Float(), 'f', -1, 64))
}
case reflect.String:
if field.String() != "" {
args = append(args, "--"+flagName, field.String())
}
case reflect.Slice:
if field.Type().Elem().Kind() == reflect.String {
// Handle []string fields
for j := 0; j < field.Len(); j++ {
args = append(args, "--"+flagName, field.Index(j).String())
}
}
}
func (o *LlamaServerOptions) Validate() error {
if o == nil {
return validation.ValidationError(fmt.Errorf("llama server options cannot be nil for llama.cpp backend"))
}
return args
// Use reflection to check all string fields for injection patterns
if err := validation.ValidateStructStrings(o, ""); err != nil {
return err
}
// Basic network validation for port
if o.Port < 0 || o.Port > 65535 {
return validation.ValidationError(fmt.Errorf("invalid port range: %d", o.Port))
}
return nil
}
// BuildCommandArgs converts InstanceOptions to command line arguments
func (o *LlamaServerOptions) BuildCommandArgs() []string {
// Llama uses multiple flags for arrays by default (not comma-separated)
// Use package-level llamaMultiValuedFlags variable
return BuildCommandArgs(o, llamaMultiValuedFlags)
}
func (o *LlamaServerOptions) BuildDockerArgs() []string {
// For llama, Docker args are the same as normal args
return o.BuildCommandArgs()
}
// ParseCommand parses a llama-server command string into LlamaServerOptions
// Supports multiple formats:
// 1. Full command: "llama-server --model file.gguf"
// 2. Full path: "/usr/local/bin/llama-server --model file.gguf"
// 3. Args only: "--model file.gguf --gpu-layers 32"
// 4. Multiline commands with backslashes
func (o *LlamaServerOptions) ParseCommand(command string) (any, error) {
executableNames := []string{"llama-server"}
var subcommandNames []string // Llama has no subcommands
// Use package-level llamaMultiValuedFlags variable
var llamaOptions LlamaServerOptions
if err := parseCommand(command, executableNames, subcommandNames, llamaMultiValuedFlags, &llamaOptions); err != nil {
return nil, err
}
return &llamaOptions, nil
}

View File

@@ -1,71 +1,38 @@
package llamacpp_test
package backends_test
import (
"encoding/json"
"fmt"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/backends"
"llamactl/pkg/testutil"
"reflect"
"slices"
"testing"
)
func TestBuildCommandArgs_BasicFields(t *testing.T) {
options := llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
Host: "localhost",
Verbose: true,
CtxSize: 4096,
GPULayers: 32,
}
args := options.BuildCommandArgs()
// Check individual arguments
expectedPairs := map[string]string{
"--model": "/path/to/model.gguf",
"--port": "8080",
"--host": "localhost",
"--ctx-size": "4096",
"--gpu-layers": "32",
}
for flag, expectedValue := range expectedPairs {
if !containsFlagWithValue(args, flag, expectedValue) {
t.Errorf("Expected %s %s, not found in %v", flag, expectedValue, args)
}
}
// Check standalone boolean flag
if !contains(args, "--verbose") {
t.Errorf("Expected --verbose flag not found in %v", args)
}
}
func TestBuildCommandArgs_BooleanFields(t *testing.T) {
func TestLlamaCppBuildCommandArgs_BooleanFields(t *testing.T) {
tests := []struct {
name string
options llamacpp.LlamaServerOptions
options backends.LlamaServerOptions
expected []string
excluded []string
}{
{
name: "verbose true",
options: llamacpp.LlamaServerOptions{
options: backends.LlamaServerOptions{
Verbose: true,
},
expected: []string{"--verbose"},
},
{
name: "verbose false",
options: llamacpp.LlamaServerOptions{
options: backends.LlamaServerOptions{
Verbose: false,
},
excluded: []string{"--verbose"},
},
{
name: "multiple booleans",
options: llamacpp.LlamaServerOptions{
options: backends.LlamaServerOptions{
Verbose: true,
FlashAttn: true,
Mlock: false,
@@ -81,13 +48,13 @@ func TestBuildCommandArgs_BooleanFields(t *testing.T) {
args := tt.options.BuildCommandArgs()
for _, expectedArg := range tt.expected {
if !contains(args, expectedArg) {
if !testutil.Contains(args, expectedArg) {
t.Errorf("Expected argument %q not found in %v", expectedArg, args)
}
}
for _, excludedArg := range tt.excluded {
if contains(args, excludedArg) {
if testutil.Contains(args, excludedArg) {
t.Errorf("Excluded argument %q found in %v", excludedArg, args)
}
}
@@ -95,38 +62,8 @@ func TestBuildCommandArgs_BooleanFields(t *testing.T) {
}
}
func TestBuildCommandArgs_NumericFields(t *testing.T) {
options := llamacpp.LlamaServerOptions{
Port: 8080,
Threads: 4,
CtxSize: 2048,
GPULayers: 16,
Temperature: 0.7,
TopK: 40,
TopP: 0.9,
}
args := options.BuildCommandArgs()
expectedPairs := map[string]string{
"--port": "8080",
"--threads": "4",
"--ctx-size": "2048",
"--gpu-layers": "16",
"--temp": "0.7",
"--top-k": "40",
"--top-p": "0.9",
}
for flag, expectedValue := range expectedPairs {
if !containsFlagWithValue(args, flag, expectedValue) {
t.Errorf("Expected %s %s, not found in %v", flag, expectedValue, args)
}
}
}
func TestBuildCommandArgs_ZeroValues(t *testing.T) {
options := llamacpp.LlamaServerOptions{
func TestLlamaCppBuildCommandArgs_ZeroValues(t *testing.T) {
options := backends.LlamaServerOptions{
Port: 0, // Should be excluded
Threads: 0, // Should be excluded
Temperature: 0, // Should be excluded
@@ -146,14 +83,14 @@ func TestBuildCommandArgs_ZeroValues(t *testing.T) {
}
for _, excludedArg := range excludedArgs {
if contains(args, excludedArg) {
if testutil.Contains(args, excludedArg) {
t.Errorf("Zero value argument %q should not be present in %v", excludedArg, args)
}
}
}
func TestBuildCommandArgs_ArrayFields(t *testing.T) {
options := llamacpp.LlamaServerOptions{
func TestLlamaCppBuildCommandArgs_ArrayFields(t *testing.T) {
options := backends.LlamaServerOptions{
Lora: []string{"adapter1.bin", "adapter2.bin"},
OverrideTensor: []string{"tensor1", "tensor2", "tensor3"},
DrySequenceBreaker: []string{".", "!", "?"},
@@ -170,15 +107,15 @@ func TestBuildCommandArgs_ArrayFields(t *testing.T) {
for flag, values := range expectedOccurrences {
for _, value := range values {
if !containsFlagWithValue(args, flag, value) {
if !testutil.ContainsFlagWithValue(args, flag, value) {
t.Errorf("Expected %s %s, not found in %v", flag, value, args)
}
}
}
}
func TestBuildCommandArgs_EmptyArrays(t *testing.T) {
options := llamacpp.LlamaServerOptions{
func TestLlamaCppBuildCommandArgs_EmptyArrays(t *testing.T) {
options := backends.LlamaServerOptions{
Lora: []string{}, // Empty array should not generate args
OverrideTensor: []string{}, // Empty array should not generate args
}
@@ -187,43 +124,13 @@ func TestBuildCommandArgs_EmptyArrays(t *testing.T) {
excludedArgs := []string{"--lora", "--override-tensor"}
for _, excludedArg := range excludedArgs {
if contains(args, excludedArg) {
if testutil.Contains(args, excludedArg) {
t.Errorf("Empty array should not generate argument %q in %v", excludedArg, args)
}
}
}
func TestBuildCommandArgs_FieldNameConversion(t *testing.T) {
// Test snake_case to kebab-case conversion
options := llamacpp.LlamaServerOptions{
CtxSize: 4096,
GPULayers: 32,
ThreadsBatch: 2,
FlashAttn: true,
TopK: 40,
TopP: 0.9,
}
args := options.BuildCommandArgs()
// Check that field names are properly converted
expectedFlags := []string{
"--ctx-size", // ctx_size -> ctx-size
"--gpu-layers", // gpu_layers -> gpu-layers
"--threads-batch", // threads_batch -> threads-batch
"--flash-attn", // flash_attn -> flash-attn
"--top-k", // top_k -> top-k
"--top-p", // top_p -> top-p
}
for _, flag := range expectedFlags {
if !contains(args, flag) {
t.Errorf("Expected flag %q not found in %v", flag, args)
}
}
}
func TestUnmarshalJSON_StandardFields(t *testing.T) {
func TestLlamaCppUnmarshalJSON_StandardFields(t *testing.T) {
jsonData := `{
"model": "/path/to/model.gguf",
"port": 8080,
@@ -234,7 +141,7 @@ func TestUnmarshalJSON_StandardFields(t *testing.T) {
"temp": 0.7
}`
var options llamacpp.LlamaServerOptions
var options backends.LlamaServerOptions
err := json.Unmarshal([]byte(jsonData), &options)
if err != nil {
t.Fatalf("Unmarshal failed: %v", err)
@@ -263,16 +170,16 @@ func TestUnmarshalJSON_StandardFields(t *testing.T) {
}
}
func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
func TestLlamaCppUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
tests := []struct {
name string
jsonData string
checkFn func(llamacpp.LlamaServerOptions) error
checkFn func(backends.LlamaServerOptions) error
}{
{
name: "threads alternatives",
jsonData: `{"t": 4, "tb": 2}`,
checkFn: func(opts llamacpp.LlamaServerOptions) error {
checkFn: func(opts backends.LlamaServerOptions) error {
if opts.Threads != 4 {
return fmt.Errorf("expected threads 4, got %d", opts.Threads)
}
@@ -285,7 +192,7 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
{
name: "context size alternatives",
jsonData: `{"c": 2048}`,
checkFn: func(opts llamacpp.LlamaServerOptions) error {
checkFn: func(opts backends.LlamaServerOptions) error {
if opts.CtxSize != 2048 {
return fmt.Errorf("expected ctx_size 4096, got %d", opts.CtxSize)
}
@@ -295,7 +202,7 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
{
name: "gpu layers alternatives",
jsonData: `{"ngl": 16}`,
checkFn: func(opts llamacpp.LlamaServerOptions) error {
checkFn: func(opts backends.LlamaServerOptions) error {
if opts.GPULayers != 16 {
return fmt.Errorf("expected gpu_layers 32, got %d", opts.GPULayers)
}
@@ -305,7 +212,7 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
{
name: "model alternatives",
jsonData: `{"m": "/path/model.gguf"}`,
checkFn: func(opts llamacpp.LlamaServerOptions) error {
checkFn: func(opts backends.LlamaServerOptions) error {
if opts.Model != "/path/model.gguf" {
return fmt.Errorf("expected model '/path/model.gguf', got %q", opts.Model)
}
@@ -315,7 +222,7 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
{
name: "temperature alternatives",
jsonData: `{"temp": 0.8}`,
checkFn: func(opts llamacpp.LlamaServerOptions) error {
checkFn: func(opts backends.LlamaServerOptions) error {
if opts.Temperature != 0.8 {
return fmt.Errorf("expected temperature 0.8, got %f", opts.Temperature)
}
@@ -326,7 +233,7 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var options llamacpp.LlamaServerOptions
var options backends.LlamaServerOptions
err := json.Unmarshal([]byte(tt.jsonData), &options)
if err != nil {
t.Fatalf("Unmarshal failed: %v", err)
@@ -339,24 +246,24 @@ func TestUnmarshalJSON_AlternativeFieldNames(t *testing.T) {
}
}
func TestUnmarshalJSON_InvalidJSON(t *testing.T) {
func TestLlamaCppUnmarshalJSON_InvalidJSON(t *testing.T) {
invalidJSON := `{"port": "not-a-number", "invalid": syntax}`
var options llamacpp.LlamaServerOptions
var options backends.LlamaServerOptions
err := json.Unmarshal([]byte(invalidJSON), &options)
if err == nil {
t.Error("Expected error for invalid JSON")
}
}
func TestUnmarshalJSON_ArrayFields(t *testing.T) {
func TestLlamaCppUnmarshalJSON_ArrayFields(t *testing.T) {
jsonData := `{
"lora": ["adapter1.bin", "adapter2.bin"],
"override_tensor": ["tensor1", "tensor2"],
"dry_sequence_breaker": [".", "!", "?"]
}`
var options llamacpp.LlamaServerOptions
var options backends.LlamaServerOptions
err := json.Unmarshal([]byte(jsonData), &options)
if err != nil {
t.Fatalf("Unmarshal failed: %v", err)
@@ -378,19 +285,152 @@ func TestUnmarshalJSON_ArrayFields(t *testing.T) {
}
}
// Helper functions
func contains(slice []string, item string) bool {
return slices.Contains(slice, item)
func TestParseLlamaCommand(t *testing.T) {
tests := []struct {
name string
command string
expectErr bool
validate func(*testing.T, *backends.LlamaServerOptions)
}{
{
name: "basic command",
command: "llama-server --model /path/to/model.gguf --gpu-layers 32",
expectErr: false,
validate: func(t *testing.T, opts *backends.LlamaServerOptions) {
if opts.Model != "/path/to/model.gguf" {
t.Errorf("expected model '/path/to/model.gguf', got '%s'", opts.Model)
}
if opts.GPULayers != 32 {
t.Errorf("expected gpu_layers 32, got %d", opts.GPULayers)
}
},
},
{
name: "args only",
command: "--model /path/to/model.gguf --ctx-size 4096",
expectErr: false,
validate: func(t *testing.T, opts *backends.LlamaServerOptions) {
if opts.Model != "/path/to/model.gguf" {
t.Errorf("expected model '/path/to/model.gguf', got '%s'", opts.Model)
}
if opts.CtxSize != 4096 {
t.Errorf("expected ctx_size 4096, got %d", opts.CtxSize)
}
},
},
{
name: "mixed flag formats",
command: "llama-server --model=/path/model.gguf --gpu-layers 16 --verbose",
expectErr: false,
validate: func(t *testing.T, opts *backends.LlamaServerOptions) {
if opts.Model != "/path/model.gguf" {
t.Errorf("expected model '/path/model.gguf', got '%s'", opts.Model)
}
if opts.GPULayers != 16 {
t.Errorf("expected gpu_layers 16, got %d", opts.GPULayers)
}
if !opts.Verbose {
t.Errorf("expected verbose to be true")
}
},
},
{
name: "quoted strings",
command: `llama-server --model test.gguf --api-key "sk-1234567890abcdef"`,
expectErr: false,
validate: func(t *testing.T, opts *backends.LlamaServerOptions) {
if opts.APIKey != "sk-1234567890abcdef" {
t.Errorf("expected api_key 'sk-1234567890abcdef', got '%s'", opts.APIKey)
}
},
},
{
name: "multiple value types",
command: "llama-server --model /test/model.gguf --gpu-layers 32 --temp 0.7 --verbose --no-mmap",
expectErr: false,
validate: func(t *testing.T, opts *backends.LlamaServerOptions) {
if opts.Model != "/test/model.gguf" {
t.Errorf("expected model '/test/model.gguf', got '%s'", opts.Model)
}
if opts.GPULayers != 32 {
t.Errorf("expected gpu_layers 32, got %d", opts.GPULayers)
}
if opts.Temperature != 0.7 {
t.Errorf("expected temperature 0.7, got %f", opts.Temperature)
}
if !opts.Verbose {
t.Errorf("expected verbose to be true")
}
if !opts.NoMmap {
t.Errorf("expected no_mmap to be true")
}
},
},
{
name: "empty command",
command: "",
expectErr: true,
},
{
name: "unterminated quote",
command: `llama-server --model test.gguf --api-key "unterminated`,
expectErr: true,
},
{
name: "malformed flag",
command: "llama-server ---model test.gguf",
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var opts backends.LlamaServerOptions
resultAny, err := opts.ParseCommand(tt.command)
result, _ := resultAny.(*backends.LlamaServerOptions)
if tt.expectErr {
if err == nil {
t.Errorf("expected error but got none")
}
return
}
if err != nil {
t.Errorf("unexpected error: %v", err)
return
}
if result == nil {
t.Errorf("expected result but got nil")
return
}
if tt.validate != nil {
tt.validate(t, result)
}
})
}
}
func containsFlagWithValue(args []string, flag, value string) bool {
for i, arg := range args {
if arg == flag {
// Check if there's a next argument and it matches the expected value
if i+1 < len(args) && args[i+1] == value {
return true
}
func TestParseLlamaCommandArrays(t *testing.T) {
command := "llama-server --model test.gguf --lora adapter1.bin --lora=adapter2.bin"
var opts backends.LlamaServerOptions
resultAny, err := opts.ParseCommand(command)
result, _ := resultAny.(*backends.LlamaServerOptions)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(result.Lora) != 2 {
t.Errorf("expected 2 lora adapters, got %d", len(result.Lora))
}
expected := []string{"adapter1.bin", "adapter2.bin"}
for i, v := range expected {
if result.Lora[i] != v {
t.Errorf("expected lora[%d]=%s got %s", i, v, result.Lora[i])
}
}
return false
}

90
pkg/backends/mlx.go Normal file
View File

@@ -0,0 +1,90 @@
package backends
import (
"fmt"
"llamactl/pkg/validation"
)
type MlxServerOptions struct {
// Basic connection options
Model string `json:"model,omitempty"`
Host string `json:"host,omitempty"`
Port int `json:"port,omitempty"`
// Model and adapter options
AdapterPath string `json:"adapter_path,omitempty"`
DraftModel string `json:"draft_model,omitempty"`
NumDraftTokens int `json:"num_draft_tokens,omitempty"`
TrustRemoteCode bool `json:"trust_remote_code,omitempty"`
// Logging and templates
LogLevel string `json:"log_level,omitempty"`
ChatTemplate string `json:"chat_template,omitempty"`
UseDefaultChatTemplate bool `json:"use_default_chat_template,omitempty"`
ChatTemplateArgs string `json:"chat_template_args,omitempty"` // JSON string
// Sampling defaults
Temp float64 `json:"temp,omitempty"`
TopP float64 `json:"top_p,omitempty"`
TopK int `json:"top_k,omitempty"`
MinP float64 `json:"min_p,omitempty"`
MaxTokens int `json:"max_tokens,omitempty"`
}
func (o *MlxServerOptions) GetPort() int {
return o.Port
}
func (o *MlxServerOptions) SetPort(port int) {
o.Port = port
}
func (o *MlxServerOptions) GetHost() string {
return o.Host
}
func (o *MlxServerOptions) Validate() error {
if o == nil {
return validation.ValidationError(fmt.Errorf("MLX server options cannot be nil for MLX backend"))
}
if err := validation.ValidateStructStrings(o, ""); err != nil {
return err
}
// Basic network validation for port
if o.Port < 0 || o.Port > 65535 {
return validation.ValidationError(fmt.Errorf("invalid port range: %d", o.Port))
}
return nil
}
// BuildCommandArgs converts to command line arguments
func (o *MlxServerOptions) BuildCommandArgs() []string {
multipleFlags := map[string]struct{}{} // MLX doesn't currently have []string fields
return BuildCommandArgs(o, multipleFlags)
}
func (o *MlxServerOptions) BuildDockerArgs() []string {
return []string{}
}
// ParseCommand parses a mlx_lm.server command string into MlxServerOptions
// Supports multiple formats:
// 1. Full command: "mlx_lm.server --model model/path"
// 2. Full path: "/usr/local/bin/mlx_lm.server --model model/path"
// 3. Args only: "--model model/path --host 0.0.0.0"
// 4. Multiline commands with backslashes
func (o *MlxServerOptions) ParseCommand(command string) (any, error) {
executableNames := []string{"mlx_lm.server"}
var subcommandNames []string // MLX has no subcommands
multiValuedFlags := map[string]struct{}{} // MLX has no multi-valued flags
var mlxOptions MlxServerOptions
if err := parseCommand(command, executableNames, subcommandNames, multiValuedFlags, &mlxOptions); err != nil {
return nil, err
}
return &mlxOptions, nil
}

204
pkg/backends/mlx_test.go Normal file
View File

@@ -0,0 +1,204 @@
package backends_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/testutil"
"testing"
)
func TestParseMlxCommand(t *testing.T) {
tests := []struct {
name string
command string
expectErr bool
validate func(*testing.T, *backends.MlxServerOptions)
}{
{
name: "basic command",
command: "mlx_lm.server --model /path/to/model --host 0.0.0.0",
expectErr: false,
validate: func(t *testing.T, opts *backends.MlxServerOptions) {
if opts.Model != "/path/to/model" {
t.Errorf("expected model '/path/to/model', got '%s'", opts.Model)
}
if opts.Host != "0.0.0.0" {
t.Errorf("expected host '0.0.0.0', got '%s'", opts.Host)
}
},
},
{
name: "args only",
command: "--model /path/to/model --port 8080",
expectErr: false,
validate: func(t *testing.T, opts *backends.MlxServerOptions) {
if opts.Model != "/path/to/model" {
t.Errorf("expected model '/path/to/model', got '%s'", opts.Model)
}
if opts.Port != 8080 {
t.Errorf("expected port 8080, got %d", opts.Port)
}
},
},
{
name: "mixed flag formats",
command: "mlx_lm.server --model=/path/model --temp=0.7 --trust-remote-code",
expectErr: false,
validate: func(t *testing.T, opts *backends.MlxServerOptions) {
if opts.Model != "/path/model" {
t.Errorf("expected model '/path/model', got '%s'", opts.Model)
}
if opts.Temp != 0.7 {
t.Errorf("expected temp 0.7, got %f", opts.Temp)
}
if !opts.TrustRemoteCode {
t.Errorf("expected trust_remote_code to be true")
}
},
},
{
name: "multiple value types",
command: "mlx_lm.server --model /test/model.mlx --port 8080 --temp 0.7 --trust-remote-code --log-level DEBUG",
expectErr: false,
validate: func(t *testing.T, opts *backends.MlxServerOptions) {
if opts.Model != "/test/model.mlx" {
t.Errorf("expected model '/test/model.mlx', got '%s'", opts.Model)
}
if opts.Port != 8080 {
t.Errorf("expected port 8080, got %d", opts.Port)
}
if opts.Temp != 0.7 {
t.Errorf("expected temp 0.7, got %f", opts.Temp)
}
if !opts.TrustRemoteCode {
t.Errorf("expected trust_remote_code to be true")
}
if opts.LogLevel != "DEBUG" {
t.Errorf("expected log_level 'DEBUG', got '%s'", opts.LogLevel)
}
},
},
{
name: "empty command",
command: "",
expectErr: true,
},
{
name: "unterminated quote",
command: `mlx_lm.server --model test.mlx --chat-template "unterminated`,
expectErr: true,
},
{
name: "malformed flag",
command: "mlx_lm.server ---model test.mlx",
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var opts backends.MlxServerOptions
resultAny, err := opts.ParseCommand(tt.command)
result, _ := resultAny.(*backends.MlxServerOptions)
if tt.expectErr {
if err == nil {
t.Errorf("expected error but got none")
}
return
}
if err != nil {
t.Errorf("unexpected error: %v", err)
return
}
if result == nil {
t.Errorf("expected result but got nil")
return
}
if tt.validate != nil {
tt.validate(t, result)
}
})
}
}
func TestMlxBuildCommandArgs_BooleanFields(t *testing.T) {
tests := []struct {
name string
options backends.MlxServerOptions
expected []string
excluded []string
}{
{
name: "trust_remote_code true",
options: backends.MlxServerOptions{
TrustRemoteCode: true,
},
expected: []string{"--trust-remote-code"},
},
{
name: "trust_remote_code false",
options: backends.MlxServerOptions{
TrustRemoteCode: false,
},
excluded: []string{"--trust-remote-code"},
},
{
name: "multiple booleans",
options: backends.MlxServerOptions{
TrustRemoteCode: true,
UseDefaultChatTemplate: true,
},
expected: []string{"--trust-remote-code", "--use-default-chat-template"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
args := tt.options.BuildCommandArgs()
for _, expectedArg := range tt.expected {
if !testutil.Contains(args, expectedArg) {
t.Errorf("Expected argument %q not found in %v", expectedArg, args)
}
}
for _, excludedArg := range tt.excluded {
if testutil.Contains(args, excludedArg) {
t.Errorf("Excluded argument %q found in %v", excludedArg, args)
}
}
})
}
}
func TestMlxBuildCommandArgs_ZeroValues(t *testing.T) {
options := backends.MlxServerOptions{
Port: 0, // Should be excluded
TopK: 0, // Should be excluded
Temp: 0, // Should be excluded
Model: "", // Should be excluded
LogLevel: "", // Should be excluded
TrustRemoteCode: false, // Should be excluded
}
args := options.BuildCommandArgs()
// Zero values should not appear in arguments
excludedArgs := []string{
"--port", "0",
"--top-k", "0",
"--temp", "0",
"--model", "",
"--log-level", "",
"--trust-remote-code",
}
for _, excludedArg := range excludedArgs {
if testutil.Contains(args, excludedArg) {
t.Errorf("Zero value argument %q should not be present in %v", excludedArg, args)
}
}
}

213
pkg/backends/parser.go Normal file
View File

@@ -0,0 +1,213 @@
package backends
import (
"encoding/json"
"fmt"
"path/filepath"
"regexp"
"strconv"
"strings"
)
// parseCommand parses a command string into a target struct
func parseCommand(command string, executableNames []string, subcommandNames []string, multiValuedFlags map[string]struct{}, target any) error {
// Normalize multiline commands
command = normalizeCommand(command)
if command == "" {
return fmt.Errorf("command cannot be empty")
}
// Extract arguments and positional model
args, modelFromPositional, err := extractArgs(command, executableNames, subcommandNames)
if err != nil {
return err
}
// Parse flags into map
options, err := parseFlags(args, multiValuedFlags)
if err != nil {
return err
}
// If we found a positional model and no --model flag was provided, set the model
if modelFromPositional != "" {
if _, hasModelFlag := options["model"]; !hasModelFlag {
options["model"] = modelFromPositional
}
}
// Convert to target struct via JSON
jsonData, err := json.Marshal(options)
if err != nil {
return fmt.Errorf("failed to marshal options: %w", err)
}
if err := json.Unmarshal(jsonData, target); err != nil {
return fmt.Errorf("failed to unmarshal to target: %w", err)
}
return nil
}
// normalizeCommand handles multiline commands with backslashes
func normalizeCommand(command string) string {
re := regexp.MustCompile(`\\\s*\n\s*`)
normalized := re.ReplaceAllString(command, " ")
re = regexp.MustCompile(`\s+`)
return strings.TrimSpace(re.ReplaceAllString(normalized, " "))
}
// extractArgs extracts arguments from command, removing executable and subcommands
// Returns: args, modelFromPositional, error
func extractArgs(command string, executableNames []string, subcommandNames []string) ([]string, string, error) {
// Check for unterminated quotes
if strings.Count(command, `"`)%2 != 0 || strings.Count(command, `'`)%2 != 0 {
return nil, "", fmt.Errorf("unterminated quoted string")
}
tokens := strings.Fields(command)
if len(tokens) == 0 {
return nil, "", fmt.Errorf("no tokens found")
}
// Skip executable
start := 0
firstToken := tokens[0]
// Check for executable name (with or without path)
if strings.Contains(firstToken, string(filepath.Separator)) {
baseName := filepath.Base(firstToken)
for _, execName := range executableNames {
if strings.HasSuffix(strings.ToLower(baseName), strings.ToLower(execName)) {
start = 1
break
}
}
} else {
for _, execName := range executableNames {
if strings.EqualFold(firstToken, execName) {
start = 1
break
}
}
}
// Skip subcommand if present
if start < len(tokens) {
for _, subCmd := range subcommandNames {
if strings.EqualFold(tokens[start], subCmd) {
start++
break
}
}
}
// Handle case where command starts with subcommand (no executable)
if start == 0 {
for _, subCmd := range subcommandNames {
if strings.EqualFold(firstToken, subCmd) {
start = 1
break
}
}
}
args := tokens[start:]
// Extract first positional argument (model) if present and not a flag
var modelFromPositional string
if len(args) > 0 && !strings.HasPrefix(args[0], "-") {
modelFromPositional = args[0]
args = args[1:] // Remove the model from args to process remaining flags
}
return args, modelFromPositional, nil
}
// parseFlags parses command line flags into a map
func parseFlags(args []string, multiValuedFlags map[string]struct{}) (map[string]any, error) {
options := make(map[string]any)
for i := 0; i < len(args); i++ {
arg := args[i]
if !strings.HasPrefix(arg, "-") {
continue
}
// Check for malformed flags (more than two leading dashes)
if strings.HasPrefix(arg, "---") {
return nil, fmt.Errorf("malformed flag: %s", arg)
}
// Get flag name and value
var flagName, value string
var hasValue bool
if strings.Contains(arg, "=") {
parts := strings.SplitN(arg, "=", 2)
flagName = strings.TrimLeft(parts[0], "-")
value = parts[1]
hasValue = true
} else {
flagName = strings.TrimLeft(arg, "-")
if i+1 < len(args) && !strings.HasPrefix(args[i+1], "-") {
value = args[i+1]
hasValue = true
i++ // Skip next arg since we consumed it
}
}
// Convert kebab-case to snake_case for JSON
flagName = strings.ReplaceAll(flagName, "-", "_")
if hasValue {
// Handle multi-valued flags
if _, isMultiValue := multiValuedFlags[flagName]; isMultiValue {
if existing, ok := options[flagName].([]string); ok {
options[flagName] = append(existing, value)
} else {
options[flagName] = []string{value}
}
} else {
options[flagName] = parseValue(value)
}
} else {
// Boolean flag
options[flagName] = true
}
}
return options, nil
}
// parseValue converts string to appropriate type
func parseValue(value string) any {
// Remove quotes
if len(value) >= 2 {
if (value[0] == '"' && value[len(value)-1] == '"') || (value[0] == '\'' && value[len(value)-1] == '\'') {
value = value[1 : len(value)-1]
}
}
// Try boolean
switch strings.ToLower(value) {
case "true":
return true
case "false":
return false
}
// Try integer
if intVal, err := strconv.Atoi(value); err == nil {
return intVal
}
// Try float
if floatVal, err := strconv.ParseFloat(value, 64); err == nil {
return floatVal
}
// Return as string
return value
}

226
pkg/backends/vllm.go Normal file
View File

@@ -0,0 +1,226 @@
package backends
import (
"fmt"
"llamactl/pkg/validation"
)
// vllmMultiValuedFlags defines flags that should be repeated for each value rather than comma-separated
// Based on vLLM's CLI argument definitions with action='append' or List types
// Keys use snake_case as the parser converts kebab-case flags to snake_case before lookup
var vllmMultiValuedFlags = map[string]struct{}{
"api_key": {}, // --api-key (action='append')
"allowed_origins": {}, // --allowed-origins (List type)
"allowed_methods": {}, // --allowed-methods (List type)
"allowed_headers": {}, // --allowed-headers (List type)
"middleware": {}, // --middleware (action='append')
"lora_modules": {}, // --lora-modules (custom LoRAParserAction, accepts multiple)
"prompt_adapters": {}, // --prompt-adapters (similar to lora-modules, accepts multiple)
}
type VllmServerOptions struct {
// Basic connection options (auto-assigned by llamactl)
Host string `json:"host,omitempty"`
Port int `json:"port,omitempty"`
// Model and engine configuration
Model string `json:"model,omitempty"`
Tokenizer string `json:"tokenizer,omitempty"`
SkipTokenizerInit bool `json:"skip_tokenizer_init,omitempty"`
Revision string `json:"revision,omitempty"`
CodeRevision string `json:"code_revision,omitempty"`
TokenizerRevision string `json:"tokenizer_revision,omitempty"`
TokenizerMode string `json:"tokenizer_mode,omitempty"`
TrustRemoteCode bool `json:"trust_remote_code,omitempty"`
DownloadDir string `json:"download_dir,omitempty"`
LoadFormat string `json:"load_format,omitempty"`
ConfigFormat string `json:"config_format,omitempty"`
Dtype string `json:"dtype,omitempty"`
KVCacheDtype string `json:"kv_cache_dtype,omitempty"`
QuantizationParamPath string `json:"quantization_param_path,omitempty"`
Seed int `json:"seed,omitempty"`
MaxModelLen int `json:"max_model_len,omitempty"`
GuidedDecodingBackend string `json:"guided_decoding_backend,omitempty"`
DistributedExecutorBackend string `json:"distributed_executor_backend,omitempty"`
WorkerUseRay bool `json:"worker_use_ray,omitempty"`
RayWorkersUseNSight bool `json:"ray_workers_use_nsight,omitempty"`
// Performance and serving configuration
BlockSize int `json:"block_size,omitempty"`
EnablePrefixCaching bool `json:"enable_prefix_caching,omitempty"`
DisableSlidingWindow bool `json:"disable_sliding_window,omitempty"`
UseV2BlockManager bool `json:"use_v2_block_manager,omitempty"`
NumLookaheadSlots int `json:"num_lookahead_slots,omitempty"`
SwapSpace int `json:"swap_space,omitempty"`
CPUOffloadGB int `json:"cpu_offload_gb,omitempty"`
GPUMemoryUtilization float64 `json:"gpu_memory_utilization,omitempty"`
NumGPUBlocksOverride int `json:"num_gpu_blocks_override,omitempty"`
MaxNumBatchedTokens int `json:"max_num_batched_tokens,omitempty"`
MaxNumSeqs int `json:"max_num_seqs,omitempty"`
MaxLogprobs int `json:"max_logprobs,omitempty"`
DisableLogStats bool `json:"disable_log_stats,omitempty"`
Quantization string `json:"quantization,omitempty"`
RopeScaling string `json:"rope_scaling,omitempty"`
RopeTheta float64 `json:"rope_theta,omitempty"`
EnforceEager bool `json:"enforce_eager,omitempty"`
MaxContextLenToCapture int `json:"max_context_len_to_capture,omitempty"`
MaxSeqLenToCapture int `json:"max_seq_len_to_capture,omitempty"`
DisableCustomAllReduce bool `json:"disable_custom_all_reduce,omitempty"`
TokenizerPoolSize int `json:"tokenizer_pool_size,omitempty"`
TokenizerPoolType string `json:"tokenizer_pool_type,omitempty"`
TokenizerPoolExtraConfig string `json:"tokenizer_pool_extra_config,omitempty"`
EnableLoraBias bool `json:"enable_lora_bias,omitempty"`
LoraExtraVocabSize int `json:"lora_extra_vocab_size,omitempty"`
LoraRank int `json:"lora_rank,omitempty"`
PromptLookbackDistance int `json:"prompt_lookback_distance,omitempty"`
PreemptionMode string `json:"preemption_mode,omitempty"`
// Distributed and parallel processing
TensorParallelSize int `json:"tensor_parallel_size,omitempty"`
PipelineParallelSize int `json:"pipeline_parallel_size,omitempty"`
MaxParallelLoadingWorkers int `json:"max_parallel_loading_workers,omitempty"`
DisableAsyncOutputProc bool `json:"disable_async_output_proc,omitempty"`
WorkerClass string `json:"worker_class,omitempty"`
EnabledLoraModules string `json:"enabled_lora_modules,omitempty"`
MaxLoraRank int `json:"max_lora_rank,omitempty"`
FullyShardedLoras bool `json:"fully_sharded_loras,omitempty"`
LoraModules string `json:"lora_modules,omitempty"`
PromptAdapters string `json:"prompt_adapters,omitempty"`
MaxPromptAdapterToken int `json:"max_prompt_adapter_token,omitempty"`
Device string `json:"device,omitempty"`
SchedulerDelay float64 `json:"scheduler_delay,omitempty"`
EnableChunkedPrefill bool `json:"enable_chunked_prefill,omitempty"`
SpeculativeModel string `json:"speculative_model,omitempty"`
SpeculativeModelQuantization string `json:"speculative_model_quantization,omitempty"`
SpeculativeRevision string `json:"speculative_revision,omitempty"`
SpeculativeMaxModelLen int `json:"speculative_max_model_len,omitempty"`
SpeculativeDisableByBatchSize int `json:"speculative_disable_by_batch_size,omitempty"`
NgptSpeculativeLength int `json:"ngpt_speculative_length,omitempty"`
SpeculativeDisableMqa bool `json:"speculative_disable_mqa,omitempty"`
ModelLoaderExtraConfig string `json:"model_loader_extra_config,omitempty"`
IgnorePatterns string `json:"ignore_patterns,omitempty"`
PreloadedLoraModules string `json:"preloaded_lora_modules,omitempty"`
// OpenAI server specific options
UDS string `json:"uds,omitempty"`
UvicornLogLevel string `json:"uvicorn_log_level,omitempty"`
ResponseRole string `json:"response_role,omitempty"`
SSLKeyfile string `json:"ssl_keyfile,omitempty"`
SSLCertfile string `json:"ssl_certfile,omitempty"`
SSLCACerts string `json:"ssl_ca_certs,omitempty"`
SSLCertReqs int `json:"ssl_cert_reqs,omitempty"`
RootPath string `json:"root_path,omitempty"`
Middleware []string `json:"middleware,omitempty"`
ReturnTokensAsTokenIDS bool `json:"return_tokens_as_token_ids,omitempty"`
DisableFrontendMultiprocessing bool `json:"disable_frontend_multiprocessing,omitempty"`
EnableAutoToolChoice bool `json:"enable_auto_tool_choice,omitempty"`
ToolCallParser string `json:"tool_call_parser,omitempty"`
ToolServer string `json:"tool_server,omitempty"`
ChatTemplate string `json:"chat_template,omitempty"`
ChatTemplateContentFormat string `json:"chat_template_content_format,omitempty"`
AllowCredentials bool `json:"allow_credentials,omitempty"`
AllowedOrigins []string `json:"allowed_origins,omitempty"`
AllowedMethods []string `json:"allowed_methods,omitempty"`
AllowedHeaders []string `json:"allowed_headers,omitempty"`
APIKey []string `json:"api_key,omitempty"`
EnableLogOutputs bool `json:"enable_log_outputs,omitempty"`
EnableTokenUsage bool `json:"enable_token_usage,omitempty"`
EnableAsyncEngineDebug bool `json:"enable_async_engine_debug,omitempty"`
EngineUseRay bool `json:"engine_use_ray,omitempty"`
DisableLogRequests bool `json:"disable_log_requests,omitempty"`
MaxLogLen int `json:"max_log_len,omitempty"`
// Additional engine configuration
Task string `json:"task,omitempty"`
MultiModalConfig string `json:"multi_modal_config,omitempty"`
LimitMmPerPrompt string `json:"limit_mm_per_prompt,omitempty"`
EnableSleepMode bool `json:"enable_sleep_mode,omitempty"`
EnableChunkingRequest bool `json:"enable_chunking_request,omitempty"`
CompilationConfig string `json:"compilation_config,omitempty"`
DisableSlidingWindowMask bool `json:"disable_sliding_window_mask,omitempty"`
EnableTRTLLMEngineLatency bool `json:"enable_trtllm_engine_latency,omitempty"`
OverridePoolingConfig string `json:"override_pooling_config,omitempty"`
OverrideNeuronConfig string `json:"override_neuron_config,omitempty"`
OverrideKVCacheALIGNSize int `json:"override_kv_cache_align_size,omitempty"`
}
func (o *VllmServerOptions) GetPort() int {
return o.Port
}
func (o *VllmServerOptions) SetPort(port int) {
o.Port = port
}
func (o *VllmServerOptions) GetHost() string {
return o.Host
}
func (o *VllmServerOptions) Validate() error {
if o == nil {
return validation.ValidationError(fmt.Errorf("vLLM server options cannot be nil for vLLM backend"))
}
// Use reflection to check all string fields for injection patterns
if err := validation.ValidateStructStrings(o, ""); err != nil {
return err
}
// Basic network validation for port
if o.Port < 0 || o.Port > 65535 {
return validation.ValidationError(fmt.Errorf("invalid port range: %d", o.Port))
}
return nil
}
// BuildCommandArgs converts VllmServerOptions to command line arguments
// For vLLM native, model is a positional argument after "serve"
func (o *VllmServerOptions) BuildCommandArgs() []string {
var args []string
// Add model as positional argument if specified (for native execution)
if o.Model != "" {
args = append(args, o.Model)
}
// Create a copy without Model field to avoid --model flag
optionsCopy := *o
optionsCopy.Model = ""
// Use package-level multipleFlags variable
flagArgs := BuildCommandArgs(&optionsCopy, vllmMultiValuedFlags)
args = append(args, flagArgs...)
return args
}
func (o *VllmServerOptions) BuildDockerArgs() []string {
var args []string
// Use package-level multipleFlags variable
flagArgs := BuildCommandArgs(o, vllmMultiValuedFlags)
args = append(args, flagArgs...)
return args
}
// ParseCommand parses a vLLM serve command string into VllmServerOptions
// Supports multiple formats:
// 1. Full command: "vllm serve --model MODEL_NAME --other-args"
// 2. Full path: "/usr/local/bin/vllm serve --model MODEL_NAME"
// 3. Serve only: "serve --model MODEL_NAME --other-args"
// 4. Args only: "--model MODEL_NAME --other-args"
// 5. Multiline commands with backslashes
func (o *VllmServerOptions) ParseCommand(command string) (any, error) {
executableNames := []string{"vllm"}
subcommandNames := []string{"serve"}
var vllmOptions VllmServerOptions
if err := parseCommand(command, executableNames, subcommandNames, vllmMultiValuedFlags, &vllmOptions); err != nil {
return nil, err
}
return &vllmOptions, nil
}

323
pkg/backends/vllm_test.go Normal file
View File

@@ -0,0 +1,323 @@
package backends_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/testutil"
"testing"
)
func TestParseVllmCommand(t *testing.T) {
tests := []struct {
name string
command string
expectErr bool
validate func(*testing.T, *backends.VllmServerOptions)
}{
{
name: "basic vllm serve command",
command: "vllm serve microsoft/DialoGPT-medium",
expectErr: false,
validate: func(t *testing.T, opts *backends.VllmServerOptions) {
if opts.Model != "microsoft/DialoGPT-medium" {
t.Errorf("expected model 'microsoft/DialoGPT-medium', got '%s'", opts.Model)
}
},
},
{
name: "serve only command",
command: "serve microsoft/DialoGPT-medium",
expectErr: false,
validate: func(t *testing.T, opts *backends.VllmServerOptions) {
if opts.Model != "microsoft/DialoGPT-medium" {
t.Errorf("expected model 'microsoft/DialoGPT-medium', got '%s'", opts.Model)
}
},
},
{
name: "positional model with flags",
command: "vllm serve microsoft/DialoGPT-medium --tensor-parallel-size 2",
expectErr: false,
validate: func(t *testing.T, opts *backends.VllmServerOptions) {
if opts.Model != "microsoft/DialoGPT-medium" {
t.Errorf("expected model 'microsoft/DialoGPT-medium', got '%s'", opts.Model)
}
if opts.TensorParallelSize != 2 {
t.Errorf("expected tensor_parallel_size 2, got %d", opts.TensorParallelSize)
}
},
},
{
name: "model with path",
command: "vllm serve /path/to/model --gpu-memory-utilization 0.8",
expectErr: false,
validate: func(t *testing.T, opts *backends.VllmServerOptions) {
if opts.Model != "/path/to/model" {
t.Errorf("expected model '/path/to/model', got '%s'", opts.Model)
}
if opts.GPUMemoryUtilization != 0.8 {
t.Errorf("expected gpu_memory_utilization 0.8, got %f", opts.GPUMemoryUtilization)
}
},
},
{
name: "multiple value types",
command: "vllm serve test-model --tensor-parallel-size 4 --gpu-memory-utilization 0.8 --enable-log-outputs",
expectErr: false,
validate: func(t *testing.T, opts *backends.VllmServerOptions) {
if opts.Model != "test-model" {
t.Errorf("expected model 'test-model', got '%s'", opts.Model)
}
if opts.TensorParallelSize != 4 {
t.Errorf("expected tensor_parallel_size 4, got %d", opts.TensorParallelSize)
}
if opts.GPUMemoryUtilization != 0.8 {
t.Errorf("expected gpu_memory_utilization 0.8, got %f", opts.GPUMemoryUtilization)
}
if !opts.EnableLogOutputs {
t.Errorf("expected enable_log_outputs true, got %v", opts.EnableLogOutputs)
}
},
},
{
name: "empty command",
command: "",
expectErr: true,
},
{
name: "unterminated quote",
command: `vllm serve "unterminated`,
expectErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var opts backends.VllmServerOptions
resultAny, err := opts.ParseCommand(tt.command)
result, _ := resultAny.(*backends.VllmServerOptions)
if tt.expectErr {
if err == nil {
t.Errorf("expected error but got none")
}
return
}
if err != nil {
t.Errorf("unexpected error: %v", err)
return
}
if result == nil {
t.Errorf("expected result but got nil")
return
}
if tt.validate != nil {
tt.validate(t, result)
}
})
}
}
func TestParseVllmCommandArrays(t *testing.T) {
command := "vllm serve test-model --middleware auth.py --middleware=cors.py --api-key key1 --api-key key2"
var opts backends.VllmServerOptions
resultAny, err := opts.ParseCommand(command)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
result, ok := resultAny.(*backends.VllmServerOptions)
if !ok {
t.Fatalf("expected *VllmServerOptions, got %T", resultAny)
}
expectedMiddleware := []string{"auth.py", "cors.py"}
if len(result.Middleware) != len(expectedMiddleware) {
t.Errorf("expected %d middleware items, got %d", len(expectedMiddleware), len(result.Middleware))
}
for i, v := range expectedMiddleware {
if i >= len(result.Middleware) || result.Middleware[i] != v {
t.Errorf("expected middleware[%d]=%s got %s", i, v, result.Middleware[i])
}
}
expectedAPIKeys := []string{"key1", "key2"}
if len(result.APIKey) != len(expectedAPIKeys) {
t.Errorf("expected %d api keys, got %d", len(expectedAPIKeys), len(result.APIKey))
}
for i, v := range expectedAPIKeys {
if i >= len(result.APIKey) || result.APIKey[i] != v {
t.Errorf("expected api_key[%d]=%s got %s", i, v, result.APIKey[i])
}
}
}
func TestVllmBuildCommandArgs_BooleanFields(t *testing.T) {
tests := []struct {
name string
options backends.VllmServerOptions
expected []string
excluded []string
}{
{
name: "enable_log_outputs true",
options: backends.VllmServerOptions{
EnableLogOutputs: true,
},
expected: []string{"--enable-log-outputs"},
},
{
name: "enable_log_outputs false",
options: backends.VllmServerOptions{
EnableLogOutputs: false,
},
excluded: []string{"--enable-log-outputs"},
},
{
name: "multiple booleans",
options: backends.VllmServerOptions{
EnableLogOutputs: true,
TrustRemoteCode: true,
EnablePrefixCaching: true,
DisableLogStats: false,
},
expected: []string{"--enable-log-outputs", "--trust-remote-code", "--enable-prefix-caching"},
excluded: []string{"--disable-log-stats"},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
args := tt.options.BuildCommandArgs()
for _, expectedArg := range tt.expected {
if !testutil.Contains(args, expectedArg) {
t.Errorf("Expected argument %q not found in %v", expectedArg, args)
}
}
for _, excludedArg := range tt.excluded {
if testutil.Contains(args, excludedArg) {
t.Errorf("Excluded argument %q found in %v", excludedArg, args)
}
}
})
}
}
func TestVllmBuildCommandArgs_ZeroValues(t *testing.T) {
options := backends.VllmServerOptions{
Port: 0, // Should be excluded
TensorParallelSize: 0, // Should be excluded
GPUMemoryUtilization: 0, // Should be excluded
Model: "", // Should be excluded (positional arg)
Host: "", // Should be excluded
EnableLogOutputs: false, // Should be excluded
}
args := options.BuildCommandArgs()
// Zero values should not appear in arguments
excludedArgs := []string{
"--port", "0",
"--tensor-parallel-size", "0",
"--gpu-memory-utilization", "0",
"--host", "",
"--enable-log-outputs",
}
for _, excludedArg := range excludedArgs {
if testutil.Contains(args, excludedArg) {
t.Errorf("Zero value argument %q should not be present in %v", excludedArg, args)
}
}
// Model should not be present as positional arg when empty
if len(args) > 0 && args[0] == "" {
t.Errorf("Empty model should not be present as positional argument")
}
}
func TestVllmBuildCommandArgs_ArrayFields(t *testing.T) {
options := backends.VllmServerOptions{
AllowedOrigins: []string{"http://localhost:3000", "https://example.com"},
AllowedMethods: []string{"GET", "POST"},
Middleware: []string{"middleware1", "middleware2", "middleware3"},
}
args := options.BuildCommandArgs()
// Check that each array value appears with its flag
expectedOccurrences := map[string][]string{
"--allowed-origins": {"http://localhost:3000", "https://example.com"},
"--allowed-methods": {"GET", "POST"},
"--middleware": {"middleware1", "middleware2", "middleware3"},
}
for flag, values := range expectedOccurrences {
for _, value := range values {
if !testutil.ContainsFlagWithValue(args, flag, value) {
t.Errorf("Expected %s %s, not found in %v", flag, value, args)
}
}
}
}
func TestVllmBuildCommandArgs_EmptyArrays(t *testing.T) {
options := backends.VllmServerOptions{
AllowedOrigins: []string{}, // Empty array should not generate args
Middleware: []string{}, // Empty array should not generate args
}
args := options.BuildCommandArgs()
excludedArgs := []string{"--allowed-origins", "--middleware"}
for _, excludedArg := range excludedArgs {
if testutil.Contains(args, excludedArg) {
t.Errorf("Empty array should not generate argument %q in %v", excludedArg, args)
}
}
}
func TestVllmBuildCommandArgs_PositionalModel(t *testing.T) {
options := backends.VllmServerOptions{
Model: "microsoft/DialoGPT-medium",
Port: 8080,
Host: "localhost",
TensorParallelSize: 2,
GPUMemoryUtilization: 0.8,
EnableLogOutputs: true,
}
args := options.BuildCommandArgs()
// Check that model is the first positional argument (not a --model flag)
if len(args) == 0 || args[0] != "microsoft/DialoGPT-medium" {
t.Errorf("Expected model 'microsoft/DialoGPT-medium' as first positional argument, got args: %v", args)
}
// Check that --model flag is NOT present (since model should be positional)
if testutil.Contains(args, "--model") {
t.Errorf("Found --model flag, but model should be positional argument in args: %v", args)
}
// Check other flags
if !testutil.ContainsFlagWithValue(args, "--tensor-parallel-size", "2") {
t.Errorf("Expected --tensor-parallel-size 2 not found in %v", args)
}
if !testutil.ContainsFlagWithValue(args, "--gpu-memory-utilization", "0.8") {
t.Errorf("Expected --gpu-memory-utilization 0.8 not found in %v", args)
}
if !testutil.Contains(args, "--enable-log-outputs") {
t.Errorf("Expected --enable-log-outputs not found in %v", args)
}
if !testutil.ContainsFlagWithValue(args, "--host", "localhost") {
t.Errorf("Expected --host localhost not found in %v", args)
}
if !testutil.ContainsFlagWithValue(args, "--port", "8080") {
t.Errorf("Expected --port 8080 not found in %v", args)
}
}

View File

@@ -1,6 +1,7 @@
package config
import (
"log"
"os"
"path/filepath"
"runtime"
@@ -10,14 +11,41 @@ import (
"gopkg.in/yaml.v3"
)
// BackendSettings contains structured backend configuration
type BackendSettings struct {
Command string `yaml:"command"`
Args []string `yaml:"args"`
Environment map[string]string `yaml:"environment,omitempty"`
Docker *DockerSettings `yaml:"docker,omitempty"`
ResponseHeaders map[string]string `yaml:"response_headers,omitempty"`
}
// DockerSettings contains Docker-specific configuration
type DockerSettings struct {
Enabled bool `yaml:"enabled"`
Image string `yaml:"image"`
Args []string `yaml:"args"`
Environment map[string]string `yaml:"environment,omitempty"`
}
// BackendConfig contains backend executable configurations
type BackendConfig struct {
LlamaCpp BackendSettings `yaml:"llama-cpp"`
VLLM BackendSettings `yaml:"vllm"`
MLX BackendSettings `yaml:"mlx"`
}
// AppConfig represents the configuration for llamactl
type AppConfig struct {
Server ServerConfig `yaml:"server"`
Instances InstancesConfig `yaml:"instances"`
Auth AuthConfig `yaml:"auth"`
Version string `yaml:"-"`
CommitHash string `yaml:"-"`
BuildTime string `yaml:"-"`
Server ServerConfig `yaml:"server"`
Backends BackendConfig `yaml:"backends"`
Instances InstancesConfig `yaml:"instances"`
Auth AuthConfig `yaml:"auth"`
LocalNode string `yaml:"local_node,omitempty"`
Nodes map[string]NodeConfig `yaml:"nodes,omitempty"`
Version string `yaml:"-"`
CommitHash string `yaml:"-"`
BuildTime string `yaml:"-"`
}
// ServerConfig contains HTTP server configuration
@@ -31,8 +59,14 @@ type ServerConfig struct {
// Allowed origins for CORS (e.g., "http://localhost:3000")
AllowedOrigins []string `yaml:"allowed_origins"`
// Allowed headers for CORS (e.g., "Accept", "Authorization", "Content-Type", "X-CSRF-Token")
AllowedHeaders []string `yaml:"allowed_headers"`
// Enable Swagger UI for API documentation
EnableSwagger bool `yaml:"enable_swagger"`
// Response headers to send with responses
ResponseHeaders map[string]string `yaml:"response_headers,omitempty"`
}
// InstancesConfig contains instance management configuration
@@ -61,9 +95,6 @@ type InstancesConfig struct {
// Enable LRU eviction for instance logs
EnableLRUEviction bool `yaml:"enable_lru_eviction"`
// Path to llama-server executable
LlamaExecutable string `yaml:"llama_executable"`
// Default auto-restart setting for new instances
DefaultAutoRestart bool `yaml:"default_auto_restart"`
@@ -99,6 +130,11 @@ type AuthConfig struct {
ManagementKeys []string `yaml:"management_keys"`
}
type NodeConfig struct {
Address string `yaml:"address"`
APIKey string `yaml:"api_key,omitempty"`
}
// LoadConfig loads configuration with the following precedence:
// 1. Hardcoded defaults
// 2. Config file
@@ -110,18 +146,55 @@ func LoadConfig(configPath string) (AppConfig, error) {
Host: "0.0.0.0",
Port: 8080,
AllowedOrigins: []string{"*"}, // Default to allow all origins
AllowedHeaders: []string{"*"}, // Default to allow all headers
EnableSwagger: false,
},
LocalNode: "main",
Nodes: map[string]NodeConfig{},
Backends: BackendConfig{
LlamaCpp: BackendSettings{
Command: "llama-server",
Args: []string{},
Environment: map[string]string{},
Docker: &DockerSettings{
Enabled: false,
Image: "ghcr.io/ggml-org/llama.cpp:server",
Args: []string{
"run", "--rm", "--network", "host", "--gpus", "all",
"-v", filepath.Join(getDefaultDataDirectory(), "llama.cpp") + ":/root/.cache/llama.cpp"},
Environment: map[string]string{},
},
},
VLLM: BackendSettings{
Command: "vllm",
Args: []string{"serve"},
Docker: &DockerSettings{
Enabled: false,
Image: "vllm/vllm-openai:latest",
Args: []string{
"run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g",
"-v", filepath.Join(getDefaultDataDirectory(), "huggingface") + ":/root/.cache/huggingface",
},
Environment: map[string]string{},
},
},
MLX: BackendSettings{
Command: "mlx_lm.server",
Args: []string{},
// No Docker section for MLX - not supported
},
},
Instances: InstancesConfig{
PortRange: [2]int{8000, 9000},
DataDir: getDefaultDataDirectory(),
InstancesDir: filepath.Join(getDefaultDataDirectory(), "instances"),
LogsDir: filepath.Join(getDefaultDataDirectory(), "logs"),
PortRange: [2]int{8000, 9000},
DataDir: getDefaultDataDirectory(),
// NOTE: empty strings are set as placeholder values since InstancesDir and LogsDir
// should be relative path to DataDir if not explicitly set.
InstancesDir: "",
LogsDir: "",
AutoCreateDirs: true,
MaxInstances: -1, // -1 means unlimited
MaxRunningInstances: -1, // -1 means unlimited
EnableLRUEviction: true,
LlamaExecutable: "llama-server",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
@@ -142,9 +215,22 @@ func LoadConfig(configPath string) (AppConfig, error) {
return cfg, err
}
// If local node is not defined in nodes, add it with default config
if _, ok := cfg.Nodes[cfg.LocalNode]; !ok {
cfg.Nodes[cfg.LocalNode] = NodeConfig{}
}
// 3. Override with environment variables
loadEnvVars(&cfg)
// If InstancesDir or LogsDir is not set, set it to relative path of DataDir
if cfg.Instances.InstancesDir == "" {
cfg.Instances.InstancesDir = filepath.Join(cfg.Instances.DataDir, "instances")
}
if cfg.Instances.LogsDir == "" {
cfg.Instances.LogsDir = filepath.Join(cfg.Instances.DataDir, "logs")
}
return cfg, nil
}
@@ -165,6 +251,7 @@ func loadConfigFile(cfg *AppConfig, configPath string) error {
if err := yaml.Unmarshal(data, cfg); err != nil {
return err
}
log.Printf("Read config at %s", path)
return nil
}
}
@@ -229,9 +316,126 @@ func loadEnvVars(cfg *AppConfig) {
cfg.Instances.EnableLRUEviction = b
}
}
if llamaExec := os.Getenv("LLAMACTL_LLAMA_EXECUTABLE"); llamaExec != "" {
cfg.Instances.LlamaExecutable = llamaExec
// Backend config
// LlamaCpp backend
if llamaCmd := os.Getenv("LLAMACTL_LLAMACPP_COMMAND"); llamaCmd != "" {
cfg.Backends.LlamaCpp.Command = llamaCmd
}
if llamaArgs := os.Getenv("LLAMACTL_LLAMACPP_ARGS"); llamaArgs != "" {
cfg.Backends.LlamaCpp.Args = strings.Split(llamaArgs, " ")
}
if llamaEnv := os.Getenv("LLAMACTL_LLAMACPP_ENV"); llamaEnv != "" {
if cfg.Backends.LlamaCpp.Environment == nil {
cfg.Backends.LlamaCpp.Environment = make(map[string]string)
}
parseEnvVars(llamaEnv, cfg.Backends.LlamaCpp.Environment)
}
if llamaDockerEnabled := os.Getenv("LLAMACTL_LLAMACPP_DOCKER_ENABLED"); llamaDockerEnabled != "" {
if b, err := strconv.ParseBool(llamaDockerEnabled); err == nil {
if cfg.Backends.LlamaCpp.Docker == nil {
cfg.Backends.LlamaCpp.Docker = &DockerSettings{}
}
cfg.Backends.LlamaCpp.Docker.Enabled = b
}
}
if llamaDockerImage := os.Getenv("LLAMACTL_LLAMACPP_DOCKER_IMAGE"); llamaDockerImage != "" {
if cfg.Backends.LlamaCpp.Docker == nil {
cfg.Backends.LlamaCpp.Docker = &DockerSettings{}
}
cfg.Backends.LlamaCpp.Docker.Image = llamaDockerImage
}
if llamaDockerArgs := os.Getenv("LLAMACTL_LLAMACPP_DOCKER_ARGS"); llamaDockerArgs != "" {
if cfg.Backends.LlamaCpp.Docker == nil {
cfg.Backends.LlamaCpp.Docker = &DockerSettings{}
}
cfg.Backends.LlamaCpp.Docker.Args = strings.Split(llamaDockerArgs, " ")
}
if llamaDockerEnv := os.Getenv("LLAMACTL_LLAMACPP_DOCKER_ENV"); llamaDockerEnv != "" {
if cfg.Backends.LlamaCpp.Docker == nil {
cfg.Backends.LlamaCpp.Docker = &DockerSettings{}
}
if cfg.Backends.LlamaCpp.Docker.Environment == nil {
cfg.Backends.LlamaCpp.Docker.Environment = make(map[string]string)
}
parseEnvVars(llamaDockerEnv, cfg.Backends.LlamaCpp.Docker.Environment)
}
if llamaEnv := os.Getenv("LLAMACTL_LLAMACPP_RESPONSE_HEADERS"); llamaEnv != "" {
if cfg.Backends.LlamaCpp.ResponseHeaders == nil {
cfg.Backends.LlamaCpp.ResponseHeaders = make(map[string]string)
}
parseHeaders(llamaEnv, cfg.Backends.LlamaCpp.ResponseHeaders)
}
// vLLM backend
if vllmCmd := os.Getenv("LLAMACTL_VLLM_COMMAND"); vllmCmd != "" {
cfg.Backends.VLLM.Command = vllmCmd
}
if vllmArgs := os.Getenv("LLAMACTL_VLLM_ARGS"); vllmArgs != "" {
cfg.Backends.VLLM.Args = strings.Split(vllmArgs, " ")
}
if vllmEnv := os.Getenv("LLAMACTL_VLLM_ENV"); vllmEnv != "" {
if cfg.Backends.VLLM.Environment == nil {
cfg.Backends.VLLM.Environment = make(map[string]string)
}
parseEnvVars(vllmEnv, cfg.Backends.VLLM.Environment)
}
if vllmDockerEnabled := os.Getenv("LLAMACTL_VLLM_DOCKER_ENABLED"); vllmDockerEnabled != "" {
if b, err := strconv.ParseBool(vllmDockerEnabled); err == nil {
if cfg.Backends.VLLM.Docker == nil {
cfg.Backends.VLLM.Docker = &DockerSettings{}
}
cfg.Backends.VLLM.Docker.Enabled = b
}
}
if vllmDockerImage := os.Getenv("LLAMACTL_VLLM_DOCKER_IMAGE"); vllmDockerImage != "" {
if cfg.Backends.VLLM.Docker == nil {
cfg.Backends.VLLM.Docker = &DockerSettings{}
}
cfg.Backends.VLLM.Docker.Image = vllmDockerImage
}
if vllmDockerArgs := os.Getenv("LLAMACTL_VLLM_DOCKER_ARGS"); vllmDockerArgs != "" {
if cfg.Backends.VLLM.Docker == nil {
cfg.Backends.VLLM.Docker = &DockerSettings{}
}
cfg.Backends.VLLM.Docker.Args = strings.Split(vllmDockerArgs, " ")
}
if vllmDockerEnv := os.Getenv("LLAMACTL_VLLM_DOCKER_ENV"); vllmDockerEnv != "" {
if cfg.Backends.VLLM.Docker == nil {
cfg.Backends.VLLM.Docker = &DockerSettings{}
}
if cfg.Backends.VLLM.Docker.Environment == nil {
cfg.Backends.VLLM.Docker.Environment = make(map[string]string)
}
parseEnvVars(vllmDockerEnv, cfg.Backends.VLLM.Docker.Environment)
}
if llamaEnv := os.Getenv("LLAMACTL_VLLM_RESPONSE_HEADERS"); llamaEnv != "" {
if cfg.Backends.VLLM.ResponseHeaders == nil {
cfg.Backends.VLLM.ResponseHeaders = make(map[string]string)
}
parseHeaders(llamaEnv, cfg.Backends.VLLM.ResponseHeaders)
}
// MLX backend
if mlxCmd := os.Getenv("LLAMACTL_MLX_COMMAND"); mlxCmd != "" {
cfg.Backends.MLX.Command = mlxCmd
}
if mlxArgs := os.Getenv("LLAMACTL_MLX_ARGS"); mlxArgs != "" {
cfg.Backends.MLX.Args = strings.Split(mlxArgs, " ")
}
if mlxEnv := os.Getenv("LLAMACTL_MLX_ENV"); mlxEnv != "" {
if cfg.Backends.MLX.Environment == nil {
cfg.Backends.MLX.Environment = make(map[string]string)
}
parseEnvVars(mlxEnv, cfg.Backends.MLX.Environment)
}
if llamaEnv := os.Getenv("LLAMACTL_MLX_RESPONSE_HEADERS"); llamaEnv != "" {
if cfg.Backends.MLX.ResponseHeaders == nil {
cfg.Backends.MLX.ResponseHeaders = make(map[string]string)
}
parseHeaders(llamaEnv, cfg.Backends.MLX.ResponseHeaders)
}
// Instance defaults
if autoRestart := os.Getenv("LLAMACTL_DEFAULT_AUTO_RESTART"); autoRestart != "" {
if b, err := strconv.ParseBool(autoRestart); err == nil {
cfg.Instances.DefaultAutoRestart = b
@@ -279,6 +483,11 @@ func loadEnvVars(cfg *AppConfig) {
if managementKeys := os.Getenv("LLAMACTL_MANAGEMENT_KEYS"); managementKeys != "" {
cfg.Auth.ManagementKeys = strings.Split(managementKeys, ",")
}
// Local node config
if localNode := os.Getenv("LLAMACTL_LOCAL_NODE"); localNode != "" {
cfg.LocalNode = localNode
}
}
// ParsePortRange parses port range from string formats like "8000-9000" or "8000,9000"
@@ -304,6 +513,32 @@ func ParsePortRange(s string) [2]int {
return [2]int{0, 0} // Invalid format
}
// parseEnvVars parses environment variables in format "KEY1=value1,KEY2=value2"
// and populates the provided environment map
func parseEnvVars(envString string, envMap map[string]string) {
if envString == "" {
return
}
for _, envPair := range strings.Split(envString, ",") {
if parts := strings.SplitN(strings.TrimSpace(envPair), "=", 2); len(parts) == 2 {
envMap[parts[0]] = parts[1]
}
}
}
// parseHeaders parses HTTP headers in format "KEY1=value1;KEY2=value2"
// and populates the provided environment map
func parseHeaders(envString string, envMap map[string]string) {
if envString == "" {
return
}
for _, envPair := range strings.Split(envString, ";") {
if parts := strings.SplitN(strings.TrimSpace(envPair), "=", 2); len(parts) == 2 {
envMap[parts[0]] = parts[1]
}
}
}
// getDefaultDataDirectory returns platform-specific default data directory
func getDefaultDataDirectory() string {
switch runtime.GOOS {
@@ -336,6 +571,10 @@ func getDefaultDataDirectory() string {
// getDefaultConfigLocations returns platform-specific config file locations
func getDefaultConfigLocations() []string {
var locations []string
// Use ./llamactl.yaml and ./config.yaml as the default config file
locations = append(locations, "llamactl.yaml")
locations = append(locations, "config.yaml")
homeDir, _ := os.UserHomeDir()
switch runtime.GOOS {

View File

@@ -7,6 +7,20 @@ import (
"testing"
)
// GetBackendSettings resolves backend settings
func getBackendSettings(bc *config.BackendConfig, backendType string) config.BackendSettings {
switch backendType {
case "llama-cpp":
return bc.LlamaCpp
case "vllm":
return bc.VLLM
case "mlx":
return bc.MLX
default:
return config.BackendSettings{}
}
}
func TestLoadConfig_Defaults(t *testing.T) {
// Test loading config when no file exists and no env vars set
cfg, err := config.LoadConfig("nonexistent-file.yaml")
@@ -42,9 +56,6 @@ func TestLoadConfig_Defaults(t *testing.T) {
if cfg.Instances.MaxInstances != -1 {
t.Errorf("Expected default max instances -1, got %d", cfg.Instances.MaxInstances)
}
if cfg.Instances.LlamaExecutable != "llama-server" {
t.Errorf("Expected default executable 'llama-server', got %q", cfg.Instances.LlamaExecutable)
}
if !cfg.Instances.DefaultAutoRestart {
t.Error("Expected default auto restart to be true")
}
@@ -101,9 +112,6 @@ instances:
if cfg.Instances.MaxInstances != 5 {
t.Errorf("Expected max instances 5, got %d", cfg.Instances.MaxInstances)
}
if cfg.Instances.LlamaExecutable != "/usr/bin/llama-server" {
t.Errorf("Expected executable '/usr/bin/llama-server', got %q", cfg.Instances.LlamaExecutable)
}
if cfg.Instances.DefaultAutoRestart {
t.Error("Expected auto restart to be false")
}
@@ -123,7 +131,6 @@ func TestLoadConfig_EnvironmentOverrides(t *testing.T) {
"LLAMACTL_INSTANCE_PORT_RANGE": "5000-6000",
"LLAMACTL_LOGS_DIR": "/env/logs",
"LLAMACTL_MAX_INSTANCES": "20",
"LLAMACTL_LLAMA_EXECUTABLE": "/env/llama-server",
"LLAMACTL_DEFAULT_AUTO_RESTART": "false",
"LLAMACTL_DEFAULT_MAX_RESTARTS": "7",
"LLAMACTL_DEFAULT_RESTART_DELAY": "15",
@@ -156,8 +163,8 @@ func TestLoadConfig_EnvironmentOverrides(t *testing.T) {
if cfg.Instances.MaxInstances != 20 {
t.Errorf("Expected max instances 20, got %d", cfg.Instances.MaxInstances)
}
if cfg.Instances.LlamaExecutable != "/env/llama-server" {
t.Errorf("Expected executable '/env/llama-server', got %q", cfg.Instances.LlamaExecutable)
if cfg.Backends.LlamaCpp.Command != "llama-server" {
t.Errorf("Expected default llama command 'llama-server', got %q", cfg.Backends.LlamaCpp.Command)
}
if cfg.Instances.DefaultAutoRestart {
t.Error("Expected auto restart to be false")
@@ -212,29 +219,6 @@ instances:
}
}
func TestLoadConfig_InvalidYAML(t *testing.T) {
// Create a temporary config file with invalid YAML
tempDir := t.TempDir()
configFile := filepath.Join(tempDir, "invalid-config.yaml")
invalidContent := `
server:
host: "localhost"
port: not-a-number
instances:
[invalid yaml structure
`
err := os.WriteFile(configFile, []byte(invalidContent), 0644)
if err != nil {
t.Fatalf("Failed to write test config file: %v", err)
}
_, err = config.LoadConfig(configFile)
if err == nil {
t.Error("Expected LoadConfig to return error for invalid YAML")
}
}
func TestParsePortRange(t *testing.T) {
tests := []struct {
@@ -264,94 +248,259 @@ func TestParsePortRange(t *testing.T) {
}
}
// Remove the getDefaultConfigLocations test entirely
func TestLoadConfig_EnvironmentVariableTypes(t *testing.T) {
// Test that environment variables are properly converted to correct types
testCases := []struct {
envVar string
envValue string
checkFn func(*config.AppConfig) bool
desc string
}{
{
envVar: "LLAMACTL_PORT",
envValue: "invalid-port",
checkFn: func(c *config.AppConfig) bool { return c.Server.Port == 8080 }, // Should keep default
desc: "invalid port number should keep default",
func TestGetBackendSettings_NewStructuredConfig(t *testing.T) {
bc := &config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "custom-llama",
Args: []string{"--verbose"},
Docker: &config.DockerSettings{
Enabled: true,
Image: "custom-llama:latest",
Args: []string{"--gpus", "all"},
Environment: map[string]string{"CUDA_VISIBLE_DEVICES": "1"},
},
},
{
envVar: "LLAMACTL_MAX_INSTANCES",
envValue: "not-a-number",
checkFn: func(c *config.AppConfig) bool { return c.Instances.MaxInstances == -1 }, // Should keep default
desc: "invalid max instances should keep default",
VLLM: config.BackendSettings{
Command: "custom-vllm",
Args: []string{"serve", "--debug"},
},
{
envVar: "LLAMACTL_DEFAULT_AUTO_RESTART",
envValue: "invalid-bool",
checkFn: func(c *config.AppConfig) bool { return c.Instances.DefaultAutoRestart == true }, // Should keep default
desc: "invalid boolean should keep default",
},
{
envVar: "LLAMACTL_INSTANCE_PORT_RANGE",
envValue: "invalid-range",
checkFn: func(c *config.AppConfig) bool { return c.Instances.PortRange == [2]int{8000, 9000} }, // Should keep default
desc: "invalid port range should keep default",
MLX: config.BackendSettings{
Command: "custom-mlx",
Args: []string{},
},
}
for _, tc := range testCases {
t.Run(tc.desc, func(t *testing.T) {
os.Setenv(tc.envVar, tc.envValue)
defer os.Unsetenv(tc.envVar)
// Test llama-cpp with Docker
settings := getBackendSettings(bc, "llama-cpp")
if settings.Command != "custom-llama" {
t.Errorf("Expected command 'custom-llama', got %q", settings.Command)
}
if len(settings.Args) != 1 || settings.Args[0] != "--verbose" {
t.Errorf("Expected args ['--verbose'], got %v", settings.Args)
}
if settings.Docker == nil || !settings.Docker.Enabled {
t.Error("Expected Docker to be enabled")
}
if settings.Docker.Image != "custom-llama:latest" {
t.Errorf("Expected Docker image 'custom-llama:latest', got %q", settings.Docker.Image)
}
cfg, err := config.LoadConfig("nonexistent-file.yaml")
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
// Test vLLM without Docker
settings = getBackendSettings(bc, "vllm")
if settings.Command != "custom-vllm" {
t.Errorf("Expected command 'custom-vllm', got %q", settings.Command)
}
if len(settings.Args) != 2 || settings.Args[0] != "serve" || settings.Args[1] != "--debug" {
t.Errorf("Expected args ['serve', '--debug'], got %v", settings.Args)
}
if settings.Docker != nil && settings.Docker.Enabled {
t.Error("Expected Docker to be disabled or nil")
}
if !tc.checkFn(&cfg) {
t.Errorf("Test failed: %s", tc.desc)
}
})
// Test MLX
settings = getBackendSettings(bc, "mlx")
if settings.Command != "custom-mlx" {
t.Errorf("Expected command 'custom-mlx', got %q", settings.Command)
}
}
func TestLoadConfig_PartialFile(t *testing.T) {
// Test that partial config files work correctly (missing sections should use defaults)
tempDir := t.TempDir()
configFile := filepath.Join(tempDir, "partial-config.yaml")
// Only specify server config, instances should use defaults
configContent := `
server:
host: "partial-host"
port: 7777
`
err := os.WriteFile(configFile, []byte(configContent), 0644)
if err != nil {
t.Fatalf("Failed to write test config file: %v", err)
func TestLoadConfig_BackendEnvironmentVariables(t *testing.T) {
// Test that backend environment variables work correctly
envVars := map[string]string{
"LLAMACTL_LLAMACPP_COMMAND": "env-llama",
"LLAMACTL_LLAMACPP_ARGS": "--verbose --threads 4",
"LLAMACTL_LLAMACPP_DOCKER_ENABLED": "true",
"LLAMACTL_LLAMACPP_DOCKER_IMAGE": "env-llama:latest",
"LLAMACTL_LLAMACPP_DOCKER_ARGS": "run --rm --network host --gpus all",
"LLAMACTL_LLAMACPP_DOCKER_ENV": "CUDA_VISIBLE_DEVICES=0,OMP_NUM_THREADS=4",
"LLAMACTL_VLLM_COMMAND": "env-vllm",
"LLAMACTL_VLLM_DOCKER_ENABLED": "false",
"LLAMACTL_VLLM_DOCKER_IMAGE": "env-vllm:latest",
"LLAMACTL_VLLM_DOCKER_ENV": "PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512,CUDA_VISIBLE_DEVICES=1",
"LLAMACTL_MLX_COMMAND": "env-mlx",
}
cfg, err := config.LoadConfig(configFile)
// Set env vars and ensure cleanup
for key, value := range envVars {
os.Setenv(key, value)
defer os.Unsetenv(key)
}
cfg, err := config.LoadConfig("nonexistent-file.yaml")
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
// Server config should be from file
if cfg.Server.Host != "partial-host" {
t.Errorf("Expected host 'partial-host', got %q", cfg.Server.Host)
// Verify llama-cpp environment overrides
if cfg.Backends.LlamaCpp.Command != "env-llama" {
t.Errorf("Expected llama command 'env-llama', got %q", cfg.Backends.LlamaCpp.Command)
}
if cfg.Server.Port != 7777 {
t.Errorf("Expected port 7777, got %d", cfg.Server.Port)
expectedArgs := []string{"--verbose", "--threads", "4"}
if len(cfg.Backends.LlamaCpp.Args) != len(expectedArgs) {
t.Errorf("Expected llama args %v, got %v", expectedArgs, cfg.Backends.LlamaCpp.Args)
}
if !cfg.Backends.LlamaCpp.Docker.Enabled {
t.Error("Expected llama Docker to be enabled")
}
if cfg.Backends.LlamaCpp.Docker.Image != "env-llama:latest" {
t.Errorf("Expected llama Docker image 'env-llama:latest', got %q", cfg.Backends.LlamaCpp.Docker.Image)
}
expectedDockerArgs := []string{"run", "--rm", "--network", "host", "--gpus", "all"}
if len(cfg.Backends.LlamaCpp.Docker.Args) != len(expectedDockerArgs) {
t.Errorf("Expected llama Docker args %v, got %v", expectedDockerArgs, cfg.Backends.LlamaCpp.Docker.Args)
}
if cfg.Backends.LlamaCpp.Docker.Environment["CUDA_VISIBLE_DEVICES"] != "0" {
t.Errorf("Expected CUDA_VISIBLE_DEVICES=0, got %q", cfg.Backends.LlamaCpp.Docker.Environment["CUDA_VISIBLE_DEVICES"])
}
if cfg.Backends.LlamaCpp.Docker.Environment["OMP_NUM_THREADS"] != "4" {
t.Errorf("Expected OMP_NUM_THREADS=4, got %q", cfg.Backends.LlamaCpp.Docker.Environment["OMP_NUM_THREADS"])
}
// Instances config should be defaults
if cfg.Instances.PortRange != [2]int{8000, 9000} {
t.Errorf("Expected default port range [8000, 9000], got %v", cfg.Instances.PortRange)
// Verify vLLM environment overrides
if cfg.Backends.VLLM.Command != "env-vllm" {
t.Errorf("Expected vLLM command 'env-vllm', got %q", cfg.Backends.VLLM.Command)
}
if cfg.Instances.MaxInstances != -1 {
t.Errorf("Expected default max instances -1, got %d", cfg.Instances.MaxInstances)
if cfg.Backends.VLLM.Docker.Enabled {
t.Error("Expected vLLM Docker to be disabled")
}
if cfg.Backends.VLLM.Docker.Environment["PYTORCH_CUDA_ALLOC_CONF"] != "max_split_size_mb:512" {
t.Errorf("Expected PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512, got %q", cfg.Backends.VLLM.Docker.Environment["PYTORCH_CUDA_ALLOC_CONF"])
}
// Verify MLX environment overrides
if cfg.Backends.MLX.Command != "env-mlx" {
t.Errorf("Expected MLX command 'env-mlx', got %q", cfg.Backends.MLX.Command)
}
}
func TestLoadConfig_LocalNode(t *testing.T) {
t.Run("default local node", func(t *testing.T) {
cfg, err := config.LoadConfig("nonexistent-file.yaml")
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
if cfg.LocalNode != "main" {
t.Errorf("Expected default local node 'main', got %q", cfg.LocalNode)
}
})
t.Run("local node from file", func(t *testing.T) {
tempDir := t.TempDir()
configFile := filepath.Join(tempDir, "test-config.yaml")
configContent := `
local_node: "worker1"
nodes:
worker1:
address: ""
worker2:
address: "http://192.168.1.10:8080"
api_key: "test-key"
`
err := os.WriteFile(configFile, []byte(configContent), 0644)
if err != nil {
t.Fatalf("Failed to write test config file: %v", err)
}
cfg, err := config.LoadConfig(configFile)
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
if cfg.LocalNode != "worker1" {
t.Errorf("Expected local node 'worker1', got %q", cfg.LocalNode)
}
// Verify nodes map (includes default "main" + worker1 + worker2)
if len(cfg.Nodes) != 2 {
t.Errorf("Expected 2 nodes (default worker1 + worker2), got %d", len(cfg.Nodes))
}
// Verify local node exists and is empty
localNode, exists := cfg.Nodes["worker1"]
if !exists {
t.Error("Expected local node 'worker1' to exist in nodes map")
}
if localNode.Address != "" {
t.Errorf("Expected local node address to be empty, got %q", localNode.Address)
}
if localNode.APIKey != "" {
t.Errorf("Expected local node api_key to be empty, got %q", localNode.APIKey)
}
// Verify remote node
remoteNode, exists := cfg.Nodes["worker2"]
if !exists {
t.Error("Expected remote node 'worker2' to exist in nodes map")
}
if remoteNode.Address != "http://192.168.1.10:8080" {
t.Errorf("Expected remote node address 'http://192.168.1.10:8080', got %q", remoteNode.Address)
}
// Verify default main node still exists
_, exists = cfg.Nodes["main"]
if exists {
t.Error("Default 'main' node should not exist when local_node is overridden")
}
})
t.Run("custom local node name in config", func(t *testing.T) {
tempDir := t.TempDir()
configFile := filepath.Join(tempDir, "test-config.yaml")
configContent := `
local_node: "primary"
nodes:
primary:
address: ""
worker1:
address: "http://192.168.1.10:8080"
`
err := os.WriteFile(configFile, []byte(configContent), 0644)
if err != nil {
t.Fatalf("Failed to write test config file: %v", err)
}
cfg, err := config.LoadConfig(configFile)
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
if cfg.LocalNode != "primary" {
t.Errorf("Expected local node 'primary', got %q", cfg.LocalNode)
}
// Verify nodes map includes default "main" + primary + worker1
if len(cfg.Nodes) != 2 {
t.Errorf("Expected 2 nodes (primary + worker1), got %d", len(cfg.Nodes))
}
localNode, exists := cfg.Nodes["primary"]
if !exists {
t.Error("Expected local node 'primary' to exist in nodes map")
}
if localNode.Address != "" {
t.Errorf("Expected local node address to be empty, got %q", localNode.Address)
}
})
t.Run("local node from environment variable", func(t *testing.T) {
os.Setenv("LLAMACTL_LOCAL_NODE", "custom-node")
defer os.Unsetenv("LLAMACTL_LOCAL_NODE")
cfg, err := config.LoadConfig("nonexistent-file.yaml")
if err != nil {
t.Fatalf("LoadConfig failed: %v", err)
}
if cfg.LocalNode != "custom-node" {
t.Errorf("Expected local node 'custom-node' from env var, got %q", cfg.LocalNode)
}
})
}

View File

@@ -1,223 +1,316 @@
package instance
import (
"context"
"encoding/json"
"fmt"
"io"
"llamactl/pkg/backends"
"llamactl/pkg/config"
"log"
"net/http"
"net/http/httputil"
"net/url"
"os/exec"
"sync"
"sync/atomic"
"time"
)
// TimeProvider interface allows for testing with mock time
type TimeProvider interface {
Now() time.Time
// Instance represents a running instance of the llama server
type Instance struct {
Name string `json:"name"`
Created int64 `json:"created,omitempty"` // Unix timestamp when the instance was created
// Global configuration
globalInstanceSettings *config.InstancesConfig
globalBackendSettings *config.BackendConfig
globalNodesConfig map[string]config.NodeConfig
localNodeName string `json:"-"` // Name of the local node for remote detection
status *status `json:"-"`
options *options `json:"-"`
// Components (can be nil for remote instances)
process *process `json:"-"`
proxy *proxy `json:"-"`
logger *logger `json:"-"`
}
// realTimeProvider implements TimeProvider using the actual time
type realTimeProvider struct{}
// New creates a new instance with the given name, log path, options and local node name
func New(name string, globalConfig *config.AppConfig, opts *Options, onStatusChange func(oldStatus, newStatus Status)) *Instance {
func (realTimeProvider) Now() time.Time {
return time.Now()
}
globalInstanceSettings := &globalConfig.Instances
globalBackendSettings := &globalConfig.Backends
globalNodesConfig := globalConfig.Nodes
localNodeName := globalConfig.LocalNode
// Process represents a running instance of the llama server
type Process struct {
Name string `json:"name"`
options *CreateInstanceOptions `json:"-"`
globalSettings *config.InstancesConfig
// Status
Status InstanceStatus `json:"status"`
onStatusChange func(oldStatus, newStatus InstanceStatus)
// Creation time
Created int64 `json:"created,omitempty"` // Unix timestamp when the instance was created
// Logging file
logger *InstanceLogger `json:"-"`
// internal
cmd *exec.Cmd `json:"-"` // Command to run the instance
ctx context.Context `json:"-"` // Context for managing the instance lifecycle
cancel context.CancelFunc `json:"-"` // Function to cancel the context
stdout io.ReadCloser `json:"-"` // Standard output stream
stderr io.ReadCloser `json:"-"` // Standard error stream
mu sync.RWMutex `json:"-"` // RWMutex for better read/write separation
restarts int `json:"-"` // Number of restarts
proxy *httputil.ReverseProxy `json:"-"` // Reverse proxy for this instance
// Restart control
restartCancel context.CancelFunc `json:"-"` // Cancel function for pending restarts
monitorDone chan struct{} `json:"-"` // Channel to signal monitor goroutine completion
// Timeout management
lastRequestTime atomic.Int64 // Unix timestamp of last request
timeProvider TimeProvider `json:"-"` // Time provider for testing
}
// NewInstance creates a new instance with the given name, log path, and options
func NewInstance(name string, globalSettings *config.InstancesConfig, options *CreateInstanceOptions, onStatusChange func(oldStatus, newStatus InstanceStatus)) *Process {
// Validate and copy options
options.ValidateAndApplyDefaults(name, globalSettings)
opts.validateAndApplyDefaults(name, globalInstanceSettings)
// Create the instance logger
logger := NewInstanceLogger(name, globalSettings.LogsDir)
// Create status wrapper
status := newStatus(Stopped)
status.onStatusChange = onStatusChange
return &Process{
Name: name,
options: options,
globalSettings: globalSettings,
logger: logger,
timeProvider: realTimeProvider{},
Created: time.Now().Unix(),
Status: Stopped,
onStatusChange: onStatusChange,
// Create options wrapper
options := newOptions(opts)
instance := &Instance{
Name: name,
options: options,
globalInstanceSettings: globalInstanceSettings,
globalBackendSettings: globalBackendSettings,
globalNodesConfig: globalNodesConfig,
localNodeName: localNodeName,
Created: time.Now().Unix(),
status: status,
}
var err error
instance.proxy, err = newProxy(instance)
if err != nil {
log.Println("Warning: Failed to create proxy for instance", instance.Name, "-", err)
}
// Only create logger, proxy, and process for local instances
if !instance.IsRemote() {
instance.logger = newLogger(name, globalInstanceSettings.LogsDir)
instance.process = newProcess(instance)
}
return instance
}
// Start starts the instance
func (i *Instance) Start() error {
if i.process == nil {
return fmt.Errorf("instance %s has no process component (remote instances cannot be started locally)", i.Name)
}
return i.process.start()
}
// Stop stops the instance
func (i *Instance) Stop() error {
if i.process == nil {
return fmt.Errorf("instance %s has no process component (remote instances cannot be stopped locally)", i.Name)
}
return i.process.stop()
}
// Restart restarts the instance
func (i *Instance) Restart() error {
if i.process == nil {
return fmt.Errorf("instance %s has no process component (remote instances cannot be restarted locally)", i.Name)
}
return i.process.restart()
}
// WaitForHealthy waits for the instance to become healthy
func (i *Instance) WaitForHealthy(timeout int) error {
if i.process == nil {
return fmt.Errorf("instance %s has no process component (remote instances cannot be health checked locally)", i.Name)
}
return i.process.waitForHealthy(timeout)
}
// GetOptions returns the current options
func (i *Instance) GetOptions() *Options {
if i.options == nil {
return nil
}
return i.options.get()
}
// GetStatus returns the current status
func (i *Instance) GetStatus() Status {
if i.status == nil {
return Stopped
}
return i.status.get()
}
// SetStatus sets the status
func (i *Instance) SetStatus(s Status) {
if i.status != nil {
i.status.set(s)
}
}
func (i *Process) GetOptions() *CreateInstanceOptions {
i.mu.RLock()
defer i.mu.RUnlock()
return i.options
}
func (i *Process) GetPort() int {
i.mu.RLock()
defer i.mu.RUnlock()
if i.options != nil {
switch i.options.BackendType {
case backends.BackendTypeLlamaCpp:
return i.options.LlamaServerOptions.Port
}
// IsRunning returns true if the status is Running
func (i *Instance) IsRunning() bool {
if i.status == nil {
return false
}
return 0
return i.status.isRunning()
}
func (i *Process) GetHost() string {
i.mu.RLock()
defer i.mu.RUnlock()
if i.options != nil {
switch i.options.BackendType {
case backends.BackendTypeLlamaCpp:
return i.options.LlamaServerOptions.Host
}
}
return ""
}
func (i *Process) SetOptions(options *CreateInstanceOptions) {
i.mu.Lock()
defer i.mu.Unlock()
if options == nil {
// SetOptions sets the options
func (i *Instance) SetOptions(opts *Options) {
if opts == nil {
log.Println("Warning: Attempted to set nil options on instance", i.Name)
return
}
// Validate and copy options
options.ValidateAndApplyDefaults(i.Name, i.globalSettings)
// Preserve the original nodes to prevent changing instance location
if i.options != nil && i.options.get() != nil {
opts.Nodes = i.options.get().Nodes
}
// Validate and copy options
opts.validateAndApplyDefaults(i.Name, i.globalInstanceSettings)
if i.options != nil {
i.options.set(opts)
}
i.options = options
// Clear the proxy so it gets recreated with new options
i.proxy = nil
if i.proxy != nil {
i.proxy.clear()
}
}
// SetTimeProvider sets a custom time provider for testing
func (i *Process) SetTimeProvider(tp TimeProvider) {
i.timeProvider = tp
func (i *Instance) SetTimeProvider(tp TimeProvider) {
if i.proxy != nil {
i.proxy.setTimeProvider(tp)
}
}
// GetProxy returns the reverse proxy for this instance, creating it if needed
func (i *Process) GetProxy() (*httputil.ReverseProxy, error) {
i.mu.Lock()
defer i.mu.Unlock()
if i.proxy != nil {
return i.proxy, nil
}
func (i *Instance) GetHost() string {
if i.options == nil {
return nil, fmt.Errorf("instance %s has no options set", i.Name)
return "localhost"
}
return i.options.GetHost()
}
func (i *Instance) GetPort() int {
if i.options == nil {
return 0
}
return i.options.GetPort()
}
// GetProxy returns the reverse proxy for this instance
func (i *Instance) GetProxy() (*httputil.ReverseProxy, error) {
if i.proxy == nil {
return nil, fmt.Errorf("instance %s has no proxy component", i.Name)
}
var host string
var port int
switch i.options.BackendType {
case backends.BackendTypeLlamaCpp:
host = i.options.LlamaServerOptions.Host
port = i.options.LlamaServerOptions.Port
return i.proxy.get()
}
func (i *Instance) IsRemote() bool {
opts := i.GetOptions()
if opts == nil {
return false
}
targetURL, err := url.Parse(fmt.Sprintf("http://%s:%d", host, port))
if err != nil {
return nil, fmt.Errorf("failed to parse target URL for instance %s: %w", i.Name, err)
// If no nodes specified, it's a local instance
if len(opts.Nodes) == 0 {
return false
}
proxy := httputil.NewSingleHostReverseProxy(targetURL)
// If the local node is in the nodes map, treat it as a local instance
if _, isLocal := opts.Nodes[i.localNodeName]; isLocal {
return false
}
proxy.ModifyResponse = func(resp *http.Response) error {
// Remove CORS headers from llama-server response to avoid conflicts
// llamactl will add its own CORS headers
resp.Header.Del("Access-Control-Allow-Origin")
resp.Header.Del("Access-Control-Allow-Methods")
resp.Header.Del("Access-Control-Allow-Headers")
resp.Header.Del("Access-Control-Allow-Credentials")
resp.Header.Del("Access-Control-Max-Age")
resp.Header.Del("Access-Control-Expose-Headers")
// Otherwise, it's a remote instance
return true
}
// GetLogs retrieves the last n lines of logs from the instance
func (i *Instance) GetLogs(num_lines int) (string, error) {
if i.logger == nil {
return "", fmt.Errorf("instance %s has no logger (remote instances don't have logs)", i.Name)
}
return i.logger.getLogs(num_lines)
}
// LastRequestTime returns the last request time as a Unix timestamp
func (i *Instance) LastRequestTime() int64 {
if i.proxy == nil {
return 0
}
return i.proxy.getLastRequestTime()
}
// UpdateLastRequestTime updates the last request access time for the instance via proxy
func (i *Instance) UpdateLastRequestTime() {
if i.proxy != nil {
i.proxy.updateLastRequestTime()
}
}
// ShouldTimeout checks if the instance should timeout based on idle time
func (i *Instance) ShouldTimeout() bool {
if i.proxy == nil {
return false
}
return i.proxy.shouldTimeout()
}
func (i *Instance) getCommand() string {
opts := i.GetOptions()
if opts == nil {
return ""
}
return opts.BackendOptions.GetCommand(i.globalBackendSettings)
}
func (i *Instance) buildCommandArgs() []string {
opts := i.GetOptions()
if opts == nil {
return nil
}
i.proxy = proxy
return opts.BackendOptions.BuildCommandArgs(i.globalBackendSettings)
}
return i.proxy, nil
func (i *Instance) buildEnvironment() map[string]string {
opts := i.GetOptions()
if opts == nil {
return nil
}
return opts.BackendOptions.BuildEnvironment(i.globalBackendSettings, opts.Environment)
}
// MarshalJSON implements json.Marshaler for Instance
func (i *Process) MarshalJSON() ([]byte, error) {
// Use read lock since we're only reading data
i.mu.RLock()
defer i.mu.RUnlock()
func (i *Instance) MarshalJSON() ([]byte, error) {
// Get options
opts := i.GetOptions()
// Determine if docker is enabled for this instance's backend
dockerEnabled := opts.BackendOptions.IsDockerEnabled(i.globalBackendSettings)
// Use anonymous struct to avoid recursion
type Alias Process
return json.Marshal(&struct {
*Alias
Options *CreateInstanceOptions `json:"options,omitempty"`
Name string `json:"name"`
Status *status `json:"status"`
Created int64 `json:"created,omitempty"`
Options *options `json:"options,omitempty"`
DockerEnabled bool `json:"docker_enabled,omitempty"`
}{
Alias: (*Alias)(i),
Options: i.options,
Name: i.Name,
Status: i.status,
Created: i.Created,
Options: i.options,
DockerEnabled: dockerEnabled,
})
}
// UnmarshalJSON implements json.Unmarshaler for Instance
func (i *Process) UnmarshalJSON(data []byte) error {
// Use anonymous struct to avoid recursion
type Alias Process
func (i *Instance) UnmarshalJSON(data []byte) error {
// Explicitly deserialize to match MarshalJSON format
aux := &struct {
*Alias
Options *CreateInstanceOptions `json:"options,omitempty"`
}{
Alias: (*Alias)(i),
}
Name string `json:"name"`
Status *status `json:"status"`
Created int64 `json:"created,omitempty"`
Options *options `json:"options,omitempty"`
}{}
if err := json.Unmarshal(data, aux); err != nil {
return err
}
// Handle options with validation and defaults
if aux.Options != nil {
aux.Options.ValidateAndApplyDefaults(i.Name, i.globalSettings)
i.options = aux.Options
}
// Set the fields
i.Name = aux.Name
i.Created = aux.Created
i.status = aux.Status
i.options = aux.Options
return nil
}

View File

@@ -3,33 +3,53 @@ package instance_test
import (
"encoding/json"
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/testutil"
"testing"
"time"
)
func TestNewInstance(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "llama-server",
Args: []string{},
},
MLX: config.BackendSettings{
Command: "mlx_lm.server",
Args: []string{},
},
VLLM: config.BackendSettings{
Command: "vllm",
Args: []string{"serve"},
},
},
Instances: config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
mockOnStatusChange := func(oldStatus, newStatus instance.Status) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
inst := instance.New("test-instance", globalConfig, options, mockOnStatusChange)
if inst.Name != "test-instance" {
t.Errorf("Expected name 'test-instance', got %q", inst.Name)
@@ -40,8 +60,8 @@ func TestNewInstance(t *testing.T) {
// Check that options were properly set with defaults applied
opts := inst.GetOptions()
if opts.LlamaServerOptions.Model != "/path/to/model.gguf" {
t.Errorf("Expected model '/path/to/model.gguf', got %q", opts.LlamaServerOptions.Model)
if opts.BackendOptions.LlamaServerOptions.Model != "/path/to/model.gguf" {
t.Errorf("Expected model '/path/to/model.gguf', got %q", opts.BackendOptions.LlamaServerOptions.Model)
}
if inst.GetPort() != 8080 {
t.Errorf("Expected port 8080, got %d", inst.GetPort())
@@ -57,84 +77,89 @@ func TestNewInstance(t *testing.T) {
if opts.RestartDelay == nil || *opts.RestartDelay != 5 {
t.Errorf("Expected RestartDelay to be 5 (default), got %v", opts.RestartDelay)
}
}
func TestNewInstance_WithRestartOptions(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
}
// Override some defaults
// Test that explicit values override defaults
autoRestart := false
maxRestarts := 10
restartDelay := 15
options := &instance.CreateInstanceOptions{
AutoRestart: &autoRestart,
MaxRestarts: &maxRestarts,
RestartDelay: &restartDelay,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
optionsWithOverrides := &instance.Options{
AutoRestart: &autoRestart,
MaxRestarts: &maxRestarts,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst2 := instance.New("test-override", globalConfig, optionsWithOverrides, mockOnStatusChange)
opts2 := inst2.GetOptions()
instance := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
opts := instance.GetOptions()
// Check that explicit values override defaults
if opts.AutoRestart == nil || *opts.AutoRestart {
if opts2.AutoRestart == nil || *opts2.AutoRestart {
t.Error("Expected AutoRestart to be false (overridden)")
}
if opts.MaxRestarts == nil || *opts.MaxRestarts != 10 {
t.Errorf("Expected MaxRestarts to be 10 (overridden), got %v", opts.MaxRestarts)
}
if opts.RestartDelay == nil || *opts.RestartDelay != 15 {
t.Errorf("Expected RestartDelay to be 15 (overridden), got %v", opts.RestartDelay)
if opts2.MaxRestarts == nil || *opts2.MaxRestarts != 10 {
t.Errorf("Expected MaxRestarts to be 10 (overridden), got %v", opts2.MaxRestarts)
}
}
func TestSetOptions(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "llama-server",
Args: []string{},
},
MLX: config.BackendSettings{
Command: "mlx_lm.server",
Args: []string{},
},
VLLM: config.BackendSettings{
Command: "vllm",
Args: []string{"serve"},
},
},
Instances: config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
initialOptions := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
initialOptions := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
mockOnStatusChange := func(oldStatus, newStatus instance.Status) {}
inst := instance.NewInstance("test-instance", globalSettings, initialOptions, mockOnStatusChange)
inst := instance.New("test-instance", globalConfig, initialOptions, mockOnStatusChange)
// Update options
newOptions := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/new-model.gguf",
Port: 8081,
newOptions := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/new-model.gguf",
Port: 8081,
},
},
}
inst.SetOptions(newOptions)
opts := inst.GetOptions()
if opts.LlamaServerOptions.Model != "/path/to/new-model.gguf" {
t.Errorf("Expected updated model '/path/to/new-model.gguf', got %q", opts.LlamaServerOptions.Model)
if opts.BackendOptions.LlamaServerOptions.Model != "/path/to/new-model.gguf" {
t.Errorf("Expected updated model '/path/to/new-model.gguf', got %q", opts.BackendOptions.LlamaServerOptions.Model)
}
if inst.GetPort() != 8081 {
t.Errorf("Expected updated port 8081, got %d", inst.GetPort())
@@ -147,22 +172,43 @@ func TestSetOptions(t *testing.T) {
}
func TestGetProxy(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "llama-server",
Args: []string{},
},
MLX: config.BackendSettings{
Command: "mlx_lm.server",
Args: []string{},
},
VLLM: config.BackendSettings{
Command: "vllm",
Args: []string{"serve"},
},
},
Instances: config.InstancesConfig{
LogsDir: "/tmp/test",
},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Host: "localhost",
Port: 8080,
options := &instance.Options{
Nodes: map[string]struct{}{"main": {}},
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Host: "localhost",
Port: 8080,
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
mockOnStatusChange := func(oldStatus, newStatus instance.Status) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
inst := instance.New("test-instance", globalConfig, options, mockOnStatusChange)
// Get proxy for the first time
proxy1, err := inst.GetProxy()
@@ -184,35 +230,34 @@ func TestGetProxy(t *testing.T) {
}
func TestMarshalJSON(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.New("test-instance", globalConfig, options, nil)
instance := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
data, err := json.Marshal(instance)
data, err := json.Marshal(inst)
if err != nil {
t.Fatalf("JSON marshal failed: %v", err)
}
// Check that JSON contains expected fields
// Verify by unmarshaling and checking key fields
var result map[string]any
err = json.Unmarshal(data, &result)
if err != nil {
if err := json.Unmarshal(data, &result); err != nil {
t.Fatalf("JSON unmarshal failed: %v", err)
}
@@ -222,37 +267,9 @@ func TestMarshalJSON(t *testing.T) {
if result["status"] != "stopped" {
t.Errorf("Expected status 'stopped', got %v", result["status"])
}
// Check that options are included
options_data, ok := result["options"]
if !ok {
if result["options"] == nil {
t.Error("Expected options to be included in JSON")
}
options_map, ok := options_data.(map[string]interface{})
if !ok {
t.Error("Expected options to be a map")
}
// Check backend type
if options_map["backend_type"] != string(backends.BackendTypeLlamaCpp) {
t.Errorf("Expected backend_type '%s', got %v", backends.BackendTypeLlamaCpp, options_map["backend_type"])
}
// Check backend options
backend_options_data, ok := options_map["backend_options"]
if !ok {
t.Error("Expected backend_options to be included in JSON")
}
backend_options_map, ok := backend_options_data.(map[string]any)
if !ok {
t.Error("Expected backend_options to be a map")
}
if backend_options_map["model"] != "/path/to/model.gguf" {
t.Errorf("Expected model '/path/to/model.gguf', got %v", backend_options_map["model"])
}
if backend_options_map["port"] != float64(8080) {
t.Errorf("Expected port 8080, got %v", backend_options_map["port"])
}
}
func TestUnmarshalJSON(t *testing.T) {
@@ -270,7 +287,7 @@ func TestUnmarshalJSON(t *testing.T) {
}
}`
var inst instance.Process
var inst instance.Instance
err := json.Unmarshal([]byte(jsonData), &inst)
if err != nil {
t.Fatalf("JSON unmarshal failed: %v", err)
@@ -287,14 +304,14 @@ func TestUnmarshalJSON(t *testing.T) {
if opts == nil {
t.Fatal("Expected options to be set")
}
if opts.BackendType != backends.BackendTypeLlamaCpp {
t.Errorf("Expected backend_type '%s', got %s", backends.BackendTypeLlamaCpp, opts.BackendType)
if opts.BackendOptions.BackendType != backends.BackendTypeLlamaCpp {
t.Errorf("Expected backend_type '%s', got %s", backends.BackendTypeLlamaCpp, opts.BackendOptions.BackendType)
}
if opts.LlamaServerOptions == nil {
if opts.BackendOptions.LlamaServerOptions == nil {
t.Fatal("Expected LlamaServerOptions to be set")
}
if opts.LlamaServerOptions.Model != "/path/to/model.gguf" {
t.Errorf("Expected model '/path/to/model.gguf', got %q", opts.LlamaServerOptions.Model)
if opts.BackendOptions.LlamaServerOptions.Model != "/path/to/model.gguf" {
t.Errorf("Expected model '/path/to/model.gguf', got %q", opts.BackendOptions.LlamaServerOptions.Model)
}
if inst.GetPort() != 8080 {
t.Errorf("Expected port 8080, got %d", inst.GetPort())
@@ -307,7 +324,7 @@ func TestUnmarshalJSON(t *testing.T) {
}
}
func TestCreateInstanceOptionsValidation(t *testing.T) {
func TestCreateOptionsValidation(t *testing.T) {
tests := []struct {
name string
maxRestarts *int
@@ -338,25 +355,45 @@ func TestCreateInstanceOptionsValidation(t *testing.T) {
},
}
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "llama-server",
Args: []string{},
},
MLX: config.BackendSettings{
Command: "mlx_lm.server",
Args: []string{},
},
VLLM: config.BackendSettings{
Command: "vllm",
Args: []string{"serve"},
},
},
Instances: config.InstancesConfig{
LogsDir: "/tmp/test",
},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
options := &instance.CreateInstanceOptions{
options := &instance.Options{
MaxRestarts: tt.maxRestarts,
RestartDelay: tt.restartDelay,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
mockOnStatusChange := func(oldStatus, newStatus instance.Status) {}
instance := instance.NewInstance("test", globalSettings, options, mockOnStatusChange)
instance := instance.New("test", globalConfig, options, mockOnStatusChange)
opts := instance.GetOptions()
if opts.MaxRestarts == nil {
@@ -373,3 +410,300 @@ func TestCreateInstanceOptionsValidation(t *testing.T) {
})
}
}
func TestStatusChangeCallback(t *testing.T) {
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
var callbackOldStatus, callbackNewStatus instance.Status
callbackCalled := false
onStatusChange := func(oldStatus, newStatus instance.Status) {
callbackOldStatus = oldStatus
callbackNewStatus = newStatus
callbackCalled = true
}
inst := instance.New("test", globalConfig, options, onStatusChange)
inst.SetStatus(instance.Running)
if !callbackCalled {
t.Error("Expected status change callback to be called")
}
if callbackOldStatus != instance.Stopped {
t.Errorf("Expected old status Stopped, got %v", callbackOldStatus)
}
if callbackNewStatus != instance.Running {
t.Errorf("Expected new status Running, got %v", callbackNewStatus)
}
}
func TestSetOptions_NodesPreserved(t *testing.T) {
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
tests := []struct {
name string
initialNodes map[string]struct{}
updateNodes map[string]struct{}
expectedNodes map[string]struct{}
}{
{
name: "nil nodes preserved as nil",
initialNodes: nil,
updateNodes: map[string]struct{}{"worker1": {}},
expectedNodes: nil,
},
{
name: "empty nodes preserved as empty",
initialNodes: map[string]struct{}{},
updateNodes: map[string]struct{}{"worker1": {}},
expectedNodes: map[string]struct{}{},
},
{
name: "existing nodes preserved",
initialNodes: map[string]struct{}{"worker1": {}, "worker2": {}},
updateNodes: map[string]struct{}{"worker3": {}},
expectedNodes: map[string]struct{}{"worker1": {}, "worker2": {}},
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
options := &instance.Options{
Nodes: tt.initialNodes,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
inst := instance.New("test", globalConfig, options, nil)
// Attempt to update nodes (should be ignored)
updateOptions := &instance.Options{
Nodes: tt.updateNodes,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/new-model.gguf",
},
},
}
inst.SetOptions(updateOptions)
opts := inst.GetOptions()
// Verify nodes are preserved
if len(opts.Nodes) != len(tt.expectedNodes) {
t.Errorf("Expected %d nodes, got %d", len(tt.expectedNodes), len(opts.Nodes))
}
for node := range tt.expectedNodes {
if _, exists := opts.Nodes[node]; !exists {
t.Errorf("Expected node %s to exist", node)
}
}
// Verify other options were updated
if opts.BackendOptions.LlamaServerOptions.Model != "/path/to/new-model.gguf" {
t.Errorf("Expected model to be updated to '/path/to/new-model.gguf', got %q", opts.BackendOptions.LlamaServerOptions.Model)
}
})
}
}
func TestProcessErrorCases(t *testing.T) {
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
inst := instance.New("test", globalConfig, options, nil)
// Stop when not running should return error
err := inst.Stop()
if err == nil {
t.Error("Expected error when stopping non-running instance")
}
// Simulate running state
inst.SetStatus(instance.Running)
// Start when already running should return error
err = inst.Start()
if err == nil {
t.Error("Expected error when starting already running instance")
}
}
func TestRemoteInstanceOperations(t *testing.T) {
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{
"remote-node": {Address: "http://remote-node:8080"},
},
LocalNode: "main",
}
options := &instance.Options{
Nodes: map[string]struct{}{"remote-node": {}}, // Remote instance
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
inst := instance.New("remote-test", globalConfig, options, nil)
if !inst.IsRemote() {
t.Error("Expected instance to be remote")
}
// Start should fail for remote instance
if err := inst.Start(); err == nil {
t.Error("Expected error when starting remote instance")
}
// Stop should fail for remote instance
if err := inst.Stop(); err == nil {
t.Error("Expected error when stopping remote instance")
}
// Restart should fail for remote instance
if err := inst.Restart(); err == nil {
t.Error("Expected error when restarting remote instance")
}
// GetProxy should fail for remote instance
if _, err := inst.GetProxy(); err != nil {
t.Error("Expected no error when getting proxy for remote instance")
}
// GetLogs should fail for remote instance
if _, err := inst.GetLogs(10); err == nil {
t.Error("Expected error when getting logs for remote instance")
}
}
func TestIdleTimeout(t *testing.T) {
globalConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{Command: "llama-server"},
},
Instances: config.InstancesConfig{LogsDir: "/tmp/test"},
Nodes: map[string]config.NodeConfig{},
LocalNode: "main",
}
t.Run("not running never times out", func(t *testing.T) {
timeout := 1
inst := instance.New("test", globalConfig, &instance.Options{
IdleTimeout: &timeout,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}, nil)
if inst.ShouldTimeout() {
t.Error("Non-running instance should never timeout")
}
})
t.Run("no timeout configured", func(t *testing.T) {
inst := instance.New("test", globalConfig, &instance.Options{
IdleTimeout: nil, // No timeout
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}, nil)
inst.SetStatus(instance.Running)
if inst.ShouldTimeout() {
t.Error("Instance with no timeout configured should not timeout")
}
})
t.Run("timeout exceeded", func(t *testing.T) {
timeout := 1 // 1 minute
inst := instance.New("test", globalConfig, &instance.Options{
IdleTimeout: &timeout,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Host: "localhost",
Port: 8080,
},
},
}, nil)
inst.SetStatus(instance.Running)
// Use mock time provider
mockTime := &mockTimeProvider{currentTime: time.Now().Unix()}
inst.SetTimeProvider(mockTime)
// Set last request time to now
inst.UpdateLastRequestTime()
// Advance time by 2 minutes (exceeds 1 minute timeout)
mockTime.currentTime = time.Now().Add(2 * time.Minute).Unix()
if !inst.ShouldTimeout() {
t.Error("Instance should timeout when idle time exceeds configured timeout")
}
})
}
// mockTimeProvider for timeout testing
type mockTimeProvider struct {
currentTime int64 // Unix timestamp
}
func (m *mockTimeProvider) Now() time.Time {
return time.Unix(m.currentTime, 0)
}

View File

@@ -1,345 +0,0 @@
package instance
import (
"context"
"fmt"
"log"
"net/http"
"os/exec"
"runtime"
"syscall"
"time"
)
// Start starts the llama server instance and returns an error if it fails.
func (i *Process) Start() error {
i.mu.Lock()
defer i.mu.Unlock()
if i.IsRunning() {
return fmt.Errorf("instance %s is already running", i.Name)
}
// Safety check: ensure options are valid
if i.options == nil {
return fmt.Errorf("instance %s has no options set", i.Name)
}
// Reset restart counter when manually starting (not during auto-restart)
// We can detect auto-restart by checking if restartCancel is set
if i.restartCancel == nil {
i.restarts = 0
}
// Initialize last request time to current time when starting
i.lastRequestTime.Store(i.timeProvider.Now().Unix())
// Create log files
if err := i.logger.Create(); err != nil {
return fmt.Errorf("failed to create log files: %w", err)
}
args := i.options.BuildCommandArgs()
i.ctx, i.cancel = context.WithCancel(context.Background())
i.cmd = exec.CommandContext(i.ctx, "llama-server", args...)
if runtime.GOOS != "windows" {
setProcAttrs(i.cmd)
}
var err error
i.stdout, err = i.cmd.StdoutPipe()
if err != nil {
i.logger.Close()
return fmt.Errorf("failed to get stdout pipe: %w", err)
}
i.stderr, err = i.cmd.StderrPipe()
if err != nil {
i.stdout.Close()
i.logger.Close()
return fmt.Errorf("failed to get stderr pipe: %w", err)
}
if err := i.cmd.Start(); err != nil {
return fmt.Errorf("failed to start instance %s: %w", i.Name, err)
}
i.SetStatus(Running)
// Create channel for monitor completion signaling
i.monitorDone = make(chan struct{})
go i.logger.readOutput(i.stdout)
go i.logger.readOutput(i.stderr)
go i.monitorProcess()
return nil
}
// Stop terminates the subprocess
func (i *Process) Stop() error {
i.mu.Lock()
if !i.IsRunning() {
// Even if not running, cancel any pending restart
if i.restartCancel != nil {
i.restartCancel()
i.restartCancel = nil
log.Printf("Cancelled pending restart for instance %s", i.Name)
}
i.mu.Unlock()
return fmt.Errorf("instance %s is not running", i.Name)
}
// Cancel any pending restart
if i.restartCancel != nil {
i.restartCancel()
i.restartCancel = nil
}
// Set status to stopped first to signal intentional stop
i.SetStatus(Stopped)
// Clean up the proxy
i.proxy = nil
// Get the monitor done channel before releasing the lock
monitorDone := i.monitorDone
i.mu.Unlock()
// Stop the process with SIGINT if cmd exists
if i.cmd != nil && i.cmd.Process != nil {
if err := i.cmd.Process.Signal(syscall.SIGINT); err != nil {
log.Printf("Failed to send SIGINT to instance %s: %v", i.Name, err)
}
}
// If no process exists, we can return immediately
if i.cmd == nil || monitorDone == nil {
i.logger.Close()
return nil
}
select {
case <-monitorDone:
// Process exited normally
case <-time.After(30 * time.Second):
// Force kill if it doesn't exit within 30 seconds
if i.cmd != nil && i.cmd.Process != nil {
killErr := i.cmd.Process.Kill()
if killErr != nil {
log.Printf("Failed to force kill instance %s: %v", i.Name, killErr)
}
log.Printf("Instance %s did not stop in time, force killed", i.Name)
// Wait a bit more for the monitor to finish after force kill
select {
case <-monitorDone:
// Monitor completed after force kill
case <-time.After(2 * time.Second):
log.Printf("Warning: Monitor goroutine did not complete after force kill for instance %s", i.Name)
}
}
}
i.logger.Close()
return nil
}
func (i *Process) LastRequestTime() int64 {
return i.lastRequestTime.Load()
}
func (i *Process) WaitForHealthy(timeout int) error {
if !i.IsRunning() {
return fmt.Errorf("instance %s is not running", i.Name)
}
if timeout <= 0 {
timeout = 30 // Default to 30 seconds if no timeout is specified
}
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(timeout)*time.Second)
defer cancel()
// Get instance options to build the health check URL
opts := i.GetOptions()
if opts == nil {
return fmt.Errorf("instance %s has no options set", i.Name)
}
// Build the health check URL directly
var host string
var port int
switch opts.BackendType {
case "llama-cpp":
host = opts.LlamaServerOptions.Host
port = opts.LlamaServerOptions.Port
}
if host == "" {
host = "localhost"
}
healthURL := fmt.Sprintf("http://%s:%d/health", host, port)
// Create a dedicated HTTP client for health checks
client := &http.Client{
Timeout: 5 * time.Second, // 5 second timeout per request
}
// Helper function to check health directly
checkHealth := func() bool {
req, err := http.NewRequestWithContext(ctx, "GET", healthURL, nil)
if err != nil {
return false
}
resp, err := client.Do(req)
if err != nil {
return false
}
defer resp.Body.Close()
return resp.StatusCode == http.StatusOK
}
// Try immediate check first
if checkHealth() {
return nil // Instance is healthy
}
// If immediate check failed, start polling
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return fmt.Errorf("timeout waiting for instance %s to become healthy after %d seconds", i.Name, timeout)
case <-ticker.C:
if checkHealth() {
return nil // Instance is healthy
}
// Continue polling
}
}
}
func (i *Process) monitorProcess() {
defer func() {
i.mu.Lock()
if i.monitorDone != nil {
close(i.monitorDone)
i.monitorDone = nil
}
i.mu.Unlock()
}()
err := i.cmd.Wait()
i.mu.Lock()
// Check if the instance was intentionally stopped
if !i.IsRunning() {
i.mu.Unlock()
return
}
i.SetStatus(Stopped)
i.logger.Close()
// Cancel any existing restart context since we're handling a new exit
if i.restartCancel != nil {
i.restartCancel()
i.restartCancel = nil
}
// Log the exit
if err != nil {
log.Printf("Instance %s crashed with error: %v", i.Name, err)
// Handle restart while holding the lock, then release it
i.handleRestart()
} else {
log.Printf("Instance %s exited cleanly", i.Name)
i.mu.Unlock()
}
}
// handleRestart manages the restart process while holding the lock
func (i *Process) handleRestart() {
// Validate restart conditions and get safe parameters
shouldRestart, maxRestarts, restartDelay := i.validateRestartConditions()
if !shouldRestart {
i.SetStatus(Failed)
i.mu.Unlock()
return
}
i.restarts++
log.Printf("Auto-restarting instance %s (attempt %d/%d) in %v",
i.Name, i.restarts, maxRestarts, time.Duration(restartDelay)*time.Second)
// Create a cancellable context for the restart delay
restartCtx, cancel := context.WithCancel(context.Background())
i.restartCancel = cancel
// Release the lock before sleeping
i.mu.Unlock()
// Use context-aware sleep so it can be cancelled
select {
case <-time.After(time.Duration(restartDelay) * time.Second):
// Sleep completed normally, continue with restart
case <-restartCtx.Done():
// Restart was cancelled
log.Printf("Restart cancelled for instance %s", i.Name)
return
}
// Restart the instance
if err := i.Start(); err != nil {
log.Printf("Failed to restart instance %s: %v", i.Name, err)
} else {
log.Printf("Successfully restarted instance %s", i.Name)
// Clear the cancel function
i.mu.Lock()
i.restartCancel = nil
i.mu.Unlock()
}
}
// validateRestartConditions checks if the instance should be restarted and returns the parameters
func (i *Process) validateRestartConditions() (shouldRestart bool, maxRestarts int, restartDelay int) {
if i.options == nil {
log.Printf("Instance %s not restarting: options are nil", i.Name)
return false, 0, 0
}
if i.options.AutoRestart == nil || !*i.options.AutoRestart {
log.Printf("Instance %s not restarting: AutoRestart is disabled", i.Name)
return false, 0, 0
}
if i.options.MaxRestarts == nil {
log.Printf("Instance %s not restarting: MaxRestarts is nil", i.Name)
return false, 0, 0
}
if i.options.RestartDelay == nil {
log.Printf("Instance %s not restarting: RestartDelay is nil", i.Name)
return false, 0, 0
}
// Values are already validated during unmarshaling/SetOptions
maxRestarts = *i.options.MaxRestarts
restartDelay = *i.options.RestartDelay
if i.restarts >= maxRestarts {
log.Printf("Instance %s exceeded max restart attempts (%d)", i.Name, maxRestarts)
return false, 0, 0
}
return true, maxRestarts, restartDelay
}

View File

@@ -6,25 +6,30 @@ import (
"io"
"os"
"strings"
"sync"
"time"
)
type InstanceLogger struct {
type logger struct {
name string
logDir string
logFile *os.File
logFilePath string
mu sync.RWMutex
}
func NewInstanceLogger(name string, logDir string) *InstanceLogger {
return &InstanceLogger{
func newLogger(name string, logDir string) *logger {
return &logger{
name: name,
logDir: logDir,
}
}
// Create creates and opens the log files for stdout and stderr
func (i *InstanceLogger) Create() error {
// create creates and opens the log files for stdout and stderr
func (i *logger) create() error {
i.mu.Lock()
defer i.mu.Unlock()
if i.logDir == "" {
return fmt.Errorf("logDir is empty for instance %s", i.name)
}
@@ -51,17 +56,16 @@ func (i *InstanceLogger) Create() error {
return nil
}
// GetLogs retrieves the last n lines of logs from the instance
func (i *Process) GetLogs(num_lines int) (string, error) {
// getLogs retrieves the last n lines of logs from the instance
func (i *logger) getLogs(num_lines int) (string, error) {
i.mu.RLock()
logFileName := i.logger.logFilePath
i.mu.RUnlock()
defer i.mu.RUnlock()
if logFileName == "" {
return "", fmt.Errorf("log file not created for instance %s", i.Name)
if i.logFilePath == "" {
return "", fmt.Errorf("log file not created for instance %s", i.name)
}
file, err := os.Open(logFileName)
file, err := os.Open(i.logFilePath)
if err != nil {
return "", fmt.Errorf("failed to open log file: %w", err)
}
@@ -93,8 +97,11 @@ func (i *Process) GetLogs(num_lines int) (string, error) {
return strings.Join(lines[start:], "\n"), nil
}
// closeLogFile closes the log files
func (i *InstanceLogger) Close() {
// close closes the log files
func (i *logger) close() {
i.mu.Lock()
defer i.mu.Unlock()
if i.logFile != nil {
timestamp := time.Now().Format("2006-01-02 15:04:05")
fmt.Fprintf(i.logFile, "=== Instance %s stopped at %s ===\n\n", i.name, timestamp)
@@ -104,7 +111,7 @@ func (i *InstanceLogger) Close() {
}
// readOutput reads from the given reader and writes lines to the log file
func (i *InstanceLogger) readOutput(reader io.ReadCloser) {
func (i *logger) readOutput(reader io.ReadCloser) {
defer reader.Close()
scanner := bufio.NewScanner(reader)

View File

@@ -4,12 +4,14 @@ import (
"encoding/json"
"fmt"
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"log"
"slices"
"sync"
)
type CreateInstanceOptions struct {
// Options contains the actual configuration (exported - this is the public API).
type Options struct {
// Auto restart
AutoRestart *bool `json:"auto_restart,omitempty"`
MaxRestarts *int `json:"max_restarts,omitempty"`
@@ -18,19 +20,79 @@ type CreateInstanceOptions struct {
OnDemandStart *bool `json:"on_demand_start,omitempty"`
// Idle timeout
IdleTimeout *int `json:"idle_timeout,omitempty"` // minutes
BackendType backends.BackendType `json:"backend_type"`
BackendOptions map[string]any `json:"backend_options,omitempty"`
// LlamaServerOptions contains the options for the llama server
LlamaServerOptions *llamacpp.LlamaServerOptions `json:"-"`
// Environment variables
Environment map[string]string `json:"environment,omitempty"`
// Assigned nodes
Nodes map[string]struct{} `json:"-"`
// Backend options
BackendOptions backends.Options `json:"-"`
}
// UnmarshalJSON implements custom JSON unmarshaling for CreateInstanceOptions
func (c *CreateInstanceOptions) UnmarshalJSON(data []byte) error {
// options wraps Options with thread-safe access (unexported).
type options struct {
mu sync.RWMutex
opts *Options
}
// newOptions creates a new options wrapper with the given Options
func newOptions(opts *Options) *options {
return &options{
opts: opts,
}
}
// get returns a copy of the current options
func (o *options) get() *Options {
o.mu.RLock()
defer o.mu.RUnlock()
return o.opts
}
// set updates the options
func (o *options) set(opts *Options) {
o.mu.Lock()
defer o.mu.Unlock()
o.opts = opts
}
func (o *options) GetHost() string {
o.mu.RLock()
defer o.mu.RUnlock()
return o.opts.BackendOptions.GetHost()
}
func (o *options) GetPort() int {
o.mu.RLock()
defer o.mu.RUnlock()
return o.opts.BackendOptions.GetPort()
}
// MarshalJSON implements json.Marshaler for options wrapper
func (o *options) MarshalJSON() ([]byte, error) {
o.mu.RLock()
defer o.mu.RUnlock()
return o.opts.MarshalJSON()
}
// UnmarshalJSON implements json.Unmarshaler for options wrapper
func (o *options) UnmarshalJSON(data []byte) error {
o.mu.Lock()
defer o.mu.Unlock()
if o.opts == nil {
o.opts = &Options{}
}
return o.opts.UnmarshalJSON(data)
}
// UnmarshalJSON implements custom JSON unmarshaling for Options
func (c *Options) UnmarshalJSON(data []byte) error {
// Use anonymous struct to avoid recursion
type Alias CreateInstanceOptions
type Alias Options
aux := &struct {
Nodes []string `json:"nodes,omitempty"`
BackendType backends.BackendType `json:"backend_type"`
BackendOptions map[string]any `json:"backend_options,omitempty"`
*Alias
}{
Alias: (*Alias)(c),
@@ -40,58 +102,88 @@ func (c *CreateInstanceOptions) UnmarshalJSON(data []byte) error {
return err
}
// Parse backend-specific options
switch c.BackendType {
case backends.BackendTypeLlamaCpp:
if c.BackendOptions != nil {
// Convert map to JSON and then unmarshal to LlamaServerOptions
optionsData, err := json.Marshal(c.BackendOptions)
if err != nil {
return fmt.Errorf("failed to marshal backend options: %w", err)
}
c.LlamaServerOptions = &llamacpp.LlamaServerOptions{}
if err := json.Unmarshal(optionsData, c.LlamaServerOptions); err != nil {
return fmt.Errorf("failed to unmarshal llama.cpp options: %w", err)
}
// Convert nodes array to map
if len(aux.Nodes) > 0 {
c.Nodes = make(map[string]struct{}, len(aux.Nodes))
for _, node := range aux.Nodes {
c.Nodes[node] = struct{}{}
}
default:
return fmt.Errorf("unknown backend type: %s", c.BackendType)
}
// Create backend options struct and unmarshal
c.BackendOptions = backends.Options{
BackendType: aux.BackendType,
BackendOptions: aux.BackendOptions,
}
// Marshal the backend options to JSON for proper unmarshaling
backendJson, err := json.Marshal(struct {
BackendType backends.BackendType `json:"backend_type"`
BackendOptions map[string]any `json:"backend_options,omitempty"`
}{
BackendType: aux.BackendType,
BackendOptions: aux.BackendOptions,
})
if err != nil {
return fmt.Errorf("failed to marshal backend options: %w", err)
}
// Unmarshal into the backends.Options struct to trigger its custom unmarshaling
if err := json.Unmarshal(backendJson, &c.BackendOptions); err != nil {
return fmt.Errorf("failed to unmarshal backend options: %w", err)
}
return nil
}
// MarshalJSON implements custom JSON marshaling for CreateInstanceOptions
func (c *CreateInstanceOptions) MarshalJSON() ([]byte, error) {
// MarshalJSON implements custom JSON marshaling for Options
func (c *Options) MarshalJSON() ([]byte, error) {
// Use anonymous struct to avoid recursion
type Alias CreateInstanceOptions
type Alias Options
aux := struct {
Nodes []string `json:"nodes,omitempty"` // Output as JSON array
BackendType backends.BackendType `json:"backend_type"`
BackendOptions map[string]any `json:"backend_options,omitempty"`
*Alias
}{
Alias: (*Alias)(c),
}
// Convert LlamaServerOptions back to BackendOptions map for JSON
if c.BackendType == backends.BackendTypeLlamaCpp && c.LlamaServerOptions != nil {
data, err := json.Marshal(c.LlamaServerOptions)
if err != nil {
return nil, fmt.Errorf("failed to marshal llama server options: %w", err)
// Convert nodes map to array (sorted for consistency)
if len(c.Nodes) > 0 {
aux.Nodes = make([]string, 0, len(c.Nodes))
for node := range c.Nodes {
aux.Nodes = append(aux.Nodes, node)
}
var backendOpts map[string]any
if err := json.Unmarshal(data, &backendOpts); err != nil {
return nil, fmt.Errorf("failed to unmarshal to map: %w", err)
}
aux.BackendOptions = backendOpts
// Sort for consistent output
slices.Sort(aux.Nodes)
}
// Set backend type
aux.BackendType = c.BackendOptions.BackendType
// Marshal the backends.Options struct to get the properly formatted backend options
// Marshal a pointer to trigger the pointer receiver MarshalJSON method
backendData, err := json.Marshal(&c.BackendOptions)
if err != nil {
return nil, fmt.Errorf("failed to marshal backend options: %w", err)
}
// Unmarshal into a temporary struct to extract the backend_options map
var tempBackend struct {
BackendOptions map[string]any `json:"backend_options,omitempty"`
}
if err := json.Unmarshal(backendData, &tempBackend); err != nil {
return nil, fmt.Errorf("failed to unmarshal backend data: %w", err)
}
aux.BackendOptions = tempBackend.BackendOptions
return json.Marshal(aux)
}
// ValidateAndApplyDefaults validates the instance options and applies constraints
func (c *CreateInstanceOptions) ValidateAndApplyDefaults(name string, globalSettings *config.InstancesConfig) {
// validateAndApplyDefaults validates the instance options and applies constraints
func (c *Options) validateAndApplyDefaults(name string, globalSettings *config.InstancesConfig) {
// Validate and apply constraints
if c.MaxRestarts != nil && *c.MaxRestarts < 0 {
log.Printf("Instance %s MaxRestarts value (%d) cannot be negative, setting to 0", name, *c.MaxRestarts)
@@ -128,14 +220,3 @@ func (c *CreateInstanceOptions) ValidateAndApplyDefaults(name string, globalSett
}
}
}
// BuildCommandArgs builds command line arguments for the backend
func (c *CreateInstanceOptions) BuildCommandArgs() []string {
switch c.BackendType {
case backends.BackendTypeLlamaCpp:
if c.LlamaServerOptions != nil {
return c.LlamaServerOptions.BuildCommandArgs()
}
}
return []string{}
}

413
pkg/instance/process.go Normal file
View File

@@ -0,0 +1,413 @@
package instance
import (
"context"
"fmt"
"io"
"log"
"net/http"
"os"
"os/exec"
"runtime"
"sync"
"syscall"
"time"
)
// process manages the OS process lifecycle for a local instance.
// process owns its complete lifecycle including auto-restart logic.
type process struct {
instance *Instance // Back-reference for SetStatus, GetOptions
mu sync.RWMutex
cmd *exec.Cmd
ctx context.Context
cancel context.CancelFunc
stdout io.ReadCloser
stderr io.ReadCloser
restarts int
restartCancel context.CancelFunc
monitorDone chan struct{}
}
// newProcess creates a new process component for the given instance
func newProcess(instance *Instance) *process {
return &process{
instance: instance,
}
}
// start starts the OS process and returns an error if it fails.
func (p *process) start() error {
p.mu.Lock()
defer p.mu.Unlock()
if p.instance.IsRunning() {
return fmt.Errorf("instance %s is already running", p.instance.Name)
}
// Safety check: ensure options are valid
if p.instance.options == nil {
return fmt.Errorf("instance %s has no options set", p.instance.Name)
}
// Reset restart counter when manually starting (not during auto-restart)
// We can detect auto-restart by checking if restartCancel is set
if p.restartCancel == nil {
p.restarts = 0
}
// Initialize last request time to current time when starting
if p.instance.proxy != nil {
p.instance.proxy.updateLastRequestTime()
}
// Create context before building command (needed for CommandContext)
p.ctx, p.cancel = context.WithCancel(context.Background())
// Create log files
if err := p.instance.logger.create(); err != nil {
return fmt.Errorf("failed to create log files: %w", err)
}
// Build command using backend-specific methods
cmd, cmdErr := p.buildCommand()
if cmdErr != nil {
return fmt.Errorf("failed to build command: %w", cmdErr)
}
p.cmd = cmd
if runtime.GOOS != "windows" {
setProcAttrs(p.cmd)
}
var err error
p.stdout, err = p.cmd.StdoutPipe()
if err != nil {
p.instance.logger.close()
return fmt.Errorf("failed to get stdout pipe: %w", err)
}
p.stderr, err = p.cmd.StderrPipe()
if err != nil {
p.stdout.Close()
p.instance.logger.close()
return fmt.Errorf("failed to get stderr pipe: %w", err)
}
if err := p.cmd.Start(); err != nil {
return fmt.Errorf("failed to start instance %s: %w", p.instance.Name, err)
}
p.instance.SetStatus(Running)
// Create channel for monitor completion signaling
p.monitorDone = make(chan struct{})
go p.instance.logger.readOutput(p.stdout)
go p.instance.logger.readOutput(p.stderr)
go p.monitorProcess()
return nil
}
// stop terminates the subprocess without restarting
func (p *process) stop() error {
p.mu.Lock()
if !p.instance.IsRunning() {
// Even if not running, cancel any pending restart
if p.restartCancel != nil {
p.restartCancel()
p.restartCancel = nil
log.Printf("Cancelled pending restart for instance %s", p.instance.Name)
}
p.mu.Unlock()
return fmt.Errorf("instance %s is not running", p.instance.Name)
}
// Cancel any pending restart
if p.restartCancel != nil {
p.restartCancel()
p.restartCancel = nil
}
// Set status to stopped first to signal intentional stop
p.instance.SetStatus(Stopped)
// Get the monitor done channel before releasing the lock
monitorDone := p.monitorDone
p.mu.Unlock()
// Stop the process with SIGINT if cmd exists
if p.cmd != nil && p.cmd.Process != nil {
if err := p.cmd.Process.Signal(syscall.SIGINT); err != nil {
log.Printf("Failed to send SIGINT to instance %s: %v", p.instance.Name, err)
}
}
// If no process exists, we can return immediately
if p.cmd == nil || monitorDone == nil {
p.instance.logger.close()
return nil
}
select {
case <-monitorDone:
// Process exited normally
case <-time.After(30 * time.Second):
// Force kill if it doesn't exit within 30 seconds
if p.cmd != nil && p.cmd.Process != nil {
killErr := p.cmd.Process.Kill()
if killErr != nil {
log.Printf("Failed to force kill instance %s: %v", p.instance.Name, killErr)
}
log.Printf("Instance %s did not stop in time, force killed", p.instance.Name)
// Wait a bit more for the monitor to finish after force kill
select {
case <-monitorDone:
// Monitor completed after force kill
case <-time.After(2 * time.Second):
log.Printf("Warning: Monitor goroutine did not complete after force kill for instance %s", p.instance.Name)
}
}
}
p.instance.logger.close()
return nil
}
// restart manually restarts the process (resets restart counter)
func (p *process) restart() error {
// Stop the process first
if err := p.stop(); err != nil {
// If it's not running, that's ok - we'll just start it
if err.Error() != fmt.Sprintf("instance %s is not running", p.instance.Name) {
return fmt.Errorf("failed to stop instance during restart: %w", err)
}
}
// Reset restart counter for manual restart
p.mu.Lock()
p.restarts = 0
p.mu.Unlock()
// Start the process
return p.start()
}
// waitForHealthy waits for the process to become healthy
func (p *process) waitForHealthy(timeout int) error {
if !p.instance.IsRunning() {
return fmt.Errorf("instance %s is not running", p.instance.Name)
}
if timeout <= 0 {
timeout = 30 // Default to 30 seconds if no timeout is specified
}
ctx, cancel := context.WithTimeout(context.Background(), time.Duration(timeout)*time.Second)
defer cancel()
// Get host/port from instance
host := p.instance.options.GetHost()
port := p.instance.options.GetPort()
healthURL := fmt.Sprintf("http://%s:%d/health", host, port)
// Create a dedicated HTTP client for health checks
client := &http.Client{
Timeout: 5 * time.Second, // 5 second timeout per request
}
// Helper function to check health directly
checkHealth := func() bool {
req, err := http.NewRequestWithContext(ctx, "GET", healthURL, nil)
if err != nil {
return false
}
resp, err := client.Do(req)
if err != nil {
return false
}
defer resp.Body.Close()
return resp.StatusCode == http.StatusOK
}
// Try immediate check first
if checkHealth() {
return nil // Instance is healthy
}
// If immediate check failed, start polling
ticker := time.NewTicker(1 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return fmt.Errorf("timeout waiting for instance %s to become healthy after %d seconds", p.instance.Name, timeout)
case <-ticker.C:
if checkHealth() {
return nil // Instance is healthy
}
// Continue polling
}
}
}
// monitorProcess monitors the OS process and handles crashes/exits
func (p *process) monitorProcess() {
defer func() {
p.mu.Lock()
if p.monitorDone != nil {
close(p.monitorDone)
p.monitorDone = nil
}
p.mu.Unlock()
}()
err := p.cmd.Wait()
p.mu.Lock()
// Check if the instance was intentionally stopped
if !p.instance.IsRunning() {
p.mu.Unlock()
return
}
p.instance.SetStatus(Stopped)
p.instance.logger.close()
// Cancel any existing restart context since we're handling a new exit
if p.restartCancel != nil {
p.restartCancel()
p.restartCancel = nil
}
// Log the exit
if err != nil {
log.Printf("Instance %s crashed with error: %v", p.instance.Name, err)
// Handle auto-restart logic
p.handleAutoRestart(err)
} else {
log.Printf("Instance %s exited cleanly", p.instance.Name)
p.mu.Unlock()
}
}
// shouldAutoRestart checks if the process should auto-restart
func (p *process) shouldAutoRestart() bool {
opts := p.instance.GetOptions()
if opts == nil {
log.Printf("Instance %s not restarting: options are nil", p.instance.Name)
return false
}
if opts.AutoRestart == nil || !*opts.AutoRestart {
log.Printf("Instance %s not restarting: AutoRestart is disabled", p.instance.Name)
return false
}
if opts.MaxRestarts == nil {
log.Printf("Instance %s not restarting: MaxRestarts is nil", p.instance.Name)
return false
}
maxRestarts := *opts.MaxRestarts
if p.restarts >= maxRestarts {
log.Printf("Instance %s exceeded max restart attempts (%d)", p.instance.Name, maxRestarts)
return false
}
return true
}
// handleAutoRestart manages the auto-restart process
func (p *process) handleAutoRestart(err error) {
// Check if should restart
if !p.shouldAutoRestart() {
p.instance.SetStatus(Failed)
p.mu.Unlock()
return
}
// Get restart parameters
opts := p.instance.GetOptions()
if opts.RestartDelay == nil {
log.Printf("Instance %s not restarting: RestartDelay is nil", p.instance.Name)
p.instance.SetStatus(Failed)
p.mu.Unlock()
return
}
restartDelay := *opts.RestartDelay
maxRestarts := *opts.MaxRestarts
p.restarts++
// Set status to Restarting instead of leaving as Stopped
p.instance.SetStatus(Restarting)
log.Printf("Auto-restarting instance %s (attempt %d/%d) in %v",
p.instance.Name, p.restarts, maxRestarts, time.Duration(restartDelay)*time.Second)
// Create a cancellable context for the restart delay
restartCtx, cancel := context.WithCancel(context.Background())
p.restartCancel = cancel
// Release the lock before sleeping
p.mu.Unlock()
// Use context-aware sleep so it can be cancelled
select {
case <-time.After(time.Duration(restartDelay) * time.Second):
// Sleep completed normally, continue with restart
case <-restartCtx.Done():
// Restart was cancelled
log.Printf("Restart cancelled for instance %s", p.instance.Name)
return
}
// Restart the instance
if err := p.start(); err != nil {
log.Printf("Failed to restart instance %s: %v", p.instance.Name, err)
} else {
log.Printf("Successfully restarted instance %s", p.instance.Name)
// Clear the cancel function
p.mu.Lock()
p.restartCancel = nil
p.mu.Unlock()
}
}
// buildCommand builds the command to execute using backend-specific logic
func (p *process) buildCommand() (*exec.Cmd, error) {
// Build the environment variables
env := p.instance.buildEnvironment()
// Get the command to execute
command := p.instance.getCommand()
// Build command arguments
args := p.instance.buildCommandArgs()
// Create the exec.Cmd
cmd := exec.CommandContext(p.ctx, command, args...)
// Start with host environment variables
cmd.Env = os.Environ()
// Add/override with backend-specific environment variables
for k, v := range env {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
return cmd, nil
}

201
pkg/instance/proxy.go Normal file
View File

@@ -0,0 +1,201 @@
package instance
import (
"fmt"
"net/http"
"net/http/httputil"
"net/url"
"sync"
"sync/atomic"
"time"
)
// TimeProvider interface allows for testing with mock time
type TimeProvider interface {
Now() time.Time
}
// realTimeProvider implements TimeProvider using the actual time
type realTimeProvider struct{}
func (realTimeProvider) Now() time.Time {
return time.Now()
}
// proxy manages HTTP reverse proxy and request tracking for an instance.
type proxy struct {
instance *Instance
targetURL *url.URL
apiKey string // For remote instances
responseHeaders map[string]string
mu sync.RWMutex
proxy *httputil.ReverseProxy
proxyOnce sync.Once
proxyErr error
lastRequestTime atomic.Int64
timeProvider TimeProvider
}
// newProxy creates a new Proxy for the given instance
func newProxy(instance *Instance) (*proxy, error) {
p := &proxy{
instance: instance,
timeProvider: realTimeProvider{},
}
var err error
options := instance.GetOptions()
if options == nil {
return nil, fmt.Errorf("instance %s has no options set", instance.Name)
}
if instance.IsRemote() {
// Take the first remote node as the target for now
var nodeName string
for node := range options.Nodes {
nodeName = node
break
}
if nodeName == "" {
return nil, fmt.Errorf("instance %s has no remote nodes defined", p.instance.Name)
}
node, ok := p.instance.globalNodesConfig[nodeName]
if !ok {
return nil, fmt.Errorf("remote node %s is not defined", nodeName)
}
p.targetURL, err = url.Parse(node.Address)
if err != nil {
return nil, fmt.Errorf("failed to parse target URL for remote instance %s: %w", p.instance.Name, err)
}
p.apiKey = node.APIKey
} else {
// Get host/port from process
host := p.instance.options.GetHost()
port := p.instance.options.GetPort()
if port == 0 {
return nil, fmt.Errorf("instance %s has no port assigned", p.instance.Name)
}
p.targetURL, err = url.Parse(fmt.Sprintf("http://%s:%d", host, port))
if err != nil {
return nil, fmt.Errorf("failed to parse target URL for instance %s: %w", p.instance.Name, err)
}
// Get response headers from backend config
p.responseHeaders = options.BackendOptions.GetResponseHeaders(p.instance.globalBackendSettings)
}
return p, nil
}
// get returns the reverse proxy for this instance, creating it if needed.
// Uses sync.Once to ensure thread-safe one-time initialization.
func (p *proxy) get() (*httputil.ReverseProxy, error) {
// sync.Once guarantees buildProxy() is called exactly once
// Other callers block until first initialization completes
p.proxyOnce.Do(func() {
p.proxy, p.proxyErr = p.build()
})
return p.proxy, p.proxyErr
}
// build creates the reverse proxy based on instance options
func (p *proxy) build() (*httputil.ReverseProxy, error) {
proxy := httputil.NewSingleHostReverseProxy(p.targetURL)
// Modify the request before sending it to the backend
originalDirector := proxy.Director
proxy.Director = func(req *http.Request) {
originalDirector(req)
// Add API key header for remote instances
if p.instance.IsRemote() && p.apiKey != "" {
req.Header.Set("Authorization", "Bearer "+p.apiKey)
}
// Update last request time
p.updateLastRequestTime()
}
if !p.instance.IsRemote() {
// Add custom headers to the request
proxy.ModifyResponse = func(resp *http.Response) error {
// Remove CORS headers from backend response to avoid conflicts
// llamactl will add its own CORS headers
resp.Header.Del("Access-Control-Allow-Origin")
resp.Header.Del("Access-Control-Allow-Methods")
resp.Header.Del("Access-Control-Allow-Headers")
resp.Header.Del("Access-Control-Allow-Credentials")
resp.Header.Del("Access-Control-Max-Age")
resp.Header.Del("Access-Control-Expose-Headers")
for key, value := range p.responseHeaders {
resp.Header.Set(key, value)
}
return nil
}
}
return proxy, nil
}
// clear resets the proxy, allowing it to be recreated when options change.
func (p *proxy) clear() {
p.mu.Lock()
defer p.mu.Unlock()
p.proxy = nil
p.proxyErr = nil
p.proxyOnce = sync.Once{} // Reset Once for next GetProxy call
}
// updateLastRequestTime updates the last request access time for the instance
func (p *proxy) updateLastRequestTime() {
lastRequestTime := p.timeProvider.Now().Unix()
p.lastRequestTime.Store(lastRequestTime)
}
// getLastRequestTime returns the last request time as a Unix timestamp
func (p *proxy) getLastRequestTime() int64 {
return p.lastRequestTime.Load()
}
// shouldTimeout checks if the instance should timeout based on idle time
func (p *proxy) shouldTimeout() bool {
if !p.instance.IsRunning() {
return false
}
options := p.instance.GetOptions()
if options == nil || options.IdleTimeout == nil || *options.IdleTimeout <= 0 {
return false
}
// Check if the last request time exceeds the idle timeout
lastRequest := p.lastRequestTime.Load()
idleTimeoutMinutes := *options.IdleTimeout
// Convert timeout from minutes to seconds for comparison
idleTimeoutSeconds := int64(idleTimeoutMinutes * 60)
return (p.timeProvider.Now().Unix() - lastRequest) > idleTimeoutSeconds
}
// setTimeProvider sets a custom time provider for testing
func (p *proxy) setTimeProvider(tp TimeProvider) {
p.timeProvider = tp
}

View File

@@ -3,48 +3,35 @@ package instance
import (
"encoding/json"
"log"
"sync"
)
// Enum for instance status
type InstanceStatus int
// Status is the enum for status values (exported).
type Status int
const (
Stopped InstanceStatus = iota
Stopped Status = iota
Running
Failed
Restarting
)
var nameToStatus = map[string]InstanceStatus{
"stopped": Stopped,
"running": Running,
"failed": Failed,
var nameToStatus = map[string]Status{
"stopped": Stopped,
"running": Running,
"failed": Failed,
"restarting": Restarting,
}
var statusToName = map[InstanceStatus]string{
Stopped: "stopped",
Running: "running",
Failed: "failed",
var statusToName = map[Status]string{
Stopped: "stopped",
Running: "running",
Failed: "failed",
Restarting: "restarting",
}
func (p *Process) SetStatus(status InstanceStatus) {
oldStatus := p.Status
p.Status = status
if p.onStatusChange != nil {
p.onStatusChange(oldStatus, status)
}
}
func (p *Process) GetStatus() InstanceStatus {
return p.Status
}
// IsRunning returns true if the status is Running
func (p *Process) IsRunning() bool {
return p.Status == Running
}
func (s InstanceStatus) MarshalJSON() ([]byte, error) {
// Status enum JSON marshaling methods
func (s Status) MarshalJSON() ([]byte, error) {
name, ok := statusToName[s]
if !ok {
name = "stopped" // Default to "stopped" for unknown status
@@ -52,8 +39,8 @@ func (s InstanceStatus) MarshalJSON() ([]byte, error) {
return json.Marshal(name)
}
// UnmarshalJSON implements json.Unmarshaler
func (s *InstanceStatus) UnmarshalJSON(data []byte) error {
// UnmarshalJSON implements json.Unmarshaler for Status enum
func (s *Status) UnmarshalJSON(data []byte) error {
var str string
if err := json.Unmarshal(data, &str); err != nil {
return err
@@ -68,3 +55,61 @@ func (s *InstanceStatus) UnmarshalJSON(data []byte) error {
*s = status
return nil
}
// status represents the instance status with thread-safe access (unexported).
type status struct {
mu sync.RWMutex
s Status
// Callback for status changes
onStatusChange func(oldStatus, newStatus Status)
}
// newStatus creates a new status wrapper with the given initial status
func newStatus(initial Status) *status {
return &status{
s: initial,
}
}
// get returns the current status
func (st *status) get() Status {
st.mu.RLock()
defer st.mu.RUnlock()
return st.s
}
// set updates the status and triggers the onStatusChange callback if set
func (st *status) set(newStatus Status) {
st.mu.Lock()
oldStatus := st.s
st.s = newStatus
callback := st.onStatusChange
st.mu.Unlock()
// Call the callback outside the lock to avoid potential deadlocks
if callback != nil {
callback(oldStatus, newStatus)
}
}
// isRunning returns true if the status is Running
func (st *status) isRunning() bool {
st.mu.RLock()
defer st.mu.RUnlock()
return st.s == Running
}
// MarshalJSON implements json.Marshaler for status wrapper
func (st *status) MarshalJSON() ([]byte, error) {
st.mu.RLock()
defer st.mu.RUnlock()
return st.s.MarshalJSON()
}
// UnmarshalJSON implements json.Unmarshaler for status wrapper
func (st *status) UnmarshalJSON(data []byte) error {
st.mu.Lock()
defer st.mu.Unlock()
return st.s.UnmarshalJSON(data)
}

View File

@@ -1,28 +0,0 @@
package instance
// UpdateLastRequestTime updates the last request access time for the instance via proxy
func (i *Process) UpdateLastRequestTime() {
i.mu.Lock()
defer i.mu.Unlock()
lastRequestTime := i.timeProvider.Now().Unix()
i.lastRequestTime.Store(lastRequestTime)
}
func (i *Process) ShouldTimeout() bool {
i.mu.RLock()
defer i.mu.RUnlock()
if !i.IsRunning() || i.options.IdleTimeout == nil || *i.options.IdleTimeout <= 0 {
return false
}
// Check if the last request time exceeds the idle timeout
lastRequest := i.lastRequestTime.Load()
idleTimeoutMinutes := *i.options.IdleTimeout
// Convert timeout from minutes to seconds for comparison
idleTimeoutSeconds := int64(idleTimeoutMinutes * 60)
return (i.timeProvider.Now().Unix() - lastRequest) > idleTimeoutSeconds
}

View File

@@ -1,220 +0,0 @@
package instance_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/testutil"
"sync/atomic"
"testing"
"time"
)
// MockTimeProvider implements TimeProvider for testing
type MockTimeProvider struct {
currentTime atomic.Int64 // Unix timestamp
}
func NewMockTimeProvider(t time.Time) *MockTimeProvider {
m := &MockTimeProvider{}
m.currentTime.Store(t.Unix())
return m
}
func (m *MockTimeProvider) Now() time.Time {
return time.Unix(m.currentTime.Load(), 0)
}
func (m *MockTimeProvider) SetTime(t time.Time) {
m.currentTime.Store(t.Unix())
}
// Timeout-related tests
func TestUpdateLastRequestTime(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
// Test that UpdateLastRequestTime doesn't panic
inst.UpdateLastRequestTime()
}
func TestShouldTimeout_NotRunning(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
idleTimeout := 1 // 1 minute
options := &instance.CreateInstanceOptions{
IdleTimeout: &idleTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
// Instance is not running, should not timeout regardless of configuration
if inst.ShouldTimeout() {
t.Error("Non-running instance should never timeout")
}
}
func TestShouldTimeout_NoTimeoutConfigured(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
tests := []struct {
name string
idleTimeout *int
}{
{"nil timeout", nil},
{"zero timeout", testutil.IntPtr(0)},
{"negative timeout", testutil.IntPtr(-5)},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
options := &instance.CreateInstanceOptions{
IdleTimeout: tt.idleTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
// Simulate running state
inst.SetStatus(instance.Running)
if inst.ShouldTimeout() {
t.Errorf("Instance with %s should not timeout", tt.name)
}
})
}
}
func TestShouldTimeout_WithinTimeLimit(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
idleTimeout := 5 // 5 minutes
options := &instance.CreateInstanceOptions{
IdleTimeout: &idleTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
inst.SetStatus(instance.Running)
// Update last request time to now
inst.UpdateLastRequestTime()
// Should not timeout immediately
if inst.ShouldTimeout() {
t.Error("Instance should not timeout when last request was recent")
}
}
func TestShouldTimeout_ExceedsTimeLimit(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
idleTimeout := 1 // 1 minute
options := &instance.CreateInstanceOptions{
IdleTimeout: &idleTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
inst.SetStatus(instance.Running)
// Use MockTimeProvider to simulate old last request time
mockTime := NewMockTimeProvider(time.Now())
inst.SetTimeProvider(mockTime)
// Set last request time to now
inst.UpdateLastRequestTime()
// Advance time by 2 minutes (exceeds 1 minute timeout)
mockTime.SetTime(time.Now().Add(2 * time.Minute))
if !inst.ShouldTimeout() {
t.Error("Instance should timeout when last request exceeds idle timeout")
}
}
func TestTimeoutConfiguration_Validation(t *testing.T) {
globalSettings := &config.InstancesConfig{
LogsDir: "/tmp/test",
}
tests := []struct {
name string
inputTimeout *int
expectedTimeout int
}{
{"default value when nil", nil, 0},
{"positive value", testutil.IntPtr(10), 10},
{"zero value", testutil.IntPtr(0), 0},
{"negative value gets corrected", testutil.IntPtr(-5), 0},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
options := &instance.CreateInstanceOptions{
IdleTimeout: tt.inputTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
// Mock onStatusChange function
mockOnStatusChange := func(oldStatus, newStatus instance.InstanceStatus) {}
inst := instance.NewInstance("test-instance", globalSettings, options, mockOnStatusChange)
opts := inst.GetOptions()
if opts.IdleTimeout == nil || *opts.IdleTimeout != tt.expectedTimeout {
t.Errorf("Expected IdleTimeout %d, got %v", tt.expectedTimeout, opts.IdleTimeout)
}
})
}
}

152
pkg/manager/lifecycle.go Normal file
View File

@@ -0,0 +1,152 @@
package manager
import (
"fmt"
"llamactl/pkg/instance"
"log"
"sync"
"time"
)
// lifecycleManager handles background timeout checking and LRU eviction.
// It properly coordinates shutdown to prevent races with the timeout checker.
type lifecycleManager struct {
registry *instanceRegistry
manager InstanceManager // For calling Stop/Evict operations
ticker *time.Ticker
checkInterval time.Duration
enableLRU bool
shutdownChan chan struct{}
shutdownDone chan struct{}
shutdownOnce sync.Once
}
// newLifecycleManager creates a new lifecycle manager.
func newLifecycleManager(
registry *instanceRegistry,
manager InstanceManager,
checkInterval time.Duration,
enableLRU bool,
) *lifecycleManager {
if checkInterval <= 0 {
checkInterval = 5 * time.Minute // Default to 5 minutes
}
return &lifecycleManager{
registry: registry,
manager: manager,
ticker: time.NewTicker(checkInterval),
checkInterval: checkInterval,
enableLRU: enableLRU,
shutdownChan: make(chan struct{}),
shutdownDone: make(chan struct{}),
}
}
// Start begins the timeout checking loop in a goroutine.
func (l *lifecycleManager) start() {
go l.timeoutCheckLoop()
}
// Stop gracefully stops the lifecycle manager.
// This ensures the timeout checker completes before instance cleanup begins.
func (l *lifecycleManager) stop() {
l.shutdownOnce.Do(func() {
close(l.shutdownChan)
<-l.shutdownDone // Wait for checker to finish (prevents shutdown race)
l.ticker.Stop()
})
}
// timeoutCheckLoop is the main loop that periodically checks for timeouts.
func (l *lifecycleManager) timeoutCheckLoop() {
defer close(l.shutdownDone) // Signal completion
for {
select {
case <-l.ticker.C:
l.checkTimeouts()
case <-l.shutdownChan:
return // Exit goroutine on shutdown
}
}
}
// checkTimeouts checks all instances for timeout and stops those that have timed out.
func (l *lifecycleManager) checkTimeouts() {
// Get all instances from registry
instances := l.registry.list()
var timeoutInstances []string
// Identify instances that should timeout
for _, inst := range instances {
// Skip remote instances - they are managed by their respective nodes
if inst.IsRemote() {
continue
}
// Only check running instances
if !l.registry.isRunning(inst.Name) {
continue
}
if inst.ShouldTimeout() {
timeoutInstances = append(timeoutInstances, inst.Name)
}
}
// Stop the timed-out instances
for _, name := range timeoutInstances {
log.Printf("Instance %s has timed out, stopping it", name)
if _, err := l.manager.StopInstance(name); err != nil {
log.Printf("Error stopping instance %s: %v", name, err)
} else {
log.Printf("Instance %s stopped successfully", name)
}
}
}
// EvictLRU finds and stops the least recently used running instance.
// This is called when max running instances limit is reached.
func (l *lifecycleManager) evictLRU() error {
if !l.enableLRU {
return fmt.Errorf("LRU eviction is not enabled")
}
// Get all running instances
runningInstances := l.registry.listRunning()
var lruInstance *instance.Instance
for _, inst := range runningInstances {
// Skip remote instances - they are managed by their respective nodes
if inst.IsRemote() {
continue
}
// Skip instances without idle timeout
if inst.GetOptions() != nil && inst.GetOptions().IdleTimeout != nil && *inst.GetOptions().IdleTimeout <= 0 {
continue
}
if lruInstance == nil {
lruInstance = inst
}
if inst.LastRequestTime() < lruInstance.LastRequestTime() {
lruInstance = inst
}
}
if lruInstance == nil {
return fmt.Errorf("failed to find lru instance")
}
// Evict the LRU instance
log.Printf("Evicting LRU instance %s", lruInstance.Name)
_, err := l.manager.StopInstance(lruInstance.Name)
return err
}

View File

@@ -0,0 +1,220 @@
package manager_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
"sync"
"testing"
"time"
)
func TestInstanceTimeoutLogic(t *testing.T) {
testManager := createTestManager()
defer testManager.Shutdown()
idleTimeout := 1 // 1 minute
inst := createInstanceWithTimeout(t, testManager, "timeout-test", "/path/to/model.gguf", &idleTimeout)
// Test timeout logic with mock time provider
mockTime := NewMockTimeProvider(time.Now())
inst.SetTimeProvider(mockTime)
// Set instance to running state so timeout logic can work
inst.SetStatus(instance.Running)
defer inst.SetStatus(instance.Stopped)
// Update last request time
inst.UpdateLastRequestTime()
// Initially should not timeout (just updated)
if inst.ShouldTimeout() {
t.Error("Instance should not timeout immediately after request")
}
// Advance time to trigger timeout
mockTime.SetTime(time.Now().Add(2 * time.Minute))
// Now it should timeout
if !inst.ShouldTimeout() {
t.Error("Instance should timeout after idle period")
}
}
func TestInstanceWithoutTimeoutNeverExpires(t *testing.T) {
testManager := createTestManager()
defer testManager.Shutdown()
noTimeoutInst := createInstanceWithTimeout(t, testManager, "no-timeout-test", "/path/to/model.gguf", nil)
mockTime := NewMockTimeProvider(time.Now())
noTimeoutInst.SetTimeProvider(mockTime)
noTimeoutInst.SetStatus(instance.Running)
defer noTimeoutInst.SetStatus(instance.Stopped)
noTimeoutInst.UpdateLastRequestTime()
// Advance time significantly
mockTime.SetTime(mockTime.Now().Add(24 * time.Hour))
// Even with time advanced, should not timeout
if noTimeoutInst.ShouldTimeout() {
t.Error("Instance without timeout configuration should never timeout")
}
}
func TestEvictLRUInstance_Success(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
// Create 3 instances with idle timeout enabled (value doesn't matter for LRU logic)
validTimeout := 1
inst1 := createInstanceWithTimeout(t, manager, "instance-1", "/path/to/model1.gguf", &validTimeout)
inst2 := createInstanceWithTimeout(t, manager, "instance-2", "/path/to/model2.gguf", &validTimeout)
inst3 := createInstanceWithTimeout(t, manager, "instance-3", "/path/to/model3.gguf", &validTimeout)
// Set up mock time and set instances to running
mockTime := NewMockTimeProvider(time.Now())
inst1.SetTimeProvider(mockTime)
inst2.SetTimeProvider(mockTime)
inst3.SetTimeProvider(mockTime)
inst1.SetStatus(instance.Running)
inst2.SetStatus(instance.Running)
inst3.SetStatus(instance.Running)
defer func() {
// Clean up - ensure all instances are stopped
for _, inst := range []*instance.Instance{inst1, inst2, inst3} {
if inst.IsRunning() {
inst.SetStatus(instance.Stopped)
}
}
}()
// Set different last request times (oldest to newest)
// inst1: oldest (will be evicted)
inst1.UpdateLastRequestTime()
mockTime.SetTime(mockTime.Now().Add(1 * time.Minute))
inst2.UpdateLastRequestTime()
mockTime.SetTime(mockTime.Now().Add(1 * time.Minute))
inst3.UpdateLastRequestTime()
// Evict LRU instance (should be inst1)
if err := manager.EvictLRUInstance(); err != nil {
t.Fatalf("EvictLRUInstance failed: %v", err)
}
// Verify inst1 is stopped
if inst1.IsRunning() {
t.Error("Expected instance-1 to be stopped after eviction")
}
// Verify inst2 and inst3 are still running
if !inst2.IsRunning() {
t.Error("Expected instance-2 to still be running")
}
if !inst3.IsRunning() {
t.Error("Expected instance-3 to still be running")
}
}
func TestEvictLRUInstance_NoRunningInstances(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
err := manager.EvictLRUInstance()
if err == nil {
t.Error("Expected error when no running instances exist")
}
if err.Error() != "failed to find lru instance" {
t.Errorf("Expected 'failed to find lru instance' error, got: %v", err)
}
}
func TestEvictLRUInstance_OnlyEvictsTimeoutEnabledInstances(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
// Create mix of instances: some with timeout enabled, some disabled
// Only timeout-enabled instances should be eligible for eviction
validTimeout := 1
zeroTimeout := 0
instWithTimeout := createInstanceWithTimeout(t, manager, "with-timeout", "/path/to/model-with-timeout.gguf", &validTimeout)
instNoTimeout1 := createInstanceWithTimeout(t, manager, "no-timeout-1", "/path/to/model-no-timeout1.gguf", &zeroTimeout)
instNoTimeout2 := createInstanceWithTimeout(t, manager, "no-timeout-2", "/path/to/model-no-timeout2.gguf", nil)
// Set all instances to running
instances := []*instance.Instance{instWithTimeout, instNoTimeout1, instNoTimeout2}
for _, inst := range instances {
inst.SetStatus(instance.Running)
inst.UpdateLastRequestTime()
}
defer func() {
// Reset instances to stopped to avoid shutdown panics
for _, inst := range instances {
if inst.IsRunning() {
inst.SetStatus(instance.Stopped)
}
}
}()
// Evict LRU instance - should only consider the one with timeout
err := manager.EvictLRUInstance()
if err != nil {
t.Fatalf("EvictLRUInstance failed: %v", err)
}
// Verify only the instance with timeout was evicted
if instWithTimeout.IsRunning() {
t.Error("Expected with-timeout instance to be stopped after eviction")
}
if !instNoTimeout1.IsRunning() {
t.Error("Expected no-timeout-1 instance to still be running")
}
if !instNoTimeout2.IsRunning() {
t.Error("Expected no-timeout-2 instance to still be running")
}
}
// Helper function to create instances with different timeout configurations
func createInstanceWithTimeout(t *testing.T, manager manager.InstanceManager, name, model string, timeout *int) *instance.Instance {
t.Helper()
options := &instance.Options{
IdleTimeout: timeout,
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: model,
},
},
}
inst, err := manager.CreateInstance(name, options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
return inst
}
// Helper for timeout tests
type MockTimeProvider struct {
currentTime time.Time
mu sync.RWMutex
}
func NewMockTimeProvider(t time.Time) *MockTimeProvider {
return &MockTimeProvider{currentTime: t}
}
func (m *MockTimeProvider) Now() time.Time {
m.mu.RLock()
defer m.mu.RUnlock()
return m.currentTime
}
func (m *MockTimeProvider) SetTime(t time.Time) {
m.mu.Lock()
defer m.mu.Unlock()
m.currentTime = t
}

View File

@@ -1,296 +1,303 @@
package manager
import (
"encoding/json"
"context"
"fmt"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"log"
"os"
"path/filepath"
"strings"
"sync"
"time"
)
// InstanceManager defines the interface for managing instances of the llama server.
type InstanceManager interface {
ListInstances() ([]*instance.Process, error)
CreateInstance(name string, options *instance.CreateInstanceOptions) (*instance.Process, error)
GetInstance(name string) (*instance.Process, error)
UpdateInstance(name string, options *instance.CreateInstanceOptions) (*instance.Process, error)
ListInstances() ([]*instance.Instance, error)
CreateInstance(name string, options *instance.Options) (*instance.Instance, error)
GetInstance(name string) (*instance.Instance, error)
UpdateInstance(name string, options *instance.Options) (*instance.Instance, error)
DeleteInstance(name string) error
StartInstance(name string) (*instance.Process, error)
StartInstance(name string) (*instance.Instance, error)
IsMaxRunningInstancesReached() bool
StopInstance(name string) (*instance.Process, error)
StopInstance(name string) (*instance.Instance, error)
EvictLRUInstance() error
RestartInstance(name string) (*instance.Process, error)
GetInstanceLogs(name string) (string, error)
RestartInstance(name string) (*instance.Instance, error)
GetInstanceLogs(name string, numLines int) (string, error)
Shutdown()
}
type instanceManager struct {
mu sync.RWMutex
instances map[string]*instance.Process
runningInstances map[string]struct{}
ports map[int]bool
instancesConfig config.InstancesConfig
// Components (each with own synchronization)
registry *instanceRegistry
ports *portAllocator
persistence *instancePersister
remote *remoteManager
lifecycle *lifecycleManager
// Timeout checker
timeoutChecker *time.Ticker
shutdownChan chan struct{}
shutdownDone chan struct{}
isShutdown bool
// Configuration
globalConfig *config.AppConfig
// Synchronization
instanceLocks sync.Map // map[string]*sync.Mutex - per-instance locks for concurrent operations
shutdownOnce sync.Once
}
// NewInstanceManager creates a new instance of InstanceManager.
func NewInstanceManager(instancesConfig config.InstancesConfig) InstanceManager {
if instancesConfig.TimeoutCheckInterval <= 0 {
instancesConfig.TimeoutCheckInterval = 5 // Default to 5 minutes if not set
}
im := &instanceManager{
instances: make(map[string]*instance.Process),
runningInstances: make(map[string]struct{}),
ports: make(map[int]bool),
instancesConfig: instancesConfig,
// New creates a new instance of InstanceManager.
func New(globalConfig *config.AppConfig) InstanceManager {
timeoutChecker: time.NewTicker(time.Duration(instancesConfig.TimeoutCheckInterval) * time.Minute),
shutdownChan: make(chan struct{}),
shutdownDone: make(chan struct{}),
if globalConfig.Instances.TimeoutCheckInterval <= 0 {
globalConfig.Instances.TimeoutCheckInterval = 5 // Default to 5 minutes if not set
}
// Initialize components
registry := newInstanceRegistry()
// Initialize port allocator
portRange := globalConfig.Instances.PortRange
ports, err := newPortAllocator(portRange[0], portRange[1])
if err != nil {
log.Fatalf("Failed to create port allocator: %v", err)
}
// Initialize persistence
persistence, err := newInstancePersister(globalConfig.Instances.InstancesDir)
if err != nil {
log.Fatalf("Failed to create instance persister: %v", err)
}
// Initialize remote manager
remote := newRemoteManager(globalConfig.Nodes, 30*time.Second)
// Create manager instance
im := &instanceManager{
registry: registry,
ports: ports,
persistence: persistence,
remote: remote,
globalConfig: globalConfig,
}
// Initialize lifecycle manager (needs reference to manager for Stop/Evict operations)
checkInterval := time.Duration(globalConfig.Instances.TimeoutCheckInterval) * time.Minute
im.lifecycle = newLifecycleManager(registry, im, checkInterval, true)
// Load existing instances from disk
if err := im.loadInstances(); err != nil {
log.Printf("Error loading instances: %v", err)
}
// Start the timeout checker goroutine after initialization is complete
go func() {
defer close(im.shutdownDone)
for {
select {
case <-im.timeoutChecker.C:
im.checkAllTimeouts()
case <-im.shutdownChan:
return // Exit goroutine on shutdown
}
}
}()
// Start the lifecycle manager
im.lifecycle.start()
return im
}
func (im *instanceManager) getNextAvailablePort() (int, error) {
portRange := im.instancesConfig.PortRange
for port := portRange[0]; port <= portRange[1]; port++ {
if !im.ports[port] {
im.ports[port] = true
return port, nil
}
}
return 0, fmt.Errorf("no available ports in the specified range")
}
// persistInstance saves an instance to its JSON file
func (im *instanceManager) persistInstance(instance *instance.Process) error {
if im.instancesConfig.InstancesDir == "" {
return nil // Persistence disabled
}
instancePath := filepath.Join(im.instancesConfig.InstancesDir, instance.Name+".json")
tempPath := instancePath + ".tmp"
// Serialize instance to JSON
jsonData, err := json.MarshalIndent(instance, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal instance %s: %w", instance.Name, err)
}
// Write to temporary file first
if err := os.WriteFile(tempPath, jsonData, 0644); err != nil {
return fmt.Errorf("failed to write temp file for instance %s: %w", instance.Name, err)
}
// Atomic rename
if err := os.Rename(tempPath, instancePath); err != nil {
os.Remove(tempPath) // Clean up temp file
return fmt.Errorf("failed to rename temp file for instance %s: %w", instance.Name, err)
}
return nil
// persistInstance saves an instance using the persistence component
func (im *instanceManager) persistInstance(inst *instance.Instance) error {
return im.persistence.save(inst)
}
func (im *instanceManager) Shutdown() {
im.mu.Lock()
im.shutdownOnce.Do(func() {
// 1. Stop lifecycle manager (stops timeout checker)
im.lifecycle.stop()
// Check if already shutdown
if im.isShutdown {
im.mu.Unlock()
return
}
im.isShutdown = true
// 2. Get running instances (no lock needed - registry handles it)
running := im.registry.listRunning()
// Signal the timeout checker to stop
close(im.shutdownChan)
// Create a list of running instances to stop
var runningInstances []*instance.Process
var runningNames []string
for name, inst := range im.instances {
if inst.IsRunning() {
runningInstances = append(runningInstances, inst)
runningNames = append(runningNames, name)
}
}
// Release lock before stopping instances to avoid deadlock
im.mu.Unlock()
// Wait for the timeout checker goroutine to actually stop
<-im.shutdownDone
// Now stop the ticker
if im.timeoutChecker != nil {
im.timeoutChecker.Stop()
}
// Stop instances without holding the manager lock
var wg sync.WaitGroup
wg.Add(len(runningInstances))
for i, inst := range runningInstances {
go func(name string, inst *instance.Process) {
defer wg.Done()
fmt.Printf("Stopping instance %s...\n", name)
// Attempt to stop the instance gracefully
if err := inst.Stop(); err != nil {
fmt.Printf("Error stopping instance %s: %v\n", name, err)
// 3. Stop local instances concurrently
var wg sync.WaitGroup
for _, inst := range running {
if inst.IsRemote() {
continue // Skip remote instances
}
}(runningNames[i], inst)
}
wg.Wait()
fmt.Println("All instances stopped.")
wg.Add(1)
go func(inst *instance.Instance) {
defer wg.Done()
fmt.Printf("Stopping instance %s...\n", inst.Name)
if err := inst.Stop(); err != nil {
fmt.Printf("Error stopping instance %s: %v\n", inst.Name, err)
}
}(inst)
}
wg.Wait()
fmt.Println("All instances stopped.")
})
}
// loadInstances restores all instances from disk
// loadInstances restores all instances from disk using the persistence component
func (im *instanceManager) loadInstances() error {
if im.instancesConfig.InstancesDir == "" {
return nil // Persistence disabled
}
// Check if instances directory exists
if _, err := os.Stat(im.instancesConfig.InstancesDir); os.IsNotExist(err) {
return nil // No instances directory, start fresh
}
// Read all JSON files from instances directory
files, err := os.ReadDir(im.instancesConfig.InstancesDir)
// Load all instances from persistence
instances, err := im.persistence.loadAll()
if err != nil {
return fmt.Errorf("failed to read instances directory: %w", err)
return fmt.Errorf("failed to load instances: %w", err)
}
loadedCount := 0
for _, file := range files {
if file.IsDir() || !strings.HasSuffix(file.Name(), ".json") {
if len(instances) == 0 {
return nil
}
// Process each loaded instance
for _, persistedInst := range instances {
if err := im.loadInstance(persistedInst); err != nil {
log.Printf("Failed to load instance %s: %v", persistedInst.Name, err)
continue
}
instanceName := strings.TrimSuffix(file.Name(), ".json")
instancePath := filepath.Join(im.instancesConfig.InstancesDir, file.Name())
if err := im.loadInstance(instanceName, instancePath); err != nil {
log.Printf("Failed to load instance %s: %v", instanceName, err)
continue
}
loadedCount++
}
if loadedCount > 0 {
log.Printf("Loaded %d instances from persistence", loadedCount)
// Auto-start instances that have auto-restart enabled
go im.autoStartInstances()
}
log.Printf("Loaded %d instances from persistence", len(instances))
// Auto-start instances that have auto-restart enabled
go im.autoStartInstances()
return nil
}
// loadInstance loads a single instance from its JSON file
func (im *instanceManager) loadInstance(name, path string) error {
data, err := os.ReadFile(path)
if err != nil {
return fmt.Errorf("failed to read instance file: %w", err)
// loadInstance loads a single persisted instance and adds it to the registry
func (im *instanceManager) loadInstance(persistedInst *instance.Instance) error {
name := persistedInst.Name
options := persistedInst.GetOptions()
// Check if this is a remote instance (local node not in the Nodes set)
var isRemote bool
var nodeName string
if options != nil {
if _, isLocal := options.Nodes[im.globalConfig.LocalNode]; !isLocal && len(options.Nodes) > 0 {
// Get the first node from the set
for node := range options.Nodes {
nodeName = node
isRemote = true
break
}
}
}
var persistedInstance instance.Process
if err := json.Unmarshal(data, &persistedInstance); err != nil {
return fmt.Errorf("failed to unmarshal instance: %w", err)
}
// Validate the instance name matches the filename
if persistedInstance.Name != name {
return fmt.Errorf("instance name mismatch: file=%s, instance.Name=%s", name, persistedInstance.Name)
}
statusCallback := func(oldStatus, newStatus instance.InstanceStatus) {
im.onStatusChange(persistedInstance.Name, oldStatus, newStatus)
var statusCallback func(oldStatus, newStatus instance.Status)
if !isRemote {
// Only set status callback for local instances
statusCallback = func(oldStatus, newStatus instance.Status) {
im.onStatusChange(name, oldStatus, newStatus)
}
}
// Create new inst using NewInstance (handles validation, defaults, setup)
inst := instance.NewInstance(name, &im.instancesConfig, persistedInstance.GetOptions(), statusCallback)
inst := instance.New(name, im.globalConfig, options, statusCallback)
// Restore persisted fields that NewInstance doesn't set
inst.Created = persistedInstance.Created
inst.SetStatus(persistedInstance.Status)
inst.Created = persistedInst.Created
inst.SetStatus(persistedInst.GetStatus())
// Check for port conflicts and add to maps
if inst.GetPort() > 0 {
port := inst.GetPort()
if im.ports[port] {
return fmt.Errorf("port conflict: instance %s wants port %d which is already in use", name, port)
// Handle remote instance mapping
if isRemote {
// Map instance to node in remote manager
if err := im.remote.setInstanceNode(name, nodeName); err != nil {
return fmt.Errorf("failed to set instance node: %w", err)
}
} else {
// Allocate port for local instances
if inst.GetPort() > 0 {
port := inst.GetPort()
if err := im.ports.allocateSpecific(port, name); err != nil {
return fmt.Errorf("port conflict: instance %s wants port %d which is already in use: %w", name, port, err)
}
}
im.ports[port] = true
}
im.instances[name] = inst
// Add instance to registry
if err := im.registry.add(inst); err != nil {
return fmt.Errorf("failed to add instance to registry: %w", err)
}
return nil
}
// autoStartInstances starts instances that were running when persisted and have auto-restart enabled
// For instances with auto-restart disabled, it sets their status to Stopped
func (im *instanceManager) autoStartInstances() {
im.mu.RLock()
var instancesToStart []*instance.Process
for _, inst := range im.instances {
instances := im.registry.list()
var instancesToStart []*instance.Instance
var instancesToStop []*instance.Instance
for _, inst := range instances {
if inst.IsRunning() && // Was running when persisted
inst.GetOptions() != nil &&
inst.GetOptions().AutoRestart != nil &&
*inst.GetOptions().AutoRestart {
instancesToStart = append(instancesToStart, inst)
inst.GetOptions().AutoRestart != nil {
if *inst.GetOptions().AutoRestart {
instancesToStart = append(instancesToStart, inst)
} else {
// Instance was running but auto-restart is disabled, mark as stopped
instancesToStop = append(instancesToStop, inst)
}
}
}
im.mu.RUnlock()
// Stop instances that have auto-restart disabled
for _, inst := range instancesToStop {
log.Printf("Instance %s was running but auto-restart is disabled, setting status to stopped", inst.Name)
inst.SetStatus(instance.Stopped)
im.registry.markStopped(inst.Name)
}
// Start instances that have auto-restart enabled
for _, inst := range instancesToStart {
log.Printf("Auto-starting instance %s", inst.Name)
// Reset running state before starting (since Start() expects stopped instance)
inst.SetStatus(instance.Stopped)
if err := inst.Start(); err != nil {
log.Printf("Failed to auto-start instance %s: %v", inst.Name, err)
im.registry.markStopped(inst.Name)
// Check if this is a remote instance
if node, exists := im.remote.getNodeForInstance(inst.Name); exists && node != nil {
// Remote instance - use remote manager with context
ctx := context.Background()
if _, err := im.remote.startInstance(ctx, node, inst.Name); err != nil {
log.Printf("Failed to auto-start remote instance %s: %v", inst.Name, err)
}
} else {
// Local instance - call Start() directly
if err := inst.Start(); err != nil {
log.Printf("Failed to auto-start instance %s: %v", inst.Name, err)
}
}
}
}
func (im *instanceManager) onStatusChange(name string, oldStatus, newStatus instance.InstanceStatus) {
im.mu.Lock()
defer im.mu.Unlock()
func (im *instanceManager) onStatusChange(name string, oldStatus, newStatus instance.Status) {
if newStatus == instance.Running {
im.runningInstances[name] = struct{}{}
im.registry.markRunning(name)
} else {
delete(im.runningInstances, name)
im.registry.markStopped(name)
}
}
// getNodeForInstance returns the node configuration for a remote instance
// Returns nil if the instance is not remote or the node is not found
func (im *instanceManager) getNodeForInstance(inst *instance.Instance) *config.NodeConfig {
if !inst.IsRemote() {
return nil
}
// Check if we have a node mapping in remote manager
if nodeConfig, exists := im.remote.getNodeForInstance(inst.Name); exists {
return nodeConfig
}
return nil
}
// lockInstance returns the lock for a specific instance, creating one if needed.
// This allows concurrent operations on different instances while preventing
// concurrent operations on the same instance.
func (im *instanceManager) lockInstance(name string) *sync.Mutex {
lock, _ := im.instanceLocks.LoadOrStore(name, &sync.Mutex{})
return lock.(*sync.Mutex)
}
// unlockAndCleanup unlocks the instance lock and removes it from the map.
// This should only be called when deleting an instance to prevent memory leaks.
func (im *instanceManager) unlockAndCleanup(name string) {
if lock, ok := im.instanceLocks.Load(name); ok {
lock.(*sync.Mutex).Unlock()
im.instanceLocks.Delete(name)
}
}

View File

@@ -3,61 +3,28 @@ package manager_test
import (
"fmt"
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
"os"
"path/filepath"
"strings"
"sync"
"testing"
)
func TestNewInstanceManager(t *testing.T) {
cfg := config.InstancesConfig{
PortRange: [2]int{8000, 9000},
LogsDir: "/tmp/test",
MaxInstances: 5,
LlamaExecutable: "llama-server",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
TimeoutCheckInterval: 5,
}
mgr := manager.NewInstanceManager(cfg)
if mgr == nil {
t.Fatal("NewInstanceManager returned nil")
}
// Test initial state
instances, err := mgr.ListInstances()
if err != nil {
t.Fatalf("ListInstances failed: %v", err)
}
if len(instances) != 0 {
t.Errorf("Expected empty instance list, got %d instances", len(instances))
}
}
func TestPersistence(t *testing.T) {
func TestManager_PersistsAndLoadsInstances(t *testing.T) {
tempDir := t.TempDir()
appConfig := createTestAppConfig(tempDir)
cfg := config.InstancesConfig{
PortRange: [2]int{8000, 9000},
InstancesDir: tempDir,
MaxInstances: 10,
TimeoutCheckInterval: 5,
}
// Test instance persistence on creation
manager1 := manager.NewInstanceManager(cfg)
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
// Create instance and check file was created
manager1 := manager.New(appConfig)
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
@@ -66,14 +33,13 @@ func TestPersistence(t *testing.T) {
t.Fatalf("CreateInstance failed: %v", err)
}
// Check that JSON file was created
expectedPath := filepath.Join(tempDir, "test-instance.json")
if _, err := os.Stat(expectedPath); os.IsNotExist(err) {
t.Errorf("Expected persistence file %s to exist", expectedPath)
}
// Test loading instances from disk
manager2 := manager.NewInstanceManager(cfg)
// Load instances from disk
manager2 := manager.New(appConfig)
instances, err := manager2.ListInstances()
if err != nil {
t.Fatalf("ListInstances failed: %v", err)
@@ -84,15 +50,31 @@ func TestPersistence(t *testing.T) {
if instances[0].Name != "test-instance" {
t.Errorf("Expected loaded instance name 'test-instance', got %q", instances[0].Name)
}
}
// Test port map populated from loaded instances (port conflict should be detected)
_, err = manager2.CreateInstance("new-instance", options) // Same port
if err == nil || !strings.Contains(err.Error(), "port") {
t.Errorf("Expected port conflict error, got: %v", err)
func TestDeleteInstance_RemovesPersistenceFile(t *testing.T) {
tempDir := t.TempDir()
appConfig := createTestAppConfig(tempDir)
mgr := manager.New(appConfig)
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
// Test file deletion on instance deletion
err = manager2.DeleteInstance("test-instance")
_, err := mgr.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
expectedPath := filepath.Join(tempDir, "test-instance.json")
err = mgr.DeleteInstance("test-instance")
if err != nil {
t.Fatalf("DeleteInstance failed: %v", err)
}
@@ -115,10 +97,12 @@ func TestConcurrentAccess(t *testing.T) {
wg.Add(1)
go func(index int) {
defer wg.Done()
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
instanceName := fmt.Sprintf("concurrent-test-%d", index)
@@ -148,39 +132,58 @@ func TestConcurrentAccess(t *testing.T) {
}
}
func TestShutdown(t *testing.T) {
mgr := createTestManager()
// Create test instance
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
// Helper functions for test configuration
func createTestAppConfig(instancesDir string) *config.AppConfig {
// Use 'sleep' as a test command instead of 'llama-server'
// This allows tests to run in CI environments without requiring actual LLM binaries
// The sleep command will be invoked with model paths and other args, which it ignores
return &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "sleep",
},
MLX: config.BackendSettings{
Command: "sleep",
},
},
Instances: config.InstancesConfig{
PortRange: [2]int{8000, 9000},
InstancesDir: instancesDir,
LogsDir: instancesDir,
MaxInstances: 10,
MaxRunningInstances: 10,
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
TimeoutCheckInterval: 5,
},
LocalNode: "main",
Nodes: map[string]config.NodeConfig{},
}
_, err := mgr.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
// Shutdown should not panic
mgr.Shutdown()
// Multiple shutdowns should not panic
mgr.Shutdown()
}
// Helper function to create a test manager with standard config
func createTestManager() manager.InstanceManager {
cfg := config.InstancesConfig{
PortRange: [2]int{8000, 9000},
LogsDir: "/tmp/test",
MaxInstances: 10,
LlamaExecutable: "llama-server",
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
TimeoutCheckInterval: 5,
appConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "sleep",
},
MLX: config.BackendSettings{
Command: "sleep",
},
},
Instances: config.InstancesConfig{
PortRange: [2]int{8000, 9000},
LogsDir: "/tmp/test",
MaxInstances: 10,
MaxRunningInstances: 10,
DefaultAutoRestart: true,
DefaultMaxRestarts: 3,
DefaultRestartDelay: 5,
TimeoutCheckInterval: 5,
},
LocalNode: "main",
Nodes: map[string]config.NodeConfig{},
}
return manager.NewInstanceManager(cfg)
return manager.New(appConfig)
}

View File

@@ -1,159 +1,350 @@
package manager
import (
"context"
"fmt"
"llamactl/pkg/backends"
"llamactl/pkg/instance"
"llamactl/pkg/validation"
"os"
"path/filepath"
"log"
)
type MaxRunningInstancesError error
// ListInstances returns a list of all instances managed by the instance manager.
func (im *instanceManager) ListInstances() ([]*instance.Process, error) {
im.mu.RLock()
defer im.mu.RUnlock()
instances := make([]*instance.Process, 0, len(im.instances))
for _, inst := range im.instances {
instances = append(instances, inst)
// updateLocalInstanceFromRemote updates the local stub instance with data from the remote instance
func (im *instanceManager) updateLocalInstanceFromRemote(localInst *instance.Instance, remoteInst *instance.Instance) {
if localInst == nil || remoteInst == nil {
return
}
remoteOptions := remoteInst.GetOptions()
if remoteOptions == nil {
return
}
// Update the local instance with all remote data
localInst.SetOptions(remoteOptions)
localInst.SetStatus(remoteInst.GetStatus())
localInst.Created = remoteInst.Created
}
// ListInstances returns a list of all instances managed by the instance manager.
// For remote instances, this fetches the live state from remote nodes and updates local stubs.
func (im *instanceManager) ListInstances() ([]*instance.Instance, error) {
instances := im.registry.list()
// Update remote instances with live state
ctx := context.Background()
for _, inst := range instances {
if node := im.getNodeForInstance(inst); node != nil {
remoteInst, err := im.remote.getInstance(ctx, node, inst.Name)
if err != nil {
// Log error but continue with stale data
// Don't fail the entire list operation due to one remote failure
continue
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
}
}
return instances, nil
}
// CreateInstance creates a new instance with the given options and returns it.
// The instance is initially in a "stopped" state.
func (im *instanceManager) CreateInstance(name string, options *instance.CreateInstanceOptions) (*instance.Process, error) {
func (im *instanceManager) CreateInstance(name string, options *instance.Options) (*instance.Instance, error) {
if options == nil {
return nil, fmt.Errorf("instance options cannot be nil")
}
name, err := validation.ValidateInstanceName(name)
err := options.BackendOptions.ValidateInstanceOptions()
if err != nil {
return nil, err
}
err = validation.ValidateInstanceOptions(options)
if err != nil {
return nil, err
}
im.mu.Lock()
defer im.mu.Unlock()
// Check max instances limit after acquiring the lock
if len(im.instances) >= im.instancesConfig.MaxInstances && im.instancesConfig.MaxInstances != -1 {
return nil, fmt.Errorf("maximum number of instances (%d) reached", im.instancesConfig.MaxInstances)
}
// Check if instance with this name already exists
if im.instances[name] != nil {
// Check if instance with this name already exists (must be globally unique)
if _, exists := im.registry.get(name); exists {
return nil, fmt.Errorf("instance with name %s already exists", name)
}
// Assign and validate port for backend-specific options
if err := im.assignAndValidatePort(options); err != nil {
return nil, err
// Check if this is a remote instance (local node not in the Nodes set)
if _, isLocal := options.Nodes[im.globalConfig.LocalNode]; !isLocal && len(options.Nodes) > 0 {
// Get the first node from the set
var nodeName string
for node := range options.Nodes {
nodeName = node
break
}
// Create the remote instance on the remote node
ctx := context.Background()
nodeConfig, exists := im.remote.getNodeForInstance(nodeName)
if !exists {
// Try to set the node if it doesn't exist yet
if err := im.remote.setInstanceNode(name, nodeName); err != nil {
return nil, fmt.Errorf("node %s not found", nodeName)
}
nodeConfig, _ = im.remote.getNodeForInstance(name)
}
remoteInst, err := im.remote.createInstance(ctx, nodeConfig, name, options)
if err != nil {
return nil, err
}
// Create a local stub that preserves the Nodes field for tracking
// We keep the original options (with Nodes) so IsRemote() works correctly
inst := instance.New(name, im.globalConfig, options, nil)
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
// Map instance to node
if err := im.remote.setInstanceNode(name, nodeName); err != nil {
return nil, fmt.Errorf("failed to map instance to node: %w", err)
}
// Add to registry (doesn't count towards local limits)
if err := im.registry.add(inst); err != nil {
return nil, fmt.Errorf("failed to add instance to registry: %w", err)
}
// Persist the remote instance locally for tracking across restarts
if err := im.persistInstance(inst); err != nil {
// Rollback: remove from registry
im.registry.remove(name)
return nil, fmt.Errorf("failed to persist remote instance %s: %w", name, err)
}
return inst, nil
}
statusCallback := func(oldStatus, newStatus instance.InstanceStatus) {
// Local instance creation
// Check max instances limit for local instances only
totalInstances := im.registry.count()
remoteCount := 0
for _, inst := range im.registry.list() {
if inst.IsRemote() {
remoteCount++
}
}
localInstanceCount := totalInstances - remoteCount
if localInstanceCount >= im.globalConfig.Instances.MaxInstances && im.globalConfig.Instances.MaxInstances != -1 {
return nil, fmt.Errorf("maximum number of instances (%d) reached", im.globalConfig.Instances.MaxInstances)
}
// Assign and validate port for backend-specific options
currentPort := im.getPortFromOptions(options)
var allocatedPort int
if currentPort == 0 {
// Allocate a port if not specified
allocatedPort, err = im.ports.allocate(name)
if err != nil {
return nil, fmt.Errorf("failed to allocate port: %w", err)
}
im.setPortInOptions(options, allocatedPort)
} else {
// Use the specified port
if err := im.ports.allocateSpecific(currentPort, name); err != nil {
return nil, fmt.Errorf("port %d is already in use: %w", currentPort, err)
}
allocatedPort = currentPort
}
statusCallback := func(oldStatus, newStatus instance.Status) {
im.onStatusChange(name, oldStatus, newStatus)
}
inst := instance.NewInstance(name, &im.instancesConfig, options, statusCallback)
im.instances[inst.Name] = inst
inst := instance.New(name, im.globalConfig, options, statusCallback)
// Add to registry
if err := im.registry.add(inst); err != nil {
// Rollback: release port
im.ports.release(allocatedPort)
return nil, fmt.Errorf("failed to add instance to registry: %w", err)
}
// Persist instance (best-effort, don't fail if persistence fails)
if err := im.persistInstance(inst); err != nil {
return nil, fmt.Errorf("failed to persist instance %s: %w", name, err)
log.Printf("Warning: failed to persist instance %s: %v", name, err)
}
return inst, nil
}
// GetInstance retrieves an instance by its name.
func (im *instanceManager) GetInstance(name string) (*instance.Process, error) {
im.mu.RLock()
defer im.mu.RUnlock()
instance, exists := im.instances[name]
// For remote instances, this fetches the live state from the remote node and updates the local stub.
func (im *instanceManager) GetInstance(name string) (*instance.Instance, error) {
inst, exists := im.registry.get(name)
if !exists {
return nil, fmt.Errorf("instance with name %s not found", name)
}
return instance, nil
// Check if instance is remote and fetch live state
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
remoteInst, err := im.remote.getInstance(ctx, node, name)
if err != nil {
return nil, err
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
// Return the local stub (preserving Nodes field)
return inst, nil
}
return inst, nil
}
// UpdateInstance updates the options of an existing instance and returns it.
// If the instance is running, it will be restarted to apply the new options.
func (im *instanceManager) UpdateInstance(name string, options *instance.CreateInstanceOptions) (*instance.Process, error) {
im.mu.RLock()
instance, exists := im.instances[name]
im.mu.RUnlock()
func (im *instanceManager) UpdateInstance(name string, options *instance.Options) (*instance.Instance, error) {
inst, exists := im.registry.get(name)
if !exists {
return nil, fmt.Errorf("instance with name %s not found", name)
}
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
remoteInst, err := im.remote.updateInstance(ctx, node, name, options)
if err != nil {
return nil, err
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
// Persist the updated remote instance locally
if err := im.persistInstance(inst); err != nil {
return nil, fmt.Errorf("failed to persist updated remote instance %s: %w", name, err)
}
return inst, nil
}
if options == nil {
return nil, fmt.Errorf("instance options cannot be nil")
}
err := validation.ValidateInstanceOptions(options)
err := options.BackendOptions.ValidateInstanceOptions()
if err != nil {
return nil, err
}
// Lock this specific instance only
lock := im.lockInstance(name)
lock.Lock()
defer lock.Unlock()
// Handle port changes
oldPort := inst.GetPort()
newPort := im.getPortFromOptions(options)
var allocatedPort int
if newPort != oldPort {
// Port is changing - need to release old and allocate new
if newPort == 0 {
// Auto-allocate new port
allocatedPort, err = im.ports.allocate(name)
if err != nil {
return nil, fmt.Errorf("failed to allocate new port: %w", err)
}
im.setPortInOptions(options, allocatedPort)
} else {
// Use specified port
if err := im.ports.allocateSpecific(newPort, name); err != nil {
return nil, fmt.Errorf("failed to allocate port %d: %w", newPort, err)
}
allocatedPort = newPort
}
// Release old port
if oldPort > 0 {
if err := im.ports.release(oldPort); err != nil {
// Rollback new port allocation
im.ports.release(allocatedPort)
return nil, fmt.Errorf("failed to release old port %d: %w", oldPort, err)
}
}
}
// Check if instance is running before updating options
wasRunning := instance.IsRunning()
wasRunning := inst.IsRunning()
// If the instance is running, stop it first
if wasRunning {
if err := instance.Stop(); err != nil {
if err := inst.Stop(); err != nil {
return nil, fmt.Errorf("failed to stop instance %s for update: %w", name, err)
}
}
// Now update the options while the instance is stopped
instance.SetOptions(options)
inst.SetOptions(options)
// If it was running before, start it again with the new options
if wasRunning {
if err := instance.Start(); err != nil {
if err := inst.Start(); err != nil {
return nil, fmt.Errorf("failed to start instance %s after update: %w", name, err)
}
}
im.mu.Lock()
defer im.mu.Unlock()
if err := im.persistInstance(instance); err != nil {
if err := im.persistInstance(inst); err != nil {
return nil, fmt.Errorf("failed to persist updated instance %s: %w", name, err)
}
return instance, nil
return inst, nil
}
// DeleteInstance removes stopped instance by its name.
func (im *instanceManager) DeleteInstance(name string) error {
im.mu.Lock()
defer im.mu.Unlock()
instance, exists := im.instances[name]
inst, exists := im.registry.get(name)
if !exists {
return fmt.Errorf("instance with name %s not found", name)
}
if instance.IsRunning() {
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
err := im.remote.deleteInstance(ctx, node, name)
if err != nil {
return err
}
// Clean up local tracking
im.remote.removeInstance(name)
im.registry.remove(name)
// Delete the instance's persistence file
if err := im.persistence.delete(name); err != nil {
return fmt.Errorf("failed to delete config file for remote instance %s: %w", name, err)
}
return nil
}
// Lock this specific instance and clean up the lock on completion
lock := im.lockInstance(name)
lock.Lock()
defer im.unlockAndCleanup(name)
if inst.IsRunning() {
return fmt.Errorf("instance with name %s is still running, stop it before deleting", name)
}
delete(im.ports, instance.GetPort())
delete(im.instances, name)
// Release port (use ReleaseByInstance for proper cleanup)
im.ports.releaseByInstance(name)
// Delete the instance's config file if persistence is enabled
instancePath := filepath.Join(im.instancesConfig.InstancesDir, instance.Name+".json")
if err := os.Remove(instancePath); err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to delete config file for instance %s: %w", instance.Name, err)
// Remove from registry
if err := im.registry.remove(name); err != nil {
return fmt.Errorf("failed to remove instance from registry: %w", err)
}
// Delete persistence file
if err := im.persistence.delete(name); err != nil {
return fmt.Errorf("failed to delete config file for instance %s: %w", name, err)
}
return nil
@@ -161,140 +352,186 @@ func (im *instanceManager) DeleteInstance(name string) error {
// StartInstance starts a stopped instance and returns it.
// If the instance is already running, it returns an error.
func (im *instanceManager) StartInstance(name string) (*instance.Process, error) {
im.mu.RLock()
instance, exists := im.instances[name]
maxRunningExceeded := len(im.runningInstances) >= im.instancesConfig.MaxRunningInstances && im.instancesConfig.MaxRunningInstances != -1
im.mu.RUnlock()
func (im *instanceManager) StartInstance(name string) (*instance.Instance, error) {
inst, exists := im.registry.get(name)
if !exists {
return nil, fmt.Errorf("instance with name %s not found", name)
}
if instance.IsRunning() {
return instance, fmt.Errorf("instance with name %s is already running", name)
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
remoteInst, err := im.remote.startInstance(ctx, node, name)
if err != nil {
return nil, err
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
return inst, nil
}
if maxRunningExceeded {
return nil, MaxRunningInstancesError(fmt.Errorf("maximum number of running instances (%d) reached", im.instancesConfig.MaxRunningInstances))
// Lock this specific instance only
lock := im.lockInstance(name)
lock.Lock()
defer lock.Unlock()
// Idempotent: if already running, just return success
if inst.IsRunning() {
return inst, nil
}
if err := instance.Start(); err != nil {
// Check max running instances limit for local instances only
if im.IsMaxRunningInstancesReached() {
return nil, MaxRunningInstancesError(fmt.Errorf("maximum number of running instances (%d) reached", im.globalConfig.Instances.MaxRunningInstances))
}
if err := inst.Start(); err != nil {
return nil, fmt.Errorf("failed to start instance %s: %w", name, err)
}
im.mu.Lock()
defer im.mu.Unlock()
err := im.persistInstance(instance)
if err != nil {
return nil, fmt.Errorf("failed to persist instance %s: %w", name, err)
// Persist instance (best-effort, don't fail if persistence fails)
if err := im.persistInstance(inst); err != nil {
log.Printf("Warning: failed to persist instance %s: %v", name, err)
}
return instance, nil
return inst, nil
}
func (im *instanceManager) IsMaxRunningInstancesReached() bool {
im.mu.RLock()
defer im.mu.RUnlock()
if im.instancesConfig.MaxRunningInstances != -1 && len(im.runningInstances) >= im.instancesConfig.MaxRunningInstances {
return true
if im.globalConfig.Instances.MaxRunningInstances == -1 {
return false
}
return false
// Count only local running instances (each node has its own limits)
localRunningCount := 0
for _, inst := range im.registry.listRunning() {
if !inst.IsRemote() {
localRunningCount++
}
}
return localRunningCount >= im.globalConfig.Instances.MaxRunningInstances
}
// StopInstance stops a running instance and returns it.
func (im *instanceManager) StopInstance(name string) (*instance.Process, error) {
im.mu.RLock()
instance, exists := im.instances[name]
im.mu.RUnlock()
func (im *instanceManager) StopInstance(name string) (*instance.Instance, error) {
inst, exists := im.registry.get(name)
if !exists {
return nil, fmt.Errorf("instance with name %s not found", name)
}
if !instance.IsRunning() {
return instance, fmt.Errorf("instance with name %s is already stopped", name)
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
remoteInst, err := im.remote.stopInstance(ctx, node, name)
if err != nil {
return nil, err
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
return inst, nil
}
if err := instance.Stop(); err != nil {
// Lock this specific instance only
lock := im.lockInstance(name)
lock.Lock()
defer lock.Unlock()
// Idempotent: if already stopped, just return success
if !inst.IsRunning() {
return inst, nil
}
if err := inst.Stop(); err != nil {
return nil, fmt.Errorf("failed to stop instance %s: %w", name, err)
}
im.mu.Lock()
defer im.mu.Unlock()
err := im.persistInstance(instance)
if err != nil {
return nil, fmt.Errorf("failed to persist instance %s: %w", name, err)
// Persist instance (best-effort, don't fail if persistence fails)
if err := im.persistInstance(inst); err != nil {
log.Printf("Warning: failed to persist instance %s: %v", name, err)
}
return instance, nil
return inst, nil
}
// RestartInstance stops and then starts an instance, returning the updated instance.
func (im *instanceManager) RestartInstance(name string) (*instance.Process, error) {
instance, err := im.StopInstance(name)
if err != nil {
return nil, err
func (im *instanceManager) RestartInstance(name string) (*instance.Instance, error) {
inst, exists := im.registry.get(name)
if !exists {
return nil, fmt.Errorf("instance with name %s not found", name)
}
return im.StartInstance(instance.Name)
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
remoteInst, err := im.remote.restartInstance(ctx, node, name)
if err != nil {
return nil, err
}
// Update the local stub with all remote data (preserving Nodes)
im.updateLocalInstanceFromRemote(inst, remoteInst)
return inst, nil
}
// Lock this specific instance for the entire restart operation to ensure atomicity
lock := im.lockInstance(name)
lock.Lock()
defer lock.Unlock()
// Stop the instance
if inst.IsRunning() {
if err := inst.Stop(); err != nil {
return nil, fmt.Errorf("failed to stop instance %s: %w", name, err)
}
}
// Start the instance
if err := inst.Start(); err != nil {
return nil, fmt.Errorf("failed to start instance %s: %w", name, err)
}
// Persist the restarted instance
if err := im.persistInstance(inst); err != nil {
log.Printf("Warning: failed to persist instance %s: %v", name, err)
}
return inst, nil
}
// GetInstanceLogs retrieves the logs for a specific instance by its name.
func (im *instanceManager) GetInstanceLogs(name string) (string, error) {
im.mu.RLock()
_, exists := im.instances[name]
im.mu.RUnlock()
func (im *instanceManager) GetInstanceLogs(name string, numLines int) (string, error) {
inst, exists := im.registry.get(name)
if !exists {
return "", fmt.Errorf("instance with name %s not found", name)
}
// TODO: Implement actual log retrieval logic
return fmt.Sprintf("Logs for instance %s", name), nil
// Check if instance is remote and delegate to remote operation
if node := im.getNodeForInstance(inst); node != nil {
ctx := context.Background()
return im.remote.getInstanceLogs(ctx, node, name, numLines)
}
// Get logs from the local instance
return inst.GetLogs(numLines)
}
// getPortFromOptions extracts the port from backend-specific options
func (im *instanceManager) getPortFromOptions(options *instance.CreateInstanceOptions) int {
switch options.BackendType {
case backends.BackendTypeLlamaCpp:
if options.LlamaServerOptions != nil {
return options.LlamaServerOptions.Port
}
}
return 0
func (im *instanceManager) getPortFromOptions(options *instance.Options) int {
return options.BackendOptions.GetPort()
}
// setPortInOptions sets the port in backend-specific options
func (im *instanceManager) setPortInOptions(options *instance.CreateInstanceOptions, port int) {
switch options.BackendType {
case backends.BackendTypeLlamaCpp:
if options.LlamaServerOptions != nil {
options.LlamaServerOptions.Port = port
}
}
func (im *instanceManager) setPortInOptions(options *instance.Options, port int) {
options.BackendOptions.SetPort(port)
}
// assignAndValidatePort assigns a port if not specified and validates it's not in use
func (im *instanceManager) assignAndValidatePort(options *instance.CreateInstanceOptions) error {
currentPort := im.getPortFromOptions(options)
if currentPort == 0 {
// Assign a port if not specified
port, err := im.getNextAvailablePort()
if err != nil {
return fmt.Errorf("failed to get next available port: %w", err)
}
im.setPortInOptions(options, port)
// Mark the port as used
im.ports[port] = true
} else {
// Validate the specified port
if _, exists := im.ports[currentPort]; exists {
return fmt.Errorf("port %d is already in use", currentPort)
}
// Mark the port as used
im.ports[currentPort] = true
}
return nil
// EvictLRUInstance finds and stops the least recently used running instance.
func (im *instanceManager) EvictLRUInstance() error {
return im.lifecycle.evictLRU()
}

View File

@@ -2,7 +2,6 @@ package manager_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
@@ -10,40 +9,14 @@ import (
"testing"
)
func TestCreateInstance_Success(t *testing.T) {
manager := createTestManager()
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
}
inst, err := manager.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
if inst.Name != "test-instance" {
t.Errorf("Expected instance name 'test-instance', got %q", inst.Name)
}
if inst.GetStatus() != instance.Stopped {
t.Error("New instance should not be running")
}
if inst.GetPort() != 8080 {
t.Errorf("Expected port 8080, got %d", inst.GetPort())
}
}
func TestCreateInstance_ValidationAndLimits(t *testing.T) {
// Test duplicate names
func TestCreateInstance_FailsWithDuplicateName(t *testing.T) {
mngr := createTestManager()
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
@@ -60,16 +33,35 @@ func TestCreateInstance_ValidationAndLimits(t *testing.T) {
if !strings.Contains(err.Error(), "already exists") {
t.Errorf("Expected duplicate name error, got: %v", err)
}
}
// Test max instances limit
cfg := config.InstancesConfig{
PortRange: [2]int{8000, 9000},
MaxInstances: 1, // Very low limit for testing
TimeoutCheckInterval: 5,
func TestCreateInstance_FailsWhenMaxInstancesReached(t *testing.T) {
appConfig := &config.AppConfig{
Backends: config.BackendConfig{
LlamaCpp: config.BackendSettings{
Command: "llama-server",
},
},
Instances: config.InstancesConfig{
PortRange: [2]int{8000, 9000},
MaxInstances: 1, // Very low limit for testing
TimeoutCheckInterval: 5,
},
LocalNode: "main",
Nodes: map[string]config.NodeConfig{},
}
limitedManager := manager.NewInstanceManager(cfg)
limitedManager := manager.New(appConfig)
_, err = limitedManager.CreateInstance("instance1", options)
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
_, err := limitedManager.CreateInstance("instance1", options)
if err != nil {
t.Fatalf("CreateInstance 1 failed: %v", err)
}
@@ -84,33 +76,32 @@ func TestCreateInstance_ValidationAndLimits(t *testing.T) {
}
}
func TestPortManagement(t *testing.T) {
func TestCreateInstance_FailsWithPortConflict(t *testing.T) {
manager := createTestManager()
// Test auto port assignment
options1 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
options1 := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
inst1, err := manager.CreateInstance("instance1", options1)
_, err := manager.CreateInstance("instance1", options1)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
port1 := inst1.GetPort()
if port1 < 8000 || port1 > 9000 {
t.Errorf("Expected port in range 8000-9000, got %d", port1)
}
// Test port conflict detection
options2 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model2.gguf",
Port: port1, // Same port - should conflict
// Try to create instance with same port
options2 := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model2.gguf",
Port: 8080, // Same port - should conflict
},
},
}
@@ -121,98 +112,21 @@ func TestPortManagement(t *testing.T) {
if !strings.Contains(err.Error(), "port") && !strings.Contains(err.Error(), "in use") {
t.Errorf("Expected port conflict error, got: %v", err)
}
// Test port release on deletion
specificPort := 8080
options3 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: specificPort,
},
}
_, err = manager.CreateInstance("port-test", options3)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
err = manager.DeleteInstance("port-test")
if err != nil {
t.Fatalf("DeleteInstance failed: %v", err)
}
// Should be able to create new instance with same port
_, err = manager.CreateInstance("new-port-test", options3)
if err != nil {
t.Errorf("Expected to reuse port after deletion, got error: %v", err)
}
}
func TestInstanceOperations(t *testing.T) {
func TestInstanceOperations_FailWithNonExistentInstance(t *testing.T) {
manager := createTestManager()
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
// Create instance
created, err := manager.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
// Get instance
retrieved, err := manager.GetInstance("test-instance")
if err != nil {
t.Fatalf("GetInstance failed: %v", err)
}
if retrieved.Name != created.Name {
t.Errorf("Expected name %q, got %q", created.Name, retrieved.Name)
}
// Update instance
newOptions := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/new-model.gguf",
Port: 8081,
},
}
updated, err := manager.UpdateInstance("test-instance", newOptions)
if err != nil {
t.Fatalf("UpdateInstance failed: %v", err)
}
if updated.GetOptions().LlamaServerOptions.Model != "/path/to/new-model.gguf" {
t.Errorf("Expected model '/path/to/new-model.gguf', got %q", updated.GetOptions().LlamaServerOptions.Model)
}
// List instances
instances, err := manager.ListInstances()
if err != nil {
t.Fatalf("ListInstances failed: %v", err)
}
if len(instances) != 1 {
t.Errorf("Expected 1 instance, got %d", len(instances))
}
// Delete instance
err = manager.DeleteInstance("test-instance")
if err != nil {
t.Fatalf("DeleteInstance failed: %v", err)
}
_, err = manager.GetInstance("test-instance")
if err == nil {
t.Error("Instance should not exist after deletion")
}
// Test operations on non-existent instances
_, err = manager.GetInstance("nonexistent")
_, err := manager.GetInstance("nonexistent")
if err == nil || !strings.Contains(err.Error(), "not found") {
t.Errorf("Expected 'not found' error, got: %v", err)
}
@@ -227,3 +141,143 @@ func TestInstanceOperations(t *testing.T) {
t.Errorf("Expected 'not found' error, got: %v", err)
}
}
func TestDeleteInstance_RunningInstanceFails(t *testing.T) {
mgr := createTestManager()
defer mgr.Shutdown()
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
},
}
_, err := mgr.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
_, err = mgr.StartInstance("test-instance")
if err != nil {
t.Fatalf("StartInstance failed: %v", err)
}
// Should fail to delete running instance
err = mgr.DeleteInstance("test-instance")
if err == nil {
t.Error("Expected error when deleting running instance")
}
}
func TestUpdateInstance(t *testing.T) {
mgr := createTestManager()
defer mgr.Shutdown()
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
_, err := mgr.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
_, err = mgr.StartInstance("test-instance")
if err != nil {
t.Fatalf("StartInstance failed: %v", err)
}
// Update running instance with new model
newOptions := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/new-model.gguf",
Port: 8080,
},
},
}
updated, err := mgr.UpdateInstance("test-instance", newOptions)
if err != nil {
t.Fatalf("UpdateInstance failed: %v", err)
}
// Should still be running after update
if !updated.IsRunning() {
t.Error("Instance should be running after update")
}
if updated.GetOptions().BackendOptions.LlamaServerOptions.Model != "/path/to/new-model.gguf" {
t.Errorf("Expected model to be updated")
}
}
func TestUpdateInstance_ReleasesOldPort(t *testing.T) {
mgr := createTestManager()
defer mgr.Shutdown()
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8080,
},
},
}
inst, err := mgr.CreateInstance("test-instance", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
if inst.GetPort() != 8080 {
t.Errorf("Expected port 8080, got %d", inst.GetPort())
}
// Update with new port
newOptions := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
Port: 8081,
},
},
}
updated, err := mgr.UpdateInstance("test-instance", newOptions)
if err != nil {
t.Fatalf("UpdateInstance failed: %v", err)
}
if updated.GetPort() != 8081 {
t.Errorf("Expected port 8081, got %d", updated.GetPort())
}
// Old port should be released - try to create new instance with old port
options2 := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model2.gguf",
Port: 8080,
},
},
}
_, err = mgr.CreateInstance("test-instance-2", options2)
if err != nil {
t.Errorf("Should be able to use old port 8080: %v", err)
}
}

223
pkg/manager/persistence.go Normal file
View File

@@ -0,0 +1,223 @@
package manager
import (
"encoding/json"
"fmt"
"llamactl/pkg/instance"
"log"
"os"
"path/filepath"
"strings"
"sync"
)
// instancePersister provides atomic file-based persistence with durability guarantees.
type instancePersister struct {
mu sync.Mutex
instancesDir string
enabled bool
}
// newInstancePersister creates a new instance persister.
// If instancesDir is empty, persistence is disabled.
func newInstancePersister(instancesDir string) (*instancePersister, error) {
if instancesDir == "" {
return &instancePersister{
enabled: false,
}, nil
}
// Ensure the instances directory exists
if err := os.MkdirAll(instancesDir, 0755); err != nil {
return nil, fmt.Errorf("failed to create instances directory: %w", err)
}
return &instancePersister{
instancesDir: instancesDir,
enabled: true,
}, nil
}
// Save persists an instance to disk with atomic write
func (p *instancePersister) save(inst *instance.Instance) error {
if !p.enabled {
return nil
}
if inst == nil {
return fmt.Errorf("cannot save nil instance")
}
// Validate instance name to prevent path traversal
validatedName, err := p.validateInstanceName(inst.Name)
if err != nil {
return err
}
p.mu.Lock()
defer p.mu.Unlock()
instancePath := filepath.Join(p.instancesDir, validatedName+".json")
tempPath := instancePath + ".tmp"
// Serialize instance to JSON
jsonData, err := json.MarshalIndent(inst, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal instance %s: %w", inst.Name, err)
}
// Create temporary file
tempFile, err := os.OpenFile(tempPath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0644)
if err != nil {
return fmt.Errorf("failed to create temp file for instance %s: %w", inst.Name, err)
}
// Write data to temporary file
if _, err := tempFile.Write(jsonData); err != nil {
tempFile.Close()
os.Remove(tempPath)
return fmt.Errorf("failed to write temp file for instance %s: %w", inst.Name, err)
}
// Sync to disk before rename to ensure durability
if err := tempFile.Sync(); err != nil {
tempFile.Close()
os.Remove(tempPath)
return fmt.Errorf("failed to sync temp file for instance %s: %w", inst.Name, err)
}
// Close the file
if err := tempFile.Close(); err != nil {
os.Remove(tempPath)
return fmt.Errorf("failed to close temp file for instance %s: %w", inst.Name, err)
}
// Atomic rename (this is atomic on POSIX systems)
if err := os.Rename(tempPath, instancePath); err != nil {
os.Remove(tempPath)
return fmt.Errorf("failed to rename temp file for instance %s: %w", inst.Name, err)
}
return nil
}
// Delete removes an instance's persistence file from disk.
func (p *instancePersister) delete(name string) error {
if !p.enabled {
return nil
}
validatedName, err := p.validateInstanceName(name)
if err != nil {
return err
}
p.mu.Lock()
defer p.mu.Unlock()
instancePath := filepath.Join(p.instancesDir, validatedName+".json")
if err := os.Remove(instancePath); err != nil {
if os.IsNotExist(err) {
// Not an error if file doesn't exist
return nil
}
return fmt.Errorf("failed to delete instance file for %s: %w", name, err)
}
return nil
}
// LoadAll loads all persisted instances from disk.
// Returns a slice of instances and any errors encountered during loading.
func (p *instancePersister) loadAll() ([]*instance.Instance, error) {
if !p.enabled {
return nil, nil
}
p.mu.Lock()
defer p.mu.Unlock()
// Check if instances directory exists
if _, err := os.Stat(p.instancesDir); os.IsNotExist(err) {
return nil, nil // No instances directory, return empty list
}
// Read all JSON files from instances directory
files, err := os.ReadDir(p.instancesDir)
if err != nil {
return nil, fmt.Errorf("failed to read instances directory: %w", err)
}
instances := make([]*instance.Instance, 0)
var loadErrors []string
for _, file := range files {
if file.IsDir() || !strings.HasSuffix(file.Name(), ".json") {
continue
}
instanceName := strings.TrimSuffix(file.Name(), ".json")
instancePath := filepath.Join(p.instancesDir, file.Name())
inst, err := p.loadInstanceFile(instanceName, instancePath)
if err != nil {
log.Printf("Failed to load instance %s: %v", instanceName, err)
loadErrors = append(loadErrors, fmt.Sprintf("%s: %v", instanceName, err))
continue
}
instances = append(instances, inst)
}
if len(loadErrors) > 0 {
log.Printf("Loaded %d instances with %d errors", len(instances), len(loadErrors))
} else if len(instances) > 0 {
log.Printf("Loaded %d instances from persistence", len(instances))
}
return instances, nil
}
// loadInstanceFile is an internal helper that loads a single instance file.
// Note: This assumes the mutex is already held by the caller.
func (p *instancePersister) loadInstanceFile(name, path string) (*instance.Instance, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("failed to read instance file: %w", err)
}
var inst instance.Instance
if err := json.Unmarshal(data, &inst); err != nil {
return nil, fmt.Errorf("failed to unmarshal instance: %w", err)
}
// Validate the instance name matches the filename
if inst.Name != name {
return nil, fmt.Errorf("instance name mismatch: file=%s, instance.Name=%s", name, inst.Name)
}
return &inst, nil
}
// validateInstanceName ensures the instance name is safe for filesystem operations.
// Returns the validated name if valid, or an error if invalid.
func (p *instancePersister) validateInstanceName(name string) (string, error) {
if name == "" {
return "", fmt.Errorf("instance name cannot be empty")
}
// Check for path separators and parent directory references
// This prevents path traversal attacks
if strings.Contains(name, "/") || strings.Contains(name, "\\") || strings.Contains(name, "..") {
return "", fmt.Errorf("invalid instance name: %s (cannot contain path separators or '..')", name)
}
// Additional check: ensure the name doesn't start with a dot (hidden files)
// or contain any other suspicious characters
if strings.HasPrefix(name, ".") {
return "", fmt.Errorf("invalid instance name: %s (cannot start with '.')", name)
}
return name, nil
}

184
pkg/manager/ports.go Normal file
View File

@@ -0,0 +1,184 @@
package manager
import (
"fmt"
"math/bits"
"sync"
)
// portAllocator provides efficient port allocation using a bitmap for O(1) operations.
// The bitmap approach prevents unbounded memory growth and simplifies port management.
type portAllocator struct {
mu sync.Mutex
// Bitmap for O(1) allocation/release
// Each bit represents a port (1 = allocated, 0 = free)
bitmap []uint64 // Each uint64 covers 64 ports
// Map port to instance name for cleanup operations
allocated map[int]string
minPort int
maxPort int
rangeSize int
}
// newPortAllocator creates a new port allocator for the given port range.
// Returns an error if the port range is invalid.
func newPortAllocator(minPort, maxPort int) (*portAllocator, error) {
if minPort <= 0 || maxPort <= 0 {
return nil, fmt.Errorf("invalid port range: min=%d, max=%d (must be > 0)", minPort, maxPort)
}
if minPort > maxPort {
return nil, fmt.Errorf("invalid port range: min=%d > max=%d", minPort, maxPort)
}
rangeSize := maxPort - minPort + 1
bitmapSize := (rangeSize + 63) / 64 // Round up to nearest uint64
return &portAllocator{
bitmap: make([]uint64, bitmapSize),
allocated: make(map[int]string),
minPort: minPort,
maxPort: maxPort,
rangeSize: rangeSize,
}, nil
}
// allocate finds and allocates the first available port for the given instance.
// Returns the allocated port or an error if no ports are available.
func (p *portAllocator) allocate(instanceName string) (int, error) {
if instanceName == "" {
return 0, fmt.Errorf("instance name cannot be empty")
}
p.mu.Lock()
defer p.mu.Unlock()
port, err := p.findFirstFreeBit()
if err != nil {
return 0, err
}
p.setBit(port)
p.allocated[port] = instanceName
return port, nil
}
// allocateSpecific allocates a specific port for the given instance.
// Returns an error if the port is already allocated or out of range.
func (p *portAllocator) allocateSpecific(port int, instanceName string) error {
if instanceName == "" {
return fmt.Errorf("instance name cannot be empty")
}
if port < p.minPort || port > p.maxPort {
return fmt.Errorf("port %d is out of range [%d-%d]", port, p.minPort, p.maxPort)
}
p.mu.Lock()
defer p.mu.Unlock()
if p.isBitSet(port) {
return fmt.Errorf("port %d is already allocated", port)
}
p.setBit(port)
p.allocated[port] = instanceName
return nil
}
// release releases a specific port, making it available for reuse.
// Returns an error if the port is not allocated.
func (p *portAllocator) release(port int) error {
if port < p.minPort || port > p.maxPort {
return fmt.Errorf("port %d is out of range [%d-%d]", port, p.minPort, p.maxPort)
}
p.mu.Lock()
defer p.mu.Unlock()
if !p.isBitSet(port) {
return fmt.Errorf("port %d is not allocated", port)
}
p.clearBit(port)
delete(p.allocated, port)
return nil
}
// releaseByInstance releases all ports allocated to the given instance.
// This is useful for cleanup when deleting or updating an instance.
// Returns the number of ports released.
func (p *portAllocator) releaseByInstance(instanceName string) int {
if instanceName == "" {
return 0
}
p.mu.Lock()
defer p.mu.Unlock()
portsToRelease := make([]int, 0)
for port, name := range p.allocated {
if name == instanceName {
portsToRelease = append(portsToRelease, port)
}
}
for _, port := range portsToRelease {
p.clearBit(port)
delete(p.allocated, port)
}
return len(portsToRelease)
}
// --- Internal bitmap operations ---
// portToBitPos converts a port number to bitmap array index and bit position.
func (p *portAllocator) portToBitPos(port int) (index int, bit uint) {
offset := port - p.minPort
index = offset / 64
bit = uint(offset % 64)
return
}
// setBit marks a port as allocated in the bitmap.
func (p *portAllocator) setBit(port int) {
index, bit := p.portToBitPos(port)
p.bitmap[index] |= (1 << bit)
}
// clearBit marks a port as free in the bitmap.
func (p *portAllocator) clearBit(port int) {
index, bit := p.portToBitPos(port)
p.bitmap[index] &^= (1 << bit)
}
// isBitSet checks if a port is allocated in the bitmap.
func (p *portAllocator) isBitSet(port int) bool {
index, bit := p.portToBitPos(port)
return (p.bitmap[index] & (1 << bit)) != 0
}
// findFirstFreeBit scans the bitmap to find the first unallocated port.
// Returns the port number or an error if no ports are available.
func (p *portAllocator) findFirstFreeBit() (int, error) {
for i, word := range p.bitmap {
if word != ^uint64(0) { // Not all bits are set (some ports are free)
// Find the first 0 bit in this word
// XOR with all 1s to flip bits, then find first 1 (which was 0)
bit := bits.TrailingZeros64(^word)
port := p.minPort + (i * 64) + bit
// Ensure we don't go beyond maxPort due to bitmap rounding
if port <= p.maxPort {
return port, nil
}
}
}
return 0, fmt.Errorf("no available ports in range [%d-%d]", p.minPort, p.maxPort)
}

121
pkg/manager/registry.go Normal file
View File

@@ -0,0 +1,121 @@
package manager
import (
"fmt"
"llamactl/pkg/instance"
"sync"
)
// instanceRegistry provides thread-safe storage and lookup of instances
// with running state tracking using lock-free sync.Map for status checks.
type instanceRegistry struct {
mu sync.RWMutex
instances map[string]*instance.Instance
running sync.Map // map[string]struct{} - lock-free for status checks
}
// newInstanceRegistry creates a new instance registry.
func newInstanceRegistry() *instanceRegistry {
return &instanceRegistry{
instances: make(map[string]*instance.Instance),
}
}
// Get retrieves an instance by name.
// Returns the instance and true if found, nil and false otherwise.
func (r *instanceRegistry) get(name string) (*instance.Instance, bool) {
r.mu.RLock()
defer r.mu.RUnlock()
inst, exists := r.instances[name]
return inst, exists
}
// List returns a snapshot copy of all instances to prevent external mutation.
func (r *instanceRegistry) list() []*instance.Instance {
r.mu.RLock()
defer r.mu.RUnlock()
result := make([]*instance.Instance, 0, len(r.instances))
for _, inst := range r.instances {
result = append(result, inst)
}
return result
}
// ListRunning returns a snapshot of all currently running instances.
func (r *instanceRegistry) listRunning() []*instance.Instance {
r.mu.RLock()
defer r.mu.RUnlock()
result := make([]*instance.Instance, 0)
for name, inst := range r.instances {
if _, isRunning := r.running.Load(name); isRunning {
result = append(result, inst)
}
}
return result
}
// Add adds a new instance to the registry.
// Returns an error if an instance with the same name already exists.
func (r *instanceRegistry) add(inst *instance.Instance) error {
if inst == nil {
return fmt.Errorf("cannot add nil instance")
}
r.mu.Lock()
defer r.mu.Unlock()
if _, exists := r.instances[inst.Name]; exists {
return fmt.Errorf("instance %s already exists", inst.Name)
}
r.instances[inst.Name] = inst
// Initialize running state if the instance is running
if inst.IsRunning() {
r.running.Store(inst.Name, struct{}{})
}
return nil
}
// Remove removes an instance from the registry.
// Returns an error if the instance doesn't exist.
func (r *instanceRegistry) remove(name string) error {
r.mu.Lock()
defer r.mu.Unlock()
if _, exists := r.instances[name]; !exists {
return fmt.Errorf("instance %s not found", name)
}
delete(r.instances, name)
r.running.Delete(name)
return nil
}
// MarkRunning marks an instance as running using lock-free sync.Map.
func (r *instanceRegistry) markRunning(name string) {
r.running.Store(name, struct{}{})
}
// MarkStopped marks an instance as stopped using lock-free sync.Map.
func (r *instanceRegistry) markStopped(name string) {
r.running.Delete(name)
}
// IsRunning checks if an instance is running using lock-free sync.Map.
func (r *instanceRegistry) isRunning(name string) bool {
_, isRunning := r.running.Load(name)
return isRunning
}
// Count returns the total number of instances in the registry.
func (r *instanceRegistry) count() int {
r.mu.RLock()
defer r.mu.RUnlock()
return len(r.instances)
}

293
pkg/manager/remote.go Normal file
View File

@@ -0,0 +1,293 @@
package manager
import (
"bytes"
"context"
"encoding/json"
"fmt"
"io"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"net/http"
"net/url"
"sync"
"time"
)
const apiBasePath = "/api/v1/instances/"
// remoteManager handles HTTP operations for remote instances.
type remoteManager struct {
mu sync.RWMutex
client *http.Client
nodeMap map[string]*config.NodeConfig // node name -> node config
instanceToNode map[string]*config.NodeConfig // instance name -> node config
}
// newRemoteManager creates a new remote manager.
func newRemoteManager(nodes map[string]config.NodeConfig, timeout time.Duration) *remoteManager {
if timeout <= 0 {
timeout = 30 * time.Second
}
// Build node config map
nodeMap := make(map[string]*config.NodeConfig)
for name := range nodes {
nodeCopy := nodes[name]
nodeMap[name] = &nodeCopy
}
return &remoteManager{
client: &http.Client{
Timeout: timeout,
},
nodeMap: nodeMap,
instanceToNode: make(map[string]*config.NodeConfig),
}
}
// GetNodeForInstance returns the node configuration for a given instance.
// Returns nil if the instance is not mapped to any node.
func (rm *remoteManager) getNodeForInstance(instanceName string) (*config.NodeConfig, bool) {
rm.mu.RLock()
defer rm.mu.RUnlock()
node, exists := rm.instanceToNode[instanceName]
return node, exists
}
// SetInstanceNode maps an instance to a specific node.
// Returns an error if the node doesn't exist.
func (rm *remoteManager) setInstanceNode(instanceName, nodeName string) error {
rm.mu.Lock()
defer rm.mu.Unlock()
node, exists := rm.nodeMap[nodeName]
if !exists {
return fmt.Errorf("node %s not found", nodeName)
}
rm.instanceToNode[instanceName] = node
return nil
}
// RemoveInstance removes the instance-to-node mapping.
func (rm *remoteManager) removeInstance(instanceName string) {
rm.mu.Lock()
defer rm.mu.Unlock()
delete(rm.instanceToNode, instanceName)
}
// --- HTTP request helpers ---
// makeRemoteRequest creates and executes an HTTP request to a remote node with context support.
func (rm *remoteManager) makeRemoteRequest(ctx context.Context, nodeConfig *config.NodeConfig, method, path string, body any) (*http.Response, error) {
var reqBody io.Reader
if body != nil {
jsonData, err := json.Marshal(body)
if err != nil {
return nil, fmt.Errorf("failed to marshal request body: %w", err)
}
reqBody = bytes.NewBuffer(jsonData)
}
url := fmt.Sprintf("%s%s", nodeConfig.Address, path)
req, err := http.NewRequestWithContext(ctx, method, url, reqBody)
if err != nil {
return nil, fmt.Errorf("failed to create request: %w", err)
}
if body != nil {
req.Header.Set("Content-Type", "application/json")
}
if nodeConfig.APIKey != "" {
req.Header.Set("Authorization", fmt.Sprintf("Bearer %s", nodeConfig.APIKey))
}
resp, err := rm.client.Do(req)
if err != nil {
return nil, fmt.Errorf("failed to execute request: %w", err)
}
return resp, nil
}
// parseRemoteResponse parses an HTTP response and unmarshals the result.
func parseRemoteResponse(resp *http.Response, result any) error {
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return fmt.Errorf("failed to read response body: %w", err)
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body))
}
if result != nil {
if err := json.Unmarshal(body, result); err != nil {
return fmt.Errorf("failed to unmarshal response: %w", err)
}
}
return nil
}
// --- Remote CRUD operations ---
// createInstance creates a new instance on a remote node.
func (rm *remoteManager) createInstance(ctx context.Context, node *config.NodeConfig, name string, opts *instance.Options) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "POST", path, opts)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// getInstance retrieves an instance by name from a remote node.
func (rm *remoteManager) getInstance(ctx context.Context, node *config.NodeConfig, name string) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "GET", path, nil)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// updateInstance updates an existing instance on a remote node.
func (rm *remoteManager) updateInstance(ctx context.Context, node *config.NodeConfig, name string, opts *instance.Options) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "PUT", path, opts)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// deleteInstance deletes an instance from a remote node.
func (rm *remoteManager) deleteInstance(ctx context.Context, node *config.NodeConfig, name string) error {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "DELETE", path, nil)
if err != nil {
return err
}
return parseRemoteResponse(resp, nil)
}
// startInstance starts an instance on a remote node.
func (rm *remoteManager) startInstance(ctx context.Context, node *config.NodeConfig, name string) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/start", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "POST", path, nil)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// stopInstance stops an instance on a remote node.
func (rm *remoteManager) stopInstance(ctx context.Context, node *config.NodeConfig, name string) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/stop", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "POST", path, nil)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// restartInstance restarts an instance on a remote node.
func (rm *remoteManager) restartInstance(ctx context.Context, node *config.NodeConfig, name string) (*instance.Instance, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/restart", apiBasePath, escapedName)
resp, err := rm.makeRemoteRequest(ctx, node, "POST", path, nil)
if err != nil {
return nil, err
}
var inst instance.Instance
if err := parseRemoteResponse(resp, &inst); err != nil {
return nil, err
}
return &inst, nil
}
// getInstanceLogs retrieves logs for an instance from a remote node.
func (rm *remoteManager) getInstanceLogs(ctx context.Context, node *config.NodeConfig, name string, numLines int) (string, error) {
escapedName := url.PathEscape(name)
path := fmt.Sprintf("%s%s/logs?lines=%d", apiBasePath, escapedName, numLines)
resp, err := rm.makeRemoteRequest(ctx, node, "GET", path, nil)
if err != nil {
return "", err
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return "", fmt.Errorf("failed to read response body: %w", err)
}
if resp.StatusCode < 200 || resp.StatusCode >= 300 {
return "", fmt.Errorf("API request failed with status %d: %s", resp.StatusCode, string(body))
}
// Logs endpoint returns plain text (Content-Type: text/plain)
return string(body), nil
}

View File

@@ -1,64 +0,0 @@
package manager
import (
"fmt"
"llamactl/pkg/instance"
"log"
)
func (im *instanceManager) checkAllTimeouts() {
im.mu.RLock()
var timeoutInstances []string
// Identify instances that should timeout
for _, inst := range im.instances {
if inst.ShouldTimeout() {
timeoutInstances = append(timeoutInstances, inst.Name)
}
}
im.mu.RUnlock() // Release read lock before calling StopInstance
// Stop the timed-out instances
for _, name := range timeoutInstances {
log.Printf("Instance %s has timed out, stopping it", name)
if _, err := im.StopInstance(name); err != nil {
log.Printf("Error stopping instance %s: %v", name, err)
} else {
log.Printf("Instance %s stopped successfully", name)
}
}
}
// EvictLRUInstance finds and stops the least recently used running instance.
func (im *instanceManager) EvictLRUInstance() error {
im.mu.RLock()
var lruInstance *instance.Process
for name, _ := range im.runningInstances {
inst := im.instances[name]
if inst == nil {
continue
}
if inst.GetOptions() != nil && inst.GetOptions().IdleTimeout != nil && *inst.GetOptions().IdleTimeout <= 0 {
continue // Skip instances without idle timeout
}
if lruInstance == nil {
lruInstance = inst
}
if inst.LastRequestTime() < lruInstance.LastRequestTime() {
lruInstance = inst
}
}
im.mu.RUnlock()
if lruInstance == nil {
return fmt.Errorf("failed to find lru instance")
}
// Evict Instance
_, err := im.StopInstance(lruInstance.Name)
return err
}

View File

@@ -1,328 +0,0 @@
package manager_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
"sync"
"testing"
"time"
)
func TestTimeoutFunctionality(t *testing.T) {
// Test timeout checker initialization
cfg := config.InstancesConfig{
PortRange: [2]int{8000, 9000},
TimeoutCheckInterval: 10,
MaxInstances: 5,
}
manager := manager.NewInstanceManager(cfg)
if manager == nil {
t.Fatal("Manager should be initialized with timeout checker")
}
manager.Shutdown() // Clean up
// Test timeout configuration and logic without starting the actual process
testManager := createTestManager()
defer testManager.Shutdown()
idleTimeout := 1 // 1 minute
options := &instance.CreateInstanceOptions{
IdleTimeout: &idleTimeout,
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
}
inst, err := testManager.CreateInstance("timeout-test", options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
// Test timeout configuration is properly set
if inst.GetOptions().IdleTimeout == nil {
t.Fatal("Instance should have idle timeout configured")
}
if *inst.GetOptions().IdleTimeout != 1 {
t.Errorf("Expected idle timeout 1 minute, got %d", *inst.GetOptions().IdleTimeout)
}
// Test timeout logic without actually starting the process
// Create a mock time provider to simulate timeout
mockTime := NewMockTimeProvider(time.Now())
inst.SetTimeProvider(mockTime)
// Set instance to running state so timeout logic can work
inst.SetStatus(instance.Running)
// Simulate instance being "running" for timeout check (without actual process)
// We'll test the ShouldTimeout logic directly
inst.UpdateLastRequestTime()
// Initially should not timeout (just updated)
if inst.ShouldTimeout() {
t.Error("Instance should not timeout immediately after request")
}
// Advance time to trigger timeout
mockTime.SetTime(time.Now().Add(2 * time.Minute))
// Now it should timeout
if !inst.ShouldTimeout() {
t.Error("Instance should timeout after idle period")
}
// Reset running state to avoid shutdown issues
inst.SetStatus(instance.Stopped)
// Test that instance without timeout doesn't timeout
noTimeoutOptions := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model.gguf",
},
// No IdleTimeout set
}
noTimeoutInst, err := testManager.CreateInstance("no-timeout-test", noTimeoutOptions)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
noTimeoutInst.SetTimeProvider(mockTime)
noTimeoutInst.SetStatus(instance.Running) // Set to running for timeout check
noTimeoutInst.UpdateLastRequestTime()
// Even with time advanced, should not timeout
if noTimeoutInst.ShouldTimeout() {
t.Error("Instance without timeout configuration should never timeout")
}
// Reset running state to avoid shutdown issues
noTimeoutInst.SetStatus(instance.Stopped)
}
func TestEvictLRUInstance_Success(t *testing.T) {
manager := createTestManager()
// Don't defer manager.Shutdown() - we'll handle cleanup manually
// Create 3 instances with idle timeout enabled (value doesn't matter for LRU logic)
options1 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model1.gguf",
},
IdleTimeout: func() *int { timeout := 1; return &timeout }(), // Any value > 0
}
options2 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model2.gguf",
},
IdleTimeout: func() *int { timeout := 1; return &timeout }(), // Any value > 0
}
options3 := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: "/path/to/model3.gguf",
},
IdleTimeout: func() *int { timeout := 1; return &timeout }(), // Any value > 0
}
inst1, err := manager.CreateInstance("instance-1", options1)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
inst2, err := manager.CreateInstance("instance-2", options2)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
inst3, err := manager.CreateInstance("instance-3", options3)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
// Set up mock time and set instances to running
mockTime := NewMockTimeProvider(time.Now())
inst1.SetTimeProvider(mockTime)
inst2.SetTimeProvider(mockTime)
inst3.SetTimeProvider(mockTime)
inst1.SetStatus(instance.Running)
inst2.SetStatus(instance.Running)
inst3.SetStatus(instance.Running)
// Set different last request times (oldest to newest)
// inst1: oldest (will be evicted)
inst1.UpdateLastRequestTime()
mockTime.SetTime(mockTime.Now().Add(1 * time.Minute))
inst2.UpdateLastRequestTime()
mockTime.SetTime(mockTime.Now().Add(1 * time.Minute))
inst3.UpdateLastRequestTime()
// Evict LRU instance (should be inst1)
err = manager.EvictLRUInstance()
if err != nil {
t.Fatalf("EvictLRUInstance failed: %v", err)
}
// Verify inst1 is stopped
if inst1.IsRunning() {
t.Error("Expected instance-1 to be stopped after eviction")
}
// Verify inst2 and inst3 are still running
if !inst2.IsRunning() {
t.Error("Expected instance-2 to still be running")
}
if !inst3.IsRunning() {
t.Error("Expected instance-3 to still be running")
}
// Clean up manually - set all to stopped and then shutdown
inst2.SetStatus(instance.Stopped)
inst3.SetStatus(instance.Stopped)
}
func TestEvictLRUInstance_NoEligibleInstances(t *testing.T) {
// Helper function to create instances with different timeout configurations
createInstanceWithTimeout := func(manager manager.InstanceManager, name, model string, timeout *int) *instance.Process {
options := &instance.CreateInstanceOptions{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
Model: model,
},
IdleTimeout: timeout,
}
inst, err := manager.CreateInstance(name, options)
if err != nil {
t.Fatalf("CreateInstance failed: %v", err)
}
return inst
}
t.Run("no running instances", func(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
err := manager.EvictLRUInstance()
if err == nil {
t.Error("Expected error when no running instances exist")
}
if err.Error() != "failed to find lru instance" {
t.Errorf("Expected 'failed to find lru instance' error, got: %v", err)
}
})
t.Run("only instances without timeout", func(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
// Create instances with various non-eligible timeout configurations
zeroTimeout := 0
negativeTimeout := -1
inst1 := createInstanceWithTimeout(manager, "no-timeout-1", "/path/to/model1.gguf", &zeroTimeout)
inst2 := createInstanceWithTimeout(manager, "no-timeout-2", "/path/to/model2.gguf", &negativeTimeout)
inst3 := createInstanceWithTimeout(manager, "no-timeout-3", "/path/to/model3.gguf", nil)
// Set instances to running
instances := []*instance.Process{inst1, inst2, inst3}
for _, inst := range instances {
inst.SetStatus(instance.Running)
}
defer func() {
// Reset instances to stopped to avoid shutdown panics
for _, inst := range instances {
inst.SetStatus(instance.Stopped)
}
}()
// Try to evict - should fail because no eligible instances
err := manager.EvictLRUInstance()
if err == nil {
t.Error("Expected error when no eligible instances exist")
}
if err.Error() != "failed to find lru instance" {
t.Errorf("Expected 'failed to find lru instance' error, got: %v", err)
}
// Verify all instances are still running
for i, inst := range instances {
if !inst.IsRunning() {
t.Errorf("Expected instance %d to still be running", i+1)
}
}
})
t.Run("mixed instances - evicts only eligible ones", func(t *testing.T) {
manager := createTestManager()
defer manager.Shutdown()
// Create mix of instances: some with timeout enabled, some disabled
validTimeout := 1
zeroTimeout := 0
instWithTimeout := createInstanceWithTimeout(manager, "with-timeout", "/path/to/model-with-timeout.gguf", &validTimeout)
instNoTimeout1 := createInstanceWithTimeout(manager, "no-timeout-1", "/path/to/model-no-timeout1.gguf", &zeroTimeout)
instNoTimeout2 := createInstanceWithTimeout(manager, "no-timeout-2", "/path/to/model-no-timeout2.gguf", nil)
// Set all instances to running
instances := []*instance.Process{instWithTimeout, instNoTimeout1, instNoTimeout2}
for _, inst := range instances {
inst.SetStatus(instance.Running)
inst.UpdateLastRequestTime()
}
defer func() {
// Reset instances to stopped to avoid shutdown panics
for _, inst := range instances {
if inst.IsRunning() {
inst.SetStatus(instance.Stopped)
}
}
}()
// Evict LRU instance - should only consider the one with timeout
err := manager.EvictLRUInstance()
if err != nil {
t.Fatalf("EvictLRUInstance failed: %v", err)
}
// Verify only the instance with timeout was evicted
if instWithTimeout.IsRunning() {
t.Error("Expected with-timeout instance to be stopped after eviction")
}
if !instNoTimeout1.IsRunning() {
t.Error("Expected no-timeout-1 instance to still be running")
}
if !instNoTimeout2.IsRunning() {
t.Error("Expected no-timeout-2 instance to still be running")
}
})
}
// Helper for timeout tests
type MockTimeProvider struct {
currentTime time.Time
mu sync.RWMutex
}
func NewMockTimeProvider(t time.Time) *MockTimeProvider {
return &MockTimeProvider{currentTime: t}
}
func (m *MockTimeProvider) Now() time.Time {
m.mu.RLock()
defer m.mu.RUnlock()
return m.currentTime
}
func (m *MockTimeProvider) SetTime(t time.Time) {
m.mu.Lock()
defer m.mu.Unlock()
m.currentTime = t
}

View File

@@ -1,631 +1,115 @@
package server
import (
"bytes"
"encoding/json"
"fmt"
"io"
"llamactl/pkg/config"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
"llamactl/pkg/validation"
"log"
"net/http"
"os/exec"
"strconv"
"strings"
"time"
"github.com/go-chi/chi/v5"
)
// errorResponse represents an error response returned by the API
type errorResponse struct {
Error string `json:"error"`
Details string `json:"details,omitempty"`
}
// writeError writes a JSON error response with the specified HTTP status code
func writeError(w http.ResponseWriter, status int, code, details string) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
if err := json.NewEncoder(w).Encode(errorResponse{Error: code, Details: details}); err != nil {
log.Printf("Failed to encode error response: %v", err)
}
}
// writeJSON writes a JSON response with the specified HTTP status code
func writeJSON(w http.ResponseWriter, status int, data any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
if err := json.NewEncoder(w).Encode(data); err != nil {
log.Printf("Failed to encode JSON response: %v", err)
}
}
// writeText writes a plain text response with the specified HTTP status code
func writeText(w http.ResponseWriter, status int, data string) {
w.Header().Set("Content-Type", "text/plain")
w.WriteHeader(status)
if _, err := w.Write([]byte(data)); err != nil {
log.Printf("Failed to write text response: %v", err)
}
}
// Handler provides HTTP handlers for the llamactl server API
type Handler struct {
InstanceManager manager.InstanceManager
cfg config.AppConfig
httpClient *http.Client
}
// NewHandler creates a new Handler instance with the provided instance manager and configuration
func NewHandler(im manager.InstanceManager, cfg config.AppConfig) *Handler {
return &Handler{
InstanceManager: im,
cfg: cfg,
httpClient: &http.Client{
Timeout: 30 * time.Second,
},
}
}
// VersionHandler godoc
// @Summary Get llamactl version
// @Description Returns the version of the llamactl command
// @Tags version
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Version information"
// @Failure 500 {string} string "Internal Server Error"
// @Router /version [get]
func (h *Handler) VersionHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/plain")
fmt.Fprintf(w, "Version: %s\nCommit: %s\nBuild Time: %s\n", h.cfg.Version, h.cfg.CommitHash, h.cfg.BuildTime)
// getInstance retrieves an instance by name from the request query parameters
func (h *Handler) getInstance(r *http.Request) (*instance.Instance, error) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
return nil, fmt.Errorf("invalid instance name: %w", err)
}
inst, err := h.InstanceManager.GetInstance(validatedName)
if err != nil {
return nil, fmt.Errorf("failed to get instance by name: %w", err)
}
return inst, nil
}
// LlamaServerHelpHandler godoc
// @Summary Get help for llama server
// @Description Returns the help text for the llama server command
// @Tags server
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Help text"
// @Failure 500 {string} string "Internal Server Error"
// @Router /server/help [get]
func (h *Handler) LlamaServerHelpHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
helpCmd := exec.Command("llama-server", "--help")
output, err := helpCmd.CombinedOutput()
if err != nil {
http.Error(w, "Failed to get help: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/plain")
w.Write(output)
// ensureInstanceRunning ensures the instance is running by starting it if on-demand start is enabled
// It handles LRU eviction when the maximum number of running instances is reached
func (h *Handler) ensureInstanceRunning(inst *instance.Instance) error {
options := inst.GetOptions()
allowOnDemand := options != nil && options.OnDemandStart != nil && *options.OnDemandStart
if !allowOnDemand {
return fmt.Errorf("instance is not running and on-demand start is not enabled")
}
}
// LlamaServerVersionHandler godoc
// @Summary Get version of llama server
// @Description Returns the version of the llama server command
// @Tags server
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Version information"
// @Failure 500 {string} string "Internal Server Error"
// @Router /server/version [get]
func (h *Handler) LlamaServerVersionHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
versionCmd := exec.Command("llama-server", "--version")
output, err := versionCmd.CombinedOutput()
if err != nil {
http.Error(w, "Failed to get version: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/plain")
w.Write(output)
}
}
// LlamaServerListDevicesHandler godoc
// @Summary List available devices for llama server
// @Description Returns a list of available devices for the llama server
// @Tags server
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "List of devices"
// @Failure 500 {string} string "Internal Server Error"
// @Router /server/devices [get]
func (h *Handler) LlamaServerListDevicesHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
listCmd := exec.Command("llama-server", "--list-devices")
output, err := listCmd.CombinedOutput()
if err != nil {
http.Error(w, "Failed to list devices: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/plain")
w.Write(output)
}
}
// ListInstances godoc
// @Summary List all instances
// @Description Returns a list of all instances managed by the server
// @Tags instances
// @Security ApiKeyAuth
// @Produces json
// @Success 200 {array} instance.Process "List of instances"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances [get]
func (h *Handler) ListInstances() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
instances, err := h.InstanceManager.ListInstances()
if err != nil {
http.Error(w, "Failed to list instances: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(instances); err != nil {
http.Error(w, "Failed to encode instances: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// CreateInstance godoc
// @Summary Create and start a new instance
// @Description Creates a new instance with the provided configuration options
// @Tags instances
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Param name path string true "Instance Name"
// @Param options body instance.CreateInstanceOptions true "Instance configuration options"
// @Success 201 {object} instance.Process "Created instance details"
// @Failure 400 {string} string "Invalid request body"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name} [post]
func (h *Handler) CreateInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
var options instance.CreateInstanceOptions
if err := json.NewDecoder(r.Body).Decode(&options); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.CreateInstance(name, &options)
if err != nil {
http.Error(w, "Failed to create instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(http.StatusCreated)
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// GetInstance godoc
// @Summary Get details of a specific instance
// @Description Returns the details of a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Process "Instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name} [get]
func (h *Handler) GetInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.GetInstance(name)
if err != nil {
http.Error(w, "Failed to get instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// UpdateInstance godoc
// @Summary Update an instance's configuration
// @Description Updates the configuration of a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Param name path string true "Instance Name"
// @Param options body instance.CreateInstanceOptions true "Instance configuration options"
// @Success 200 {object} instance.Process "Updated instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name} [put]
func (h *Handler) UpdateInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
var options instance.CreateInstanceOptions
if err := json.NewDecoder(r.Body).Decode(&options); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.UpdateInstance(name, &options)
if err != nil {
http.Error(w, "Failed to update instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// StartInstance godoc
// @Summary Start a stopped instance
// @Description Starts a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Process "Started instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name}/start [post]
func (h *Handler) StartInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.StartInstance(name)
if err != nil {
// Check if error is due to maximum running instances limit
if _, ok := err.(manager.MaxRunningInstancesError); ok {
http.Error(w, err.Error(), http.StatusConflict)
return
if h.InstanceManager.IsMaxRunningInstancesReached() {
if h.cfg.Instances.EnableLRUEviction {
err := h.InstanceManager.EvictLRUInstance()
if err != nil {
return fmt.Errorf("cannot start instance, failed to evict instance: %w", err)
}
http.Error(w, "Failed to start instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
} else {
return fmt.Errorf("cannot start instance, maximum number of instances reached")
}
}
}
// StopInstance godoc
// @Summary Stop a running instance
// @Description Stops a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Process "Stopped instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name}/stop [post]
func (h *Handler) StopInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.StopInstance(name)
if err != nil {
http.Error(w, "Failed to stop instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
}
// If on-demand start is enabled, start the instance
if _, err := h.InstanceManager.StartInstance(inst.Name); err != nil {
return fmt.Errorf("failed to start instance: %w", err)
}
}
// RestartInstance godoc
// @Summary Restart a running instance
// @Description Restarts a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Process "Restarted instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name}/restart [post]
func (h *Handler) RestartInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.RestartInstance(name)
if err != nil {
http.Error(w, "Failed to restart instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(inst); err != nil {
http.Error(w, "Failed to encode instance: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// DeleteInstance godoc
// @Summary Delete an instance
// @Description Stops and removes a specific instance by name
// @Tags instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Success 204 "No Content"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name} [delete]
func (h *Handler) DeleteInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
if err := h.InstanceManager.DeleteInstance(name); err != nil {
http.Error(w, "Failed to delete instance: "+err.Error(), http.StatusInternalServerError)
return
}
w.WriteHeader(http.StatusNoContent)
}
}
// GetInstanceLogs godoc
// @Summary Get logs from a specific instance
// @Description Returns the logs from a specific instance by name with optional line limit
// @Tags instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Param lines query string false "Number of lines to retrieve (default: all lines)"
// @Produces text/plain
// @Success 200 {string} string "Instance logs"
// @Failure 400 {string} string "Invalid name format or lines parameter"
// @Failure 500 {string} string "Internal Server Error"
// @Router /instances/{name}/logs [get]
func (h *Handler) GetInstanceLogs() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
lines := r.URL.Query().Get("lines")
if lines == "" {
lines = "-1"
}
num_lines, err := strconv.Atoi(lines)
if err != nil {
http.Error(w, "Invalid lines parameter: "+err.Error(), http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.GetInstance(name)
if err != nil {
http.Error(w, "Failed to get instance: "+err.Error(), http.StatusInternalServerError)
return
}
logs, err := inst.GetLogs(num_lines)
if err != nil {
http.Error(w, "Failed to get logs: "+err.Error(), http.StatusInternalServerError)
return
}
w.Header().Set("Content-Type", "text/plain")
w.Write([]byte(logs))
}
}
// ProxyToInstance godoc
// @Summary Proxy requests to a specific instance
// @Description Forwards HTTP requests to the llama-server instance running on a specific port
// @Tags instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Success 200 "Request successfully proxied to instance"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Failure 503 {string} string "Instance is not running"
// @Router /instances/{name}/proxy [get]
// @Router /instances/{name}/proxy [post]
func (h *Handler) ProxyToInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
http.Error(w, "Instance name cannot be empty", http.StatusBadRequest)
return
}
inst, err := h.InstanceManager.GetInstance(name)
if err != nil {
http.Error(w, "Failed to get instance: "+err.Error(), http.StatusInternalServerError)
return
}
if !inst.IsRunning() {
http.Error(w, "Instance is not running", http.StatusServiceUnavailable)
return
}
// Get the cached proxy for this instance
proxy, err := inst.GetProxy()
if err != nil {
http.Error(w, "Failed to get proxy: "+err.Error(), http.StatusInternalServerError)
return
}
// Strip the "/api/v1/instances/<name>/proxy" prefix from the request URL
prefix := fmt.Sprintf("/api/v1/instances/%s/proxy", name)
proxyPath := r.URL.Path[len(prefix):]
// Ensure the proxy path starts with "/"
if !strings.HasPrefix(proxyPath, "/") {
proxyPath = "/" + proxyPath
}
// Update the last request time for the instance
inst.UpdateLastRequestTime()
// Modify the request to remove the proxy prefix
originalPath := r.URL.Path
r.URL.Path = proxyPath
// Set forwarded headers
r.Header.Set("X-Forwarded-Host", r.Header.Get("Host"))
r.Header.Set("X-Forwarded-Proto", "http")
// Restore original path for logging purposes
defer func() {
r.URL.Path = originalPath
}()
// Forward the request using the cached proxy
proxy.ServeHTTP(w, r)
}
}
// OpenAIListInstances godoc
// @Summary List instances in OpenAI-compatible format
// @Description Returns a list of instances in a format compatible with OpenAI API
// @Tags openai
// @Security ApiKeyAuth
// @Produces json
// @Success 200 {object} OpenAIListInstancesResponse "List of OpenAI-compatible instances"
// @Failure 500 {string} string "Internal Server Error"
// @Router /v1/models [get]
func (h *Handler) OpenAIListInstances() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
instances, err := h.InstanceManager.ListInstances()
if err != nil {
http.Error(w, "Failed to list instances: "+err.Error(), http.StatusInternalServerError)
return
}
openaiInstances := make([]OpenAIInstance, len(instances))
for i, inst := range instances {
openaiInstances[i] = OpenAIInstance{
ID: inst.Name,
Object: "model",
Created: inst.Created,
OwnedBy: "llamactl",
}
}
openaiResponse := OpenAIListInstancesResponse{
Object: "list",
Data: openaiInstances,
}
w.Header().Set("Content-Type", "application/json")
if err := json.NewEncoder(w).Encode(openaiResponse); err != nil {
http.Error(w, "Failed to encode instances: "+err.Error(), http.StatusInternalServerError)
return
}
}
}
// OpenAIProxy godoc
// @Summary OpenAI-compatible proxy endpoint
// @Description Handles all POST requests to /v1/*, routing to the appropriate instance based on the request body. Requires API key authentication via the `Authorization` header.
// @Tags openai
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Success 200 "OpenAI response"
// @Failure 400 {string} string "Invalid request body or model name"
// @Failure 500 {string} string "Internal Server Error"
// @Router /v1/ [post]
func (h *Handler) OpenAIProxy() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Read the entire body first
bodyBytes, err := io.ReadAll(r.Body)
if err != nil {
http.Error(w, "Failed to read request body", http.StatusBadRequest)
return
}
r.Body.Close()
// Parse the body to extract model name
var requestBody map[string]any
if err := json.Unmarshal(bodyBytes, &requestBody); err != nil {
http.Error(w, "Invalid request body", http.StatusBadRequest)
return
}
modelName, ok := requestBody["model"].(string)
if !ok || modelName == "" {
http.Error(w, "Model name is required", http.StatusBadRequest)
return
}
// Route to the appropriate inst based on model name
inst, err := h.InstanceManager.GetInstance(modelName)
if err != nil {
http.Error(w, "Failed to get instance: "+err.Error(), http.StatusInternalServerError)
return
}
if !inst.IsRunning() {
allowOnDemand := inst.GetOptions() != nil && inst.GetOptions().OnDemandStart != nil && *inst.GetOptions().OnDemandStart
if !allowOnDemand {
http.Error(w, "Instance is not running", http.StatusServiceUnavailable)
return
}
if h.InstanceManager.IsMaxRunningInstancesReached() {
if h.cfg.Instances.EnableLRUEviction {
err := h.InstanceManager.EvictLRUInstance()
if err != nil {
http.Error(w, "Cannot start Instance, failed to evict instance "+err.Error(), http.StatusInternalServerError)
return
}
} else {
http.Error(w, "Cannot start Instance, maximum number of instances reached", http.StatusConflict)
return
}
}
// If on-demand start is enabled, start the instance
if _, err := h.InstanceManager.StartInstance(modelName); err != nil {
http.Error(w, "Failed to start instance: "+err.Error(), http.StatusInternalServerError)
return
}
// Wait for the instance to become healthy before proceeding
if err := inst.WaitForHealthy(h.cfg.Instances.OnDemandStartTimeout); err != nil { // 2 minutes timeout
http.Error(w, "Instance failed to become healthy: "+err.Error(), http.StatusServiceUnavailable)
return
}
}
proxy, err := inst.GetProxy()
if err != nil {
http.Error(w, "Failed to get proxy: "+err.Error(), http.StatusInternalServerError)
return
}
// Update last request time for the instance
inst.UpdateLastRequestTime()
// Recreate the request body from the bytes we read
r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
r.ContentLength = int64(len(bodyBytes))
proxy.ServeHTTP(w, r)
// Wait for the instance to become healthy before proceeding
if err := inst.WaitForHealthy(h.cfg.Instances.OnDemandStartTimeout); err != nil {
return fmt.Errorf("instance failed to become healthy: %w", err)
}
return nil
}

View File

@@ -0,0 +1,298 @@
package server
import (
"encoding/json"
"fmt"
"llamactl/pkg/backends"
"llamactl/pkg/instance"
"net/http"
"os/exec"
"strings"
)
// ParseCommandRequest represents the request body for backend command parsing
type ParseCommandRequest struct {
Command string `json:"command"`
}
// validateLlamaCppInstance validates that the instance specified in the request is a llama.cpp instance
func (h *Handler) validateLlamaCppInstance(r *http.Request) (*instance.Instance, error) {
inst, err := h.getInstance(r)
if err != nil {
return nil, fmt.Errorf("invalid instance: %w", err)
}
options := inst.GetOptions()
if options == nil {
return nil, fmt.Errorf("cannot obtain instance's options")
}
if options.BackendOptions.BackendType != backends.BackendTypeLlamaCpp {
return nil, fmt.Errorf("instance is not a llama.cpp server")
}
return inst, nil
}
// stripLlamaCppPrefix removes the llama.cpp proxy prefix from the request URL path
func (h *Handler) stripLlamaCppPrefix(r *http.Request, instName string) {
// Strip the "/llama-cpp/<name>" prefix from the request URL
prefix := fmt.Sprintf("/llama-cpp/%s", instName)
r.URL.Path = strings.TrimPrefix(r.URL.Path, prefix)
}
// LlamaCppUIProxy godoc
// @Summary Proxy requests to llama.cpp UI for the instance
// @Description Proxies requests to the llama.cpp UI for the specified instance
// @Tags Llama.cpp
// @Security ApiKeyAuth
// @Produce html
// @Param name query string true "Instance Name"
// @Success 200 {string} string "Proxied HTML response"
// @Failure 400 {string} string "Invalid instance"
// @Failure 500 {string} string "Internal Server Error"
// @Router /llama-cpp/{name}/ [get]
func (h *Handler) LlamaCppUIProxy() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
inst, err := h.validateLlamaCppInstance(r)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid instance", err.Error())
return
}
if !inst.IsRemote() && !inst.IsRunning() {
writeError(w, http.StatusBadRequest, "instance is not running", "Instance is not running")
return
}
proxy, err := inst.GetProxy()
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get proxy", err.Error())
return
}
if !inst.IsRemote() {
h.stripLlamaCppPrefix(r, inst.Name)
}
proxy.ServeHTTP(w, r)
}
}
// LlamaCppProxy godoc
// @Summary Proxy requests to llama.cpp server instance
// @Description Proxies requests to the specified llama.cpp server instance, starting it on-demand if configured
// @Tags Llama.cpp
// @Security ApiKeyAuth
// @Produce json
// @Param name path string true "Instance Name"
// @Success 200 {object} map[string]any "Proxied response"
// @Failure 400 {string} string "Invalid instance"
// @Failure 500 {string} string "Internal Server Error"
// @Router /llama-cpp/{name}/props [get]
// @Router /llama-cpp/{name}/slots [get]
// @Router /llama-cpp/{name}/apply-template [post]
// @Router /llama-cpp/{name}/completion [post]
// @Router /llama-cpp/{name}/detokenize [post]
// @Router /llama-cpp/{name}/embeddings [post]
// @Router /llama-cpp/{name}/infill [post]
// @Router /llama-cpp/{name}/metrics [post]
// @Router /llama-cpp/{name}/props [post]
// @Router /llama-cpp/{name}/reranking [post]
// @Router /llama-cpp/{name}/tokenize [post]
func (h *Handler) LlamaCppProxy() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
inst, err := h.validateLlamaCppInstance(r)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid instance", err.Error())
return
}
if !inst.IsRemote() && !inst.IsRunning() {
err := h.ensureInstanceRunning(inst)
if err != nil {
writeError(w, http.StatusInternalServerError, "instance start failed", err.Error())
return
}
}
proxy, err := inst.GetProxy()
if err != nil {
writeError(w, http.StatusInternalServerError, "failed to get proxy", err.Error())
return
}
if !inst.IsRemote() {
h.stripLlamaCppPrefix(r, inst.Name)
}
proxy.ServeHTTP(w, r)
}
}
// parseHelper parses a backend command and returns the parsed options
func parseHelper(w http.ResponseWriter, r *http.Request, backend interface {
ParseCommand(string) (any, error)
}) (any, bool) {
var req ParseCommandRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "Invalid JSON body")
return nil, false
}
if strings.TrimSpace(req.Command) == "" {
writeError(w, http.StatusBadRequest, "invalid_command", "Command cannot be empty")
return nil, false
}
// Parse command using the backend's ParseCommand method
parsedOptions, err := backend.ParseCommand(req.Command)
if err != nil {
writeError(w, http.StatusBadRequest, "parse_error", err.Error())
return nil, false
}
return parsedOptions, true
}
// ParseLlamaCommand godoc
// @Summary Parse llama-server command
// @Description Parses a llama-server command string into instance options
// @Tags Backends
// @Security ApiKeyAuth
// @Accept json
// @Produce json
// @Param request body ParseCommandRequest true "Command to parse"
// @Success 200 {object} instance.Options "Parsed options"
// @Failure 400 {object} map[string]string "Invalid request or command"
// @Failure 500 {object} map[string]string "Internal Server Error"
// @Router /api/v1/backends/llama-cpp/parse-command [post]
func (h *Handler) ParseLlamaCommand() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
parsedOptions, ok := parseHelper(w, r, &backends.LlamaServerOptions{})
if !ok {
return
}
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: parsedOptions.(*backends.LlamaServerOptions),
},
}
writeJSON(w, http.StatusOK, options)
}
}
// ParseMlxCommand godoc
// @Summary Parse mlx_lm.server command
// @Description Parses MLX-LM server command string into instance options
// @Tags Backends
// @Security ApiKeyAuth
// @Accept json
// @Produce json
// @Param request body ParseCommandRequest true "Command to parse"
// @Success 200 {object} instance.Options "Parsed options"
// @Failure 400 {object} map[string]string "Invalid request or command"
// @Router /api/v1/backends/mlx/parse-command [post]
func (h *Handler) ParseMlxCommand() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
parsedOptions, ok := parseHelper(w, r, &backends.MlxServerOptions{})
if !ok {
return
}
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeMlxLm,
MlxServerOptions: parsedOptions.(*backends.MlxServerOptions),
},
}
writeJSON(w, http.StatusOK, options)
}
}
// ParseVllmCommand godoc
// @Summary Parse vllm serve command
// @Description Parses a vLLM serve command string into instance options
// @Tags Backends
// @Security ApiKeyAuth
// @Accept json
// @Produce json
// @Param request body ParseCommandRequest true "Command to parse"
// @Success 200 {object} instance.Options "Parsed options"
// @Failure 400 {object} map[string]string "Invalid request or command"
// @Router /api/v1/backends/vllm/parse-command [post]
func (h *Handler) ParseVllmCommand() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
parsedOptions, ok := parseHelper(w, r, &backends.VllmServerOptions{})
if !ok {
return
}
options := &instance.Options{
BackendOptions: backends.Options{
BackendType: backends.BackendTypeVllm,
VllmServerOptions: parsedOptions.(*backends.VllmServerOptions),
},
}
writeJSON(w, http.StatusOK, options)
}
}
// executeLlamaServerCommand executes a llama-server command with the specified flag and returns the output
func (h *Handler) executeLlamaServerCommand(flag, errorMsg string) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
cmd := exec.Command("llama-server", flag)
output, err := cmd.CombinedOutput()
if err != nil {
writeError(w, http.StatusInternalServerError, "command failed", errorMsg+": "+err.Error())
return
}
writeText(w, http.StatusOK, string(output))
}
}
// LlamaServerHelpHandler godoc
// @Summary Get help for llama server
// @Description Returns the help text for the llama server command
// @Tags Backends
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Help text"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/backends/llama-cpp/help [get]
func (h *Handler) LlamaServerHelpHandler() http.HandlerFunc {
return h.executeLlamaServerCommand("--help", "Failed to get help")
}
// LlamaServerVersionHandler godoc
// @Summary Get version of llama server
// @Description Returns the version of the llama server command
// @Tags Backends
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Version information"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/backends/llama-cpp/version [get]
func (h *Handler) LlamaServerVersionHandler() http.HandlerFunc {
return h.executeLlamaServerCommand("--version", "Failed to get version")
}
// LlamaServerListDevicesHandler godoc
// @Summary List available devices for llama server
// @Description Returns a list of available devices for the llama server
// @Tags Backends
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "List of devices"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/backends/llama-cpp/devices [get]
func (h *Handler) LlamaServerListDevicesHandler() http.HandlerFunc {
return h.executeLlamaServerCommand("--list-devices", "Failed to list devices")
}

View File

@@ -0,0 +1,353 @@
package server
import (
"encoding/json"
"fmt"
"llamactl/pkg/instance"
"llamactl/pkg/manager"
"llamactl/pkg/validation"
"net/http"
"strconv"
"strings"
"github.com/go-chi/chi/v5"
)
// ListInstances godoc
// @Summary List all instances
// @Description Returns a list of all instances managed by the server
// @Tags Instances
// @Security ApiKeyAuth
// @Produces json
// @Success 200 {array} instance.Instance "List of instances"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances [get]
func (h *Handler) ListInstances() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
instances, err := h.InstanceManager.ListInstances()
if err != nil {
writeError(w, http.StatusInternalServerError, "list_failed", "Failed to list instances: "+err.Error())
return
}
writeJSON(w, http.StatusOK, instances)
}
}
// CreateInstance godoc
// @Summary Create and start a new instance
// @Description Creates a new instance with the provided configuration options
// @Tags Instances
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Param name path string true "Instance Name"
// @Param options body instance.Options true "Instance configuration options"
// @Success 201 {object} instance.Instance "Created instance details"
// @Failure 400 {string} string "Invalid request body"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name} [post]
func (h *Handler) CreateInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
var options instance.Options
if err := json.NewDecoder(r.Body).Decode(&options); err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "Invalid request body")
return
}
inst, err := h.InstanceManager.CreateInstance(validatedName, &options)
if err != nil {
writeError(w, http.StatusInternalServerError, "create_failed", "Failed to create instance: "+err.Error())
return
}
writeJSON(w, http.StatusCreated, inst)
}
}
// GetInstance godoc
// @Summary Get details of a specific instance
// @Description Returns the details of a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Instance "Instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name} [get]
func (h *Handler) GetInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
inst, err := h.InstanceManager.GetInstance(validatedName)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance", err.Error())
return
}
writeJSON(w, http.StatusOK, inst)
}
}
// UpdateInstance godoc
// @Summary Update an instance's configuration
// @Description Updates the configuration of a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Param name path string true "Instance Name"
// @Param options body instance.Options true "Instance configuration options"
// @Success 200 {object} instance.Instance "Updated instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name} [put]
func (h *Handler) UpdateInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
var options instance.Options
if err := json.NewDecoder(r.Body).Decode(&options); err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "Invalid request body")
return
}
inst, err := h.InstanceManager.UpdateInstance(validatedName, &options)
if err != nil {
writeError(w, http.StatusInternalServerError, "update_failed", "Failed to update instance: "+err.Error())
return
}
writeJSON(w, http.StatusOK, inst)
}
}
// StartInstance godoc
// @Summary Start a stopped instance
// @Description Starts a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Instance "Started instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name}/start [post]
func (h *Handler) StartInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
inst, err := h.InstanceManager.StartInstance(validatedName)
if err != nil {
// Check if error is due to maximum running instances limit
if _, ok := err.(manager.MaxRunningInstancesError); ok {
writeError(w, http.StatusConflict, "max_instances_reached", err.Error())
return
}
writeError(w, http.StatusInternalServerError, "start_failed", "Failed to start instance: "+err.Error())
return
}
writeJSON(w, http.StatusOK, inst)
}
}
// StopInstance godoc
// @Summary Stop a running instance
// @Description Stops a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Instance "Stopped instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name}/stop [post]
func (h *Handler) StopInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
inst, err := h.InstanceManager.StopInstance(validatedName)
if err != nil {
writeError(w, http.StatusInternalServerError, "stop_failed", "Failed to stop instance: "+err.Error())
return
}
writeJSON(w, http.StatusOK, inst)
}
}
// RestartInstance godoc
// @Summary Restart a running instance
// @Description Restarts a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Instance Name"
// @Success 200 {object} instance.Instance "Restarted instance details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name}/restart [post]
func (h *Handler) RestartInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
inst, err := h.InstanceManager.RestartInstance(validatedName)
if err != nil {
writeError(w, http.StatusInternalServerError, "restart_failed", "Failed to restart instance: "+err.Error())
return
}
writeJSON(w, http.StatusOK, inst)
}
}
// DeleteInstance godoc
// @Summary Delete an instance
// @Description Stops and removes a specific instance by name
// @Tags Instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Success 204 "No Content"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name} [delete]
func (h *Handler) DeleteInstance() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
if err := h.InstanceManager.DeleteInstance(validatedName); err != nil {
writeError(w, http.StatusInternalServerError, "delete_failed", "Failed to delete instance: "+err.Error())
return
}
w.WriteHeader(http.StatusNoContent)
}
}
// GetInstanceLogs godoc
// @Summary Get logs from a specific instance
// @Description Returns the logs from a specific instance by name with optional line limit
// @Tags Instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Param lines query string false "Number of lines to retrieve (default: all lines)"
// @Produces text/plain
// @Success 200 {string} string "Instance logs"
// @Failure 400 {string} string "Invalid name format or lines parameter"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/instances/{name}/logs [get]
func (h *Handler) GetInstanceLogs() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
validatedName, err := validation.ValidateInstanceName(name)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
lines := r.URL.Query().Get("lines")
numLines := -1 // Default to all lines
if lines != "" {
parsedLines, err := strconv.Atoi(lines)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_parameter", "Invalid lines parameter: "+err.Error())
return
}
numLines = parsedLines
}
// Use the instance manager which handles both local and remote instances
logs, err := h.InstanceManager.GetInstanceLogs(validatedName, numLines)
if err != nil {
writeError(w, http.StatusInternalServerError, "logs_failed", "Failed to get logs: "+err.Error())
return
}
writeText(w, http.StatusOK, logs)
}
}
// InstanceProxy godoc
// @Summary Proxy requests to a specific instance, does not autostart instance if stopped
// @Description Forwards HTTP requests to the llama-server instance running on a specific port
// @Tags Instances
// @Security ApiKeyAuth
// @Param name path string true "Instance Name"
// @Success 200 "Request successfully proxied to instance"
// @Failure 400 {string} string "Invalid name format"
// @Failure 500 {string} string "Internal Server Error"
// @Failure 503 {string} string "Instance is not running"
// @Router /api/v1/instances/{name}/proxy [get]
// @Router /api/v1/instances/{name}/proxy [post]
func (h *Handler) InstanceProxy() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
inst, err := h.getInstance(r)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance", err.Error())
return
}
if !inst.IsRunning() {
writeError(w, http.StatusServiceUnavailable, "instance_not_running", "Instance is not running")
return
}
proxy, err := inst.GetProxy()
if err != nil {
writeError(w, http.StatusInternalServerError, "proxy_failed", "Failed to get proxy: "+err.Error())
return
}
if !inst.IsRemote() {
// Strip the "/api/v1/instances/<name>/proxy" prefix from the request URL
prefix := fmt.Sprintf("/api/v1/instances/%s/proxy", inst.Name)
r.URL.Path = strings.TrimPrefix(r.URL.Path, prefix)
}
// Set forwarded headers
r.Header.Set("X-Forwarded-Host", r.Header.Get("Host"))
r.Header.Set("X-Forwarded-Proto", "http")
proxy.ServeHTTP(w, r)
}
}

View File

@@ -0,0 +1,70 @@
package server
import (
"net/http"
"github.com/go-chi/chi/v5"
)
// NodeResponse represents a node configuration in API responses
type NodeResponse struct {
Address string `json:"address"`
}
// ListNodes godoc
// @Summary List all configured nodes
// @Description Returns a map of all nodes configured in the server (node name -> node config)
// @Tags Nodes
// @Security ApiKeyAuth
// @Produces json
// @Success 200 {object} map[string]NodeResponse "Map of nodes"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/nodes [get]
func (h *Handler) ListNodes() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Convert to sanitized response format (map of name -> NodeResponse)
nodeResponses := make(map[string]NodeResponse, len(h.cfg.Nodes))
for name, node := range h.cfg.Nodes {
nodeResponses[name] = NodeResponse{
Address: node.Address,
}
}
writeJSON(w, http.StatusOK, nodeResponses)
}
}
// GetNode godoc
// @Summary Get details of a specific node
// @Description Returns the details of a specific node by name
// @Tags Nodes
// @Security ApiKeyAuth
// @Produces json
// @Param name path string true "Node Name"
// @Success 200 {object} NodeResponse "Node details"
// @Failure 400 {string} string "Invalid name format"
// @Failure 404 {string} string "Node not found"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/nodes/{name} [get]
func (h *Handler) GetNode() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
name := chi.URLParam(r, "name")
if name == "" {
writeError(w, http.StatusBadRequest, "invalid_request", "Node name cannot be empty")
return
}
nodeConfig, exists := h.cfg.Nodes[name]
if !exists {
writeError(w, http.StatusNotFound, "not_found", "Node not found")
return
}
// Convert to sanitized response format
nodeResponse := NodeResponse{
Address: nodeConfig.Address,
}
writeJSON(w, http.StatusOK, nodeResponse)
}
}

View File

@@ -0,0 +1,129 @@
package server
import (
"bytes"
"encoding/json"
"io"
"llamactl/pkg/validation"
"net/http"
)
// OpenAIListInstancesResponse represents the response structure for listing instances (models) in OpenAI-compatible format
type OpenAIListInstancesResponse struct {
Object string `json:"object"`
Data []OpenAIInstance `json:"data"`
}
// OpenAIInstance represents a single instance (model) in OpenAI-compatible format
type OpenAIInstance struct {
ID string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
OwnedBy string `json:"owned_by"`
}
// OpenAIListInstances godoc
// @Summary List instances in OpenAI-compatible format
// @Description Returns a list of instances in a format compatible with OpenAI API
// @Tags OpenAI
// @Security ApiKeyAuth
// @Produces json
// @Success 200 {object} OpenAIListInstancesResponse "List of OpenAI-compatible instances"
// @Failure 500 {string} string "Internal Server Error"
// @Router /v1/models [get]
func (h *Handler) OpenAIListInstances() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
instances, err := h.InstanceManager.ListInstances()
if err != nil {
writeError(w, http.StatusInternalServerError, "list_failed", "Failed to list instances: "+err.Error())
return
}
openaiInstances := make([]OpenAIInstance, len(instances))
for i, inst := range instances {
openaiInstances[i] = OpenAIInstance{
ID: inst.Name,
Object: "model",
Created: inst.Created,
OwnedBy: "llamactl",
}
}
openaiResponse := OpenAIListInstancesResponse{
Object: "list",
Data: openaiInstances,
}
writeJSON(w, http.StatusOK, openaiResponse)
}
}
// OpenAIProxy godoc
// @Summary OpenAI-compatible proxy endpoint
// @Description Handles all POST requests to /v1/*, routing to the appropriate instance based on the request body. Requires API key authentication via the `Authorization` header.
// @Tags OpenAI
// @Security ApiKeyAuth
// @Accept json
// @Produces json
// @Success 200 "OpenAI response"
// @Failure 400 {string} string "Invalid request body or instance name"
// @Failure 500 {string} string "Internal Server Error"
// @Router /v1/ [post]
func (h *Handler) OpenAIProxy() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
// Read the entire body first
bodyBytes, err := io.ReadAll(r.Body)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "Failed to read request body")
return
}
r.Body.Close()
// Parse the body to extract instance name
var requestBody map[string]any
if err := json.Unmarshal(bodyBytes, &requestBody); err != nil {
writeError(w, http.StatusBadRequest, "invalid_request", "Invalid request body")
return
}
modelName, ok := requestBody["model"].(string)
if !ok || modelName == "" {
writeError(w, http.StatusBadRequest, "invalid_request", "Instance name is required")
return
}
// Validate instance name at the entry point
validatedName, err := validation.ValidateInstanceName(modelName)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance_name", err.Error())
return
}
// Route to the appropriate inst based on instance name
inst, err := h.InstanceManager.GetInstance(validatedName)
if err != nil {
writeError(w, http.StatusBadRequest, "invalid_instance", err.Error())
return
}
if !inst.IsRemote() && !inst.IsRunning() {
err := h.ensureInstanceRunning(inst)
if err != nil {
writeError(w, http.StatusInternalServerError, "instance_start_failed", err.Error())
return
}
}
proxy, err := inst.GetProxy()
if err != nil {
writeError(w, http.StatusInternalServerError, "proxy_failed", err.Error())
return
}
// Recreate the request body from the bytes we read
r.Body = io.NopCloser(bytes.NewReader(bodyBytes))
r.ContentLength = int64(len(bodyBytes))
proxy.ServeHTTP(w, r)
}
}

View File

@@ -0,0 +1,22 @@
package server
import (
"fmt"
"net/http"
)
// VersionHandler godoc
// @Summary Get llamactl version
// @Description Returns the version of the llamactl command
// @Tags System
// @Security ApiKeyAuth
// @Produces text/plain
// @Success 200 {string} string "Version information"
// @Failure 500 {string} string "Internal Server Error"
// @Router /api/v1/version [get]
func (h *Handler) VersionHandler() http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
versionInfo := fmt.Sprintf("Version: %s\nCommit: %s\nBuild Time: %s\n", h.cfg.Version, h.cfg.CommitHash, h.cfg.BuildTime)
writeText(w, http.StatusOK, versionInfo)
}
}

View File

@@ -1,13 +0,0 @@
package server
type OpenAIListInstancesResponse struct {
Object string `json:"object"`
Data []OpenAIInstance `json:"data"`
}
type OpenAIInstance struct {
ID string `json:"id"`
Object string `json:"object"`
Created int64 `json:"created"`
OwnedBy string `json:"owned_by"`
}

View File

@@ -8,7 +8,7 @@ import (
"github.com/go-chi/cors"
httpSwagger "github.com/swaggo/http-swagger"
_ "llamactl/apidocs"
_ "llamactl/docs"
"llamactl/webui"
)
@@ -20,7 +20,7 @@ func SetupRouter(handler *Handler) *chi.Mux {
r.Use(cors.Handler(cors.Options{
AllowedOrigins: handler.cfg.Server.AllowedOrigins,
AllowedMethods: []string{"GET", "POST", "PUT", "DELETE", "OPTIONS"},
AllowedHeaders: []string{"Accept", "Authorization", "Content-Type", "X-CSRF-Token"},
AllowedHeaders: handler.cfg.Server.AllowedHeaders,
ExposedHeaders: []string{"Link"},
AllowCredentials: false,
MaxAge: 300,
@@ -44,10 +44,29 @@ func SetupRouter(handler *Handler) *chi.Mux {
r.Get("/version", handler.VersionHandler()) // Get server version
r.Route("/server", func(r chi.Router) {
r.Get("/help", handler.LlamaServerHelpHandler())
r.Get("/version", handler.LlamaServerVersionHandler())
r.Get("/devices", handler.LlamaServerListDevicesHandler())
// Backend-specific endpoints
r.Route("/backends", func(r chi.Router) {
r.Route("/llama-cpp", func(r chi.Router) {
r.Get("/help", handler.LlamaServerHelpHandler())
r.Get("/version", handler.LlamaServerVersionHandler())
r.Get("/devices", handler.LlamaServerListDevicesHandler())
r.Post("/parse-command", handler.ParseLlamaCommand())
})
r.Route("/mlx", func(r chi.Router) {
r.Post("/parse-command", handler.ParseMlxCommand())
})
r.Route("/vllm", func(r chi.Router) {
r.Post("/parse-command", handler.ParseVllmCommand())
})
})
// Node management endpoints
r.Route("/nodes", func(r chi.Router) {
r.Get("/", handler.ListNodes()) // List all nodes
r.Route("/{name}", func(r chi.Router) {
r.Get("/", handler.GetNode())
})
})
// Instance management endpoints
@@ -67,7 +86,7 @@ func SetupRouter(handler *Handler) *chi.Mux {
// Llama.cpp server proxy endpoints (proxied to the actual llama.cpp server)
r.Route("/proxy", func(r chi.Router) {
r.HandleFunc("/*", handler.ProxyToInstance()) // Proxy all llama.cpp server requests
r.HandleFunc("/*", handler.InstanceProxy()) // Proxy all llama.cpp server requests
})
})
})
@@ -93,6 +112,51 @@ func SetupRouter(handler *Handler) *chi.Mux {
})
r.Route("/llama-cpp/{name}", func(r chi.Router) {
// Public Routes
// Allow llama-cpp server to serve its own WebUI if it is running.
// Don't auto start the server since it can be accessed without an API key
r.Get("/", handler.LlamaCppUIProxy())
// Private Routes
r.Group(func(r chi.Router) {
if authMiddleware != nil && handler.cfg.Auth.RequireInferenceAuth {
r.Use(authMiddleware.AuthMiddleware(KeyTypeInference))
}
// This handler auto start the server if it's not running
llamaCppHandler := handler.LlamaCppProxy()
// llama.cpp server specific proxy endpoints
r.Get("/props", llamaCppHandler)
// /slots endpoint is secured (see: https://github.com/ggml-org/llama.cpp/pull/15630)
r.Get("/slots", llamaCppHandler)
r.Post("/apply-template", llamaCppHandler)
r.Post("/completion", llamaCppHandler)
r.Post("/detokenize", llamaCppHandler)
r.Post("/embeddings", llamaCppHandler)
r.Post("/infill", llamaCppHandler)
r.Post("/metrics", llamaCppHandler)
r.Post("/props", llamaCppHandler)
r.Post("/reranking", llamaCppHandler)
r.Post("/tokenize", llamaCppHandler)
// OpenAI-compatible proxy endpoint
// Handles all POST requests to /v1/*, including:
// - /v1/completions
// - /v1/chat/completions
// - /v1/embeddings
// - /v1/rerank
// - /v1/reranking
// llamaCppHandler is used here because some users of llama.cpp endpoints depend
// on "model" field being optional, and handler.OpenAIProxy requires it.
r.Post("/v1/*", llamaCppHandler)
})
})
// Serve WebUI files
if err := webui.SetupWebUI(r); err != nil {
fmt.Printf("Failed to set up WebUI: %v\n", err)

View File

@@ -1,5 +1,7 @@
package testutil
import "slices"
// Helper functions for pointer fields
func BoolPtr(b bool) *bool {
return &b
@@ -8,3 +10,23 @@ func BoolPtr(b bool) *bool {
func IntPtr(i int) *int {
return &i
}
// Helper functions for testing command arguments
// Contains checks if a slice contains a specific item
func Contains(slice []string, item string) bool {
return slices.Contains(slice, item)
}
// ContainsFlagWithValue checks if args contains a flag followed by a specific value
func ContainsFlagWithValue(args []string, flag, value string) bool {
for i, arg := range args {
if arg == flag {
// Check if there's a next argument and it matches the expected value
if i+1 < len(args) && args[i+1] == value {
return true
}
}
}
return false
}

View File

@@ -2,8 +2,6 @@ package validation
import (
"fmt"
"llamactl/pkg/backends"
"llamactl/pkg/instance"
"reflect"
"regexp"
)
@@ -24,8 +22,8 @@ var (
type ValidationError error
// validateStringForInjection checks if a string contains dangerous patterns
func validateStringForInjection(value string) error {
// ValidateStringForInjection checks if a string contains dangerous patterns
func ValidateStringForInjection(value string) error {
for _, pattern := range dangerousPatterns {
if pattern.MatchString(value) {
return ValidationError(fmt.Errorf("value contains potentially dangerous characters: %s", value))
@@ -34,42 +32,8 @@ func validateStringForInjection(value string) error {
return nil
}
// ValidateInstanceOptions performs validation based on backend type
func ValidateInstanceOptions(options *instance.CreateInstanceOptions) error {
if options == nil {
return ValidationError(fmt.Errorf("options cannot be nil"))
}
// Validate based on backend type
switch options.BackendType {
case backends.BackendTypeLlamaCpp:
return validateLlamaCppOptions(options)
default:
return ValidationError(fmt.Errorf("unsupported backend type: %s", options.BackendType))
}
}
// validateLlamaCppOptions validates llama.cpp specific options
func validateLlamaCppOptions(options *instance.CreateInstanceOptions) error {
if options.LlamaServerOptions == nil {
return ValidationError(fmt.Errorf("llama server options cannot be nil for llama.cpp backend"))
}
// Use reflection to check all string fields for injection patterns
if err := validateStructStrings(options.LlamaServerOptions, ""); err != nil {
return err
}
// Basic network validation for port
if options.LlamaServerOptions.Port < 0 || options.LlamaServerOptions.Port > 65535 {
return ValidationError(fmt.Errorf("invalid port range: %d", options.LlamaServerOptions.Port))
}
return nil
}
// validateStructStrings recursively validates all string fields in a struct
func validateStructStrings(v any, fieldPath string) error {
// ValidateStructStrings recursively validates all string fields in a struct
func ValidateStructStrings(v any, fieldPath string) error {
val := reflect.ValueOf(v)
if val.Kind() == reflect.Ptr {
val = val.Elem()
@@ -95,21 +59,21 @@ func validateStructStrings(v any, fieldPath string) error {
switch field.Kind() {
case reflect.String:
if err := validateStringForInjection(field.String()); err != nil {
if err := ValidateStringForInjection(field.String()); err != nil {
return ValidationError(fmt.Errorf("field %s: %w", fieldName, err))
}
case reflect.Slice:
if field.Type().Elem().Kind() == reflect.String {
for j := 0; j < field.Len(); j++ {
if err := validateStringForInjection(field.Index(j).String()); err != nil {
if err := ValidateStringForInjection(field.Index(j).String()); err != nil {
return ValidationError(fmt.Errorf("field %s[%d]: %w", fieldName, j, err))
}
}
}
case reflect.Struct:
if err := validateStructStrings(field.Interface(), fieldName); err != nil {
if err := ValidateStructStrings(field.Interface(), fieldName); err != nil {
return err
}
}

View File

@@ -2,9 +2,6 @@ package validation_test
import (
"llamactl/pkg/backends"
"llamactl/pkg/backends/llamacpp"
"llamactl/pkg/instance"
"llamactl/pkg/testutil"
"llamactl/pkg/validation"
"strings"
"testing"
@@ -58,13 +55,11 @@ func TestValidateInstanceName(t *testing.T) {
}
func TestValidateInstanceOptions_NilOptions(t *testing.T) {
err := validation.ValidateInstanceOptions(nil)
var opts backends.Options
err := opts.ValidateInstanceOptions()
if err == nil {
t.Error("Expected error for nil options")
}
if !strings.Contains(err.Error(), "options cannot be nil") {
t.Errorf("Expected 'options cannot be nil' error, got: %v", err)
}
}
func TestValidateInstanceOptions_PortValidation(t *testing.T) {
@@ -83,14 +78,14 @@ func TestValidateInstanceOptions_PortValidation(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
options := &instance.CreateInstanceOptions{
options := backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Port: tt.port,
},
}
err := validation.ValidateInstanceOptions(options)
err := options.ValidateInstanceOptions()
if (err != nil) != tt.wantErr {
t.Errorf("ValidateInstanceOptions(port=%d) error = %v, wantErr %v", tt.port, err, tt.wantErr)
}
@@ -137,14 +132,14 @@ func TestValidateInstanceOptions_StringInjection(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Test with Model field (string field)
options := &instance.CreateInstanceOptions{
options := backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Model: tt.value,
},
}
err := validation.ValidateInstanceOptions(options)
err := options.ValidateInstanceOptions()
if (err != nil) != tt.wantErr {
t.Errorf("ValidateInstanceOptions(model=%q) error = %v, wantErr %v", tt.value, err, tt.wantErr)
}
@@ -175,14 +170,14 @@ func TestValidateInstanceOptions_ArrayInjection(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
// Test with Lora field (array field)
options := &instance.CreateInstanceOptions{
options := backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Lora: tt.array,
},
}
err := validation.ValidateInstanceOptions(options)
err := options.ValidateInstanceOptions()
if (err != nil) != tt.wantErr {
t.Errorf("ValidateInstanceOptions(lora=%v) error = %v, wantErr %v", tt.array, err, tt.wantErr)
}
@@ -194,14 +189,14 @@ func TestValidateInstanceOptions_MultipleFieldInjection(t *testing.T) {
// Test that injection in any field is caught
tests := []struct {
name string
options *instance.CreateInstanceOptions
options backends.Options
wantErr bool
}{
{
name: "injection in model field",
options: &instance.CreateInstanceOptions{
options: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "safe.gguf",
HFRepo: "microsoft/model; curl evil.com",
},
@@ -210,9 +205,9 @@ func TestValidateInstanceOptions_MultipleFieldInjection(t *testing.T) {
},
{
name: "injection in log file",
options: &instance.CreateInstanceOptions{
options: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "safe.gguf",
LogFile: "/tmp/log.txt | tee /etc/passwd",
},
@@ -221,9 +216,9 @@ func TestValidateInstanceOptions_MultipleFieldInjection(t *testing.T) {
},
{
name: "all safe fields",
options: &instance.CreateInstanceOptions{
options: backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
LlamaServerOptions: &backends.LlamaServerOptions{
Model: "/path/to/model.gguf",
HFRepo: "microsoft/DialoGPT-medium",
LogFile: "/tmp/llama.log",
@@ -237,7 +232,7 @@ func TestValidateInstanceOptions_MultipleFieldInjection(t *testing.T) {
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
err := validation.ValidateInstanceOptions(tt.options)
err := tt.options.ValidateInstanceOptions()
if (err != nil) != tt.wantErr {
t.Errorf("ValidateInstanceOptions() error = %v, wantErr %v", err, tt.wantErr)
}
@@ -247,12 +242,9 @@ func TestValidateInstanceOptions_MultipleFieldInjection(t *testing.T) {
func TestValidateInstanceOptions_NonStringFields(t *testing.T) {
// Test that non-string fields don't interfere with validation
options := &instance.CreateInstanceOptions{
AutoRestart: testutil.BoolPtr(true),
MaxRestarts: testutil.IntPtr(5),
RestartDelay: testutil.IntPtr(10),
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &llamacpp.LlamaServerOptions{
options := backends.Options{
BackendType: backends.BackendTypeLlamaCpp,
LlamaServerOptions: &backends.LlamaServerOptions{
Port: 8080,
GPULayers: 32,
CtxSize: 4096,
@@ -264,7 +256,7 @@ func TestValidateInstanceOptions_NonStringFields(t *testing.T) {
},
}
err := validation.ValidateInstanceOptions(options)
err := options.ValidateInstanceOptions()
if err != nil {
t.Errorf("ValidateInstanceOptions with non-string fields should not error, got: %v", err)
}

100
webui/package-lock.json generated
View File

@@ -19,6 +19,7 @@
"lucide-react": "^0.525.0",
"react": "^19.1.0",
"react-dom": "^19.1.0",
"sonner": "^2.0.7",
"tailwind-merge": "^3.3.1",
"tailwindcss": "^4.1.11",
"zod": "^4.0.5"
@@ -42,7 +43,7 @@
"tw-animate-css": "^1.3.5",
"typescript": "^5.8.3",
"typescript-eslint": "^8.38.0",
"vite": "^7.0.5",
"vite": "^7.1.11",
"vitest": "^3.2.4"
}
},
@@ -2109,6 +2110,60 @@
"node": ">=14.0.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/@emnapi/core": {
"version": "1.4.3",
"inBundle": true,
"license": "MIT",
"optional": true,
"dependencies": {
"@emnapi/wasi-threads": "1.0.2",
"tslib": "^2.4.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/@emnapi/runtime": {
"version": "1.4.3",
"inBundle": true,
"license": "MIT",
"optional": true,
"dependencies": {
"tslib": "^2.4.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/@emnapi/wasi-threads": {
"version": "1.0.2",
"inBundle": true,
"license": "MIT",
"optional": true,
"dependencies": {
"tslib": "^2.4.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/@napi-rs/wasm-runtime": {
"version": "0.2.11",
"inBundle": true,
"license": "MIT",
"optional": true,
"dependencies": {
"@emnapi/core": "^1.4.3",
"@emnapi/runtime": "^1.4.3",
"@tybys/wasm-util": "^0.9.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/@tybys/wasm-util": {
"version": "0.9.0",
"inBundle": true,
"license": "MIT",
"optional": true,
"dependencies": {
"tslib": "^2.4.0"
}
},
"node_modules/@tailwindcss/oxide-wasm32-wasi/node_modules/tslib": {
"version": "2.8.0",
"inBundle": true,
"license": "0BSD",
"optional": true
},
"node_modules/@tailwindcss/oxide-win32-arm64-msvc": {
"version": "4.1.11",
"resolved": "https://registry.npmjs.org/@tailwindcss/oxide-win32-arm64-msvc/-/oxide-win32-arm64-msvc-4.1.11.tgz",
@@ -4190,10 +4245,13 @@
}
},
"node_modules/fdir": {
"version": "6.4.6",
"resolved": "https://registry.npmjs.org/fdir/-/fdir-6.4.6.tgz",
"integrity": "sha512-hiFoqpyZcfNm1yc4u8oWCf9A2c4D3QjCrks3zmoVKVxpQRzmPNar1hUJcBG2RQHvEVGDN+Jm81ZheVLAQMK6+w==",
"version": "6.5.0",
"resolved": "https://registry.npmjs.org/fdir/-/fdir-6.5.0.tgz",
"integrity": "sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg==",
"license": "MIT",
"engines": {
"node": ">=12.0.0"
},
"peerDependencies": {
"picomatch": "^3 || ^4"
},
@@ -6693,6 +6751,16 @@
"node": ">=18"
}
},
"node_modules/sonner": {
"version": "2.0.7",
"resolved": "https://registry.npmjs.org/sonner/-/sonner-2.0.7.tgz",
"integrity": "sha512-W6ZN4p58k8aDKA4XPcx2hpIQXBRAgyiWVkYhT7CvK6D3iAu7xjvVyhQHg2/iaKJZ1XVJ4r7XuwGL+WGEK37i9w==",
"license": "MIT",
"peerDependencies": {
"react": "^18.0.0 || ^19.0.0 || ^19.0.0-rc",
"react-dom": "^18.0.0 || ^19.0.0 || ^19.0.0-rc"
}
},
"node_modules/source-map-js": {
"version": "1.2.1",
"resolved": "https://registry.npmjs.org/source-map-js/-/source-map-js-1.2.1.tgz",
@@ -6973,13 +7041,13 @@
"license": "MIT"
},
"node_modules/tinyglobby": {
"version": "0.2.14",
"resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.14.tgz",
"integrity": "sha512-tX5e7OM1HnYr2+a2C/4V0htOcSQcoSTH9KgJnVvNm5zm/cyEWKJ7j7YutsH9CxMdtOkkLFy2AHrMci9IM8IPZQ==",
"version": "0.2.15",
"resolved": "https://registry.npmjs.org/tinyglobby/-/tinyglobby-0.2.15.tgz",
"integrity": "sha512-j2Zq4NyQYG5XMST4cbs02Ak8iJUdxRM0XI5QyxXuZOzKOINmWurp3smXu3y5wDcJrptwpSjgXHzIQxR0omXljQ==",
"license": "MIT",
"dependencies": {
"fdir": "^6.4.4",
"picomatch": "^4.0.2"
"fdir": "^6.5.0",
"picomatch": "^4.0.3"
},
"engines": {
"node": ">=12.0.0"
@@ -7356,17 +7424,17 @@
}
},
"node_modules/vite": {
"version": "7.0.5",
"resolved": "https://registry.npmjs.org/vite/-/vite-7.0.5.tgz",
"integrity": "sha512-1mncVwJxy2C9ThLwz0+2GKZyEXuC3MyWtAAlNftlZZXZDP3AJt5FmwcMit/IGGaNZ8ZOB2BNO/HFUB+CpN0NQw==",
"version": "7.1.11",
"resolved": "https://registry.npmjs.org/vite/-/vite-7.1.11.tgz",
"integrity": "sha512-uzcxnSDVjAopEUjljkWh8EIrg6tlzrjFUfMcR1EVsRDGwf/ccef0qQPRyOrROwhrTDaApueq+ja+KLPlzR/zdg==",
"license": "MIT",
"dependencies": {
"esbuild": "^0.25.0",
"fdir": "^6.4.6",
"picomatch": "^4.0.2",
"fdir": "^6.5.0",
"picomatch": "^4.0.3",
"postcss": "^8.5.6",
"rollup": "^4.40.0",
"tinyglobby": "^0.2.14"
"rollup": "^4.43.0",
"tinyglobby": "^0.2.15"
},
"bin": {
"vite": "bin/vite.js"

View File

@@ -28,6 +28,7 @@
"lucide-react": "^0.525.0",
"react": "^19.1.0",
"react-dom": "^19.1.0",
"sonner": "^2.0.7",
"tailwind-merge": "^3.3.1",
"tailwindcss": "^4.1.11",
"zod": "^4.0.5"
@@ -51,7 +52,7 @@
"tw-animate-css": "^1.3.5",
"typescript": "^5.8.3",
"typescript-eslint": "^8.38.0",
"vite": "^7.0.5",
"vite": "^7.1.11",
"vitest": "^3.2.4"
}
}

View File

@@ -8,6 +8,7 @@ import { type CreateInstanceOptions, type Instance } from "@/types/instance";
import { useInstances } from "@/contexts/InstancesContext";
import { useAuth } from "@/contexts/AuthContext";
import { ThemeProvider } from "@/contexts/ThemeContext";
import { Toaster } from "sonner";
function App() {
const { isAuthenticated, isLoading: authLoading } = useAuth();
@@ -85,6 +86,8 @@ function App() {
open={isSystemInfoModalOpen}
onOpenChange={setIsSystemInfoModalOpen}
/>
<Toaster />
</div>
</ThemeProvider>
);

View File

@@ -12,12 +12,14 @@ import { AuthProvider } from '@/contexts/AuthContext'
vi.mock('@/lib/api', () => ({
instancesApi: {
list: vi.fn(),
get: vi.fn(),
create: vi.fn(),
update: vi.fn(),
start: vi.fn(),
stop: vi.fn(),
restart: vi.fn(),
delete: vi.fn(),
getHealth: vi.fn(),
},
serverApi: {
getHelp: vi.fn(),
@@ -30,9 +32,21 @@ vi.mock('@/lib/api', () => ({
vi.mock('@/lib/healthService', () => ({
healthService: {
subscribe: vi.fn(() => () => {}),
checkHealth: vi.fn(),
refreshHealth: vi.fn(() => Promise.resolve()),
checkHealthAfterOperation: vi.fn(),
performHealthCheck: vi.fn(() => Promise.resolve({
state: 'ready',
instanceStatus: 'running',
lastChecked: new Date(),
source: 'http'
})),
},
checkHealth: vi.fn(),
checkHealth: vi.fn(() => Promise.resolve({
state: 'ready',
instanceStatus: 'running',
lastChecked: new Date(),
source: 'http'
})),
}))
function renderApp() {
@@ -160,7 +174,7 @@ describe('App Component - Critical Business Logic Only', () => {
expect(screen.getAllByTitle('Start instance').length).toBeGreaterThan(0)
expect(screen.getAllByTitle('Stop instance').length).toBeGreaterThan(0)
expect(screen.getAllByTitle('Edit instance').length).toBe(2)
expect(screen.getAllByTitle('Delete instance').length).toBeGreaterThan(0)
expect(screen.getAllByTitle('More actions').length).toBe(2)
})
it('delete confirmation calls correct API', async () => {
@@ -174,8 +188,17 @@ describe('App Component - Critical Business Logic Only', () => {
expect(screen.getByText('test-instance-1')).toBeInTheDocument()
})
const deleteButtons = screen.getAllByTitle('Delete instance')
await user.click(deleteButtons[0])
// First click the "More actions" button to reveal the delete button
const moreActionsButtons = screen.getAllByTitle('More actions')
await user.click(moreActionsButtons[0])
// Wait for the delete button to appear and click it
await waitFor(() => {
expect(screen.getByTitle('Delete instance')).toBeInTheDocument()
})
const deleteButton = screen.getByTitle('Delete instance')
await user.click(deleteButton)
// Verify confirmation and API call
expect(confirmSpy).toHaveBeenCalledWith('Are you sure you want to delete instance "test-instance-1"?')

View File

@@ -0,0 +1,65 @@
import React from "react";
import { Badge } from "@/components/ui/badge";
import { BackendType, type BackendTypeValue } from "@/types/instance";
import { Server, Package } from "lucide-react";
interface BackendBadgeProps {
backend?: BackendTypeValue;
docker?: boolean;
}
const BackendBadge: React.FC<BackendBadgeProps> = ({ backend, docker }) => {
if (!backend) {
return null;
}
const getText = () => {
switch (backend) {
case BackendType.LLAMA_CPP:
return "llama.cpp";
case BackendType.MLX_LM:
return "MLX";
case BackendType.VLLM:
return "vLLM";
default:
return backend;
}
};
const getColorClasses = () => {
switch (backend) {
case BackendType.LLAMA_CPP:
return "bg-blue-100 text-blue-800 border-blue-200 dark:bg-blue-900 dark:text-blue-200 dark:border-blue-800";
case BackendType.MLX_LM:
return "bg-green-100 text-green-800 border-green-200 dark:bg-green-900 dark:text-green-200 dark:border-green-800";
case BackendType.VLLM:
return "bg-purple-100 text-purple-800 border-purple-200 dark:bg-purple-900 dark:text-purple-200 dark:border-purple-800";
default:
return "bg-gray-100 text-gray-800 border-gray-200 dark:bg-gray-900 dark:text-gray-200 dark:border-gray-800";
}
};
return (
<div className="flex items-center gap-1">
<Badge
variant="outline"
className={`flex items-center gap-1.5 ${getColorClasses()}`}
>
<Server className="h-3 w-3" />
<span className="text-xs">{getText()}</span>
</Badge>
{docker && (
<Badge
variant="outline"
className="flex items-center gap-1.5 bg-orange-100 text-orange-800 border-orange-200 dark:bg-orange-900 dark:text-orange-200 dark:border-orange-800"
title="Docker enabled"
>
<Package className="h-3 w-3" />
<span className="text-[10px] uppercase tracking-wide">Docker</span>
</Badge>
)}
</div>
);
};
export default BackendBadge;

View File

@@ -2,11 +2,10 @@ import React from 'react'
import { Input } from '@/components/ui/input'
import { Label } from '@/components/ui/label'
import { Checkbox } from '@/components/ui/checkbox'
import type { BackendOptions } from '@/schemas/instanceOptions'
import { getBackendFieldType, basicBackendFieldsConfig } from '@/lib/zodFormUtils'
interface BackendFormFieldProps {
fieldKey: keyof BackendOptions
fieldKey: string
value: string | number | boolean | string[] | undefined
onChange: (key: string, value: string | number | boolean | string[] | undefined) => void
}
@@ -46,7 +45,6 @@ const BackendFormField: React.FC<BackendFormFieldProps> = ({ fieldKey, value, on
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}
@@ -73,7 +71,6 @@ const BackendFormField: React.FC<BackendFormFieldProps> = ({ fieldKey, value, on
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}
@@ -100,7 +97,6 @@ const BackendFormField: React.FC<BackendFormFieldProps> = ({ fieldKey, value, on
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}

View File

@@ -2,7 +2,7 @@
import React from "react";
import { Badge } from "@/components/ui/badge";
import type { HealthStatus } from "@/types/instance";
import { CheckCircle, Loader2, XCircle } from "lucide-react";
import { CheckCircle, Loader2, XCircle, Clock } from "lucide-react";
interface HealthBadgeProps {
health?: HealthStatus;
@@ -10,37 +10,33 @@ interface HealthBadgeProps {
const HealthBadge: React.FC<HealthBadgeProps> = ({ health }) => {
if (!health) {
health = {
status: "unknown", // Default to unknown if not provided
lastChecked: new Date(), // Default to current date
message: undefined, // No message by default
};
return null;
}
const getIcon = () => {
switch (health.status) {
case "ok":
switch (health.state) {
case "ready":
return <CheckCircle className="h-3 w-3" />;
case "loading":
case "starting":
return <Loader2 className="h-3 w-3 animate-spin" />;
case "error":
return <XCircle className="h-3 w-3" />;
case "unknown":
case "restarting":
return <Loader2 className="h-3 w-3 animate-spin" />;
case "stopped":
return <Clock className="h-3 w-3" />;
case "failed":
return <XCircle className="h-3 w-3" />;
}
};
const getVariant = () => {
switch (health.status) {
case "ok":
switch (health.state) {
case "ready":
return "default";
case "loading":
case "starting":
return "outline";
case "error":
return "destructive";
case "unknown":
case "restarting":
return "outline";
case "stopped":
return "secondary";
case "failed":
return "destructive";
@@ -48,15 +44,15 @@ const HealthBadge: React.FC<HealthBadgeProps> = ({ health }) => {
};
const getText = () => {
switch (health.status) {
case "ok":
switch (health.state) {
case "ready":
return "Ready";
case "loading":
return "Loading";
case "error":
return "Error";
case "unknown":
return "Unknown";
case "starting":
return "Starting";
case "restarting":
return "Restarting";
case "stopped":
return "Stopped";
case "failed":
return "Failed";
}
@@ -66,10 +62,11 @@ const HealthBadge: React.FC<HealthBadgeProps> = ({ health }) => {
<Badge
variant={getVariant()}
className={`flex items-center gap-1.5 ${
health.status === "ok"
health.state === "ready"
? "bg-green-100 text-green-800 border-green-200 dark:bg-green-900 dark:text-green-200 dark:border-green-800"
: ""
}`}
title={health.error || `Source: ${health.source}`}
>
{getIcon()}
<span className="text-xs">{getText()}</span>

View File

@@ -2,9 +2,10 @@
import { Button } from "@/components/ui/button";
import { Card, CardContent, CardHeader, CardTitle } from "@/components/ui/card";
import type { Instance } from "@/types/instance";
import { Edit, FileText, Play, Square, Trash2 } from "lucide-react";
import { Edit, FileText, Play, Square, Trash2, MoreHorizontal } from "lucide-react";
import LogsDialog from "@/components/LogDialog";
import HealthBadge from "@/components/HealthBadge";
import BackendBadge from "@/components/BackendBadge";
import { useState } from "react";
import { useInstanceHealth } from "@/hooks/useInstanceHealth";
@@ -24,6 +25,7 @@ function InstanceCard({
editInstance,
}: InstanceCardProps) {
const [isLogsOpen, setIsLogsOpen] = useState(false);
const [showAllActions, setShowAllActions] = useState(false);
const health = useInstanceHealth(instance.name, instance.status);
const handleStart = () => {
@@ -54,36 +56,44 @@ function InstanceCard({
return (
<>
<Card>
<CardHeader className="pb-3">
<div className="flex items-center justify-between">
<CardTitle className="text-lg">{instance.name}</CardTitle>
{running && <HealthBadge health={health} />}
<Card className="hover:shadow-md transition-shadow">
<CardHeader className="pb-4">
{/* Header with instance name and status badges */}
<div className="space-y-3">
<CardTitle className="text-lg font-semibold leading-tight break-words">
{instance.name}
</CardTitle>
{/* Badges row */}
<div className="flex items-center gap-2 flex-wrap">
<BackendBadge backend={instance.options?.backend_type} docker={instance.docker_enabled} />
{running && <HealthBadge health={health} />}
</div>
</div>
</CardHeader>
<CardContent>
<div className="flex gap-1">
<CardContent className="pt-0">
{/* Primary actions - always visible */}
<div className="flex items-center gap-2 mb-3">
<Button
size="sm"
variant="outline"
onClick={handleStart}
disabled={running}
title="Start instance"
data-testid="start-instance-button"
variant={running ? "outline" : "default"}
onClick={running ? handleStop : handleStart}
className="flex-1"
title={running ? "Stop instance" : "Start instance"}
data-testid={running ? "stop-instance-button" : "start-instance-button"}
>
<Play className="h-4 w-4" />
</Button>
<Button
size="sm"
variant="outline"
onClick={handleStop}
disabled={!running}
title="Stop instance"
data-testid="stop-instance-button"
>
<Square className="h-4 w-4" />
{running ? (
<>
<Square className="h-4 w-4 mr-1" />
Stop
</>
) : (
<>
<Play className="h-4 w-4 mr-1" />
Start
</>
)}
</Button>
<Button
@@ -99,24 +109,40 @@ function InstanceCard({
<Button
size="sm"
variant="outline"
onClick={handleLogs}
title="View logs"
data-testid="view-logs-button"
onClick={() => setShowAllActions(!showAllActions)}
title="More actions"
>
<FileText className="h-4 w-4" />
</Button>
<Button
size="sm"
variant="destructive"
onClick={handleDelete}
disabled={running}
title="Delete instance"
data-testid="delete-instance-button"
>
<Trash2 className="h-4 w-4" />
<MoreHorizontal className="h-4 w-4" />
</Button>
</div>
{/* Secondary actions - collapsible */}
{showAllActions && (
<div className="flex items-center gap-2 pt-2 border-t border-border">
<Button
size="sm"
variant="outline"
onClick={handleLogs}
title="View logs"
data-testid="view-logs-button"
className="flex-1"
>
<FileText className="h-4 w-4 mr-1" />
Logs
</Button>
<Button
size="sm"
variant="destructive"
onClick={handleDelete}
disabled={running}
title="Delete instance"
data-testid="delete-instance-button"
>
<Trash2 className="h-4 w-4" />
</Button>
</div>
)}
</CardContent>
</Card>

View File

@@ -1,7 +1,5 @@
import React, { useState, useEffect } from "react";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Label } from "@/components/ui/label";
import {
Dialog,
DialogContent,
@@ -11,10 +9,9 @@ import {
DialogTitle,
} from "@/components/ui/dialog";
import { BackendType, type CreateInstanceOptions, type Instance } from "@/types/instance";
import { getBasicFields, getAdvancedFields, getBasicBackendFields, getAdvancedBackendFields } from "@/lib/zodFormUtils";
import { ChevronDown, ChevronRight } from "lucide-react";
import ZodFormField from "@/components/ZodFormField";
import BackendFormField from "@/components/BackendFormField";
import ParseCommandDialog from "@/components/ParseCommandDialog";
import InstanceSettingsCard from "@/components/instance/InstanceSettingsCard";
import BackendConfigurationCard from "@/components/instance/BackendConfigurationCard";
interface InstanceDialogProps {
open: boolean;
@@ -33,14 +30,9 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
const [instanceName, setInstanceName] = useState("");
const [formData, setFormData] = useState<CreateInstanceOptions>({});
const [showAdvanced, setShowAdvanced] = useState(false);
const [nameError, setNameError] = useState("");
const [showParseDialog, setShowParseDialog] = useState(false);
// Get field lists dynamically from the type
const basicFields = getBasicFields();
const advancedFields = getAdvancedFields();
const basicBackendFields = getBasicBackendFields();
const advancedBackendFields = getAdvancedBackendFields();
// Reset form when dialog opens/closes or when instance changes
useEffect(() => {
@@ -58,16 +50,26 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
backend_options: {},
});
}
setShowAdvanced(false); // Always start with basic view
setNameError(""); // Reset any name errors
}
}, [open, instance]);
const handleFieldChange = (key: keyof CreateInstanceOptions, value: any) => {
setFormData((prev) => ({
...prev,
[key]: value,
}));
setFormData((prev) => {
// If backend_type is changing, clear backend_options
if (key === 'backend_type' && prev.backend_type !== value) {
return {
...prev,
[key]: value,
backend_options: {}, // Clear backend options when backend type changes
};
}
return {
...prev,
[key]: value,
};
});
};
const handleBackendFieldChange = (key: string, value: any) => {
@@ -76,7 +78,7 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
backend_options: {
...prev.backend_options,
[key]: value,
},
} as any,
}));
};
@@ -104,7 +106,7 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
// Clean up undefined values to avoid sending empty fields
const cleanOptions: CreateInstanceOptions = {};
Object.entries(formData).forEach(([key, value]) => {
if (key === 'backend_options' && value && typeof value === 'object') {
if (key === 'backend_options' && value && typeof value === 'object' && !Array.isArray(value)) {
// Handle backend_options specially - clean nested object
const cleanBackendOptions: any = {};
Object.entries(value).forEach(([backendKey, backendValue]) => {
@@ -116,13 +118,17 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
cleanBackendOptions[backendKey] = backendValue;
}
});
// Only include backend_options if it has content
if (Object.keys(cleanBackendOptions).length > 0) {
(cleanOptions as any)[key] = cleanBackendOptions;
}
} else if (value !== undefined && value !== null && (typeof value !== 'string' || value.trim() !== "")) {
// Handle arrays - don't include empty arrays
} else if (value !== undefined && value !== null) {
// Skip empty strings
if (typeof value === 'string' && value.trim() === "") {
return;
}
// Skip empty arrays
if (Array.isArray(value) && value.length === 0) {
return;
}
@@ -138,12 +144,15 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
onOpenChange(false);
};
const toggleAdvanced = () => {
setShowAdvanced(!showAdvanced);
const handleCommandParsed = (parsedOptions: CreateInstanceOptions) => {
setFormData(prev => ({
...prev,
...parsedOptions,
}));
setShowParseDialog(false);
};
// Check if auto_restart is enabled
const isAutoRestartEnabled = formData.auto_restart === true;
// Save button label logic
let saveButtonLabel = "Create Instance";
@@ -170,168 +179,25 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
</DialogHeader>
<div className="flex-1 overflow-y-auto">
<div className="grid gap-6 py-4">
{/* Instance Name - Special handling since it's not in CreateInstanceOptions */}
<div className="grid gap-2">
<Label htmlFor="name">
Instance Name <span className="text-red-500">*</span>
</Label>
<Input
id="name"
value={instanceName}
onChange={(e) => handleNameChange(e.target.value)}
placeholder="my-instance"
disabled={isEditing} // Don't allow name changes when editing
className={nameError ? "border-red-500" : ""}
/>
{nameError && <p className="text-sm text-red-500">{nameError}</p>}
<p className="text-sm text-muted-foreground">
Unique identifier for the instance
</p>
</div>
<div className="space-y-6 py-4">
{/* Instance Settings Card */}
<InstanceSettingsCard
instanceName={instanceName}
nameError={nameError}
isEditing={isEditing}
formData={formData}
onNameChange={handleNameChange}
onChange={handleFieldChange}
/>
{/* Auto Restart Configuration Section */}
<div className="space-y-4">
<h3 className="text-lg font-medium">
Auto Restart Configuration
</h3>
{/* Backend Configuration Card */}
<BackendConfigurationCard
formData={formData}
onBackendFieldChange={handleBackendFieldChange}
onChange={handleFieldChange}
onParseCommand={() => setShowParseDialog(true)}
/>
{/* Auto Restart Toggle */}
<ZodFormField
fieldKey="auto_restart"
value={formData.auto_restart}
onChange={handleFieldChange}
/>
{/* Show restart options only when auto restart is enabled */}
{isAutoRestartEnabled && (
<div className="ml-6 space-y-4 border-l-2 border-muted pl-4">
<ZodFormField
fieldKey="max_restarts"
value={formData.max_restarts}
onChange={handleFieldChange}
/>
<ZodFormField
fieldKey="restart_delay"
value={formData.restart_delay}
onChange={handleFieldChange}
/>
</div>
)}
</div>
{/* Basic Fields - Automatically generated from type (excluding auto restart options) */}
<div className="space-y-4">
<h3 className="text-lg font-medium">Basic Configuration</h3>
{basicFields
.filter(
(fieldKey) =>
fieldKey !== "auto_restart" &&
fieldKey !== "max_restarts" &&
fieldKey !== "restart_delay" &&
fieldKey !== "backend_options" // backend_options is handled separately
)
.map((fieldKey) => (
<ZodFormField
key={fieldKey}
fieldKey={fieldKey}
value={formData[fieldKey]}
onChange={handleFieldChange}
/>
))}
</div>
{/* Backend Configuration Section */}
<div className="space-y-4">
<h3 className="text-lg font-medium">Backend Configuration</h3>
{/* Basic backend fields */}
{basicBackendFields.map((fieldKey) => (
<BackendFormField
key={fieldKey}
fieldKey={fieldKey}
value={formData.backend_options?.[fieldKey]}
onChange={handleBackendFieldChange}
/>
))}
</div>
{/* Advanced Fields Toggle */}
<div className="border-t pt-4">
<Button
variant="ghost"
onClick={toggleAdvanced}
className="flex items-center gap-2 p-0 h-auto font-medium"
>
{showAdvanced ? (
<ChevronDown className="h-4 w-4" />
) : (
<ChevronRight className="h-4 w-4" />
)}
Advanced Configuration
<span className="text-muted-foreground text-sm font-normal">
(
{
advancedFields.filter(
(f) =>
!["max_restarts", "restart_delay", "backend_options"].includes(f as string)
).length + advancedBackendFields.length
}{" "}
options)
</span>
</Button>
</div>
{/* Advanced Fields - Automatically generated from type (excluding restart options) */}
{showAdvanced && (
<div className="space-y-4 pl-6 border-l-2 border-muted">
{/* Advanced instance fields */}
{advancedFields
.filter(
(fieldKey) =>
!["max_restarts", "restart_delay", "backend_options"].includes(
fieldKey as string
)
).length > 0 && (
<div className="space-y-4">
<h4 className="text-md font-medium">Advanced Instance Configuration</h4>
{advancedFields
.filter(
(fieldKey) =>
!["max_restarts", "restart_delay", "backend_options"].includes(
fieldKey as string
)
)
.sort()
.map((fieldKey) => (
<ZodFormField
key={fieldKey}
fieldKey={fieldKey}
value={fieldKey === 'backend_options' ? undefined : formData[fieldKey]}
onChange={handleFieldChange}
/>
))}
</div>
)}
{/* Advanced backend fields */}
{advancedBackendFields.length > 0 && (
<div className="space-y-4">
<h4 className="text-md font-medium">Advanced Backend Configuration</h4>
{advancedBackendFields
.sort()
.map((fieldKey) => (
<BackendFormField
key={fieldKey}
fieldKey={fieldKey}
value={formData.backend_options?.[fieldKey]}
onChange={handleBackendFieldChange}
/>
))}
</div>
)}
</div>
)}
</div>
</div>
@@ -352,6 +218,13 @@ const InstanceDialog: React.FC<InstanceDialogProps> = ({
</Button>
</DialogFooter>
</DialogContent>
<ParseCommandDialog
open={showParseDialog}
onOpenChange={setShowParseDialog}
onParsed={handleCommandParsed}
backendType={formData.backend_type || BackendType.LLAMA_CPP}
/>
</Dialog>
);
};

View File

@@ -0,0 +1,151 @@
import React, { useState } from "react";
import { Button } from "@/components/ui/button";
import { Label } from "@/components/ui/label";
import {
Dialog,
DialogContent,
DialogDescription,
DialogFooter,
DialogHeader,
DialogTitle,
} from "@/components/ui/dialog";
import { BackendType, type BackendTypeValue, type CreateInstanceOptions } from "@/types/instance";
import { backendsApi } from "@/lib/api";
import { toast } from "sonner";
interface ParseCommandDialogProps {
open: boolean;
onOpenChange: (open: boolean) => void;
onParsed: (options: CreateInstanceOptions) => void;
backendType: BackendTypeValue;
}
const ParseCommandDialog: React.FC<ParseCommandDialogProps> = ({
open,
onOpenChange,
onParsed,
backendType,
}) => {
const [command, setCommand] = useState('');
const [loading, setLoading] = useState(false);
const [error, setError] = useState<string | null>(null);
const handleParse = async () => {
if (!command.trim()) {
setError("Command cannot be empty");
return;
}
setLoading(true);
setError(null);
try {
let options: CreateInstanceOptions;
// Parse based on selected backend type
switch (backendType) {
case BackendType.LLAMA_CPP:
options = await backendsApi.llamaCpp.parseCommand(command);
break;
case BackendType.MLX_LM:
options = await backendsApi.mlx.parseCommand(command);
break;
case BackendType.VLLM:
options = await backendsApi.vllm.parseCommand(command);
break;
default:
throw new Error(`Unsupported backend type: ${backendType}`);
}
onParsed(options);
onOpenChange(false);
setCommand('');
setError(null);
toast.success('Command parsed successfully');
} catch (err) {
const errorMessage = err instanceof Error ? err.message : 'Failed to parse command';
setError(errorMessage);
toast.error('Failed to parse command', {
description: errorMessage
});
} finally {
setLoading(false);
}
};
const handleOpenChange = (open: boolean) => {
if (!open) {
setCommand('');
setError(null);
}
onOpenChange(open);
};
const backendPlaceholders: Record<BackendTypeValue, string> = {
[BackendType.LLAMA_CPP]: "llama-server --model /path/to/model.gguf --gpu-layers 32 --ctx-size 4096",
[BackendType.MLX_LM]: "mlx_lm.server --model mlx-community/Mistral-7B-Instruct-v0.3-4bit --host 0.0.0.0 --port 8080",
[BackendType.VLLM]: "vllm serve microsoft/DialoGPT-medium --tensor-parallel-size 2 --gpu-memory-utilization 0.9",
};
const getPlaceholderForBackend = (backendType: BackendTypeValue): string => {
return backendPlaceholders[backendType] || "Enter your command here...";
};
return (
<Dialog open={open} onOpenChange={handleOpenChange}>
<DialogContent className="sm:max-w-[600px]">
<DialogHeader>
<DialogTitle>Parse Backend Command</DialogTitle>
<DialogDescription>
Select your backend type and paste the command to automatically populate the form fields
</DialogDescription>
</DialogHeader>
<div className="space-y-4">
<div>
<Label className="text-sm font-medium">Backend Type:
<span className="font-normal text-muted-foreground">
{backendType === BackendType.LLAMA_CPP && 'Llama Server'}
{backendType === BackendType.MLX_LM && 'MLX LM'}
{backendType === BackendType.VLLM && 'vLLM'}
</span>
</Label>
</div>
<div>
<Label htmlFor="command">Command</Label>
<textarea
id="command"
value={command}
onChange={(e) => setCommand(e.target.value)}
placeholder={getPlaceholderForBackend(backendType)}
className="w-full h-32 p-3 mt-2 border border-input rounded-md font-mono text-sm resize-vertical focus:outline-none focus:ring-2 focus:ring-ring focus:ring-offset-2"
/>
</div>
{error && (
<div className="text-destructive text-sm bg-destructive/10 p-3 rounded-md">
{error}
</div>
)}
</div>
<DialogFooter>
<Button variant="outline" onClick={() => handleOpenChange(false)}>
Cancel
</Button>
<Button
onClick={() => {
handleParse().catch(console.error);
}}
disabled={!command.trim() || loading}
>
{loading ? 'Parsing...' : 'Parse Command'}
</Button>
</DialogFooter>
</DialogContent>
</Dialog>
);
};
export default ParseCommandDialog;

View File

@@ -8,16 +8,19 @@ import {
DialogHeader,
DialogTitle,
} from '@/components/ui/dialog'
import {
RefreshCw,
import SelectInput from '@/components/form/SelectInput'
import {
RefreshCw,
AlertCircle,
Loader2,
ChevronDown,
ChevronRight,
Monitor,
HelpCircle
HelpCircle,
Info
} from 'lucide-react'
import { serverApi } from '@/lib/api'
import { BackendType, type BackendTypeValue } from '@/types/instance'
// Helper to get version from environment
const getAppVersion = (): string => {
@@ -28,166 +31,234 @@ const getAppVersion = (): string => {
}
}
interface SystemInfoModalProps {
interface SystemInfoDialogProps {
open: boolean
onOpenChange: (open: boolean) => void
}
interface SystemInfo {
interface BackendInfo {
version: string
devices: string
help: string
}
const SystemInfoDialog: React.FC<SystemInfoModalProps> = ({
const BACKEND_OPTIONS = [
{ value: BackendType.LLAMA_CPP, label: 'Llama Server' },
{ value: BackendType.MLX_LM, label: 'MLX LM' },
{ value: BackendType.VLLM, label: 'vLLM' },
]
const SystemInfoDialog: React.FC<SystemInfoDialogProps> = ({
open,
onOpenChange
}) => {
const [systemInfo, setSystemInfo] = useState<SystemInfo | null>(null)
const [selectedBackend, setSelectedBackend] = useState<BackendTypeValue>(BackendType.LLAMA_CPP)
const [backendInfo, setBackendInfo] = useState<BackendInfo | null>(null)
const [loading, setLoading] = useState(false)
const [error, setError] = useState<string | null>(null)
const [showHelp, setShowHelp] = useState(false)
// Fetch system info
const fetchSystemInfo = async () => {
// Fetch backend info
const fetchBackendInfo = async (backend: BackendTypeValue) => {
if (backend !== BackendType.LLAMA_CPP) {
setBackendInfo(null)
setError(null)
return
}
setLoading(true)
setError(null)
try {
const [version, devices, help] = await Promise.all([
serverApi.getVersion(),
serverApi.getDevices(),
serverApi.getHelp()
])
setSystemInfo({ version, devices, help })
setBackendInfo({ version, devices, help })
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to fetch system info')
setError(err instanceof Error ? err.message : 'Failed to fetch backend info')
} finally {
setLoading(false)
}
}
// Load data when dialog opens
// Load data when dialog opens or backend changes
useEffect(() => {
if (open) {
fetchSystemInfo()
void fetchBackendInfo(selectedBackend)
}
}, [open])
}, [open, selectedBackend])
const handleBackendChange = (value: string) => {
setSelectedBackend(value as BackendTypeValue)
setShowHelp(false) // Reset help section when switching backends
}
const renderBackendSpecificContent = () => {
if (selectedBackend !== BackendType.LLAMA_CPP) {
return (
<div className="flex items-center justify-center py-8">
<div className="text-center space-y-3">
<Info className="h-8 w-8 text-gray-400 mx-auto" />
<div>
<h3 className="font-semibold text-gray-700">Backend Info Not Available</h3>
<p className="text-sm text-gray-500 mt-1">
Information for {BACKEND_OPTIONS.find(b => b.value === selectedBackend)?.label} backend is not yet implemented.
</p>
</div>
</div>
</div>
)
}
if (loading && !backendInfo) {
return (
<div className="flex items-center justify-center py-8">
<Loader2 className="h-6 w-6 animate-spin text-gray-400" />
<span className="ml-2 text-gray-400">Loading backend information...</span>
</div>
)
}
if (error) {
return (
<div className="flex items-center gap-2 p-4 bg-destructive/10 border border-destructive/20 rounded-lg">
<AlertCircle className="h-4 w-4 text-destructive" />
<span className="text-sm text-destructive">{error}</span>
</div>
)
}
if (!backendInfo) {
return null
}
return (
<div className="space-y-6">
{/* Backend Version Section */}
<div className="space-y-3">
<h3 className="font-semibold">
{BACKEND_OPTIONS.find(b => b.value === selectedBackend)?.label} Version
</h3>
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --version</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{backendInfo.version}
</pre>
</div>
</div>
{/* Devices Section */}
<div className="space-y-3">
<div className="flex items-center gap-2">
<h3 className="font-semibold">Available Devices</h3>
</div>
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --list-devices</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{backendInfo.devices}
</pre>
</div>
</div>
{/* Help Section */}
<div className="space-y-3">
<Button
variant="ghost"
onClick={() => setShowHelp(!showHelp)}
className="flex items-center gap-2 p-0 h-auto font-semibold"
>
{showHelp ? (
<ChevronDown className="h-4 w-4" />
) : (
<ChevronRight className="h-4 w-4" />
)}
<HelpCircle className="h-4 w-4" />
Command Line Options
</Button>
{showHelp && (
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --help</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono max-h-64 overflow-y-auto">
{backendInfo.help}
</pre>
</div>
)}
</div>
</div>
)
}
return (
<Dialog open={open} onOpenChange={onOpenChange} >
<Dialog open={open} onOpenChange={onOpenChange}>
<DialogContent className="sm:max-w-4xl max-w-[calc(100%-2rem)] max-h-[80vh] flex flex-col">
<DialogHeader>
<div className="flex items-center justify-between">
<div>
<DialogTitle className="flex items-center gap-2">
<Monitor className="h-5 w-5" />
System Information
</DialogTitle>
<DialogDescription>
Llama.cpp server environment and capabilities
</DialogDescription>
</div>
<Button
variant="outline"
size="sm"
onClick={fetchSystemInfo}
disabled={loading}
>
{loading ? (
<Loader2 className="h-4 w-4 animate-spin" />
) : (
<RefreshCw className="h-4 w-4" />
)}
</Button>
</div>
<DialogTitle className="flex items-center gap-2">
<Monitor className="h-5 w-5" />
System Information
</DialogTitle>
<DialogDescription>
View system and backend-specific environment and capabilities
</DialogDescription>
</DialogHeader>
<div className="flex-1 overflow-y-auto">
{loading && !systemInfo ? (
<div className="flex items-center justify-center py-12">
<Loader2 className="h-6 w-6 animate-spin text-gray-400" />
<span className="ml-2 text-gray-400">Loading system information...</span>
<div className="space-y-6">
{/* Llamactl Version Section - Always shown */}
<div className="space-y-3">
<h3 className="font-semibold">Llamactl Version</h3>
<div className="bg-gray-900 rounded-lg p-4">
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{getAppVersion()}
</pre>
</div>
</div>
) : error ? (
<div className="flex items-center gap-2 p-4 bg-destructive/10 border border-destructive/20 rounded-lg">
<AlertCircle className="h-4 w-4 text-destructive" />
<span className="text-sm text-destructive">{error}</span>
</div>
) : systemInfo ? (
<div className="space-y-6">
{/* Llamactl Version Section */}
<div className="space-y-3">
<h3 className="font-semibold">Llamactl Version</h3>
<div className="bg-gray-900 rounded-lg p-4">
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{getAppVersion()}
</pre>
</div>
</div>
{/* Llama Server Version Section */}
<div className="space-y-3">
<h3 className="font-semibold">Llama Server Version</h3>
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --version</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{systemInfo.version}
</pre>
{/* Backend Selection Section */}
<div className="space-y-3">
<h3 className="font-semibold">Backend Information</h3>
<div className="flex items-center gap-3">
<div className="flex-1">
<SelectInput
id="backend-select"
label=""
value={selectedBackend}
onChange={(value) => handleBackendChange(value || BackendType.LLAMA_CPP)}
options={BACKEND_OPTIONS}
className="text-sm"
/>
</div>
</div>
{/* Devices Section */}
<div className="space-y-3">
<div className="flex items-center gap-2">
<h3 className="font-semibold">Available Devices</h3>
</div>
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --list-devices</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono">
{systemInfo.devices}
</pre>
</div>
</div>
{/* Help Section */}
<div className="space-y-3">
<Button
variant="ghost"
onClick={() => setShowHelp(!showHelp)}
className="flex items-center gap-2 p-0 h-auto font-semibold"
>
{showHelp ? (
<ChevronDown className="h-4 w-4" />
) : (
<ChevronRight className="h-4 w-4" />
)}
<HelpCircle className="h-4 w-4" />
Command Line Options
</Button>
{showHelp && (
<div className="bg-gray-900 rounded-lg p-4">
<div className="mb-2">
<span className="text-sm text-gray-400">$ llama-server --help</span>
</div>
<pre className="text-sm text-gray-300 whitespace-pre-wrap font-mono max-h-64 overflow-y-auto">
{systemInfo.help}
</pre>
</div>
{selectedBackend === BackendType.LLAMA_CPP && (
<Button
variant="outline"
size="sm"
onClick={() => void fetchBackendInfo(selectedBackend)}
disabled={loading}
>
{loading ? (
<Loader2 className="h-4 w-4 animate-spin" />
) : (
<RefreshCw className="h-4 w-4" />
)}
</Button>
)}
</div>
</div>
) : null}
{/* Backend-specific content */}
{renderBackendSpecificContent()}
</div>
</div>
<DialogFooter>

View File

@@ -1,148 +0,0 @@
import React from 'react'
import { Input } from '@/components/ui/input'
import { Label } from '@/components/ui/label'
import { Checkbox } from '@/components/ui/checkbox'
import type { CreateInstanceOptions } from '@/types/instance'
import { BackendType } from '@/types/instance'
import { getFieldType, basicFieldsConfig } from '@/lib/zodFormUtils'
interface ZodFormFieldProps {
fieldKey: keyof CreateInstanceOptions
value: string | number | boolean | string[] | undefined
onChange: (key: keyof CreateInstanceOptions, value: string | number | boolean | string[] | undefined) => void
}
const ZodFormField: React.FC<ZodFormFieldProps> = ({ fieldKey, value, onChange }) => {
// Get configuration for basic fields, or use field name for advanced fields
const config = basicFieldsConfig[fieldKey as string] || { label: fieldKey }
// Get type from Zod schema
const fieldType = getFieldType(fieldKey)
const handleChange = (newValue: string | number | boolean | string[] | undefined) => {
onChange(fieldKey, newValue)
}
const renderField = () => {
// Special handling for backend_type field - render as dropdown
if (fieldKey === 'backend_type') {
return (
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<select
id={fieldKey}
value={typeof value === 'string' ? value : BackendType.LLAMA_CPP}
onChange={(e) => handleChange(e.target.value || undefined)}
className="flex h-10 w-full rounded-md border border-input bg-background px-3 py-2 text-sm ring-offset-background file:border-0 file:bg-transparent file:text-sm file:font-medium placeholder:text-muted-foreground focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50"
>
<option value={BackendType.LLAMA_CPP}>Llama Server</option>
{/* Add more backend types here as they become available */}
</select>
{config.description && (
<p className="text-sm text-muted-foreground">{config.description}</p>
)}
</div>
)
}
switch (fieldType) {
case 'boolean':
return (
<div className="flex items-center space-x-2">
<Checkbox
id={fieldKey}
checked={typeof value === 'boolean' ? value : false}
onCheckedChange={(checked) => handleChange(checked)}
/>
<Label htmlFor={fieldKey} className="text-sm font-normal">
{config.label}
{config.description && (
<span className="text-muted-foreground ml-1">- {config.description}</span>
)}
</Label>
</div>
)
case 'number':
return (
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}
type="number"
step="any" // This allows decimal numbers
value={typeof value === 'string' || typeof value === 'number' ? value : ''}
onChange={(e) => {
const numValue = e.target.value ? parseFloat(e.target.value) : undefined
// Only update if the parsed value is valid or the input is empty
if (e.target.value === '' || (numValue !== undefined && !isNaN(numValue))) {
handleChange(numValue)
}
}}
placeholder={config.placeholder}
/>
{config.description && (
<p className="text-sm text-muted-foreground">{config.description}</p>
)}
</div>
)
case 'array':
return (
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}
type="text"
value={Array.isArray(value) ? value.join(', ') : ''}
onChange={(e) => {
const arrayValue = e.target.value
? e.target.value.split(',').map(s => s.trim()).filter(Boolean)
: undefined
handleChange(arrayValue)
}}
placeholder="item1, item2, item3"
/>
{config.description && (
<p className="text-sm text-muted-foreground">{config.description}</p>
)}
<p className="text-xs text-muted-foreground">Separate multiple values with commas</p>
</div>
)
case 'text':
default:
return (
<div className="grid gap-2">
<Label htmlFor={fieldKey}>
{config.label}
{config.required && <span className="text-red-500 ml-1">*</span>}
</Label>
<Input
id={fieldKey}
type="text"
value={typeof value === 'string' || typeof value === 'number' ? value : ''}
onChange={(e) => handleChange(e.target.value || undefined)}
placeholder={config.placeholder}
/>
{config.description && (
<p className="text-sm text-muted-foreground">{config.description}</p>
)}
</div>
)
}
}
return <div className="space-y-2">{renderField()}</div>
}
export default ZodFormField

View File

@@ -2,12 +2,16 @@ import { describe, it, expect, vi, beforeEach, afterEach } from 'vitest'
import { render, screen } from '@testing-library/react'
import userEvent from '@testing-library/user-event'
import InstanceCard from '@/components/InstanceCard'
import type { Instance } from '@/types/instance'
import { BackendType } from '@/types/instance'
import { type Instance, BackendType } from '@/types/instance'
// Mock the health hook since we're not testing health logic here
vi.mock('@/hooks/useInstanceHealth', () => ({
useInstanceHealth: vi.fn(() => ({ status: 'ok', lastChecked: new Date() }))
useInstanceHealth: vi.fn(() => ({
state: 'ready',
instanceStatus: 'running',
lastChecked: new Date(),
source: 'http'
}))
}))
describe('InstanceCard - Instance Actions and State', () => {
@@ -102,7 +106,7 @@ afterEach(() => {
it('opens logs dialog when logs button clicked', async () => {
const user = userEvent.setup()
render(
<InstanceCard
instance={stoppedInstance}
@@ -113,9 +117,13 @@ afterEach(() => {
/>
)
// First click "More actions" to reveal the logs button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
const logsButton = screen.getByTitle('View logs')
await user.click(logsButton)
// Should open logs dialog (we can verify this by checking if dialog title appears)
expect(screen.getByText(`Logs: ${stoppedInstance.name}`)).toBeInTheDocument()
})
@@ -125,7 +133,7 @@ afterEach(() => {
it('shows confirmation dialog and calls deleteInstance when confirmed', async () => {
const user = userEvent.setup()
const confirmSpy = vi.spyOn(window, 'confirm').mockReturnValue(true)
render(
<InstanceCard
instance={stoppedInstance}
@@ -136,19 +144,23 @@ afterEach(() => {
/>
)
// First click "More actions" to reveal the delete button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
const deleteButton = screen.getByTitle('Delete instance')
await user.click(deleteButton)
expect(confirmSpy).toHaveBeenCalledWith('Are you sure you want to delete instance "test-instance"?')
expect(mockDeleteInstance).toHaveBeenCalledWith('test-instance')
confirmSpy.mockRestore()
})
it('does not call deleteInstance when confirmation cancelled', async () => {
const user = userEvent.setup()
const confirmSpy = vi.spyOn(window, 'confirm').mockReturnValue(false)
render(
<InstanceCard
instance={stoppedInstance}
@@ -159,18 +171,24 @@ afterEach(() => {
/>
)
// First click "More actions" to reveal the delete button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
const deleteButton = screen.getByTitle('Delete instance')
await user.click(deleteButton)
expect(confirmSpy).toHaveBeenCalled()
expect(mockDeleteInstance).not.toHaveBeenCalled()
confirmSpy.mockRestore()
})
})
describe('Button State Based on Instance Status', () => {
it('disables start button and enables stop button for running instance', () => {
it('disables start button and enables stop button for running instance', async () => {
const user = userEvent.setup()
render(
<InstanceCard
instance={runningInstance}
@@ -181,12 +199,19 @@ afterEach(() => {
/>
)
expect(screen.getByTitle('Start instance')).toBeDisabled()
expect(screen.queryByTitle('Start instance')).not.toBeInTheDocument()
expect(screen.getByTitle('Stop instance')).not.toBeDisabled()
// Expand more actions to access delete button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
expect(screen.getByTitle('Delete instance')).toBeDisabled() // Can't delete running instance
})
it('enables start button and disables stop button for stopped instance', () => {
it('enables start button and disables stop button for stopped instance', async () => {
const user = userEvent.setup()
render(
<InstanceCard
instance={stoppedInstance}
@@ -198,11 +223,18 @@ afterEach(() => {
)
expect(screen.getByTitle('Start instance')).not.toBeDisabled()
expect(screen.getByTitle('Stop instance')).toBeDisabled()
expect(screen.queryByTitle('Stop instance')).not.toBeInTheDocument()
// Expand more actions to access delete button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
expect(screen.getByTitle('Delete instance')).not.toBeDisabled() // Can delete stopped instance
})
it('edit and logs buttons are always enabled', () => {
it('edit and logs buttons are always enabled', async () => {
const user = userEvent.setup()
render(
<InstanceCard
instance={runningInstance}
@@ -214,6 +246,11 @@ afterEach(() => {
)
expect(screen.getByTitle('Edit instance')).not.toBeDisabled()
// Expand more actions to access logs button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
expect(screen.getByTitle('View logs')).not.toBeDisabled()
})
})
@@ -268,7 +305,7 @@ afterEach(() => {
describe('Integration with LogsModal', () => {
it('passes correct props to LogsModal', async () => {
const user = userEvent.setup()
render(
<InstanceCard
instance={runningInstance}
@@ -279,20 +316,24 @@ afterEach(() => {
/>
)
// First click "More actions" to reveal the logs button
const moreActionsButton = screen.getByTitle('More actions')
await user.click(moreActionsButton)
// Open logs dialog
await user.click(screen.getByTitle('View logs'))
// Verify dialog opened with correct instance data
expect(screen.getByText('Logs: running-instance')).toBeInTheDocument()
// Close dialog to test close functionality
const closeButtons = screen.getAllByText('Close')
const dialogCloseButton = closeButtons.find(button =>
const dialogCloseButton = closeButtons.find(button =>
button.closest('[data-slot="dialog-content"]')
)
expect(dialogCloseButton).toBeTruthy()
await user.click(dialogCloseButton!)
// Modal should close
expect(screen.queryByText('Logs: running-instance')).not.toBeInTheDocument()
})

View File

@@ -12,12 +12,14 @@ import { AuthProvider } from '@/contexts/AuthContext'
vi.mock('@/lib/api', () => ({
instancesApi: {
list: vi.fn(),
get: vi.fn(),
create: vi.fn(),
update: vi.fn(),
start: vi.fn(),
stop: vi.fn(),
restart: vi.fn(),
delete: vi.fn(),
getHealth: vi.fn(),
}
}))
@@ -25,9 +27,21 @@ vi.mock('@/lib/api', () => ({
vi.mock('@/lib/healthService', () => ({
healthService: {
subscribe: vi.fn(() => () => {}),
checkHealth: vi.fn(),
refreshHealth: vi.fn(() => Promise.resolve()),
checkHealthAfterOperation: vi.fn(),
performHealthCheck: vi.fn(() => Promise.resolve({
state: 'ready',
instanceStatus: 'running',
lastChecked: new Date(),
source: 'http'
})),
},
checkHealth: vi.fn(),
checkHealth: vi.fn(() => Promise.resolve({
state: 'ready',
instanceStatus: 'running',
lastChecked: new Date(),
source: 'http'
})),
}))
function renderInstanceList(editInstance = vi.fn()) {

View File

@@ -280,29 +280,6 @@ afterEach(() => {
})
})
describe('Advanced Fields Toggle', () => {
it('shows advanced fields when toggle clicked', async () => {
const user = userEvent.setup()
render(
<InstanceDialog
open={true}
onOpenChange={mockOnOpenChange}
onSave={mockOnSave}
/>
)
// Advanced fields should be hidden initially
expect(screen.queryByText(/Advanced Configuration/)).toBeInTheDocument()
// Click to expand
await user.click(screen.getByText(/Advanced Configuration/))
// Should show more configuration options
// Note: Specific fields depend on zodFormUtils configuration
// We're testing the toggle behavior, not specific fields
})
})
describe('Form Data Handling', () => {
it('cleans up undefined values before submission', async () => {

View File

@@ -0,0 +1,62 @@
import React from 'react'
import { Input } from '@/components/ui/input'
import { Label } from '@/components/ui/label'
interface ArrayInputProps {
id: string
label: string
value: string[] | undefined
onChange: (value: string[] | undefined) => void
placeholder?: string
description?: string
disabled?: boolean
className?: string
}
const ArrayInput: React.FC<ArrayInputProps> = ({
id,
label,
value,
onChange,
placeholder = "item1, item2, item3",
description,
disabled = false,
className
}) => {
const handleChange = (inputValue: string) => {
if (inputValue === '') {
onChange(undefined)
return
}
const arrayValue = inputValue
.split(',')
.map(s => s.trim())
.filter(Boolean)
onChange(arrayValue.length > 0 ? arrayValue : undefined)
}
return (
<div className="grid gap-2">
<Label htmlFor={id}>
{label}
</Label>
<Input
id={id}
type="text"
value={Array.isArray(value) ? value.join(', ') : ''}
onChange={(e) => handleChange(e.target.value)}
placeholder={placeholder}
disabled={disabled}
className={className}
/>
{description && (
<p className="text-sm text-muted-foreground">{description}</p>
)}
<p className="text-xs text-muted-foreground">Separate multiple values with commas</p>
</div>
)
}
export default ArrayInput

View File

@@ -0,0 +1,42 @@
import React from 'react'
import { Checkbox } from '@/components/ui/checkbox'
import { Label } from '@/components/ui/label'
interface CheckboxInputProps {
id: string
label: string
value: boolean | undefined
onChange: (value: boolean) => void
description?: string
disabled?: boolean
className?: string
}
const CheckboxInput: React.FC<CheckboxInputProps> = ({
id,
label,
value,
onChange,
description,
disabled = false,
className
}) => {
return (
<div className={`flex items-center space-x-2 ${className || ''}`}>
<Checkbox
id={id}
checked={value === true}
onCheckedChange={(checked) => onChange(!!checked)}
disabled={disabled}
/>
<Label htmlFor={id} className="text-sm font-normal">
{label}
{description && (
<span className="text-muted-foreground ml-1">- {description}</span>
)}
</Label>
</div>
)
}
export default CheckboxInput

Some files were not shown because too many files have changed in this diff Show More