|
|
30e40ecd30
|
Refactor API endpoints to use /backends/llama-cpp path and update related documentation
|
2025-09-23 21:27:58 +02:00 |
|
|
|
46622d2107
|
Update documentation and add README synchronization
|
2025-09-22 22:37:53 +02:00 |
|
|
|
184d6df1bc
|
Fix vllm command parsing
|
2025-09-22 21:25:50 +02:00 |
|
|
|
313666ea17
|
Fix missing vllm proxy setup
|
2025-09-22 20:51:00 +02:00 |
|
|
|
c3ca5b95f7
|
Update BuildCommandArgs to use positional argument for model and adjust tests accordingly
|
2025-09-22 20:32:03 +02:00 |
|
|
|
7eb59aa7e0
|
Remove unused JSON unmarshal test and clean up command argument checks
|
2025-09-19 20:46:25 +02:00 |
|
|
|
64842e74b0
|
Refactor command parsing and building
|
2025-09-19 20:23:25 +02:00 |
|
|
|
34a949d22e
|
Refactor command argument building and parsing
|
2025-09-19 19:59:46 +02:00 |
|
|
|
ec5485bd0e
|
Refactor command argument building across backends
|
2025-09-19 19:46:54 +02:00 |
|
|
|
9eecb37aec
|
Refactor MLX and VLLM server options parsing and args building
|
2025-09-19 19:39:36 +02:00 |
|
|
|
c7136d5206
|
Refactor command parsing logic across backends to utilize a unified CommandParserConfig structure
|
2025-09-19 18:36:23 +02:00 |
|
|
|
4df02a6519
|
Initial vLLM backend support
|
2025-09-19 18:05:12 +02:00 |
|
|
|
6a580667ed
|
Remove LlamaExecutable checks from default and file loading tests
|
2025-09-18 20:30:26 +02:00 |
|
|
|
2a20817078
|
Remove redundant LlamaExecutable field from instance configuration in tests
|
2025-09-18 20:29:04 +02:00 |
|
|
|
5121f0e302
|
Remove PythonPath references from MlxServerOptions and related configurations
|
2025-09-17 21:59:55 +02:00 |
|
|
|
cc5d8acd92
|
Refactor instance and manager tests to use BackendConfig for LlamaExecutable and MLXLMExecutable
|
2025-09-16 21:45:50 +02:00 |
|
|
|
154b754aff
|
Add MLX command parsing and routing support
|
2025-09-16 21:39:08 +02:00 |
|
|
|
63fea02d66
|
Add MLX backend support in CreateInstanceOptions and validation
|
2025-09-16 21:38:33 +02:00 |
|
|
|
468688cdbc
|
Pass backend options to instances
|
2025-09-16 21:37:48 +02:00 |
|
|
|
988c4aca40
|
Add MLX backend config options
|
2025-09-16 21:14:19 +02:00 |
|
|
|
1b5934303b
|
Enhance command parsing in ParseLlamaCommand and improve error handling in ParseCommandRequest
|
2025-09-15 22:12:56 +02:00 |
|
|
|
e7b06341c3
|
Enhance command parsing in ParseLlamaCommand
|
2025-09-15 21:29:46 +02:00 |
|
|
|
323056096c
|
Implement llama-server command parsing and add UI components for command input
|
2025-09-15 21:04:14 +02:00 |
|
|
|
d697f83b46
|
Update GetProxy method to use BackendTypeLlamaCpp constant for backend type
|
2025-09-02 21:56:38 +02:00 |
|
|
|
712d28ea42
|
Remove port marking logic from CreateInstance method
|
2025-09-02 21:56:25 +02:00 |
|
|
|
d9542ba117
|
Refactor instance management to support backend types and options
|
2025-09-01 21:59:18 +02:00 |
|
|
|
9579930a6a
|
Simplify LRU eviction tests
|
2025-08-31 11:46:16 +02:00 |
|
|
|
447f441fd0
|
Move LRU eviction to timeout.go
|
2025-08-31 11:42:32 +02:00 |
|
|
|
27012b6de6
|
Split manager tests into multiple test files
|
2025-08-31 11:39:44 +02:00 |
|
|
|
905e685107
|
Add LRU eviction tests for instance management
|
2025-08-31 11:30:57 +02:00 |
|
|
|
d6d4792a0c
|
Skip eviction for instances without a valid idle timeout
|
2025-08-31 00:59:26 +02:00 |
|
|
|
894f3c3213
|
Refactor StartInstance method to improve max running instances check
|
2025-08-31 00:14:29 +02:00 |
|
|
|
c1fa0faf4b
|
Add LastRequestTime method and LRU eviction logic for instance management
|
2025-08-30 23:59:37 +02:00 |
|
|
|
4581d67165
|
Enhance instance management: improve on-demand start handling and add LRU eviction logic
|
2025-08-30 23:13:08 +02:00 |
|
|
|
58cb36bd18
|
Refactor instance management: replace CanStartInstance with IsMaxRunningInstancesReached method
|
2025-08-30 23:12:58 +02:00 |
|
|
|
68253be3e8
|
Add CanStartInstance method to check instance start conditions
|
2025-08-30 22:47:15 +02:00 |
|
|
|
a9f1c1a619
|
Add LRU eviction configuration for instances
|
2025-08-30 22:26:02 +02:00 |
|
|
|
74495f8163
|
Refactor Shutdown method to improve instance stopping logic and avoid deadlocks
|
2025-08-30 22:04:43 +02:00 |
|
|
|
9d548e6dda
|
Remove wrong MaxRunningInstancesError type
|
2025-08-28 20:42:56 +02:00 |
|
|
|
41d8c41188
|
Introduce MaxRunningInstancesError type and handle it in StartInstance handler
|
2025-08-28 20:07:03 +02:00 |
|
|
|
e319731239
|
Remove unnecessary read locks from GetStatus and IsRunning methods
|
2025-08-28 19:19:28 +02:00 |
|
|
|
b698c1d0ea
|
Remove locks from SetStatus
|
2025-08-28 19:08:20 +02:00 |
|
|
|
227ca7927a
|
Refactor SetStatus method to capture onStatusChange callback reference before unlocking mutex
|
2025-08-28 18:59:26 +02:00 |
|
|
|
0b058237fe
|
Enforce maximum running instances limit in StartInstance method
|
2025-08-27 21:18:38 +02:00 |
|
|
|
ae37055331
|
Add onStatusChange callback to instance management for status updates
|
2025-08-27 20:54:26 +02:00 |
|
|
|
b41ebdc604
|
Set instance status to Failed when restart conditions are not met
|
2025-08-27 19:47:36 +02:00 |
|
|
|
1443746add
|
Refactor instance status management: replace Running boolean with InstanceStatus enum and update related methods
|
2025-08-27 19:44:38 +02:00 |
|
|
|
615c2ac54e
|
Add MaxRunningInstances to InstancesConfig and implement IsRunning method
|
2025-08-27 18:42:34 +02:00 |
|
|
|
1939b45312
|
Refactor WaitForHealthy method to use direct health check URL and simplify health check logic
|
2025-08-20 15:58:08 +02:00 |
|
|
|
ddb54763f6
|
Add OnDemandStartTimeout configuration and update OpenAIProxy to use it
|
2025-08-20 14:25:43 +02:00 |
|