diff --git a/README.md b/README.md
index 4865174..7f547cc 100644
--- a/README.md
+++ b/README.md
@@ -22,7 +22,8 @@
 
 ### ⚡ Smart Operations
 - **Instance Monitoring**: Health checks, auto-restart, log management
-- **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits  
+- **Smart Resource Management**: Idle timeout, LRU eviction, and configurable instance limits
+- **Environment Variables**: Set custom environment variables per instance for advanced configuration  
 
 ![Dashboard Screenshot](docs/images/dashboard.png)
 
@@ -52,7 +53,8 @@ llamactl
 2. Click "Create Instance"
 3. Choose backend type (llama.cpp, MLX, or vLLM)
 4. Set model path and backend-specific options
-5. Start or stop the instance
+5. Configure environment variables if needed (optional)
+6. Start or stop the instance
 
 ### Or use the REST API:
 ```bash
@@ -66,10 +68,10 @@ curl -X POST localhost:8080/api/v1/instances/my-mlx-model \
   -H "Authorization: Bearer your-key" \
   -d '{"backend_type": "mlx_lm", "backend_options": {"model": "mlx-community/Mistral-7B-Instruct-v0.3-4bit"}}'
 
-# Create vLLM instance
+# Create vLLM instance with environment variables
 curl -X POST localhost:8080/api/v1/instances/my-vllm-model \
   -H "Authorization: Bearer your-key" \
-  -d '{"backend_type": "vllm", "backend_options": {"model": "microsoft/DialoGPT-medium", "tensor_parallel_size": 2}}'
+  -d '{"backend_type": "vllm", "backend_options": {"model": "microsoft/DialoGPT-medium", "tensor_parallel_size": 2}, "environment": {"CUDA_VISIBLE_DEVICES": "0,1", "NCCL_DEBUG": "INFO"}}'
 
 # Use with OpenAI SDK
 curl -X POST localhost:8080/v1/chat/completions \
diff --git a/docs/user-guide/api-reference.md b/docs/user-guide/api-reference.md
index 348c1c0..26e01e4 100644
--- a/docs/user-guide/api-reference.md
+++ b/docs/user-guide/api-reference.md
@@ -116,7 +116,18 @@ Create and start a new instance.
 POST /api/v1/instances/{name}
 ```
 
-**Request Body:** JSON object with instance configuration. See [Managing Instances](managing-instances.md) for available configuration options.
+**Request Body:** JSON object with instance configuration. Common fields include:
+
+- `backend_type`: Backend type (`llama_cpp`, `mlx_lm`, or `vllm`)
+- `backend_options`: Backend-specific configuration
+- `auto_restart`: Enable automatic restart on failure
+- `max_restarts`: Maximum restart attempts
+- `restart_delay`: Delay between restarts in seconds
+- `on_demand_start`: Start instance when receiving requests
+- `idle_timeout`: Idle timeout in minutes
+- `environment`: Environment variables as key-value pairs
+
+See [Managing Instances](managing-instances.md) for complete configuration options.
 
 **Response:**
 ```json
@@ -354,7 +365,15 @@ curl -X POST http://localhost:8080/api/v1/instances/my-model \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer your-api-key" \
   -d '{
-    "model": "/models/llama-2-7b.gguf"
+    "backend_type": "llama_cpp",
+    "backend_options": {
+      "model": "/models/llama-2-7b.gguf",
+      "gpu_layers": 32
+    },
+    "environment": {
+      "CUDA_VISIBLE_DEVICES": "0",
+      "OMP_NUM_THREADS": "8"
+    }
   }'
 
 # Check instance status
diff --git a/docs/user-guide/managing-instances.md b/docs/user-guide/managing-instances.md
index e094d42..824c4fe 100644
--- a/docs/user-guide/managing-instances.md
+++ b/docs/user-guide/managing-instances.md
@@ -53,6 +53,7 @@ Each instance is displayed as a card showing:
     - **Restart Delay**: Delay in seconds between restart attempts
     - **On Demand Start**: Start instance when receiving a request to the OpenAI compatible endpoint
     - **Idle Timeout**: Minutes before stopping idle instance (set to 0 to disable)
+    - **Environment Variables**: Set custom environment variables for the instance process
 6. Configure backend-specific options:
     - **llama.cpp**: Threads, context size, GPU layers, port, etc.
     - **MLX**: Temperature, top-p, adapter path, Python environment, etc.
@@ -101,7 +102,12 @@ curl -X POST http://localhost:8080/api/instances/my-vllm-instance \
       "gpu_memory_utilization": 0.9
     },
     "auto_restart": true,
-    "on_demand_start": true
+    "on_demand_start": true,
+    "environment": {
+      "CUDA_VISIBLE_DEVICES": "0,1",
+      "NCCL_DEBUG": "INFO",
+      "PYTHONPATH": "/custom/path"
+    }
   }'
 
 # Create llama.cpp instance with HuggingFace model