XIM Deployment
A Docker compose stack that runs vLLM behind the Xerotier wire protocol on your NVIDIA GPU box. Your model, your hardware, your kernel. The router treats it as just another worker once the join-key handshake succeeds.
- image
ghcr.io/cloudnull/xerotier-public/xim-vllm-cuda:latest- compose
compose/compose.agent-nvidia.yaml- runtime uid:gid
5152:5152- enroll endpoint
POST /v1/enroll
Container images, compose files, and the macOS application all live in the public cloudnull/xerotier-public repository. The compose files are under compose/; images are published to ghcr.io/cloudnull/xerotier-public. Select the stack that matches your hardware:
| Stack | Image | Compose File |
|---|---|---|
| NVIDIA CUDA (this page) | ghcr.io/cloudnull/xerotier-public/xim-vllm-cuda:latest |
compose/compose.agent-nvidia.yaml |
| AMD ROCm | ghcr.io/cloudnull/xerotier-public/xim-vllm-rocm:latest |
compose/compose.agent-amd-rocm.yaml |
| AMD CPU (ZenDNN) | ghcr.io/cloudnull/xerotier-public/xim-vllm-zendnn:latest |
compose/compose.agent-amd-cpu-zendnn.yaml |
| Intel / generic CPU | ghcr.io/cloudnull/xerotier-public/xim-vllm-cpu:latest |
compose/compose.agent-cpu.yaml |
| Apple Silicon (macOS) | native app (no container) | XIM on macOS |
Prerequisites
Before deploying a XIM node, ensure your infrastructure meets the following requirements.
Recommended Hardware
These values are advisory; the agent does not enforce minimums at startup. Actual VRAM headroom is sized per-model by the agent's VRAM estimator.
| Component | Advisory Minimum | Recommended |
|---|---|---|
| GPU | NVIDIA with 16GB VRAM | NVIDIA A30/H100+ or RTX 3090+ |
| System RAM | 32GB | 64GB+ |
| Disk Space | 100GB SSD | 500GB+ NVMe SSD |
| Network | 100 Mbps | 1 Gbps+ |
Software Requirements
| Software | Version |
|---|---|
| Docker | 24.0+ |
| Docker Compose | 2.20+ |
| NVIDIA Driver | 535+ |
| NVIDIA Container Toolkit | 1.14+ |
NVIDIA Container Toolkit Installation
Install the NVIDIA Container Toolkit to enable GPU access in Docker containers:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Verify GPU Access
docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi
Container Setup
XIM is distributed as a Docker image that bundles vLLM and the xerotier-xim-agent binary for model inference.
Container Details
| Property | Value |
|---|---|
| Image | ghcr.io/cloudnull/xerotier-public/xim-vllm-cuda:latest |
| Home Directory | /var/lib/inference |
| Model Cache | /var/lib/inference/.cache/xerotier/models |
| Config Directory | /var/lib/inference/.config/xerotier |
Pull the Image
docker pull ghcr.io/cloudnull/xerotier-public/xim-vllm-cuda:latest
Enrollment Workflow
On first start, XIM enrolls with the Xerotier router mesh using the join key. Enrollment state (including a signed XEM token) is persisted to /var/lib/inference/.config/xerotier on the host-mounted config volume. On subsequent restarts the agent reconnects automatically by exchanging its persisted refresh credential at /v1/enroll/refresh, so you can remove XEROTIER_AGENT_JOIN_KEY from your environment after the first successful enrollment.
If the config volume is wiped or credentials are rotated, re-supply a fresh join key from the dashboard. See Advanced Configuration for refresh and rotation details.
Environment Variables
Configure the XIM node using environment variables. The following tables list the options needed for deployment.
Required Variables
| Variable | Description |
|---|---|
XEROTIER_AGENT_JOIN_KEY |
Join key for enrolling with the Xerotier router mesh. Obtain from the Agents dashboard. Required on first run only; after successful enrollment the agent reconnects using its persisted XEM refresh credential and this variable can be removed. |
Agent Configuration
| Variable | Default | Description |
|---|---|---|
XEROTIER_AGENT_MAX_CONCURRENT |
auto | Optional ceiling for concurrent inference requests. When unset, the agent auto-configures the ceiling from the resolved GPU count and model size. |
XEROTIER_AGENT_LOG_LEVEL |
info | Log level: trace, debug, info, warning, error |
vLLM Configuration
The model served by XIM is assigned from the Xerotier dashboard during enrollment; there is no operator-supplied model id environment variable. The variables below tune the runtime around the assigned model.
| Variable | Default | Description |
|---|---|---|
XEROTIER_AGENT_MAX_MODEL_LEN |
unset | Maximum sequence length. Defaults to the model's config.json max_position_embeddings when unset. |
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE |
1 | Tensor parallel size for multi-GPU. Must be set explicitly to use more than one GPU; there is no auto-detection from visible devices. |
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION |
0.95 | Fraction of each GPU's memory reserved for the engine (0.0-1.0). |
SHM_SIZE |
90g | Docker Compose shm_size setting. Controls shared memory allocation for the container. Can be specified as a size string (e.g., 90g, 16g) or in bytes. |
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES |
unset (all visible GPUs) | Specific GPU devices to use (comma-separated). When unset, the agent defers to vLLM's device discovery, which uses all GPUs visible to the container. |
Cache Configuration
| Variable | Default | Description |
|---|---|---|
XEROTIER_AGENT_MODEL_CACHE_PATH |
/var/lib/inference/.cache/xerotier/models | Local model cache directory |
XEROTIER_AGENT_MODEL_CACHE_MAX_SIZE_GB |
100 | Maximum cache size in gigabytes |
GPU Configuration
Configure GPU access and memory allocation. The defaults below assume one model per agent and 95% memory utilization.
Single GPU Setup
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=1
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0
Multi-GPU Setup
For models that require multiple GPUs (tensor parallelism):
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0,1
Specific GPU Selection
To use specific GPUs (e.g., GPUs 2 and 3 on a 4-GPU system):
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=2,3
Shared Memory Configuration
Larger models require more shared memory. Adjust SHM_SIZE in your .env file based on model size. The default of 90g is sufficient for most deployments:
| Model Size | Recommended SHM_SIZE |
|---|---|
| 1-8B parameters | 16g |
| 13-34B parameters | 32g |
| 70B+ parameters | 90g (default) |
Model Storage
Configure persistent storage for downloaded models to avoid re-downloading on container restart.
Host Directory Setup
Create directories on the host for persistent model and configuration storage:
sudo mkdir -p /data/xerotier/models /data/xerotier/config
Docker Compose with Volume Mounts
Fetch the compose file from the cloudnull/xerotier-public repository and start the agent. The compose file mounts the host directories created above into the container so the model cache and enrollment state persist across restarts.
# Clone the deployment repo (or download just the compose file you need)
git clone https://github.com/cloudnull/xerotier-public.git
cd xerotier-public/compose
# Provide the join key (first run only) and start the NVIDIA agent
export XEROTIER_AGENT_JOIN_KEY=xjk_your_key_here
docker compose -f compose.agent-nvidia.yaml up -d
Swap compose.agent-nvidia.yaml for compose.agent-amd-rocm.yaml, compose.agent-amd-cpu-zendnn.yaml, or compose.agent-cpu.yaml to match your accelerator. Each compose file documents its full environment-variable surface in header comments.
KV Cache Offload
The agent ships with vLLM native CPU KV cache offloading enabled by default (25% of system RAM). This reduces Time-to-First-Token (TTFT) for repeated prompt prefixes by spilling evicted KV blocks to host memory instead of recomputing them.
Tuning
See xerotier-xim-agent --help under --kv-offload-size-gb for tuning. Setting the environment variable XEROTIER_AGENT_KV_OFFLOAD_SIZE_GB=0 disables offload.
Verification
After starting your XIM node, verify that it is running correctly.
Check Container Logs
docker-compose logs -f agent
Successful enrollment emits a log line containing enrollment successful followed by the assigned agent id. The agent then transitions to ready and begins polling for work.
Confirm in the Dashboard
Open Infrastructure -> Agents in the Xerotier dashboard. The newly enrolled node appears with status online and reports its accelerator tier, assigned model, and last-seen timestamp. If the node does not appear within a minute of container start, recheck the join key and the /data/xerotier/config volume permissions (UID/GID 5152).