// Infrastructure

XIM Deployment

A Docker compose stack that runs vLLM behind the Xerotier wire protocol on your NVIDIA GPU box. Your model, your hardware, your kernel. The router treats it as just another worker once the join-key handshake succeeds.

image
ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest
compose
compose/compose.agent-nvidia.yaml
runtime uid:gid
5152:5152
enroll endpoint
POST /v1/enroll

Container images, compose files, and the macOS application all live in the public xerotier/container-agents repository. The compose files are under compose/; images are published to ghcr.io/xerotier/container-agents. Select the stack that matches your hardware:

Stack Image Compose File
NVIDIA CUDA (this page) ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest compose/compose.agent-nvidia.yaml
AMD ROCm ghcr.io/xerotier/container-agents/xim-vllm-rocm:latest compose/compose.agent-amd-rocm.yaml
AMD CPU (ZenDNN) ghcr.io/xerotier/container-agents/xim-vllm-zendnn:latest compose/compose.agent-amd-cpu-zendnn.yaml
Intel / generic CPU ghcr.io/xerotier/container-agents/xim-vllm-cpu:latest compose/compose.agent-cpu.yaml
Apple Silicon (macOS) native app (no container) XIM on macOS

Prerequisites

Before deploying a XIM node, ensure your infrastructure meets the following requirements.

Recommended Hardware

These values are advisory; the agent does not enforce minimums at startup. Actual VRAM headroom is sized per-model by the agent's VRAM estimator.

Component Advisory Minimum Recommended
GPU NVIDIA with 16GB VRAM NVIDIA A30/H100+ or RTX 3090+
System RAM 32GB 64GB+
Disk Space 100GB SSD 500GB+ NVMe SSD
Network 100 Mbps 1 Gbps+

Software Requirements

Software Version
Docker 24.0+
Docker Compose 2.20+
NVIDIA Driver 535+
NVIDIA Container Toolkit 1.14+

NVIDIA Container Toolkit Installation

Install the NVIDIA Container Toolkit to enable GPU access in Docker containers:

Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \ sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker

Verify GPU Access

bash
docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi

Container Setup

XIM is distributed as a Docker image that bundles vLLM and the xerotier-xim-agent binary for model inference.

Container Details

Property Value
Image ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest
Home Directory /var/lib/inference
Model Cache /var/lib/inference/.cache/xerotier/models
Config Directory /var/lib/inference/.config/xerotier

Pull the Image

bash
docker pull ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest

Enrollment Workflow

On first start, XIM enrolls with the Xerotier router mesh using the join key. Enrollment state (including a signed XEM token) is persisted to /var/lib/inference/.config/xerotier on the host-mounted config volume. On subsequent restarts the agent reconnects automatically by exchanging its persisted refresh credential at /v1/enroll/refresh, so you can remove XEROTIER_AGENT_JOIN_KEY from your environment after the first successful enrollment.

If the config volume is wiped or credentials are rotated, re-supply a fresh join key from the dashboard. See Advanced Configuration for refresh and rotation details.

Environment Variables

Configure the XIM node using environment variables. The following tables list the options needed for deployment.

Required Variables

Variable Description
XEROTIER_AGENT_JOIN_KEY Join key for enrolling with the Xerotier router mesh. Obtain from the Agents dashboard. Required on first run only; after successful enrollment the agent reconnects using its persisted XEM refresh credential and this variable can be removed.

Agent Configuration

Variable Default Description
XEROTIER_AGENT_MAX_CONCURRENT auto Optional ceiling for concurrent inference requests. When unset, the agent auto-configures the ceiling from the resolved GPU count and model size.
XEROTIER_AGENT_LOG_LEVEL info Log level: trace, debug, info, warning, error

vLLM Configuration

The model served by XIM is assigned from the Xerotier dashboard during enrollment; there is no operator-supplied model id environment variable. The variables below tune the runtime around the assigned model.

Variable Default Description
XEROTIER_AGENT_MAX_MODEL_LEN unset Maximum sequence length. Defaults to the model's config.json max_position_embeddings when unset.
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE 1 Tensor parallel size for multi-GPU. Must be set explicitly to use more than one GPU; there is no auto-detection from visible devices.
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION 0.95 Fraction of each GPU's memory reserved for the engine (0.0-1.0).
SHM_SIZE 90g Docker Compose shm_size setting. Controls shared memory allocation for the container. Can be specified as a size string (e.g., 90g, 16g) or in bytes.
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES unset (all visible GPUs) Specific GPU devices to use (comma-separated). When unset, the agent defers to vLLM's device discovery, which uses all GPUs visible to the container.

Cache Configuration

Variable Default Description
XEROTIER_AGENT_MODEL_CACHE_PATH /var/lib/inference/.cache/xerotier/models Local model cache directory
XEROTIER_AGENT_MODEL_CACHE_MAX_SIZE_GB 100 Maximum cache size in gigabytes

GPU Configuration

Configure GPU access and memory allocation. The defaults below assume one model per agent and 95% memory utilization.

Single GPU Setup

.env
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=1 XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95 XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0

Multi-GPU Setup

For models that require multiple GPUs (tensor parallelism):

.env
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2 XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95 XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0,1

Specific GPU Selection

To use specific GPUs (e.g., GPUs 2 and 3 on a 4-GPU system):

.env
XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2 XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=2,3

Shared Memory Configuration

Larger models require more shared memory. Adjust SHM_SIZE in your .env file based on model size. The default of 90g is sufficient for most deployments:

Model Size Recommended SHM_SIZE
1-8B parameters 16g
13-34B parameters 32g
70B+ parameters 90g (default)

Model Storage

Configure persistent storage for downloaded models to avoid re-downloading on container restart.

Host Directory Setup

Create directories on the host for persistent model and configuration storage:

bash
sudo mkdir -p /data/xerotier/models /data/xerotier/config

Docker Compose with Volume Mounts

Fetch the compose file from the xerotier/container-agents repository and start the agent. The compose file mounts the host directories created above into the container so the model cache and enrollment state persist across restarts.

bash
# Clone the deployment repo (or download just the compose file you need) git clone https://github.com/xerotier/container-agents.git cd xerotier-public/compose # Provide the join key (first run only) and start the NVIDIA agent export XEROTIER_AGENT_JOIN_KEY=xjk_your_key_here docker compose -f compose.agent-nvidia.yaml up -d

Swap compose.agent-nvidia.yaml for compose.agent-amd-rocm.yaml, compose.agent-amd-cpu-zendnn.yaml, or compose.agent-cpu.yaml to match your accelerator. Each compose file documents its full environment-variable surface in header comments.

KV Cache Offload

The agent ships with vLLM native CPU KV cache offloading enabled by default (25% of system RAM). This reduces Time-to-First-Token (TTFT) for repeated prompt prefixes by spilling evicted KV blocks to host memory instead of recomputing them.

Tuning

See xerotier-xim-agent --help under --kv-offload-size-gb for tuning. Setting the environment variable XEROTIER_AGENT_KV_OFFLOAD_SIZE_GB=0 disables offload.

Verification

After starting your XIM node, verify that it is running correctly.

Check Container Logs

bash
docker-compose logs -f agent

Successful enrollment emits a log line containing enrollment successful followed by the assigned agent id. The agent then transitions to ready and begins polling for work.

Confirm in the Dashboard

Open Infrastructure -> Agents in the Xerotier dashboard. The newly enrolled node appears with status online and reports its accelerator tier, assigned model, and last-seen timestamp. If the node does not appear within a minute of container start, recheck the join key and the /data/xerotier/config volume permissions (UID/GID 5152).