// Infrastructure

XIM Deployment

A Docker compose stack that runs vLLM behind the Xerotier wire protocol on your NVIDIA GPU box. Your model, your hardware, your kernel. The router treats it as just another worker once the join-key handshake succeeds.

image: ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest
compose: compose/compose.agent-nvidia.yaml
runtime uid:gid: 5152:5152
enroll endpoint: POST /v1/enroll

Container images, compose files, and the macOS application all live in the public xerotier/container-agents repository. The compose files are under compose/; images are published to ghcr.io/xerotier/container-agents. Select the stack that matches your hardware:

Stack	Image	Compose File
NVIDIA CUDA (this page)	`ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest`	`compose/compose.agent-nvidia.yaml`
AMD ROCm	`ghcr.io/xerotier/container-agents/xim-vllm-rocm:latest`	`compose/compose.agent-amd-rocm.yaml`
AMD CPU (ZenDNN)	`ghcr.io/xerotier/container-agents/xim-vllm-zendnn:latest`	`compose/compose.agent-amd-cpu-zendnn.yaml`
Intel / generic CPU	`ghcr.io/xerotier/container-agents/xim-vllm-cpu:latest`	`compose/compose.agent-cpu.yaml`
Apple Silicon (macOS)	native app (no container)	XIM on macOS

Prerequisites

Before deploying a XIM node, ensure your infrastructure meets the following requirements.

Recommended Hardware

These values are advisory; the agent does not enforce minimums at startup. Actual VRAM headroom is sized per-model by the agent's VRAM estimator.

Component	Advisory Minimum	Recommended
GPU	NVIDIA with 16GB VRAM	NVIDIA A30/H100+ or RTX 3090+
System RAM	32GB	64GB+
Disk Space	100GB SSD	500GB+ NVMe SSD
Network	100 Mbps	1 Gbps+

Software Requirements

Software	Version
Docker	24.0+
Docker Compose	2.20+
NVIDIA Driver	535+
NVIDIA Container Toolkit	1.14+

NVIDIA Container Toolkit Installation

Install the NVIDIA Container Toolkit to enable GPU access in Docker containers:

Ubuntu/Debian

                    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | \
  sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg

curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
                

Verify GPU Access

bash

docker run --rm --gpus all nvidia/cuda:12.1-base-ubuntu22.04 nvidia-smi

Container Setup

XIM is distributed as a Docker image that bundles vLLM and the xerotier-xim-agent binary for model inference.

Container Details

Property	Value
Image	`ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest`
Home Directory	`/var/lib/inference`
Model Cache	`/var/lib/inference/.cache/xerotier/models`
Config Directory	`/var/lib/inference/.config/xerotier`

Pull the Image

bash

docker pull ghcr.io/xerotier/container-agents/xim-vllm-cuda:latest

Enrollment Workflow

On first start, XIM enrolls with the Xerotier router mesh using the join key. Enrollment state (including a signed XEM token) is persisted to /var/lib/inference/.config/xerotier on the host-mounted config volume. On subsequent restarts the agent reconnects automatically by exchanging its persisted refresh credential at /v1/enroll/refresh, so you can remove XEROTIER_AGENT_JOIN_KEY from your environment after the first successful enrollment.

If the config volume is wiped or credentials are rotated, re-supply a fresh join key from the dashboard. See Advanced Configuration for refresh and rotation details.

Environment Variables

Configure the XIM node using environment variables. The following tables list the options needed for deployment.

Required Variables

Variable	Description
`XEROTIER_AGENT_JOIN_KEY`	Join key for enrolling with the Xerotier router mesh. Obtain from the Agents dashboard. Required on first run only; after successful enrollment the agent reconnects using its persisted XEM refresh credential and this variable can be removed.

Agent Configuration

Variable	Default	Description
`XEROTIER_AGENT_MAX_CONCURRENT`	auto	Optional ceiling for concurrent inference requests. When unset, the agent auto-configures the ceiling from the resolved GPU count and model size.
`XEROTIER_AGENT_LOG_LEVEL`	info	Log level: trace, debug, info, warning, error

vLLM Configuration

The model served by XIM is assigned from the Xerotier dashboard during enrollment; there is no operator-supplied model id environment variable. The variables below tune the runtime around the assigned model.

Variable	Default	Description
`XEROTIER_AGENT_MAX_MODEL_LEN`	unset	Maximum sequence length. Defaults to the model's `config.json` `max_position_embeddings` when unset.
`XEROTIER_AGENT_TENSOR_PARALLEL_SIZE`	1	Tensor parallel size for multi-GPU. Must be set explicitly to use more than one GPU; there is no auto-detection from visible devices.
`XEROTIER_AGENT_GPU_MEMORY_UTILIZATION`	0.95	Fraction of each GPU's memory reserved for the engine (0.0-1.0).
`SHM_SIZE`	90g	Docker Compose `shm_size` setting. Controls shared memory allocation for the container. Can be specified as a size string (e.g., `90g`, `16g`) or in bytes.
`XEROTIER_AGENT_CUDA_VISIBLE_DEVICES`	unset (all visible GPUs)	Specific GPU devices to use (comma-separated). When unset, the agent defers to vLLM's device discovery, which uses all GPUs visible to the container.

Cache Configuration

Variable	Default	Description
`XEROTIER_AGENT_MODEL_CACHE_PATH`	/var/lib/inference/.cache/xerotier/models	Local model cache directory
`XEROTIER_AGENT_MODEL_CACHE_MAX_SIZE_GB`	100	Maximum cache size in gigabytes

GPU Configuration

Configure GPU access and memory allocation. The defaults below assume one model per agent and 95% memory utilization.

Single GPU Setup

.env

                    XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=1
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0
                

Multi-GPU Setup

For models that require multiple GPUs (tensor parallelism):

.env

                    XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2
XEROTIER_AGENT_GPU_MEMORY_UTILIZATION=0.95
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=0,1
                

Specific GPU Selection

To use specific GPUs (e.g., GPUs 2 and 3 on a 4-GPU system):

.env

                    XEROTIER_AGENT_TENSOR_PARALLEL_SIZE=2
XEROTIER_AGENT_CUDA_VISIBLE_DEVICES=2,3
                

Shared Memory Configuration

Larger models require more shared memory. Adjust SHM_SIZE in your .env file based on model size. The default of 90g is sufficient for most deployments:

Model Size	Recommended SHM_SIZE
1-8B parameters	16g
13-34B parameters	32g
70B+ parameters	90g (default)

Model Storage

Configure persistent storage for downloaded models to avoid re-downloading on container restart.

Host Directory Setup

Create directories on the host for persistent model and configuration storage:

bash

sudo mkdir -p /data/xerotier/models /data/xerotier/config

Docker Compose with Volume Mounts

Fetch the compose file from the xerotier/container-agents repository and start the agent. The compose file mounts the host directories created above into the container so the model cache and enrollment state persist across restarts.

bash

                    # Clone the deployment repo (or download just the compose file you need)
git clone https://github.com/xerotier/container-agents.git
cd xerotier-public/compose

# Provide the join key (first run only) and start the NVIDIA agent
export XEROTIER_AGENT_JOIN_KEY=xjk_your_key_here
docker compose -f compose.agent-nvidia.yaml up -d
                

Swap compose.agent-nvidia.yaml for compose.agent-amd-rocm.yaml, compose.agent-amd-cpu-zendnn.yaml, or compose.agent-cpu.yaml to match your accelerator. Each compose file documents its full environment-variable surface in header comments.

KV Cache Offload

The agent ships with vLLM native CPU KV cache offloading enabled by default (25% of system RAM). This reduces Time-to-First-Token (TTFT) for repeated prompt prefixes by spilling evicted KV blocks to host memory instead of recomputing them.

Tuning

See xerotier-xim-agent --help under --kv-offload-size-gb for tuning. Setting the environment variable XEROTIER_AGENT_KV_OFFLOAD_SIZE_GB=0 disables offload.

Verification

After starting your XIM node, verify that it is running correctly.

Check Container Logs

bash

docker-compose logs -f agent

Successful enrollment emits a log line containing enrollment successful followed by the assigned agent id. The agent then transitions to ready and begins polling for work.

Confirm in the Dashboard

Open Infrastructure -> Agents in the Xerotier dashboard. The newly enrolled node appears with status online and reports its accelerator tier, assigned model, and last-seen timestamp. If the node does not appear within a minute of container start, recheck the join key and the /data/xerotier/config volume permissions (UID/GID 5152).