Accelerated
Intelligence.

13 weighted signals scored every dispatch
a-res weighted random selection, 5% epsilon explore
prefix cache affinity, EWMA-calibrated, self-healing
curve + msgpack tool dispatch, credit-flow streaming
/v1 OpenAI-compatible. Yours, or shared.

Start free Routing engine deep dive

edge p99: 2.1ms
agents online: 6 agents online
active regions: 7regions
credit ledger: atomic

// routing engine

Composite scoring across 13 signals

Inference requests are scored on model-cache affinity, prefix-cache match, EWMA-smoothed latency prediction, KV-cache pressure, worker health, utilization, region affinity, per-project fairness, and tier, staleness, and ownership adjustments, then sampled with the A-Res weighted-random algorithm and 5% epsilon-greedy exploration to keep new routes discoverable. Tool execution dispatches through lease-bound agents over a CurveZMQ mesh with MessagePack framing, credit-based streaming backpressure, and an atomic credit ledger.

// score composite 13 weighted signals // select A-Res weighted random + 5% explore // affinity prefix-cache EWMA-calibrated, self-healing // fairness per-project soft burst + atomic credits // wire CURVE msgpack + credit flow + leases

What ships in the router.

Six capabilities the platform exposes today. Each links to its documentation.

OpenAI Compatible API

Drop-in replacement. One env var change ships your app to dedicated, isolated infrastructure -- keep every SDK and library you already use.

$ xeroctl deploy llama-3.3-70b --tier gpu_nvidia_dedicated

endpoint live - /v1/chat/completions

$ curl https://api.xerotier.ai/your-project/llama/v1/chat/completions \

-H "Authorization: Bearer $XEROTIER_API_KEY" \

-d '{"model":"llama-3-3-70b","messages":[...]}'

Instant Deployment

Push a model, get an endpoint. Automatic scaling from zero.

Security

Runtime, data, and network isolation.

Multi-Vendor GPU

NVIDIA and AMD support giving full featured acceleration, at scale.

Bring Your Own Model

Upload safetensors models with hardware-isolated sandboxing.

Usage Analytics

Real-time request tracking and performance metrics.

Bring your project into every AI tool

Wire Cursor, Claude Code, opencode, VS Code, or Claude Desktop into your Xerotier workspace once. Then keep working.

Your editor speaks your project

Memory, decisions, milestones, artifacts, code review, all reachable from inside your AI tool. No copy-paste. No context reload. One connection per workspace.

5 editors, one config. Cursor, Claude Code, opencode, VS Code, Claude Desktop.
scoped per workspace. Every action logged; destructive operations wait for approval.
streamed long-running tools push progress back into your editor while they run.

What this costs.

Pay per token on managed inference, or bring your own hardware and pay nothing per request. Only tiers with active infrastructure shown.

Free

Perfect for testing and evaluation

$0.00Forever

Max Model Size64 GB HardwareCPU Requests/min64 Tokens/min10,000

Shared Engine
Starter
10K tokens/min

Start free

CPU AMD Optimized

ZenDNN-optimized AMD EPYC inference

$0.35/1M tokens

Max Model Size20 GB HardwareCPU Requests/min128 Tokens/min100,000

ZenDNN optimized
Perfect for Batch workloads
100K tokens/min

Start free

GPU NVIDIA Shared

Best balance of price and performance with NVIDIA CUDA

$1.25/1M tokens

Max Model SizeUnlimited HardwareGPU Requests/min256 Tokens/min500,000

NVIDIA CUDA
Shared Engine
99.9% SLA

Start free

GPU AMD Shared

Cost-effective GPU inference with AMD ROCm

$1.00/1M tokens

Max Model SizeUnlimited HardwareGPU Requests/min256 Tokens/min500,000

AMD ROCm
Shared Engine
99.9% SLA

Start free

Self-Hosted

Bring your own accelerators

$20/month

Max Model Size512 GB HardwareYour Accelerators Requests/minUnlimited Tokens/minUnlimited

Your Accelerators
Our Infrastructure
Full data control

Start free

One curl. One project key.

Free tier, no card. Deploy a shared endpoint, or bring your own hardware whenever you want.

Start free

AcceleratedIntelligence.

Composite scoring across 13 signals

OpenAI Compatible API

Instant Deployment

Security

Multi-Vendor GPU

Bring Your Own Model

Usage Analytics

Your editor speaks your project

Free

CPU AMD Optimized

GPU NVIDIA Shared

GPU AMD Shared

Self-Hosted

One curl. One project key.

Accelerated
Intelligence.