Accelerated
Intelligence.

  • 13 weighted signals scored every dispatch
  • a-res weighted random selection, 5% epsilon explore
  • prefix cache affinity, EWMA-calibrated, self-healing
  • curve + msgpack tool dispatch, credit-flow streaming
  • /v1 OpenAI-compatible. Yours, or shared.
edge p99
2.1ms
agents online
6 agents online
active regions
7regions
credit ledger
atomic
// routing engine

Composite scoring across 13 signals

Inference requests are scored on model-cache affinity, prefix-cache match, EWMA-smoothed latency prediction, KV-cache pressure, worker health, utilization, region affinity, per-project fairness, and tier, staleness, and ownership adjustments, then sampled with the A-Res weighted-random algorithm and 5% epsilon-greedy exploration to keep new routes discoverable. Tool execution dispatches through lease-bound agents over a CurveZMQ mesh with MessagePack framing, credit-based streaming backpressure, and an atomic credit ledger.

What ships in the router.

Six capabilities the platform exposes today. Each links to its documentation.

OpenAI Compatible API

Drop-in replacement. One env var change ships your app to dedicated, isolated infrastructure -- keep every SDK and library you already use.

Instant Deployment

Push a model, get an endpoint. Automatic scaling from zero.

Security

Runtime, data, and network isolation.

Multi-Vendor GPU

NVIDIA and AMD support giving full featured acceleration, at scale.

Bring Your Own Model

Upload safetensors models with hardware-isolated sandboxing.

Usage Analytics

Real-time request tracking and performance metrics.

Bring your project into every AI tool

Wire Cursor, Claude Code, opencode, VS Code, or Claude Desktop into your Xerotier workspace once. Then keep working.

Your editor speaks your project

Memory, decisions, milestones, artifacts, code review, all reachable from inside your AI tool. No copy-paste. No context reload. One connection per workspace.

What this costs.

Pay per token on managed inference, or bring your own hardware and pay nothing per request. Only tiers with active infrastructure shown.

Free

Perfect for testing and evaluation

$0.00Forever
Max Model Size64 GB HardwareCPU Requests/min64 Tokens/min10,000
  • Shared Engine
  • Starter
  • 10K tokens/min
Start free

CPU AMD Optimized

ZenDNN-optimized AMD EPYC inference

$0.35/1M tokens
Max Model Size20 GB HardwareCPU Requests/min128 Tokens/min100,000
  • ZenDNN optimized
  • Perfect for Batch workloads
  • 100K tokens/min
Start free

GPU AMD Shared

Cost-effective GPU inference with AMD ROCm

$1.00/1M tokens
Max Model SizeUnlimited HardwareGPU Requests/min256 Tokens/min500,000
  • AMD ROCm
  • Shared Engine
  • 99.9% SLA
Start free

Self-Hosted

Bring your own accelerators

$20/month
Max Model Size512 GB HardwareYour Accelerators Requests/minUnlimited Tokens/minUnlimited
  • Your Accelerators
  • Our Infrastructure
  • Full data control
Start free

One curl. One project key.

Free tier, no card. Deploy a shared endpoint, or bring your own hardware whenever you want.

Start free