// Documentation

Documentation

An OpenAI-compatible inference surface, fully documented. Existing OpenAI SDKs and HTTP clients target it after a base-URL and key swap.

// 12 sections // 54 leaf pages // last verified 2026-05-27

// Overview

Overview

Xerotier.ai is a multi-tenant inference platform that serves open-source AI models behind an OpenAI-compatible API. Change your base URL and API key to start issuing requests against Xerotier-hosted endpoints or against your own XIM nodes.

Key Features

  • Drop-in OpenAI SDKs - Point the official Python or Node.js client at the Xerotier base URL; no fork, no shim.
  • Xerotier Inference Microservice (XIM) - Self-host the same workers Xerotier runs, behind your own perimeter, registered into the router.
  • Custom Model Upload - Push a HuggingFace directory or archive; the router preflights GPU fit and serves it under your project.
  • Streaming Support - SSE token stream with vendor-prefixed `x_*` events for reasoning, tool calls, and metadata.
  • Native KV Cache Offload - vLLM CPU KV offload plus prefix caching, on by default, to cut time-to-first-token on repeated prompts.

Ready to issue a request? Head to the Quickstart for a working curl call, or read Error Handling first if you maintain a strict SDK integration.

// Getting Started

Getting Started

Account to first response in five minutes. Authenticate, point your client at a Xerotier base URL, send a request.

// API Reference

API Reference

Every documented endpoint, request shape, and SSE event the router accepts and emits. Identical to OpenAI where the spec is identical; vendor-prefixed where it diverges.

Chat Completions

POST /v1/chat/completions with full tool-call, streaming, and reasoning-content support.

Responses API

POST /v1/responses for higher-level agentic flows with server-managed turns and built-in tools.

Streaming API

SSE frame format, the response.* and x_* event family, and SDK-side handling notes.

Tool Calling

Tool schemas, parallel calls, and how function results round-trip back through the model.

MCP Integration

Attach external Model Context Protocol servers; their tools surface as native tool calls.

Server-Side Tools

Web search, code interpreter, and file-search hosted by the router, no client wiring.

Web Search

Built-in web fetch with citations; usage is metered separately and surfaced in x_chat.metadata.

Embeddings

POST /v1/embeddings with batched input, base64 output, and per-model dimension control.

Reranking & Scoring

Score query-document pairs or rerank a candidate list against a query, batch-friendly.

Conversations

Server-stored multi-turn threads; resume by id with full reasoning and tool-call history.

Files API

Upload, list, and reference files by id from chat completions, responses, and batch.

Batch API

Submit a JSONL of requests for asynchronous processing at lower per-token cost.

Uploads API

Resumable multipart upload for files larger than the synchronous POST cap.

Stored Completions

List, retrieve, and export completions retained against a project for audit and replay.

Error Handling

OpenAI-shaped error envelopes, retry-after semantics, and which 5xx codes are safe to retry.

Features

Capabilities that extend the OpenAI surface: memory, document workspace, prefix caching, service tiers, SLO targets, and a workspace graph the router uses to route.

Model Management

Upload, share, version, and discover models.

Guides

Field-tested patterns, integrations, and the advanced flags the router exposes.

Platform

Operator surfaces around the inference surface: teams, auth, storage, billing, status, and the webhooks that tie them to your own systems.

Infrastructure

Choose between Xerotier-hosted inference and self-hosted Xerotier Inference Microservice (XIM) nodes you run on your own GPUs.

XIM Guides

End-to-end walkthroughs for operating self-hosted XIM nodes.

Execution Management (XEM)

Long-running agent workflows that the router schedules, approves, retries, and reports on. Same authentication, same observability, same envelopes as the inference surface.

XEM Guides

Task-oriented walkthroughs for building on XEM.

// Tools

Tools

Same surface the dashboard speaks, scripted. The CLI carries chat, responses, models, embeddings, rerank, batches, files, conversations, webhooks, keys, agents, slos, uploads, config, platform ops, exec, templates, approvals, and learnings.

// Most used

Most-used subcommands

All 19 subcommand pages are listed in the sidebar under Tools.