// Getting Started

Introduction

Point any OpenAI-compatible client at a Xerotier base URL and send a chat completion against a shared endpoint. The Python, Node.js, and curl recipes on this page are byte-faithful to the upstream OpenAI shape; the router is a passthrough, not a translation layer.

Introduction

Three things to know before the first request: the base URL, the SDK compatibility surface, and the model-discovery contract.

Base URL

All API requests use path-based URLs of the form /{externalId}/{endpointSlug}/v1:

URL
https://api.xerotier.ai/proj_ABC123/{endpointSlug}/v1

OpenAI SDK Compatibility

The OpenAI Python and Node.js SDKs work without modification. Set the base URL and the API key; nothing else changes.

Note on the model field: The model value shown in these examples (deepseek-r1-distill-llama-70b) is informational; the actual model served is determined by your endpoint's configuration. There is no platform-wide static catalog, every model is either user-uploaded or sourced from the shared catalog on your tenancy. Call GET /proj_ABC123/{endpointSlug}/v1/models to discover the model names available on an endpoint.

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Hello!"}] )
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.chat.completions.create({ model: "deepseek-r1-distill-llama-70b", messages: [{ role: "user", content: "Hello!" }] });
curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Hello!"}] }'

Quickstart

Three steps. Free account, one endpoint, one request.

1. Create an Account

Register at xerotier.ai/auth/register with email, or sign in with GitHub OAuth if it is configured on this tenancy.

2. Create an Endpoint

An endpoint is a named inference URL bound to a specific model and service tier. Each endpoint gets its own slug (e.g., my-endpoint) that forms part of the API URL. Slugs are operator-chosen, lowercase, and immutable for the lifetime of the endpoint. The service tier determines pricing, rate limits, and timeouts.

From your dashboard, browse available models, select a service tier, and click "Create Endpoint" to generate your unique completion URL.

3. Make Your First Request

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], max_tokens=100 ) print(response.choices[0].message.content)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.chat.completions.create({ model: "deepseek-r1-distill-llama-70b", messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: "What is the capital of France?" } ], max_tokens: 100 }); console.log(response.choices[0].message.content);
curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], "max_tokens": 100 }'

Streaming

Add "stream": true and the router emits Server-Sent Events, flushing each model delta as soon as it is produced. The wire format is the OpenAI chat.completion.chunk shape, byte-for-byte.

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [ {"role": "user", "content": "What is the capital of France?"} ], "stream": true }'
Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) stream = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "What is the capital of France?"}], stream=True ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const stream = await client.chat.completions.create({ model: "deepseek-r1-distill-llama-70b", messages: [{ role: "user", content: "What is the capital of France?" }], stream: true }); for await (const chunk of stream) { const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); }

The response is delivered as Server-Sent Events (SSE). Each event is a data: line containing a JSON chunk with incremental content. The stream ends with data: [DONE].

Response
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1747772400,"model":"deepseek-r1-distill-llama-70b","choices":[{"delta":{"content":"The"},"index":0}]} data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1747772400,"model":"deepseek-r1-distill-llama-70b","choices":[{"delta":{"content":" capital"},"index":0}]} data: [DONE]

For detailed streaming documentation including error handling and client examples, see the Streaming API guide.

Authentication

Bearer-token API keys in the Authorization header on every request. Keys are project-scoped and carry the project slug in their prefix.

HTTP Header
Authorization: Bearer xero_myproject_your_api_key

Creating API Keys

Generate API keys from your API Keys page. You can create multiple keys with different scopes and revoke them at any time.

Security Note: Keep your API keys secure. Do not share them in public repositories or client-side code. Use environment variables to store your keys. The full key value is only shown once at creation time.

API Key Format

Every key carries the project slug in its prefix:

Format
xero_{project_slug}_{random_token}

The placeholder xero_myproject_your_api_key used in the examples above is the same shape: myproject is the slug, your_api_key stands in for the random token returned at creation time.

API Key Scopes

When creating an API key, you select one or more scopes that determine which APIs the key can access:

  • inference - Access to the inference API: chat completions, embeddings, reranking, and model listing. Assigned by default.
  • management - Programmatic access to project management operations: key CRUD, agent CRUD, and join-key management. Required by the xeroctl CLI.
  • execution - Access to execution-surface MCP tools and XEM execution gates. Holders of execution implicitly satisfy per-tool research gates.
  • research - Access to research-surface MCP tools (for example, deep_think) that do not imply full execution rights but still require gated access.

Requests to APIs outside a key's granted scopes receive a 403 Forbidden response with an envelope of {"type":"authentication_error","code":"scope_insufficient"}. See Authentication for full scope semantics.

Important: The full API key value is only returned once at creation time. Store it securely as it cannot be retrieved again.

SLO Hints

Two optional headers let a client tell the router what latency budget the request is operating under. The router feeds the targets into composite scoring and prefers backends that can meet them.

Header Meaning
X-SLO-TTFT-Ms Target time-to-first-token, in milliseconds. Hint only; the router uses it as routing signal, not as a hard deadline.
X-SLO-TPOT-Ms Target time-per-output-token, in milliseconds. Same semantics as TTFT: hint, not contract.

Full request/response semantics live in the API Reference.

Next Steps

Where to go from a working first request:

Topic Description
API Reference Full parameter reference for chat completions, including tool calling, response formats, and SLO headers.
Streaming API Deep-dive into SSE streaming, chunk format, error handling, and client examples.
Service Tiers Understand pricing, rate limits, timeouts, and how to choose the right tier for your workload.
Prefix Caching How to structure prompts for automatic KV cache reuse and faster time-to-first-token.
Error Handling Error codes, retry policies, and troubleshooting guidance.
Authentication (deep dive) IP filtering, key rotation with 24-hour grace, OAuth/SSO, two-factor, and the proxy auth JWT.
Usage Guides Streaming, rate limit handling, and error handling code examples in Python and Node.js.
xeroctl CLI Upload models, manage resources, and test endpoints from your terminal.