Introduction
Point any OpenAI-compatible client at a Xerotier base URL and send a chat completion against a shared endpoint. The Python, Node.js, and curl recipes on this page are byte-faithful to the upstream OpenAI shape; the router is a passthrough, not a translation layer.
Introduction
Three things to know before the first request: the base URL, the SDK compatibility surface, and the model-discovery contract.
Base URL
All API requests use path-based URLs of the form /{externalId}/{endpointSlug}/v1:
https://api.xerotier.ai/proj_ABC123/{endpointSlug}/v1
OpenAI SDK Compatibility
The OpenAI Python and Node.js SDKs work without modification. Set the base URL and the API key; nothing else changes.
Note on the model field: The model value shown in these examples (deepseek-r1-distill-llama-70b) is informational; the actual model served is determined by your endpoint's configuration. There is no platform-wide static catalog, every model is either user-uploaded or sourced from the shared catalog on your tenancy. Call GET /proj_ABC123/{endpointSlug}/v1/models to discover the model names available on an endpoint.
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "Hello!" }]
});
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Quickstart
Three steps. Free account, one endpoint, one request.
1. Create an Account
Register at xerotier.ai/auth/register with email, or sign in with GitHub OAuth if it is configured on this tenancy.
2. Create an Endpoint
An endpoint is a named inference URL bound to a specific model
and service tier. Each endpoint gets its own
slug (e.g., my-endpoint) that forms part of the API URL. Slugs are
operator-chosen, lowercase, and immutable for the lifetime of the endpoint. The
service tier determines pricing, rate limits, and timeouts.
From your dashboard, browse available models, select a service tier, and click "Create Endpoint" to generate your unique completion URL.
3. Make Your First Request
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=100
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" }
],
max_tokens: 100
});
console.log(response.choices[0].message.content);
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 100
}'
Streaming
Add "stream": true and the router emits Server-Sent Events,
flushing each model delta as soon as it is produced. The wire format is
the OpenAI chat.completion.chunk shape, byte-for-byte.
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"stream": true
}'
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
stream = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "What is the capital of France?"}],
stream=True
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const stream = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "What is the capital of France?" }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
The response is delivered as Server-Sent Events (SSE). Each event is a
data: line containing a JSON chunk with incremental content.
The stream ends with data: [DONE].
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1747772400,"model":"deepseek-r1-distill-llama-70b","choices":[{"delta":{"content":"The"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1747772400,"model":"deepseek-r1-distill-llama-70b","choices":[{"delta":{"content":" capital"},"index":0}]}
data: [DONE]
For detailed streaming documentation including error handling and client examples, see the Streaming API guide.
Authentication
Bearer-token API keys in the Authorization header on every request. Keys are project-scoped and carry the project slug in their prefix.
Authorization: Bearer xero_myproject_your_api_key
Creating API Keys
Generate API keys from your API Keys page. You can create multiple keys with different scopes and revoke them at any time.
Security Note: Keep your API keys secure. Do not share them in public repositories or client-side code. Use environment variables to store your keys. The full key value is only shown once at creation time.
API Key Format
Every key carries the project slug in its prefix:
xero_{project_slug}_{random_token}
The placeholder xero_myproject_your_api_key used in the examples above is the same shape: myproject is the slug, your_api_key stands in for the random token returned at creation time.
API Key Scopes
When creating an API key, you select one or more scopes that determine which APIs the key can access:
inference- Access to the inference API: chat completions, embeddings, reranking, and model listing. Assigned by default.management- Programmatic access to project management operations: key CRUD, agent CRUD, and join-key management. Required by thexeroctlCLI.execution- Access to execution-surface MCP tools and XEM execution gates. Holders ofexecutionimplicitly satisfy per-tool research gates.research- Access to research-surface MCP tools (for example,deep_think) that do not imply full execution rights but still require gated access.
Requests to APIs outside a key's granted scopes receive a 403 Forbidden response with an envelope of {"type":"authentication_error","code":"scope_insufficient"}. See Authentication for full scope semantics.
Important: The full API key value is only returned once at creation time. Store it securely as it cannot be retrieved again.
SLO Hints
Two optional headers let a client tell the router what latency budget the request is operating under. The router feeds the targets into composite scoring and prefers backends that can meet them.
| Header | Meaning |
|---|---|
X-SLO-TTFT-Ms |
Target time-to-first-token, in milliseconds. Hint only; the router uses it as routing signal, not as a hard deadline. |
X-SLO-TPOT-Ms |
Target time-per-output-token, in milliseconds. Same semantics as TTFT: hint, not contract. |
Full request/response semantics live in the API Reference.
Next Steps
Where to go from a working first request:
| Topic | Description |
|---|---|
| API Reference | Full parameter reference for chat completions, including tool calling, response formats, and SLO headers. |
| Streaming API | Deep-dive into SSE streaming, chunk format, error handling, and client examples. |
| Service Tiers | Understand pricing, rate limits, timeouts, and how to choose the right tier for your workload. |
| Prefix Caching | How to structure prompts for automatic KV cache reuse and faster time-to-first-token. |
| Error Handling | Error codes, retry policies, and troubleshooting guidance. |
| Authentication (deep dive) | IP filtering, key rotation with 24-hour grace, OAuth/SSO, two-factor, and the proxy auth JWT. |
| Usage Guides | Streaming, rate limit handling, and error handling code examples in Python and Node.js. |
| xeroctl CLI | Upload models, manage resources, and test endpoints from your terminal. |