Getting Started
Get up and running with the Xerotier.ai API. Learn how to authenticate, make your first request, and understand the basics of the platform.
Introduction
Xerotier.ai provides a multi-tenant inference platform that allows you to access open-source AI models through a familiar OpenAI-compatible API. Simply change your base URL and API key to start using Xerotier.ai hosted models.
Base URL
All API requests use path-based URLs:
https://api.xerotier.ai/proj_ABC123/{endpointSlug}/v1
OpenAI SDK Compatibility
Xerotier.ai is fully compatible with the OpenAI Python and Node.js SDKs. Simply configure the base URL:
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "Hello!" }]
});
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Quickstart
Get started with Xerotier.ai in three steps:
1. Create an Account
Sign up at xerotier.ai/auth/register to create your free account. No credit card required. You can register with email or sign in with GitHub OAuth if configured.
2. Create an Endpoint
An endpoint is a named inference URL bound to a specific model
and service tier. Each endpoint gets its own
slug (e.g., my-endpoint) that forms part of the API URL. The
service tier determines pricing, rate limits, and timeouts.
From your dashboard, browse available models, select a service tier, and click "Create Endpoint" to generate your unique completion URL.
3. Make Your First Request
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
max_tokens=100
)
print(response.choices[0].message.content)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [
{ role: "system", content: "You are a helpful assistant." },
{ role: "user", content: "What is the capital of France?" }
],
max_tokens: 100
});
console.log(response.choices[0].message.content);
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
"max_tokens": 100
}'
Your First Streaming Request
Streaming delivers tokens as they are generated, reducing perceived latency.
Add "stream": true to your request to enable streaming:
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
],
"stream": true
}'
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
stream = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "What is the capital of France?"}],
stream=True
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const stream = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "What is the capital of France?" }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
The response is delivered as Server-Sent Events (SSE). Each event is a
data: line containing a JSON chunk with incremental content.
The stream ends with data: [DONE].
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"The"},"index":0}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" capital"},"index":0}]}
data: [DONE]
For detailed streaming documentation including error handling and client examples, see the Streaming API guide.
Authentication
The Xerotier.ai API uses API keys for authentication. Include your API key in the Authorization header of all requests.
Authorization: Bearer xero_myproject_your_api_key
Creating API Keys
Generate API keys from your API Keys page. You can create multiple keys with different scopes and revoke them at any time.
Security Note: Keep your API keys secure. Do not share them in public repositories or client-side code. Use environment variables to store your keys. The full key value is only shown once at creation time.
API Key Format
Xerotier.ai API keys follow this format:
xero_{project_slug}_{random_characters}
Each key is scoped to a specific project identified by the slug in the key prefix.
API Key Scopes
When creating an API key, you select one or more scopes that determine which APIs the key can access:
inference- Access to the inference API: chat completions, embeddings, reranking, and model listing. Assigned by default.management- Programmatic access to project management operations: key CRUD, agent CRUD, and join-key management. Required by thexeroctlCLI.
Requests to APIs outside a key's granted scopes receive a 403 Forbidden response.
Important: The full API key value is only returned once at creation time. Store it securely as it cannot be retrieved again.
Next Steps
Now that you have made your first request, explore these features:
| Topic | Description |
|---|---|
| API Reference | Full parameter reference for chat completions, including tool calling, response formats, and SLO headers. |
| Streaming API | Deep-dive into SSE streaming, chunk format, error handling, and client examples. |
| Service Tiers | Understand pricing, rate limits, timeouts, and how to choose the right tier for your workload. |
| Prefix Caching | Learn how to structure prompts for automatic KV cache reuse and faster time-to-first-token. |
| Error Handling | Error codes, retry policies, and troubleshooting guidance. |
| Authentication | API key scopes, IP filtering, key rotation with grace period, rate limit headers, and security best practices. |
| Usage Guides | Streaming, rate limit handling, and error handling code examples in Python and Node.js. |
| xeroctl CLI | Upload models, manage resources, and test endpoints from your terminal. |
You can also pass X-SLO-TTFT-Ms and X-SLO-TPOT-Ms
request headers to hint at your latency targets. The router uses these to
prefer backends that can meet your performance requirements. See the
API Reference for details.