SDK & Integration Guides

Use any OpenAI-compatible SDK to connect to Xerotier.ai. Change the base URL and API key -- everything else works the same.

Overview

The Xerotier API is fully compatible with the OpenAI Chat Completions API. Any SDK or library that supports the OpenAI API can connect to Xerotier by changing the base URL and API key. Xerotier also provides extensions (SLO headers, auto-clamping, cached token reporting) that are available via standard HTTP headers and response fields.

For practical how-to recipes covering streaming, rate limiting, error handling patterns, and log probabilities, see Usage Guides.

OpenAI Compatibility Matrix

The following table shows which OpenAI Chat Completions parameters Xerotier.ai supports.

Supported Request Parameters

Parameter Status Notes
model Supported Model name as configured on your endpoint.
messages Supported System, user, assistant, tool, and developer message roles.
max_tokens Supported Subject to auto-clamping if value exceeds model capacity.
max_completion_tokens Supported Preferred over max_tokens. Same auto-clamping behavior.
temperature Supported Sampling temperature (0.0 to 2.0).
top_p Supported Nucleus sampling parameter.
stream Supported Server-Sent Events streaming. See Streaming API.
stream_options Supported Set include_usage: true to receive token usage in the final stream chunk.
stop Supported Stop sequences (up to 4).
tools Supported Function/tool calling. See Tool Calling.
tool_choice Supported auto, none, required, or specific function.
parallel_tool_calls Supported Enable parallel tool calls in a single response.
logprobs Supported Return per-token log probabilities. See API Reference.
top_logprobs Supported Number of top alternative tokens per position (0-20). Requires logprobs: true.
reasoning_effort Supported Controls reasoning depth for reasoning models: "low", "medium", or "high".
prediction Supported Predicted output for speculative decoding. See Predicted Outputs.
service_tier Accepted Accepted for compatibility but ignored. Tier is determined by endpoint configuration.
seed Supported Deterministic sampling. Use with system_fingerprint for reproducibility.
n Supported Number of completions. Only n=1 supported in streaming mode.
frequency_penalty Supported Penalty for repeated tokens (-2.0 to 2.0).
presence_penalty Supported Penalty for tokens already in context (-2.0 to 2.0).
logit_bias Supported Map of token IDs to bias values (-100 to 100).
response_format Supported text, json_object, or json_schema (structured output).
metadata Supported Up to 16 key-value pairs for request metadata.
user Supported End-user identifier for abuse monitoring.
store Supported Store completion for later retrieval via GET /v1/chat/completions/{id}.

Supported Response Fields

All OpenAI Chat Completions response fields are supported, including:

Field Status Description
service_tier Supported Shows which service tier processed the request. Present in all responses and streaming chunks.
system_fingerprint Supported Backend configuration identifier for reproducibility tracking. Use with seed for deterministic outputs.
message.refusal Supported Refusal text when the model declines a request. Streamed via delta.refusal in SSE mode.
message.annotations Supported Message annotations (URL citations). Defaults to empty array. Reserved for future web search features.
logprobs Supported Per-token log probabilities with content and refusal arrays. Includes top_logprobs.
usage.prompt_tokens_details Supported Includes cached_tokens showing how many input tokens were served from prefix cache.
usage.completion_tokens_details Supported Includes reasoning_tokens, accepted_prediction_tokens, and rejected_prediction_tokens.

Behavioral Differences

  • max_tokens auto-clamping: Instead of returning an error when max_tokens exceeds model capacity, Xerotier automatically clamps the value down. The X-Xerotier-Max-Tokens-Clamped response header indicates the original value was reduced.
  • Service tier: Determined by your endpoint configuration, not by a request parameter. The service_tier request parameter is accepted for compatibility but has no effect on routing.
  • stream_options.include_usage: Defaults to false per the OpenAI specification. When false or omitted, the final streaming chunk does not include a usage object. Token counts are always tracked internally for billing regardless.

Xerotier Extensions

These features are available via standard HTTP headers and work with any SDK that allows custom headers.

Request Headers

Header Description
X-SLO-TTFT-Ms Target time-to-first-token in milliseconds. Influences routing to meet your latency target.
X-SLO-TPOT-Ms Target time-per-output-token in milliseconds. Influences routing to meet your throughput target.

Response Headers

Header Description
X-Request-ID Unique request identifier for debugging and log correlation.
X-Xerotier-Worker-ID Identifies which backend worker served the request.
X-Xerotier-Max-Tokens-Clamped Present when max_tokens was automatically reduced. Value is the original requested amount.

Python

Use the official openai Python package:

Installation
pip install openai

Basic Request

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Hello!"}] ) print(response.choices[0].message.content) # Response metadata print(f"Service tier: {response.service_tier}") print(f"System fingerprint: {response.system_fingerprint}") # Token usage details if response.usage.prompt_tokens_details: print(f"Cached tokens: {response.usage.prompt_tokens_details.cached_tokens}") if response.usage.completion_tokens_details: print(f"Reasoning tokens: {response.usage.completion_tokens_details.reasoning_tokens}") # Check for refusal if response.choices[0].message.refusal: print(f"Refusal: {response.choices[0].message.refusal}")

SLO Headers

Python
response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Hello!"}], extra_headers={ "X-SLO-TTFT-Ms": "500", "X-SLO-TPOT-Ms": "50" } )

Node.js

Use the official openai npm package:

Installation
npm install openai

Basic Request

Node.js
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1', apiKey: 'xero_myproject_your_api_key' }); const response = await client.chat.completions.create({ model: 'deepseek-r1-distill-llama-70b', messages: [{ role: 'user', content: 'Hello!' }] }); console.log(response.choices[0].message.content); console.log(`Service tier: ${response.service_tier}`); console.log(`System fingerprint: ${response.system_fingerprint}`); console.log(`Cached tokens: ${response.usage?.prompt_tokens_details?.cached_tokens}`); console.log(`Reasoning tokens: ${response.usage?.completion_tokens_details?.reasoning_tokens}`);

SLO Headers

Node.js
const response = await client.chat.completions.create({ model: 'deepseek-r1-distill-llama-70b', messages: [{ role: 'user', content: 'Hello!' }] }, { headers: { 'X-SLO-TTFT-Ms': '500', 'X-SLO-TPOT-Ms': '50' } });

Go

Use net/http directly or an OpenAI Go client library.

Using net/http

Go
package main import ( "bytes" "encoding/json" "fmt" "io" "net/http" ) func main() { body := map[string]interface{}{ "model": "deepseek-r1-distill-llama-70b", "messages": []map[string]string{ {"role": "user", "content": "Hello!"}, }, } jsonBody, _ := json.Marshal(body) req, _ := http.NewRequest("POST", "https://api.xerotier.ai/PROJECT_ID/my-endpoint/v1/chat/completions", bytes.NewReader(jsonBody)) req.Header.Set("Authorization", "Bearer xero_myproject_your_api_key") req.Header.Set("Content-Type", "application/json") // Optional: SLO headers req.Header.Set("X-SLO-TTFT-Ms", "500") resp, err := http.DefaultClient.Do(req) if err != nil { panic(err) } defer resp.Body.Close() data, _ := io.ReadAll(resp.Body) fmt.Println(string(data)) // Check response headers fmt.Println("Request ID:", resp.Header.Get("X-Request-ID")) fmt.Println("Worker ID:", resp.Header.Get("X-Xerotier-Worker-ID")) }

curl

Basic Request

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Hello!"}] }'

With SLO Headers

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -H "X-SLO-TTFT-Ms: 500" \ -H "X-SLO-TPOT-Ms: 50" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Hello!"}] }'

Streaming

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -N \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Write a poem about AI"}], "stream": true }'

Verbose Output (View Response Headers)

curl
curl -v https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Hello!"}] }' 2>&1 | grep -i "x-request-id\|x-xerotier\|x-ratelimit"

LangChain

LangChain supports OpenAI-compatible endpoints via the ChatOpenAI class.

Installation
pip install langchain-openai
Python
from langchain_openai import ChatOpenAI llm = ChatOpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", model="deepseek-r1-distill-llama-70b" ) response = llm.invoke("What is the capital of France?") print(response.content)

With Streaming

Python
for chunk in llm.stream("Write a poem about AI"): print(chunk.content, end="")

LlamaIndex

LlamaIndex supports OpenAI-compatible endpoints through its OpenAI LLM class.

Installation
pip install llama-index-llms-openai
Python
from llama_index.llms.openai import OpenAI llm = OpenAI( api_base="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", model="deepseek-r1-distill-llama-70b" ) response = llm.complete("What is the capital of France?") print(response.text)

Migrating from OpenAI

Switching from OpenAI to Xerotier.ai requires only two changes:

  1. Change the base URL from https://api.openai.com/v1 to your Xerotier endpoint URL.
  2. Change the API key from sk-... to xero_projectslug_....

Before (OpenAI)

Python
from openai import OpenAI client = OpenAI(api_key="sk-your-openai-key") response = client.chat.completions.create( model="gpt-4", messages=[{"role": "user", "content": "Hello!"}] )

After (Xerotier.ai)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Hello!"}] )

URL Format

Format URL Pattern
Path-based https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG/v1

API Key Format

Xerotier API keys use the format xero_{project_slug}_{random_characters}. Create keys in the project dashboard or via the POST /v1/keys API endpoint.

Xerotier Bonus Features

After migrating, you can take advantage of features not available in the OpenAI API:

  • SLO headers: Set per-request latency targets that influence routing decisions.
  • max_tokens auto-clamping: Requests with excessive max_tokens are automatically adjusted instead of rejected.
  • Prefix cache metrics: See how many tokens were served from cache in each response.
  • Worker identification: Know which backend worker served each request via response headers.