Usage Guides
Learn how to use streaming responses, understand rate limits, and handle errors effectively.
Streaming
The Xerotier.ai API supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial results as they are generated.
Enabling Streaming
Set stream: true in your request to enable streaming:
Python
from openai import OpenAI
# Path-based URL
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Or DNS-based URL
client = OpenAI(
base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1",
api_key="xero_myproject_your_api_key"
)
stream = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Write a poem about AI"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
Stream Response Format
Each streamed chunk is a JSON object prefixed with data::
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
Node.js Streaming Example
Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1',
apiKey: 'xero_myproject_your_api_key'
});
async function streamChat() {
const stream = await client.chat.completions.create({
model: 'deepseek-r1-distill-llama-70b',
messages: [{ role: 'user', content: 'Write a poem about AI' }],
stream: true
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) {
process.stdout.write(content);
}
}
}
streamChat();
Rate Limits
Rate limits are applied per API key and vary by pricing tier.
Rate Limit Headers
API responses include headers indicating your current rate limit status:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests per minute for your tier |
| X-RateLimit-Remaining | Remaining requests in the current window |
| X-RateLimit-Reset | Unix timestamp when the rate limit resets |
Tier Limits
| Tier | Requests/min | Tokens/min |
|---|---|---|
| Free | 20 | 10,000 |
| CPU Standard | 60 | 50,000 |
| CPU AMD Optimized | 80 | 100,000 |
| CPU Intel Optimized | 80 | 100,000 |
| GPU NVIDIA Shared | 100 | 250,000 |
| GPU NVIDIA Dedicated | 500 | 500,000 |
| GPU AMD Shared | 100 | 250,000 |
| GPU AMD Dedicated | 500 | 500,000 |
| GPU Intel Shared | 100 | 250,000 |
| GPU Intel Dedicated | 500 | 500,000 |
| Self-Hosted | Unlimited | Unlimited |
Handling Rate Limits
When you exceed rate limits, the API returns a 429 Too Many Requests response. Implement exponential backoff:
Python
import time
from openai import OpenAI, RateLimitError
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
def make_request_with_retry(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=messages
)
except RateLimitError as e:
if attempt == max_retries - 1:
raise
retry_after = int(e.response.headers.get('Retry-After', 60))
time.sleep(retry_after)
return None
Error Handling
The API uses standard HTTP status codes and returns detailed error messages in JSON format.
Error Response Format
{
"error": {
"message": "Invalid API key provided",
"type": "authentication_error",
"code": "invalid_api_key"
}
}
Common Error Codes
| HTTP Status | Type | Description |
|---|---|---|
| 400 | invalid_request_error | Request was malformed or missing required parameters |
| 401 | authentication_error | Invalid or missing API key |
| 403 | permission_error | API key does not have permission for this operation |
| 404 | not_found_error | The requested resource does not exist |
| 429 | rate_limit_error | Too many requests. Retry after the time specified in Retry-After header |
| 500 | server_error | Internal server error. Contact support if this persists |
| 503 | service_unavailable | The service is temporarily overloaded. Retry with exponential backoff |
Handling Errors
Python
from openai import OpenAI, APIError, RateLimitError
# Path-based URL
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Or DNS-based URL
client = OpenAI(
base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1",
api_key="xero_myproject_your_api_key"
)
try:
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
except RateLimitError as e:
print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}")
except APIError as e:
print(f"API error: {e.message}")
Node.js Error Handling
Node.js
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1',
apiKey: 'xero_myproject_your_api_key'
});
async function makeRequest() {
try {
const response = await client.chat.completions.create({
model: 'deepseek-r1-distill-llama-70b',
messages: [{ role: 'user', content: 'Hello!' }]
});
return response;
} catch (error) {
if (error.status === 429) {
const retryAfter = error.headers?.get('retry-after') || 60;
console.log(`Rate limited. Retry after ${retryAfter}s`);
} else if (error.status === 401) {
console.log('Invalid API key');
} else {
console.log(`API error: ${error.message}`);
}
throw error;
}
}