Usage Guides

Learn how to use streaming responses, understand rate limits, and handle errors effectively.

Streaming

The Xerotier.ai API supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial results as they are generated.

Enabling Streaming

Set stream: true in your request to enable streaming:

Python
from openai import OpenAI # Path-based URL client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Or DNS-based URL client = OpenAI( base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1", api_key="xero_myproject_your_api_key" ) stream = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Write a poem about AI"}], stream=True ) for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")

Stream Response Format

Each streamed chunk is a JSON object prefixed with data::

data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]} data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]} data: [DONE]

Node.js Streaming Example

Node.js
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1', apiKey: 'xero_myproject_your_api_key' }); async function streamChat() { const stream = await client.chat.completions.create({ model: 'deepseek-r1-distill-llama-70b', messages: [{ role: 'user', content: 'Write a poem about AI' }], stream: true }); for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content; if (content) { process.stdout.write(content); } } } streamChat();

Rate Limits

Rate limits are applied per API key and vary by pricing tier.

Rate Limit Headers

API responses include headers indicating your current rate limit status:

Header Description
X-RateLimit-Limit Maximum requests per minute for your tier
X-RateLimit-Remaining Remaining requests in the current window
X-RateLimit-Reset Unix timestamp when the rate limit resets

Tier Limits

Tier Requests/min Tokens/min
Free 20 10,000
CPU Standard 60 50,000
CPU AMD Optimized 80 100,000
CPU Intel Optimized 80 100,000
GPU NVIDIA Shared 100 250,000
GPU NVIDIA Dedicated 500 500,000
GPU AMD Shared 100 250,000
GPU AMD Dedicated 500 500,000
GPU Intel Shared 100 250,000
GPU Intel Dedicated 500 500,000
Self-Hosted Unlimited Unlimited

Handling Rate Limits

When you exceed rate limits, the API returns a 429 Too Many Requests response. Implement exponential backoff:

Python
import time from openai import OpenAI, RateLimitError client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) def make_request_with_retry(messages, max_retries=3): for attempt in range(max_retries): try: return client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=messages ) except RateLimitError as e: if attempt == max_retries - 1: raise retry_after = int(e.response.headers.get('Retry-After', 60)) time.sleep(retry_after) return None

Error Handling

The API uses standard HTTP status codes and returns detailed error messages in JSON format.

Error Response Format

{ "error": { "message": "Invalid API key provided", "type": "authentication_error", "code": "invalid_api_key" } }

Common Error Codes

HTTP Status Type Description
400 invalid_request_error Request was malformed or missing required parameters
401 authentication_error Invalid or missing API key
403 permission_error API key does not have permission for this operation
404 not_found_error The requested resource does not exist
429 rate_limit_error Too many requests. Retry after the time specified in Retry-After header
500 server_error Internal server error. Contact support if this persists
503 service_unavailable The service is temporarily overloaded. Retry with exponential backoff

Handling Errors

Python
from openai import OpenAI, APIError, RateLimitError # Path-based URL client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Or DNS-based URL client = OpenAI( base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1", api_key="xero_myproject_your_api_key" ) try: response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Hello!"}] ) except RateLimitError as e: print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}") except APIError as e: print(f"API error: {e.message}")

Node.js Error Handling

Node.js
import OpenAI from 'openai'; const client = new OpenAI({ baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1', apiKey: 'xero_myproject_your_api_key' }); async function makeRequest() { try { const response = await client.chat.completions.create({ model: 'deepseek-r1-distill-llama-70b', messages: [{ role: 'user', content: 'Hello!' }] }); return response; } catch (error) { if (error.status === 429) { const retryAfter = error.headers?.get('retry-after') || 60; console.log(`Rate limited. Retry after ${retryAfter}s`); } else if (error.status === 401) { console.log('Invalid API key'); } else { console.log(`API error: ${error.message}`); } throw error; } }