Usage Guides - Xerotier

Streaming

The Xerotier.ai API supports streaming responses using Server-Sent Events (SSE). This allows you to receive partial results as they are generated.

Enabling Streaming

Set stream: true in your request to enable streaming:

Python

                    from openai import OpenAI

# Path-based URL
client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

# Or DNS-based URL
client = OpenAI(
    base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1",
    api_key="xero_myproject_your_api_key"
)

stream = client.chat.completions.create(
    model="deepseek-r1-distill-llama-70b",
    messages=[{"role": "user", "content": "Write a poem about AI"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
                

Stream Response Format

Each streamed chunk is a JSON object prefixed with data::

                    data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-abc","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
                

Node.js Streaming Example

Node.js

                    import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1',
    apiKey: 'xero_myproject_your_api_key'
});

async function streamChat() {
    const stream = await client.chat.completions.create({
        model: 'deepseek-r1-distill-llama-70b',
        messages: [{ role: 'user', content: 'Write a poem about AI' }],
        stream: true
    });

    for await (const chunk of stream) {
        const content = chunk.choices[0]?.delta?.content;
        if (content) {
            process.stdout.write(content);
        }
    }
}

streamChat();
                

Rate Limits

Rate limits are applied per API key and vary by pricing tier.

Rate Limit Headers

API responses include headers indicating your current rate limit status:

Header	Description
X-RateLimit-Limit	Maximum requests per minute for your tier
X-RateLimit-Remaining	Remaining requests in the current window
X-RateLimit-Reset	Unix timestamp when the rate limit resets

Tier Limits

Tier	Requests/min	Tokens/min
Free	20	10,000
CPU Standard	60	50,000
CPU AMD Optimized	80	100,000
CPU Intel Optimized	80	100,000
GPU NVIDIA Shared	100	250,000
GPU NVIDIA Dedicated	500	500,000
GPU AMD Shared	100	250,000
GPU AMD Dedicated	500	500,000
GPU Intel Shared	100	250,000
GPU Intel Dedicated	500	500,000
Self-Hosted	Unlimited	Unlimited

Handling Rate Limits

When you exceed rate limits, the API returns a 429 Too Many Requests response. Implement exponential backoff:

Python

                    import time
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

def make_request_with_retry(messages, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.chat.completions.create(
                model="deepseek-r1-distill-llama-70b",
                messages=messages
            )
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            retry_after = int(e.response.headers.get('Retry-After', 60))
            time.sleep(retry_after)
    return None
                

Error Handling

The API uses standard HTTP status codes and returns detailed error messages in JSON format.

Error Response Format

                    {
  "error": {
    "message": "Invalid API key provided",
    "type": "authentication_error",
    "code": "invalid_api_key"
  }
}
                

Common Error Codes

HTTP Status	Type	Description
400	invalid_request_error	Request was malformed or missing required parameters
401	authentication_error	Invalid or missing API key
403	permission_error	API key does not have permission for this operation
404	not_found_error	The requested resource does not exist
429	rate_limit_error	Too many requests. Retry after the time specified in Retry-After header
500	server_error	Internal server error. Contact support if this persists
503	service_unavailable	The service is temporarily overloaded. Retry with exponential backoff

Handling Errors

Python

                    from openai import OpenAI, APIError, RateLimitError

# Path-based URL
client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

# Or DNS-based URL
client = OpenAI(
    base_url="https://my-endpoint.proj_ABC123.api.xerotier.ai/v1",
    api_key="xero_myproject_your_api_key"
)

try:
    response = client.chat.completions.create(
        model="deepseek-r1-distill-llama-70b",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.response.headers.get('Retry-After')}")
except APIError as e:
    print(f"API error: {e.message}")
                

Node.js Error Handling

Node.js

                    import OpenAI from 'openai';

const client = new OpenAI({
    baseURL: 'https://api.xerotier.ai/proj_ABC123/my-endpoint/v1',
    apiKey: 'xero_myproject_your_api_key'
});

async function makeRequest() {
    try {
        const response = await client.chat.completions.create({
            model: 'deepseek-r1-distill-llama-70b',
            messages: [{ role: 'user', content: 'Hello!' }]
        });
        return response;
    } catch (error) {
        if (error.status === 429) {
            const retryAfter = error.headers?.get('retry-after') || 60;
            console.log(`Rate limited. Retry after ${retryAfter}s`);
        } else if (error.status === 401) {
            console.log('Invalid API key');
        } else {
            console.log(`API error: ${error.message}`);
        }
        throw error;
    }
}