Documentation
The Xerotier.ai API provides an OpenAI-compatible interface for accessing AI models. Use your existing OpenAI SDK or HTTP client with minimal changes.
Overview
Xerotier.ai is a multi-tenant inference platform that allows you to access open-source AI models through a familiar OpenAI-compatible API. Simply change your base URL and API key to start using Xerotier.ai hosted models.
Key Features
- OpenAI SDK Compatible - Use existing OpenAI Python and Node.js SDKs
- Self-Hosted Agents - Run inference on your own infrastructure
- Custom Model Upload - Upload and serve your own fine-tuned models
- Streaming Support - Real-time token streaming via Server-Sent Events
- LMCache Integration - Reduced Time-to-First-Token with KV cache sharing
Quick Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
Getting Started
New to Xerotier.ai? Start here to learn the basics.
API Reference
Complete reference for all Xerotier.ai API endpoints.
Chat Completions
Generate chat responses using language models with the /v1/chat/completions endpoint.
Model Management
Upload, list, and delete custom models in your project.
Model Sharing
Share your models to the public catalog for other users to discover.
Model Upload
Upload complete HuggingFace model directories using archive or directory uploads.
Guides
Learn best practices and advanced usage patterns.
Infrastructure
Deploy and manage your own inference infrastructure.