Pricing - Xerotier

Full Service Matrix

Feature	Free	Self-Hosted
Pricing
Price per 1M tokens	Free	Free
Hourly rate (dedicated)	-	$0.01/hr
Rate Limits
Tokens per minute	10,000	Unlimited
Requests per minute	20	Unlimited
Hardware
Hardware type	CPU	Your Hardware
Max model size	4 GB	304 GB
Endpoint Configuration
Max replicas	1	Unlimited
Concurrent requests	5	Unlimited
Max batch size	1	Unlimited
Request timeout	30s	120s
Features
Streaming	Yes	Yes
Batching	-	Yes
CPU support	Yes	Yes
NVIDIA CUDA	-	Yes

Frequently Asked Questions

Token usage includes both input (prompt) tokens and output (completion) tokens. We use the same tokenization as the model you deploy, so counts match what you'd see locally. You're only charged for successful requests.

Shared GPU tiers run on multi-tenant infrastructure where resources are shared across users. This is cost-effective but may have variable latency. Dedicated GPU gives you reserved hardware with guaranteed performance and isolation - ideal for production workloads requiring consistent latency.

No, you cannot change your endpoint tier. However, you can deploy new endpoints with different tiers.

Requests exceeding your tier's rate limits will receive a 429 (Too Many Requests) response with a Retry-After header. We recommend implementing exponential backoff in your client. Consider upgrading to a higher tier if you consistently hit limits.

No minimum commitment required for any tier. Pay-as-you-go pricing means you only pay for what you use. You can stop or delete endpoints at any time with no penalty.

Simple, Transparent Pricing

Full Service Matrix

Frequently Asked Questions

Ready to get started?