Pay only for what you use
No hidden fees, no surprises. Scale from free to enterprise with predictable costs. Only tiers with active infrastructure are shown.
| Feature | Free | CPU AMD Optimized | GPU NVIDIA Shared | Self-Hosted |
|---|---|---|---|---|
| Pricing | ||||
| Price per 1M tokens | Free | $0.35 | $1.25 | Free |
| Hourly rate (XIM) | - | - | - | $0.03/hr |
| Rate Limits | ||||
| Tokens per minute | 10,000 | 100,000 | 500,000 | Unlimited |
| Requests per minute | 64 | 128 | 256 | Unlimited |
| Hardware | ||||
| Hardware type | CPU | CPU | GPU | Your Accelerators |
| Max model size | 64 GB | 20 GB | Unlimited | 512 GB |
| Endpoint Configuration | ||||
| Max replicas | 1 | 1 | 1 | 1 |
| Concurrent requests | 8 | 24 | 48 | 96 |
| Max batch size | 1 | 8 | 16 | 64 |
| Request timeout | 30s | 600s | 300s | 1800s |
| Features | ||||
| Streaming | ||||
| Batching | ||||
| CPU support | ||||
| NVIDIA CUDA | ||||
| Storage | ||||
| Storage pool | Shared | Shared | Shared | Shared |
| Minimum billable | 1 GB | 1 GB | 1 GB | 1 GB |
| Cold tier retention | 90 days | 90 days | 90 days | 90 days |
| Encryption at rest | ||||
All stored content types -- models, completions, responses, conversations, batch files, and uploads -- share a single storage pool per project. There is no separate metering for each content type.
As soon as any storage is used, a minimum of 1 GB is billed. Billable storage is calculated as max(1 GB, actual usage) whenever any stored content exists in the project.
Storage is billed at a per-GB monthly rate determined by your project's storage tier. The effective rate is computed as base rate * markup multiplier, rounded up to the nearest thousandth.
All storage is metered across every tier. No tier includes free storage -- usage is billed from the first byte stored (subject to the 1 GB minimum).
Content in cold tier object storage is retained for 90 days before automatic expiration. Hot tier cache provides faster access for recently used content before it moves to cold storage.
All cold tier content is encrypted at rest using AES-256-GCM. Encryption keys are managed per-project, ensuring complete data isolation between tenants.
Deploy your first model in minutes with our free tier. No credit card required.