What each tier costs.
Pay per token on managed inference. Bring your own hardware and pay nothing per request. Free tier through shared GPU; only tiers with active infrastructure shown.
Pay per token on managed inference. Bring your own hardware and pay nothing per request. Free tier through shared GPU; only tiers with active infrastructure shown.
Type to narrow columns by tier name, hardware, or feature. Press / to focus, Esc to clear.
| Feature | Free | CPU AMD Optimized | GPU NVIDIA Shared | GPU AMD Shared | Self-Hosted |
|---|---|---|---|---|---|
| Pricing | |||||
| Price per 1M tokens | Free | $0.35 | $1.25 | $1.00 | Free |
| Hourly rate (XIM) | - | - | - | - | $0.03/hr |
| Rate Limits | |||||
| Tokens per minute | 10,000 | 100,000 | 500,000 | 500,000 | Unlimited |
| Requests per minute | 64 | 128 | 256 | 256 | Unlimited |
| Hardware | |||||
| Hardware type | CPU | CPU | GPU | GPU | Your Accelerators |
| Max model size | 64 GB | 20 GB | Unlimited | Unlimited | 512 GB |
| Endpoint Configuration | |||||
| Max replicas | 1 | 1 | 1 | 1 | 1 |
| Concurrent requests | 8 | 24 | 48 | 48 | 96 |
| Max batch size | 1 | 8 | 16 | 16 | 64 |
| Request timeout | 30s | 600s | 300s | 300s | 1800s |
| Features | |||||
| Streaming | |||||
| Batching | |||||
| CPU support | |||||
| NVIDIA CUDA | |||||
| Storage | |||||
| Storage pool (platform-wide) | Shared | Shared | Shared | Shared | Shared |
| Minimum billable (display floor) | 1 GB | 1 GB | 1 GB | 1 GB | 1 GB |
| Cold tier retention | Up to 14 days | Up to 14 days | Up to 14 days | Up to 14 days | Up to 14 days |
| Encryption at rest | |||||
One pool per project. One platform-wide rate. No per-content-type meters; no charges for bytes you do not store.
Storage is billed at a single platform-wide rate, not per service tier: base rate * markup multiplier, rounded up to the nearest thousandth. The current default is $0.013/GB/mo.
The Usage dashboard reports billable storage as max(1 GB, actual usage). Per-cycle charges are prorated by the actual bytes you store, not the floor.
Models, completions, responses, conversations, batch files, and uploads share a single pool per project. No separate meter per content type.
Cold tier content is retained for up to 14 days before automatic expiration, depending on the service tier. A hot tier cache fronts recently used content for faster reads.
Stored content is encrypted with AES-256-GCM across both hot and cold tiers. Keys are managed per project for tenant isolation.
No tier includes free storage. Usage is billed from the first byte you store, subject to the 1 GB display floor.
Token math, tier semantics, rate-limit behavior, storage billing. Stated, not implied.
service_tier request parameter accepts flex, standard, and priority. priority raises routing priority and is billed at 1.25x the standard token cost; flex and standard carry no surcharge.
Move to dedicated hardware when the math says so. The free tier shares GPU; tier upgrades take seconds.