Full Service Matrix

Feature Free CPU AMD Optimized GPU NVIDIA Shared Self-Hosted
Pricing
Price per 1M tokens Free $0.35 $1.25 Free
Hourly rate (XIM) - - - $0.03/hr
Rate Limits
Tokens per minute 10,000 100,000 500,000 Unlimited
Requests per minute 64 128 256 Unlimited
Hardware
Hardware type CPU CPU GPU Your Accelerators
Max model size 64 GB 20 GB Unlimited 512 GB
Endpoint Configuration
Max replicas 1 1 1 1
Concurrent requests 8 24 48 96
Max batch size 1 8 16 64
Request timeout 30s 600s 300s 1800s
Features
Streaming Yes Yes Yes Yes
Batching - Yes Yes Yes
CPU support Yes Yes - Yes
NVIDIA CUDA - - Yes -
Storage
Storage pool Shared Shared Shared Shared
Minimum billable 1 GB 1 GB 1 GB 1 GB
Cold tier retention 90 days 90 days 90 days 90 days
Encryption at rest Yes Yes Yes Yes

Storage Pricing

Shared Storage Pool

All stored content types -- models, completions, responses, conversations, batch files, and uploads -- share a single storage pool per project. There is no separate metering for each content type.

1 GB Minimum

As soon as any storage is used, a minimum of 1 GB is billed. Billable storage is calculated as max(1 GB, actual usage) whenever any stored content exists in the project.

Per-GB Monthly Rate

Storage is billed at a per-GB monthly rate determined by your project's storage tier. The effective rate is computed as base rate * markup multiplier, rounded up to the nearest thousandth.

Metered Storage

All storage is metered across every tier. No tier includes free storage -- usage is billed from the first byte stored (subject to the 1 GB minimum).

Cold Tier Retention

Content in cold tier object storage is retained for 90 days before automatic expiration. Hot tier cache provides faster access for recently used content before it moves to cold storage.

Encryption at Rest

All cold tier content is encrypted at rest using AES-256-GCM. Encryption keys are managed per-project, ensuring complete data isolation between tenants.

Frequently Asked Questions

Token usage includes both input (prompt) tokens and output (completion) tokens. We use the same tokenization as the model you deploy, so counts match what you'd see locally. You're only charged for successful requests.
No, you cannot change your endpoint tier. However, you can deploy new endpoints with different tiers.
Requests exceeding your tier's rate limits will receive a 429 (Too Many Requests) response with a Retry-After header. We recommend implementing exponential backoff in your client. Consider upgrading to a higher tier if you consistently hit limits.
No minimum commitment required for any tier. Pay-as-you-go pricing means you only pay for what you use. You can stop or delete endpoints at any time with no penalty.
All content types share a single storage pool per project. As soon as any storage is used, a minimum of 1 GB is billed. Your billable amount is max(1 GB, actual usage). Storage is billed at a per-GB monthly rate based on your project's storage tier. Cold tier content is retained for 90 days before auto-expiration and is encrypted at rest with AES-256-GCM. See the storage documentation for full details.

Ready to get started?

Deploy your first model in minutes with our free tier. No credit card required.