// Platform

Storage

One hot/cold pipeline for every stored content type. AES-256-GCM at rest, per-project keys, tier-driven TTL on the hot path, tier-driven retention on the cold path, one billable pool.

All storage types share a unified billing pool per project. There is no separate metering for each content type; your total storage usage across all types determines your billable amount.

Tier Windows

Two retention windows, one per service tier. The hot window is in-memory TTL; the cold window is durable retention before automatic expiration. Every section below references this matrix rather than restating it.

Service tier hot and cold retention windows
Service Tier Hot (Cache) Cold (Object)
Free1 hour1 day
CPU5 hours7 days
GPU Shared7 hours7 days
GPU Dedicated24 hours14 days
Self-Hosted48 hours14 days

Storage Types

The following content types are stored by the platform:

Stored content types and their payloads
Type Description What Is Stored
Models Uploaded model weights GGUF/safetensors files
Completions Chat completion results Request/response pairs
Responses Responses API results Input/output content
Conversations Conversation items Message content
Batch Files Batch API input/output JSONL files
Uploads Multi-part uploads Assembled files

Hot Tier (Cache)

In-memory layer for recently created or recently accessed content. Reads are typically sub-millisecond. Items are written here on creation and re-populated automatically on cold tier hits.

TTL
Per service tier, see Tier Windows.
Behavior
Auto-populated on write, evicted after TTL.
Availability
Best-effort; cache misses fall through to cold.
Encryption
Same AES-256-GCM payload as cold.

Cold Tier (Object Storage)

Durable layer. All content is written here on creation (dual-write with the hot tier) and persists for a tier-dependent window before automatic expiration. Each project's objects live in their own container with per-project encryption keys.

Retention
Per service tier, see Tier Windows.
Encryption
AES-256-GCM at rest.
Isolation
Per-project containers, per-project keys. Shared object-store infrastructure, isolated at the container and key-prefix level.

Retrieval

When stored content is requested, the system uses an automatic waterfall retrieval strategy. This is transparent to API consumers, the same endpoint serves content regardless of which tier it resides in.

  1. Hot tier check, The in-memory cache is checked first for sub-millisecond access.
  2. Cold tier fallback, If the item is not in the hot tier (e.g., TTL expired), it is loaded from cold storage and decrypted.
  3. Cache re-population, On a cold tier hit, the item is automatically promoted back to the hot tier so subsequent reads are fast.

Note: If content has expired from both tiers (beyond the cold storage retention period and evicted from cache), requests return a 404. Database rows for stored items persist by default, though some cleanup paths (notably Files and Uploads) may soft-delete the metadata row; see each type's documentation for specifics.

Encryption

All stored content is encrypted at rest using AES-256-GCM in both tiers. Encryption and decryption are handled transparently, content is encrypted before writing to either tier and decrypted automatically on retrieval.

Algorithm
AES-256-GCM.
Scope
Per-project encryption keys.
Integrity
Checksum verification on every read.
Key rotation
Supported with zero downtime. Existing ciphertext decrypts against retained prior key versions.

Storage Billing

All storage types contribute to a single billable total per project. There is no per-type billing; combined usage across every storage type determines the bill.

Storage billing pool, formula, and monitoring
Property Details
Pool Shared, all storage types contribute to a single billable total per project
Minimum charge (display) 1 GB minimum is applied to the Usage dashboard rollup as soon as any storage is used
Formula (display) Displayed billable GB = max(1 GB, actual total usage) when usage > 0; 0 when nothing stored. The underlying time-weighted credit deduction is computed from actual bytes, the 1 GB floor is a dashboard rounding rule, not a ledger floor.
Rate Per-GB monthly rate from your service tier (visible on Usage dashboard)
Cost billable_gb * rate_per_gb
Monitoring Storage breakdown by type visible on the Usage dashboard
Quota Projects may have storage quotas; usage visible in project settings

Retention & Lifecycle

Lifecycle by layer: hot, cold, and database metadata
Layer Duration Eviction Notes
Hot (Cache) See Tier Windows Auto-eviction after TTL Non-blocking, best-effort
Cold (Object) See Tier Windows Automatic expiration Encrypted at rest
Database Records Indefinite Manual deletion only Metadata persists after storage expiration. Some cleanup paths (see Files, Uploads) may soft-delete the row.

Per-Type Limits

Per content-type item size and other limits
Type Max Item Size Other Limits
Models Per-tier max_model_size_gb (Free 48 GB; CPU and Shared GPU unbounded; Dedicated GPU unbounded; Self-Hosted 512 GB) Delivered via the Uploads API; see Service Tiers
Completions Varies by model context -
Responses Varies by model context -
Conversations 1 MB per item 100 items/conversation, 16 metadata keys
Batch Files 500 MB per file -
Uploads 200 MB per chunk (multi-part) Single-shot file endpoints capped at 200 MB body; multi-part uploads have no documented total-size limit beyond the project storage quota

See the documentation for each API for the full set of constraints: Stored Completions, Responses API, Conversations, Batch API, Files API, Uploads API.

Checking Usage

Storage usage is visible on the Usage dashboard: per-type breakdown (GB), combined total, billable amount (the greater of 1 GB or actual usage when usage is non-zero), and estimated monthly cost at your tier's per-GB rate. See Usage Tracking & Billing for the broader billing system.

Endpoint-level request counts and token usage are exposed via the usage API below. Per-type storage byte counts are dashboard-only.

// tip Press Alt+e to cycle focus through the examples.

Replace {project_id} with your project's external id (e.g. proj_abc123) and set XEROTIER_API_KEY to a valid project API key before running these examples.

curl
# Retrieve per-endpoint request counts and token usage curl "https://xerotier.ai/{project_id}/v1/usage/endpoints" \ -H "Authorization: Bearer $XEROTIER_API_KEY"
Python
import os import requests project_id = "{project_id}" headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"} response = requests.get( f"https://xerotier.ai/{project_id}/v1/usage/endpoints", headers=headers, ) response.raise_for_status() payload = response.json() for endpoint in payload["data"]: print( f"{endpoint['endpointSlug']}: " f"{endpoint['requestCount']} requests, " f"{endpoint['totalInputTokens']} input tokens" )
Node.js
const projectId = "{project_id}"; const response = await fetch( `https://xerotier.ai/${projectId}/v1/usage/endpoints`, { headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` } } ); if (!response.ok) { throw new Error(`Usage request failed: ${response.status}`); } const payload = await response.json(); payload.data.forEach(ep => { console.log(`${ep.endpointSlug}: ${ep.requestCount} requests`); });