Usage Tracking & Billing
Every inference request leaves a row. Tokens for shared agents, hours for XIM nodes, one dashboard, two APIs, CSV on the side. Deleted endpoints keep their history.
- Router API
/{project_id}/v1/usage/*- Bearer auth.
{"object":"list","data":[...]}. Script against this. - Dashboard JSON
/usage/*- Cookie + CSRF.
{items, next_cursor, ...}. Web UI only. - Shared agents
- Per-token billing
- Cost on every row. Cached tokens discounted, not free.
- XIM nodes
- Hourly billing
- Row cost is 0. 730-hour period anchored to project birth.
Every inference request generates a usage record that preserves the endpoint slug and model name, so historical data stays queryable even after an endpoint is deleted.
Two billing models can operate simultaneously within a single project:
- Per-token billing for shared (platform-managed) agents
- Hourly billing for XIM (private) nodes
Your total cost is the sum of token costs from shared agents plus hourly costs from XIM nodes. On the dashboard this appears as total_cost = token_cost + hourly_cost.
What Is Tracked
Each inference request records the following metrics. Field names below match the public router API wire shape (snake_case).
| Field | Type | Description |
|---|---|---|
| input_tokens | integer | Number of input (prompt) tokens consumed |
| output_tokens | integer | Number of output (completion) tokens generated |
| cached_tokens | integer | Input tokens served from a cached KV state, whether the per-endpoint prefix cache or a shared KV cache. Reduces latency; see Prefix Cache Impact. |
| cost | number | Token cost in dollars (shared agents only; 0 for XIM). Exposed on the dashboard JSON surface only. |
| ttft_ms | integer | Time to first token in milliseconds (request-log surface) |
| latency_ms | integer | Total request latency in milliseconds (request-log surface) |
| status_code | integer | HTTP status code of the response |
| endpoint_slug | string | Endpoint identifier (preserved after endpoint deletion) |
| model_name | string | Model used for inference (preserved after endpoint deletion) |
| created_at | string | ISO 8601 timestamp of the request |
Shared vs XIM segmentation: Per-XIM segmentation in the dashboard is a derived view computed by joining each usage row to its source agent's tier at render time. It is not a recorded boolean on the row and is not exposed on the public router API.
Deleted Endpoints: When an endpoint is deleted, its usage history is preserved. The endpoint slug and model name are retained on each usage record, so historical data remains accessible and displays as "Deleted" in the usage table.
Billing Models
Xerotier supports two billing models that can operate simultaneously within a single project. For subscription management, credit purchases, and invoice details, see Billing & Subscriptions.
Per-Token Billing (Shared Agents)
Requests served by platform-managed shared agents are billed per token. The cost is calculated at request time based on the model's token pricing and recorded in the cost field of each usage record.
- Cost is calculated per request based on input and output token counts
- Pricing varies by model and service tier
- Cached input tokens are billed at a discounted rate, not at zero; the discount appears as
cache_savingson the dashboard endpoint rollup
Hourly Billing (XIM Nodes)
XIM nodes are billed based on connected uptime rather than token usage. The cost field on per-request rows is always 0, billing is calculated separately from uptime tracking.
- Billing is based on the time your agent is connected to the platform
- Rates are determined by the agent's service tier; self-hosted CPU and free-tier agents may carry an
hourly_rateof 0 and accrue no charge - Only connected time is billed, disconnected periods are not charged
- Billing periods are 730-hour intervals anchored to your project creation date
Billing Period: Each billing period is exactly 730 hours (approximately one month), starting from the date your project was created. The current period's start and end dates are shown on the usage dashboard.
Usage Dashboard
The usage dashboard at /usage provides a comprehensive view of your project's consumption. It includes:
Summary Cards
- Token usage: Total input and output tokens, segmented by shared vs XIM
- Estimated cost: Combined token cost (shared) and hourly cost (XIM)
- Cache performance: Cache hit rate and total cached tokens
- Credits remaining: Current project credit balance (see Credits)
Charts
- 7-day usage chart: Daily token usage segmented by deployment type
- 7-day cost chart: Daily cost segmented by billing model
- Cache hit rate chart: Daily prefix cache hit rate trend
- Uptime charts: Daily connected/disconnected hours and weekly trend (XIM only)
Endpoint Usage Table
A paginated table showing per-endpoint usage breakdown including:
- Request count, input/output/cached tokens, and cache hit rate
- Token cost and hourly cost (where applicable)
- Endpoint status (Active, Deleted, Disconnected, Failed, or provisioning states)
- Connected hours and uptime percentage (for hourly-billed endpoints)
Agent Uptime Table
For projects with XIM nodes, the dashboard shows agent-level uptime data:
- Agent name and tier
- Hourly rate, connected hours, and uptime percentage
- Hourly cost and online/offline status
Agent uptime is HTML-only. There is no JSON API for per-agent uptime rows; for programmatic access use the CSV export or the per-project uptime summary route below.
Usage APIs
Two distinct usage surfaces exist. The router API is the OpenAI-compatible public surface; the dashboard JSON endpoints are cookie-authenticated and intended for the web UI only.
- Router API (bearer auth), mounted at
/{project_id}/v1/usage/*. OpenAI-style{"object": "list", "data": [...]}envelopes with snake_case fields. This is the surface to script against. - Dashboard JSON (cookie + CSRF), mounted at
/usage/*under the logged-in dashboard. Uses{items, next_cursor, prev_cursor, has_more}envelopes. Not bearer-authenticated; intended for the web UI only.
Replace {project_id} with your project's
external id (e.g. proj_abc123) and set
XEROTIER_API_KEY to a valid project API key
before running the examples below.
List Usage Events
GET/{project_id}/v1/usage/events
Retrieve usage events for the project within an optional time window.
| Parameter | Type | Description |
|---|---|---|
| since optional | string | ISO 8601 lower bound (inclusive). Defaults to 7 days ago. |
| until optional | string | ISO 8601 upper bound (exclusive). Defaults to now. |
| limit optional | integer | Page size, default 100, max 500. |
Response
{
"object": "list",
"data": [
{
"id": "11111111-2222-3333-4444-555555555555",
"endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
"endpoint_name": "My Endpoint",
"endpoint_slug": "my-endpoint",
"model_id": "bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
"model_name": "llama-3.1-8b-instruct",
"input_tokens": 512,
"output_tokens": 128,
"cached_tokens": 384,
"status_code": 200,
"created_at": "2026-06-15T14:30:00Z"
}
]
}
curl "https://xerotier.ai/{project_id}/v1/usage/events?limit=50" \
-H "Authorization: Bearer $XEROTIER_API_KEY"
import os
import requests
project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
f"https://xerotier.ai/{project_id}/v1/usage/events",
headers=headers,
params={"limit": 50},
)
response.raise_for_status()
payload = response.json()
for event in payload["data"]:
print(
f"{event['model_name']}: "
f"{event['input_tokens']}in/{event['output_tokens']}out"
)
const projectId = "{project_id}";
const response = await fetch(
`https://xerotier.ai/${projectId}/v1/usage/events?limit=50`,
{
headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` }
}
);
if (!response.ok) {
throw new Error(`Usage request failed: ${response.status}`);
}
const payload = await response.json();
payload.data.forEach(event => {
console.log(
`${event.model_name}: ` +
`${event.input_tokens}in/${event.output_tokens}out`
);
});
Each item carries cached_tokens (integer count of input tokens served from a cached KV state). The router API does not emit a precomputed cache_hit_rate; compute it client-side as cached_tokens / input_tokens when needed. The dashboard JSON surface, in contrast, emits cache_hit_rate as a percentage in the 0-100 range on its endpoint rollup.
List Request Logs
GET/{project_id}/v1/usage/logs
Retrieve request logs for the project within an optional time window, optionally filtered by a free-text query.
| Parameter | Type | Description |
|---|---|---|
| since optional | string | ISO 8601 lower bound (inclusive). Defaults to 7 days ago. |
| until optional | string | ISO 8601 upper bound (exclusive). Defaults to now. |
| q optional | string | Free-text filter (max 200 characters). |
| limit optional | integer | Page size, default 100, max 500. |
Response
{
"object": "list",
"data": [
{
"id": "11111111-2222-3333-4444-555555555555",
"request_id": "req_abc123",
"endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
"endpoint_name": "My Endpoint",
"endpoint_slug": "my-endpoint",
"method": "POST",
"path": "/v1/chat/completions",
"status_code": 200,
"ttft_ms": 45,
"latency_ms": 890,
"model_name": "llama-3.1-8b-instruct",
"created_at": "2026-06-15T14:30:00Z"
}
]
}
curl "https://xerotier.ai/{project_id}/v1/usage/logs?limit=50&q=chat" \
-H "Authorization: Bearer $XEROTIER_API_KEY"
import os
import requests
project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
f"https://xerotier.ai/{project_id}/v1/usage/logs",
headers=headers,
params={"limit": 50, "q": "chat"},
)
response.raise_for_status()
for row in response.json()["data"]:
print(f"{row['status_code']} {row['path']} ttft={row['ttft_ms']}ms")
const projectId = "{project_id}";
const url = new URL(`https://xerotier.ai/${projectId}/v1/usage/logs`);
url.searchParams.set("limit", "50");
url.searchParams.set("q", "chat");
const response = await fetch(url, {
headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` }
});
if (!response.ok) throw new Error(`logs ${response.status}`);
const payload = await response.json();
for (const row of payload.data) {
console.log(`${row.status_code} ${row.path} ttft=${row.ttft_ms}ms`);
}
List Endpoint Usage
GET/{project_id}/v1/usage/endpoints
Retrieve per-endpoint usage aggregates for the project within an optional time window.
| Parameter | Type | Description |
|---|---|---|
| since optional | string | ISO 8601 lower bound (inclusive). Defaults to 7 days ago. |
| until optional | string | ISO 8601 upper bound (exclusive). Defaults to now. |
Response
{
"object": "list",
"data": [
{
"endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
"endpoint_name": "My Endpoint",
"endpoint_slug": "my-endpoint",
"request_count": 1500,
"total_input_tokens": 450000,
"total_output_tokens": 120000,
"total_cached_tokens": 85000
}
]
}
curl "https://xerotier.ai/{project_id}/v1/usage/endpoints" \
-H "Authorization: Bearer $XEROTIER_API_KEY"
import os
import requests
project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
f"https://xerotier.ai/{project_id}/v1/usage/endpoints",
headers=headers,
)
response.raise_for_status()
for row in response.json()["data"]:
print(f"{row['endpoint_slug']}: {row['request_count']} req")
const projectId = "{project_id}";
const response = await fetch(
`https://xerotier.ai/${projectId}/v1/usage/endpoints`,
{ headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` } }
);
if (!response.ok) throw new Error(`endpoints ${response.status}`);
const payload = await response.json();
for (const row of payload.data) {
console.log(`${row.endpoint_slug}: ${row.request_count} req`);
}
The router endpoint rollup does not include monetary fields (cost, hourly_cost, cache_savings) or status labels. Those are computed for the dashboard view only; consult the Usage dashboard for cost figures.
Get Uptime Summary
GET/{project_id}/v1/usage/uptime
Retrieve a single-service uptime sample for the project's inference availability within an optional time window. Useful as a JSON alternative to the CSV export.
| Parameter | Type | Description |
|---|---|---|
| since optional | string | ISO 8601 lower bound (inclusive). Defaults to 7 days ago. |
| until optional | string | ISO 8601 upper bound (exclusive). Defaults to now. |
Response
{
"services": {
"inference": {
"uptime_percent": 99.95
}
}
}
curl "https://xerotier.ai/{project_id}/v1/usage/uptime" \
-H "Authorization: Bearer $XEROTIER_API_KEY"
import os
import requests
project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
f"https://xerotier.ai/{project_id}/v1/usage/uptime",
headers=headers,
)
response.raise_for_status()
payload = response.json()
print(f"inference uptime: {payload['services']['inference']['uptime_percent']}%")
const projectId = "{project_id}";
const response = await fetch(
`https://xerotier.ai/${projectId}/v1/usage/uptime`,
{ headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` } }
);
if (!response.ok) throw new Error(`uptime ${response.status}`);
const payload = await response.json();
console.log(`inference uptime: ${payload.services.inference.uptime_percent}%`);
Dashboard JSON Endpoints
The /usage/events, /usage/logs,
/usage/endpoints, and
/usage/export/uptime routes are the cookie +
CSRF authenticated dashboard JSON endpoints used by the
web UI. They use a different envelope
({items, next_cursor, prev_cursor, has_more})
and additional fields (per-endpoint cost,
hourly_cost, cache_hit_rate,
status_label, etc.). They are not part of the
bearer-authenticated public API. Use the router routes
above for programmatic access.
Errors
The router usage routes follow the OpenAI error envelope. Common responses:
| Status | Type | When |
|---|---|---|
| 400 | invalid_request_error |
Malformed since/until, until < since, or q longer than 200 characters. |
| 401 | authentication_error |
Missing or invalid bearer token. |
| 403 | authorization_error |
Token does not match the {project_id} in the path. |
| 500 | server_error |
Internal storage failure. |
Error body
{
"error": {
"type": "invalid_request_error",
"message": "until must be greater than since",
"param": "until",
"code": "invalid_time_range"
}
}
Uptime Billing
XIM nodes use an uptime-based billing model where you are charged for the time your agent is connected to the platform.
How It Works
- Your XIM node connects to the Xerotier.ai control plane
- Connection and disconnection timestamps are recorded automatically
- Only connected time is billed, gaps between connections are free
- Costs are calculated using the hourly rate from your agent's service tier
Billing Periods
Billing periods are 730-hour intervals (approximately 30.4 days) anchored to your project creation date. For example, if your project was created on January 1st at 00:00 UTC:
- Period 1: Jan 1 00:00 - Jan 31 10:00 UTC (730 hours)
- Period 2: Jan 31 10:00 - Mar 2 20:00 UTC (730 hours)
- And so on...
Tier Hourly Rates
Each service tier defines an hourly rate for XIM nodes. The rate is displayed on the usage dashboard. See Service Tiers for current pricing. Self-hosted CPU and free-tier agents may have a rate of 0, in which case no hourly charge accrues.
CSV Export
Uptime CSV export is available from the
Usage dashboard only. The
underlying route GET /usage/export/uptime is
cookie + CSRF authenticated (dashboard-only); there is no
bearer-authenticated equivalent. For JSON access to an
uptime sample, use the public router
/{project_id}/v1/usage/uptime route documented
above.
CSV Columns
Resource Type,Resource Name,Resource ID,Tier,Hourly Rate,Connected At,Disconnected At,Connected Hours,Cost
The CSV file is suitable for import into spreadsheet applications or billing systems. Defaults to the current 730-hour billing period if no date range is selected in the dashboard.
Prefix Cache Impact
When prefix caching is enabled on your endpoint, some input tokens may be served from cache rather than being recomputed. These are tracked as cached_tokens in usage records. The same field also counts tokens reused from a shared KV cache where applicable, it is a "cached input tokens, however served" counter rather than a per-endpoint prefix-cache-only counter.
- Cache hit rate is displayed on the usage dashboard as a daily trend chart
- Per-endpoint cache hit rate is shown in the dashboard endpoint usage table as a percentage (0-100). The router API does not emit a precomputed rate; compute
cached_tokens / input_tokensclient-side if needed. - Cached tokens reduce latency (especially TTFT). For shared agents they are billed at a discounted rate, not at zero; for XIM nodes they have no per-token billing impact.
See Prefix Caching for details on how to enable and optimize caching.
Frequently Asked Questions
What happens to usage data when I delete an endpoint?
Usage data is preserved. Each usage record retains the endpoint slug and model name, so historical data remains accessible. Deleted endpoints appear with a "Deleted" status label in the usage table.
How are billing periods calculated?
Billing periods are 730-hour intervals starting from your project creation date. The current period's start and end dates are displayed on the usage dashboard.
Can I have both shared and XIM nodes in the same project?
Yes. Shared agents are billed per token and XIM nodes are billed per hour of connected uptime. The usage dashboard segments these separately so you can see costs from each billing model.
Do cached tokens cost money?
It depends on the billing model. For shared agents, cached input tokens are still billed but at a discounted rate (the discount appears as cache_savings on the dashboard endpoint rollup). For XIM nodes there is no per-token billing, so cached tokens have no direct cost impact, only the hourly rate applies. In both cases, cached tokens significantly reduce latency by avoiding KV-cache recomputation for previously seen prompt prefixes.
What if my XIM node disconnects temporarily?
Only connected time is billed. Disconnection gaps are not charged. Each connection and disconnection event is recorded to calculate your actual connected hours.
How do credits and subscriptions work?
Credits are used for per-token inference billing on shared agents. For details on purchasing credits, managing subscriptions, and handling delinquent accounts, see Billing & Subscriptions. Note: free-tier projects with overdue balances have their dashboard cost figures clamped at the delinquency anchor (billed_token_cost_since) and continue accruing usage without further token-cost accrual until the account is settled.