Usage Tracking & Billing

Every inference request generates a usage record that preserves the endpoint slug and model name, so historical data stays queryable even after an endpoint is deleted.

Two billing models can operate simultaneously within a single project:

Per-token billing for shared (platform-managed) agents
Hourly billing for XIM (private) nodes

Your total cost is the sum of token costs from shared agents plus hourly costs from XIM nodes. On the dashboard this appears as total_cost = token_cost + hourly_cost.

What Is Tracked

Each inference request records the following metrics. Field names below match the public router API wire shape (snake_case).

Field	Type	Description
input_tokens	integer	Number of input (prompt) tokens consumed
output_tokens	integer	Number of output (completion) tokens generated
cached_tokens	integer	Input tokens served from a cached KV state, whether the per-endpoint prefix cache or a shared KV cache. Reduces latency; see Prefix Cache Impact.
cost	number	Token cost in dollars (shared agents only; 0 for XIM). Exposed on the dashboard JSON surface only.
ttft_ms	integer	Time to first token in milliseconds (request-log surface)
latency_ms	integer	Total request latency in milliseconds (request-log surface)
status_code	integer	HTTP status code of the response
endpoint_slug	string	Endpoint identifier (preserved after endpoint deletion)
model_name	string	Model used for inference (preserved after endpoint deletion)
created_at	string	ISO 8601 timestamp of the request

Shared vs XIM segmentation: Per-XIM segmentation in the dashboard is a derived view computed by joining each usage row to its source agent's tier at render time. It is not a recorded boolean on the row and is not exposed on the public router API.

Deleted Endpoints: When an endpoint is deleted, its usage history is preserved. The endpoint slug and model name are retained on each usage record, so historical data remains accessible and displays as "Deleted" in the usage table.

Billing Models

Xerotier supports two billing models that can operate simultaneously within a single project. For subscription management, credit purchases, and invoice details, see Billing & Subscriptions.

Per-Token Billing (Shared Agents)

Requests served by platform-managed shared agents are billed per token. The cost is calculated at request time based on the model's token pricing and recorded in the cost field of each usage record.

Cost is calculated per request based on input and output token counts
Pricing varies by model and service tier
Cached input tokens are billed at a discounted rate, not at zero; the discount appears as cache_savings on the dashboard endpoint rollup

Hourly Billing (XIM Nodes)

XIM nodes are billed based on connected uptime rather than token usage. The cost field on per-request rows is always 0, billing is calculated separately from uptime tracking.

Billing is based on the time your agent is connected to the platform
Rates are determined by the agent's service tier; self-hosted CPU and free-tier agents may carry an hourly_rate of 0 and accrue no charge
Only connected time is billed, disconnected periods are not charged
Billing periods are 730-hour intervals anchored to your project creation date

Billing Period: Each billing period is exactly 730 hours (approximately one month), starting from the date your project was created. The current period's start and end dates are shown on the usage dashboard.

Usage Dashboard

The usage dashboard at /usage provides a comprehensive view of your project's consumption. It includes:

Summary Cards

Token usage: Total input and output tokens, segmented by shared vs XIM
Estimated cost: Combined token cost (shared) and hourly cost (XIM)
Cache performance: Cache hit rate and total cached tokens
Credits remaining: Current project credit balance (see Credits)

Charts

7-day usage chart: Daily token usage segmented by deployment type
7-day cost chart: Daily cost segmented by billing model
Cache hit rate chart: Daily prefix cache hit rate trend
Uptime charts: Daily connected/disconnected hours and weekly trend (XIM only)

Endpoint Usage Table

A paginated table showing per-endpoint usage breakdown including:

Request count, input/output/cached tokens, and cache hit rate
Token cost and hourly cost (where applicable)
Endpoint status (Active, Deleted, Disconnected, Failed, or provisioning states)
Connected hours and uptime percentage (for hourly-billed endpoints)

Agent Uptime Table

For projects with XIM nodes, the dashboard shows agent-level uptime data:

Agent name and tier
Hourly rate, connected hours, and uptime percentage
Hourly cost and online/offline status

Agent uptime is HTML-only. There is no JSON API for per-agent uptime rows; for programmatic access use the CSV export or the per-project uptime summary route below.

Usage APIs

Two distinct usage surfaces exist. The router API is the OpenAI-compatible public surface; the dashboard JSON endpoints are cookie-authenticated and intended for the web UI only.

Router API (bearer auth), mounted at /{project_id}/v1/usage/*. OpenAI-style {"object": "list", "data": [...]} envelopes with snake_case fields. This is the surface to script against.
Dashboard JSON (cookie + CSRF), mounted at /usage/* under the logged-in dashboard. Uses {items, next_cursor, prev_cursor, has_more} envelopes. Not bearer-authenticated; intended for the web UI only.

Replace {project_id} with your project's external id (e.g. proj_abc123) and set XEROTIER_API_KEY to a valid project API key before running the examples below.

List Usage Events

GET/{project_id}/v1/usage/events

Retrieve usage events for the project within an optional time window.

Parameter	Type	Description
since optional	string	ISO 8601 lower bound (inclusive). Defaults to 7 days ago.
until optional	string	ISO 8601 upper bound (exclusive). Defaults to now.
limit optional	integer	Page size, default 100, max 500.

Response

                        {
  "object": "list",
  "data": [
    {
      "id": "11111111-2222-3333-4444-555555555555",
      "endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
      "endpoint_name": "My Endpoint",
      "endpoint_slug": "my-endpoint",
      "model_id": "bbbbbbbb-cccc-dddd-eeee-ffffffffffff",
      "model_name": "llama-3.1-8b-instruct",
      "input_tokens": 512,
      "output_tokens": 128,
      "cached_tokens": 384,
      "status_code": 200,
      "created_at": "2026-06-15T14:30:00Z"
    }
  ]
}
                    

curl

                    curl "https://xerotier.ai/{project_id}/v1/usage/events?limit=50" \
  -H "Authorization: Bearer $XEROTIER_API_KEY"
                

Python

                    import os
import requests

project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
    f"https://xerotier.ai/{project_id}/v1/usage/events",
    headers=headers,
    params={"limit": 50},
)
response.raise_for_status()
payload = response.json()
for event in payload["data"]:
    print(
        f"{event['model_name']}: "
        f"{event['input_tokens']}in/{event['output_tokens']}out"
    )
                

Node.js

                    const projectId = "{project_id}";
const response = await fetch(
    `https://xerotier.ai/${projectId}/v1/usage/events?limit=50`,
    {
        headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` }
    }
);
if (!response.ok) {
    throw new Error(`Usage request failed: ${response.status}`);
}
const payload = await response.json();
payload.data.forEach(event => {
    console.log(
        `${event.model_name}: ` +
        `${event.input_tokens}in/${event.output_tokens}out`
    );
});
                

Each item carries cached_tokens (integer count of input tokens served from a cached KV state). The router API does not emit a precomputed cache_hit_rate; compute it client-side as cached_tokens / input_tokens when needed. The dashboard JSON surface, in contrast, emits cache_hit_rate as a percentage in the 0-100 range on its endpoint rollup.

List Request Logs

GET/{project_id}/v1/usage/logs

Retrieve request logs for the project within an optional time window, optionally filtered by a free-text query.

Parameter	Type	Description
since optional	string	ISO 8601 lower bound (inclusive). Defaults to 7 days ago.
until optional	string	ISO 8601 upper bound (exclusive). Defaults to now.
q optional	string	Free-text filter (max 200 characters).
limit optional	integer	Page size, default 100, max 500.

Response

                        {
  "object": "list",
  "data": [
    {
      "id": "11111111-2222-3333-4444-555555555555",
      "request_id": "req_abc123",
      "endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
      "endpoint_name": "My Endpoint",
      "endpoint_slug": "my-endpoint",
      "method": "POST",
      "path": "/v1/chat/completions",
      "status_code": 200,
      "ttft_ms": 45,
      "latency_ms": 890,
      "model_name": "llama-3.1-8b-instruct",
      "created_at": "2026-06-15T14:30:00Z"
    }
  ]
}
                    

curl

                    curl "https://xerotier.ai/{project_id}/v1/usage/logs?limit=50&q=chat" \
  -H "Authorization: Bearer $XEROTIER_API_KEY"
                

Python

                    import os
import requests

project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
    f"https://xerotier.ai/{project_id}/v1/usage/logs",
    headers=headers,
    params={"limit": 50, "q": "chat"},
)
response.raise_for_status()
for row in response.json()["data"]:
    print(f"{row['status_code']} {row['path']} ttft={row['ttft_ms']}ms")
                

Node.js

                    const projectId = "{project_id}";
const url = new URL(`https://xerotier.ai/${projectId}/v1/usage/logs`);
url.searchParams.set("limit", "50");
url.searchParams.set("q", "chat");
const response = await fetch(url, {
    headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` }
});
if (!response.ok) throw new Error(`logs ${response.status}`);
const payload = await response.json();
for (const row of payload.data) {
    console.log(`${row.status_code} ${row.path} ttft=${row.ttft_ms}ms`);
}
                

List Endpoint Usage

GET/{project_id}/v1/usage/endpoints

Retrieve per-endpoint usage aggregates for the project within an optional time window.

Parameter	Type	Description
since optional	string	ISO 8601 lower bound (inclusive). Defaults to 7 days ago.
until optional	string	ISO 8601 upper bound (exclusive). Defaults to now.

Response

                        {
  "object": "list",
  "data": [
    {
      "endpoint_id": "66666666-7777-8888-9999-aaaaaaaaaaaa",
      "endpoint_name": "My Endpoint",
      "endpoint_slug": "my-endpoint",
      "request_count": 1500,
      "total_input_tokens": 450000,
      "total_output_tokens": 120000,
      "total_cached_tokens": 85000
    }
  ]
}
                    

curl

                    curl "https://xerotier.ai/{project_id}/v1/usage/endpoints" \
  -H "Authorization: Bearer $XEROTIER_API_KEY"
                

Python

                    import os
import requests

project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
    f"https://xerotier.ai/{project_id}/v1/usage/endpoints",
    headers=headers,
)
response.raise_for_status()
for row in response.json()["data"]:
    print(f"{row['endpoint_slug']}: {row['request_count']} req")
                

Node.js

                    const projectId = "{project_id}";
const response = await fetch(
    `https://xerotier.ai/${projectId}/v1/usage/endpoints`,
    { headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` } }
);
if (!response.ok) throw new Error(`endpoints ${response.status}`);
const payload = await response.json();
for (const row of payload.data) {
    console.log(`${row.endpoint_slug}: ${row.request_count} req`);
}
                

The router endpoint rollup does not include monetary fields (cost, hourly_cost, cache_savings) or status labels. Those are computed for the dashboard view only; consult the Usage dashboard for cost figures.

Get Uptime Summary

GET/{project_id}/v1/usage/uptime

Retrieve a single-service uptime sample for the project's inference availability within an optional time window. Useful as a JSON alternative to the CSV export.

Parameter	Type	Description
since optional	string	ISO 8601 lower bound (inclusive). Defaults to 7 days ago.
until optional	string	ISO 8601 upper bound (exclusive). Defaults to now.

Response

                        {
  "services": {
    "inference": {
      "uptime_percent": 99.95
    }
  }
}
                    

curl

                    curl "https://xerotier.ai/{project_id}/v1/usage/uptime" \
  -H "Authorization: Bearer $XEROTIER_API_KEY"
                

Python

                    import os
import requests

project_id = "{project_id}"
headers = {"Authorization": f"Bearer {os.environ['XEROTIER_API_KEY']}"}
response = requests.get(
    f"https://xerotier.ai/{project_id}/v1/usage/uptime",
    headers=headers,
)
response.raise_for_status()
payload = response.json()
print(f"inference uptime: {payload['services']['inference']['uptime_percent']}%")
                

Node.js

                    const projectId = "{project_id}";
const response = await fetch(
    `https://xerotier.ai/${projectId}/v1/usage/uptime`,
    { headers: { "Authorization": `Bearer ${process.env.XEROTIER_API_KEY}` } }
);
if (!response.ok) throw new Error(`uptime ${response.status}`);
const payload = await response.json();
console.log(`inference uptime: ${payload.services.inference.uptime_percent}%`);
                

Dashboard JSON Endpoints

The /usage/events, /usage/logs, /usage/endpoints, and /usage/export/uptime routes are the cookie + CSRF authenticated dashboard JSON endpoints used by the web UI. They use a different envelope ({items, next_cursor, prev_cursor, has_more}) and additional fields (per-endpoint cost, hourly_cost, cache_hit_rate, status_label, etc.). They are not part of the bearer-authenticated public API. Use the router routes above for programmatic access.

Errors

The router usage routes follow the OpenAI error envelope. Common responses:

Status	Type	When
400	`invalid_request_error`	Malformed `since`/`until`, `until < since`, or `q` longer than 200 characters.
401	`authentication_error`	Missing or invalid bearer token.
403	`authorization_error`	Token does not match the `{project_id}` in the path.
500	`server_error`	Internal storage failure.

Error body

                        {
  "error": {
    "type": "invalid_request_error",
    "message": "until must be greater than since",
    "param": "until",
    "code": "invalid_time_range"
  }
}
                    

Uptime Billing

XIM nodes use an uptime-based billing model where you are charged for the time your agent is connected to the platform.

How It Works

Your XIM node connects to the Xerotier.ai control plane
Connection and disconnection timestamps are recorded automatically
Only connected time is billed, gaps between connections are free
Costs are calculated using the hourly rate from your agent's service tier

Billing Periods

Billing periods are 730-hour intervals (approximately 30.4 days) anchored to your project creation date. For example, if your project was created on January 1st at 00:00 UTC:

Period 1: Jan 1 00:00 - Jan 31 10:00 UTC (730 hours)
Period 2: Jan 31 10:00 - Mar 2 20:00 UTC (730 hours)
And so on...

Tier Hourly Rates

Each service tier defines an hourly rate for XIM nodes. The rate is displayed on the usage dashboard. See Service Tiers for current pricing. Self-hosted CPU and free-tier agents may have a rate of 0, in which case no hourly charge accrues.

CSV Export

Uptime CSV export is available from the Usage dashboard only. The underlying route GET /usage/export/uptime is cookie + CSRF authenticated (dashboard-only); there is no bearer-authenticated equivalent. For JSON access to an uptime sample, use the public router /{project_id}/v1/usage/uptime route documented above.

CSV Columns

CSV

                    Resource Type,Resource Name,Resource ID,Tier,Hourly Rate,Connected At,Disconnected At,Connected Hours,Cost
                

The CSV file is suitable for import into spreadsheet applications or billing systems. Defaults to the current 730-hour billing period if no date range is selected in the dashboard.

Prefix Cache Impact

When prefix caching is enabled on your endpoint, some input tokens may be served from cache rather than being recomputed. These are tracked as cached_tokens in usage records. The same field also counts tokens reused from a shared KV cache where applicable, it is a "cached input tokens, however served" counter rather than a per-endpoint prefix-cache-only counter.

Cache hit rate is displayed on the usage dashboard as a daily trend chart
Per-endpoint cache hit rate is shown in the dashboard endpoint usage table as a percentage (0-100). The router API does not emit a precomputed rate; compute cached_tokens / input_tokens client-side if needed.
Cached tokens reduce latency (especially TTFT). For shared agents they are billed at a discounted rate, not at zero; for XIM nodes they have no per-token billing impact.

See Prefix Caching for details on how to enable and optimize caching.

Frequently Asked Questions

What happens to usage data when I delete an endpoint?

Usage data is preserved. Each usage record retains the endpoint slug and model name, so historical data remains accessible. Deleted endpoints appear with a "Deleted" status label in the usage table.

How are billing periods calculated?

Billing periods are 730-hour intervals starting from your project creation date. The current period's start and end dates are displayed on the usage dashboard.

Can I have both shared and XIM nodes in the same project?

Yes. Shared agents are billed per token and XIM nodes are billed per hour of connected uptime. The usage dashboard segments these separately so you can see costs from each billing model.

Do cached tokens cost money?

It depends on the billing model. For shared agents, cached input tokens are still billed but at a discounted rate (the discount appears as cache_savings on the dashboard endpoint rollup). For XIM nodes there is no per-token billing, so cached tokens have no direct cost impact, only the hourly rate applies. In both cases, cached tokens significantly reduce latency by avoiding KV-cache recomputation for previously seen prompt prefixes.

What if my XIM node disconnects temporarily?

Only connected time is billed. Disconnection gaps are not charged. Each connection and disconnection event is recorded to calculate your actual connected hours.

How do credits and subscriptions work?

Credits are used for per-token inference billing on shared agents. For details on purchasing credits, managing subscriptions, and handling delinquent accounts, see Billing & Subscriptions. Note: free-tier projects with overdue balances have their dashboard cost figures clamped at the delinquency anchor (billed_token_cost_since) and continue accruing usage without further token-cost accrual until the account is settled.