SLO Tracking

Define service level objectives, track compliance, and use per-request latency targets.

Overview

Service Level Objectives (SLOs) define target performance for your inference endpoints. Xerotier provides two complementary SLO mechanisms:

  • SLO Management API -- Define SLO targets, track compliance over time, and view project-wide summaries through a full CRUD API.
  • Per-request SLO headers -- Pass latency targets on individual requests to hint routing preferences in real time.

SLO Management API

All paths are relative to your endpoint base URL: https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG

Endpoints

Method Path Description
POST /v1/slos Create an SLO definition
GET /v1/slos List SLO definitions
GET /v1/slos/{id} Get an SLO definition
PUT /v1/slos/{id} Update an SLO definition
DELETE /v1/slos/{id} Delete an SLO definition
GET /v1/slos/{id}/history Get compliance history
POST /v1/slos/{id}/calculate Trigger manual calculation
GET /v1/slos/summary Get project SLO summary
GET /v1/endpoints/{endpoint_id}/slos List SLOs for an endpoint

Create SLO

POST /v1/slos

Parameter Type Description
namerequired string Human-readable name (1-128 ASCII printable characters).
descriptionoptional string Optional description (max 512 characters).
metricrequired string Metric to track. See Supported Metrics.
targetrequired number Target threshold value (must be positive).
comparisonrequired string Comparison operator. See Comparisons.
window_daysrequired integer Rolling evaluation window in days (1-90).
endpoint_idoptional string Endpoint UUID to scope the SLO. Omit for project-wide SLOs.
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "name": "API Availability", "description": "99.9% availability for production API", "metric": "availability", "target": 99.9, "comparison": "greater_than_or_equal", "window_days": 30 }'

Response

{ "id": "00000000-1111-0000-1111-000000000000", "object": "slo", "name": "API Availability", "description": "99.9% availability for production API", "metric": "availability", "target": 99.9, "comparison": "greater_than_or_equal", "window_days": 30, "endpoint_id": null, "is_active": true, "latest_compliance": null, "created_at": 1709000000, "updated_at": 1709000000 }

List SLOs

GET /v1/slos

Returns a paginated list of SLO definitions. Use limit and after for pagination.

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos?limit=10 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Get SLO

GET /v1/slos/{slo_id}

Returns the SLO definition with its latest compliance data.

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response (with compliance data)

{ "id": "00000000-1111-0000-1111-000000000000", "object": "slo", "name": "API Availability", "metric": "availability", "target": 99.9, "comparison": "greater_than_or_equal", "window_days": 30, "is_active": true, "latest_compliance": { "measured_value": 99.95, "compliance_percentage": 100.0, "is_met": true, "total_requests": 150000, "conforming_requests": 150000, "calculated_at": 1709050000 }, "created_at": 1709000000, "updated_at": 1709000000 }

Update SLO

PUT /v1/slos/{slo_id}

All fields are optional. Only provided fields are updated.

Parameter Type Description
nameoptional string Updated name.
descriptionoptional string Updated description (null to clear).
targetoptional number Updated target value.
comparisonoptional string Updated comparison operator.
window_daysoptional integer Updated window (1-90 days).
is_activeoptional boolean Enable or disable the SLO.
endpoint_idoptional string Updated endpoint scope (null for project-wide).
curl
curl -X PUT https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "target": 99.95, "window_days": 7 }'

Delete SLO

DELETE /v1/slos/{slo_id}

Permanently deletes the SLO definition and all associated history.

curl
curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Get Compliance History

GET /v1/slos/{slo_id}/history

Returns a paginated list of compliance calculation entries for the SLO.

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000/history?limit=10 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response

{ "object": "list", "data": [ { "id": "hist_abc123", "object": "slo.history", "slo_id": "00000000-1111-0000-1111-000000000000", "period_start": 1708900000, "period_end": 1709000000, "measured_value": 99.95, "total_requests": 150000, "conforming_requests": 150000, "compliance_percentage": 100.0, "is_met": true, "calculated_at": 1709000100 } ], "first_id": "hist_abc123", "last_id": "hist_abc123", "has_more": false }

Trigger Calculation

POST /v1/slos/{slo_id}/calculate

Triggers an on-demand compliance calculation for the SLO and returns the result.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000/calculate \ -H "Authorization: Bearer xero_myproject_your_api_key"

Project Summary

GET /v1/slos/summary

Returns an overview of all active SLOs for the project with their current compliance status.

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/summary \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response

{ "object": "slo.summary", "total_active": 3, "total_met": 2, "total_not_met": 1, "total_unevaluated": 0, "slos": [ { "id": "slo_1", "name": "API Availability", "metric": "availability", "target": 99.9, "status": "met", "compliance_percentage": 99.98, "measured_value": 99.98, "last_calculated_at": 1709050000 }, { "id": "slo_2", "name": "P95 Latency", "metric": "total_latency_ms", "target": 500, "status": "not_met", "compliance_percentage": 94.2, "measured_value": 520.0, "last_calculated_at": 1709050000 } ] }

Limits

  • Max 50 SLO definitions per project.
  • Window range: 1-90 days.
  • Name length: 1-128 ASCII printable characters.
  • Description length: Max 512 characters.

Supported Metrics

Metric Description Unit
ttft_ms Time to first token milliseconds
tpot_ms Time per output token milliseconds
total_latency_ms Total end-to-end request latency milliseconds
availability Percentage of successful requests percent (0-100)
error_rate Percentage of failed requests percent (0-100)
throughput_rps Average requests per second requests/second

Comparison Operators

Comparison Description Typical Use
less_than Metric must be less than target Latency, error rate
less_than_or_equal Metric must be at most target Latency, error rate
greater_than Metric must exceed target Throughput
greater_than_or_equal Metric must be at least target Availability

Compliance Calculation

SLO compliance is measured as the percentage of requests within the evaluation window that meet the defined target:

compliance_percentage = (conforming_requests / total_requests) * 100

A request is "conforming" if the measured metric value satisfies the comparison operator against the target. For example, with metric=availability, target=99.9, comparison=greater_than_or_equal, the SLO is met when the measured availability is 99.9% or higher.

Compliance calculations run automatically on a periodic schedule. Use the POST /v1/slos/{id}/calculate endpoint to trigger an on-demand calculation.

SLO Request Headers

Include SLO targets as request headers on chat completion requests. These are optional hints -- they do not enforce hard limits or cause requests to fail if targets are missed.

Header Type Description
X-SLO-TTFT-Ms integer Target time-to-first-token in milliseconds (e.g., 500 for 500ms TTFT target).
X-SLO-TPOT-Ms integer Target time-per-output-token in milliseconds (e.g., 30 for 30ms per token).

You can send one or both headers. If neither is provided, the router uses its default routing strategy without SLO-aware adjustments. When these headers are present, the router prefers backends that are predicted to meet the specified latency targets. SLO headers do not guarantee that the target will be met -- they are best-effort hints. If no backend is predicted to meet the target, the router still selects the best available backend.

Examples

Create an Availability SLO

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "name": "Production Availability", "metric": "availability", "target": 99.9, "comparison": "greater_than_or_equal", "window_days": 30 }'

Create a Latency SLO

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "name": "P95 TTFT Target", "metric": "ttft_ms", "target": 500, "comparison": "less_than_or_equal", "window_days": 7 }'

Per-Request SLO Headers

curl
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -H "X-SLO-TTFT-Ms: 500" \ -H "X-SLO-TPOT-Ms: 30" \ -d '{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}] }'

Python

Python
import requests base = "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1" headers = {"Authorization": "Bearer xero_myproject_your_api_key"} # Create an SLO slo = requests.post(f"{base}/slos", headers=headers, json={ "name": "Chat Latency", "metric": "total_latency_ms", "target": 1000, "comparison": "less_than", "window_days": 7 }).json() print(f"Created SLO: {slo['id']}") # Check compliance summary summary = requests.get(f"{base}/slos/summary", headers=headers).json() for entry in summary["slos"]: print(f"{entry['name']}: {entry['status']} ({entry['compliance_percentage']}%)") # Send a request with SLO headers response = requests.post( f"{base}/chat/completions", headers={ **headers, "Content-Type": "application/json", "X-SLO-TTFT-Ms": "500", "X-SLO-TPOT-Ms": "30", }, json={ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}], }, )

Node.js

Node.js
const base = "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1"; const headers = { "Authorization": "Bearer xero_myproject_your_api_key", "Content-Type": "application/json" }; // Create an SLO const sloResponse = await fetch(`${base}/slos`, { method: "POST", headers, body: JSON.stringify({ name: "Chat Latency", metric: "total_latency_ms", target: 1000, comparison: "less_than", window_days: 7 }) }); const slo = await sloResponse.json(); console.log(`Created SLO: ${slo.id}`); // Check compliance summary const summaryResponse = await fetch(`${base}/slos/summary`, { headers }); const summary = await summaryResponse.json(); for (const entry of summary.slos) { console.log(`${entry.name}: ${entry.status} (${entry.compliance_percentage}%)`); } // Send a request with SLO headers const inferenceResponse = await fetch(`${base}/chat/completions`, { method: "POST", headers: { ...headers, "X-SLO-TTFT-Ms": "500", "X-SLO-TPOT-Ms": "30" }, body: JSON.stringify({ model: "llama-3.1-8b", messages: [{ role: "user", content: "Hello!" }] }) });