SLO Tracking
Define service level objectives, track compliance, and use per-request latency targets.
Overview
Service Level Objectives (SLOs) define target performance for your inference endpoints. Xerotier provides two complementary SLO mechanisms:
- SLO Management API -- Define SLO targets, track compliance over time, and view project-wide summaries through a full CRUD API.
- Per-request SLO headers -- Pass latency targets on individual requests to hint routing preferences in real time.
SLO Management API
All paths are relative to your endpoint base URL:
https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/slos | Create an SLO definition |
| GET | /v1/slos | List SLO definitions |
| GET | /v1/slos/{id} | Get an SLO definition |
| PUT | /v1/slos/{id} | Update an SLO definition |
| DELETE | /v1/slos/{id} | Delete an SLO definition |
| GET | /v1/slos/{id}/history | Get compliance history |
| POST | /v1/slos/{id}/calculate | Trigger manual calculation |
| GET | /v1/slos/summary | Get project SLO summary |
| GET | /v1/endpoints/{endpoint_id}/slos | List SLOs for an endpoint |
Create SLO
POST /v1/slos
| Parameter | Type | Description |
|---|---|---|
| namerequired | string | Human-readable name (1-128 ASCII printable characters). |
| descriptionoptional | string | Optional description (max 512 characters). |
| metricrequired | string | Metric to track. See Supported Metrics. |
| targetrequired | number | Target threshold value (must be positive). |
| comparisonrequired | string | Comparison operator. See Comparisons. |
| window_daysrequired | integer | Rolling evaluation window in days (1-90). |
| endpoint_idoptional | string | Endpoint UUID to scope the SLO. Omit for project-wide SLOs. |
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "API Availability",
"description": "99.9% availability for production API",
"metric": "availability",
"target": 99.9,
"comparison": "greater_than_or_equal",
"window_days": 30
}'
Response
{
"id": "00000000-1111-0000-1111-000000000000",
"object": "slo",
"name": "API Availability",
"description": "99.9% availability for production API",
"metric": "availability",
"target": 99.9,
"comparison": "greater_than_or_equal",
"window_days": 30,
"endpoint_id": null,
"is_active": true,
"latest_compliance": null,
"created_at": 1709000000,
"updated_at": 1709000000
}
List SLOs
GET /v1/slos
Returns a paginated list of SLO definitions. Use limit and after for pagination.
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos?limit=10 \
-H "Authorization: Bearer xero_myproject_your_api_key"
Get SLO
GET /v1/slos/{slo_id}
Returns the SLO definition with its latest compliance data.
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response (with compliance data)
{
"id": "00000000-1111-0000-1111-000000000000",
"object": "slo",
"name": "API Availability",
"metric": "availability",
"target": 99.9,
"comparison": "greater_than_or_equal",
"window_days": 30,
"is_active": true,
"latest_compliance": {
"measured_value": 99.95,
"compliance_percentage": 100.0,
"is_met": true,
"total_requests": 150000,
"conforming_requests": 150000,
"calculated_at": 1709050000
},
"created_at": 1709000000,
"updated_at": 1709000000
}
Update SLO
PUT /v1/slos/{slo_id}
All fields are optional. Only provided fields are updated.
| Parameter | Type | Description |
|---|---|---|
| nameoptional | string | Updated name. |
| descriptionoptional | string | Updated description (null to clear). |
| targetoptional | number | Updated target value. |
| comparisonoptional | string | Updated comparison operator. |
| window_daysoptional | integer | Updated window (1-90 days). |
| is_activeoptional | boolean | Enable or disable the SLO. |
| endpoint_idoptional | string | Updated endpoint scope (null for project-wide). |
curl -X PUT https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"target": 99.95,
"window_days": 7
}'
Delete SLO
DELETE /v1/slos/{slo_id}
Permanently deletes the SLO definition and all associated history.
curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000 \
-H "Authorization: Bearer xero_myproject_your_api_key"
Get Compliance History
GET /v1/slos/{slo_id}/history
Returns a paginated list of compliance calculation entries for the SLO.
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000/history?limit=10 \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response
{
"object": "list",
"data": [
{
"id": "hist_abc123",
"object": "slo.history",
"slo_id": "00000000-1111-0000-1111-000000000000",
"period_start": 1708900000,
"period_end": 1709000000,
"measured_value": 99.95,
"total_requests": 150000,
"conforming_requests": 150000,
"compliance_percentage": 100.0,
"is_met": true,
"calculated_at": 1709000100
}
],
"first_id": "hist_abc123",
"last_id": "hist_abc123",
"has_more": false
}
Trigger Calculation
POST /v1/slos/{slo_id}/calculate
Triggers an on-demand compliance calculation for the SLO and returns the result.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/00000000-1111-0000-1111-000000000000/calculate \
-H "Authorization: Bearer xero_myproject_your_api_key"
Project Summary
GET /v1/slos/summary
Returns an overview of all active SLOs for the project with their current compliance status.
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos/summary \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response
{
"object": "slo.summary",
"total_active": 3,
"total_met": 2,
"total_not_met": 1,
"total_unevaluated": 0,
"slos": [
{
"id": "slo_1",
"name": "API Availability",
"metric": "availability",
"target": 99.9,
"status": "met",
"compliance_percentage": 99.98,
"measured_value": 99.98,
"last_calculated_at": 1709050000
},
{
"id": "slo_2",
"name": "P95 Latency",
"metric": "total_latency_ms",
"target": 500,
"status": "not_met",
"compliance_percentage": 94.2,
"measured_value": 520.0,
"last_calculated_at": 1709050000
}
]
}
Limits
- Max 50 SLO definitions per project.
- Window range: 1-90 days.
- Name length: 1-128 ASCII printable characters.
- Description length: Max 512 characters.
Supported Metrics
| Metric | Description | Unit |
|---|---|---|
ttft_ms |
Time to first token | milliseconds |
tpot_ms |
Time per output token | milliseconds |
total_latency_ms |
Total end-to-end request latency | milliseconds |
availability |
Percentage of successful requests | percent (0-100) |
error_rate |
Percentage of failed requests | percent (0-100) |
throughput_rps |
Average requests per second | requests/second |
Comparison Operators
| Comparison | Description | Typical Use |
|---|---|---|
less_than |
Metric must be less than target | Latency, error rate |
less_than_or_equal |
Metric must be at most target | Latency, error rate |
greater_than |
Metric must exceed target | Throughput |
greater_than_or_equal |
Metric must be at least target | Availability |
Compliance Calculation
SLO compliance is measured as the percentage of requests within the evaluation window that meet the defined target:
compliance_percentage = (conforming_requests / total_requests) * 100
A request is "conforming" if the measured metric value satisfies the comparison
operator against the target. For example, with
metric=availability, target=99.9, comparison=greater_than_or_equal,
the SLO is met when the measured availability is 99.9% or higher.
Compliance calculations run automatically on a periodic schedule. Use the
POST /v1/slos/{id}/calculate endpoint to trigger an on-demand
calculation.
SLO Request Headers
Include SLO targets as request headers on chat completion requests. These are optional hints -- they do not enforce hard limits or cause requests to fail if targets are missed.
| Header | Type | Description |
|---|---|---|
X-SLO-TTFT-Ms |
integer | Target time-to-first-token in milliseconds (e.g., 500 for 500ms TTFT target). |
X-SLO-TPOT-Ms |
integer | Target time-per-output-token in milliseconds (e.g., 30 for 30ms per token). |
You can send one or both headers. If neither is provided, the router uses its default routing strategy without SLO-aware adjustments. When these headers are present, the router prefers backends that are predicted to meet the specified latency targets. SLO headers do not guarantee that the target will be met -- they are best-effort hints. If no backend is predicted to meet the target, the router still selects the best available backend.
Examples
Create an Availability SLO
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Availability",
"metric": "availability",
"target": 99.9,
"comparison": "greater_than_or_equal",
"window_days": 30
}'
Create a Latency SLO
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/slos \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"name": "P95 TTFT Target",
"metric": "ttft_ms",
"target": 500,
"comparison": "less_than_or_equal",
"window_days": 7
}'
Per-Request SLO Headers
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-H "X-SLO-TTFT-Ms: 500" \
-H "X-SLO-TPOT-Ms: 30" \
-d '{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}]
}'
Python
import requests
base = "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1"
headers = {"Authorization": "Bearer xero_myproject_your_api_key"}
# Create an SLO
slo = requests.post(f"{base}/slos", headers=headers, json={
"name": "Chat Latency",
"metric": "total_latency_ms",
"target": 1000,
"comparison": "less_than",
"window_days": 7
}).json()
print(f"Created SLO: {slo['id']}")
# Check compliance summary
summary = requests.get(f"{base}/slos/summary", headers=headers).json()
for entry in summary["slos"]:
print(f"{entry['name']}: {entry['status']} ({entry['compliance_percentage']}%)")
# Send a request with SLO headers
response = requests.post(
f"{base}/chat/completions",
headers={
**headers,
"Content-Type": "application/json",
"X-SLO-TTFT-Ms": "500",
"X-SLO-TPOT-Ms": "30",
},
json={
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}],
},
)
Node.js
const base = "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1";
const headers = {
"Authorization": "Bearer xero_myproject_your_api_key",
"Content-Type": "application/json"
};
// Create an SLO
const sloResponse = await fetch(`${base}/slos`, {
method: "POST",
headers,
body: JSON.stringify({
name: "Chat Latency",
metric: "total_latency_ms",
target: 1000,
comparison: "less_than",
window_days: 7
})
});
const slo = await sloResponse.json();
console.log(`Created SLO: ${slo.id}`);
// Check compliance summary
const summaryResponse = await fetch(`${base}/slos/summary`, { headers });
const summary = await summaryResponse.json();
for (const entry of summary.slos) {
console.log(`${entry.name}: ${entry.status} (${entry.compliance_percentage}%)`);
}
// Send a request with SLO headers
const inferenceResponse = await fetch(`${base}/chat/completions`, {
method: "POST",
headers: {
...headers,
"X-SLO-TTFT-Ms": "500",
"X-SLO-TPOT-Ms": "30"
},
body: JSON.stringify({
model: "llama-3.1-8b",
messages: [{ role: "user", content: "Hello!" }]
})
});