Error Handling - Xerotier

Error Response Format

All error responses follow the OpenAI-compatible JSON format:

JSON

                    {
  "error": {
    "message": "Missing required field 'model'.",
    "type": "invalid_request_error",
    "code": "invalid_request",
    "param": "model"
  }
}
                

SDK integrators: the router currently emits a small number of envelope type strings that are outside the OpenAI vocabulary (authorization_error, internal_error, validation_error, service_error, stream_error, forbidden_error, cancelled). Switch on error.code, not error.type, when branching retry or user-facing logic. The code field is stable and machine-readable; the type field will be brought back to OpenAI parity in a future release.

Error Object Fields

Field	Type	Description
message	string	Human-readable error description. Sanitized on most paths to strip Swift type names, database errors, and stack traces; do not rely on sensitive data (file paths, IPs, tokens, UUIDs) being stripped in every response.
type	string	Error category (e.g., `invalid_request_error`, `authentication_error`, `rate_limit_error`, `server_error`).
code	string \| null	Machine-readable error code. See Error Codes below.
param	string \| null	The request parameter that caused the error, if applicable.
retry_after	integer	Vendor extension. Present only on HTTP 429 rate-limit responses. Suggested seconds to wait before retrying. Mirrors the `Retry-After` response header.
retry_strategy	object	Vendor extension. Present only on HTTP 429 rate-limit responses. Contains `type`, `initial_delay_ms`, `max_delay_ms`, `multiplier`, and `jitter` fields describing the recommended client-side backoff policy.

Error Codes

The following error codes can appear in inference API responses. Each code maps to a specific HTTP status, fault category, and retry policy.

20 of 20

Code	HTTP	Fault	Retryable	Description
`invalid_request`	400	Client	No	Request parameters are invalid (bad JSON, invalid max_tokens, unsupported modalities).
`context_length_exceeded`	400	Client	No	The input exceeds the model's context window or leaves too few tokens for output. Reduce prompt length, lower `max_tokens`, or use a model with a larger context window.
`json_parse_error`	400	Client	No	The request body could not be parsed as valid JSON.
`authentication_error`	401	Client	No	Invalid or missing API key. Check your Authorization header.
`insufficient_quota`	402	Client	No	Project credit balance is exhausted. Top up credits or upgrade your subscription to resume requests.
`billing_delinquent`	402	Client	No	Payment is overdue on the project's subscription. Resolve the outstanding invoice in the billing dashboard.
`endpoint_restricted`	403	Client	No	API key is restricted to a specific endpoint and cannot access this resource, or the client IP is blocked by the endpoint's IP filter.
`model_not_found`	404	Client	No	The requested model is not found or not loaded on any backend.
`project_not_found`	404	Client	No	The specified project does not exist or you do not have access.
`endpoint_not_found`	404	Client	No	The specified endpoint does not exist within this project.
`completion_not_found`	404	Client	No	The specified stored completion does not exist.
`response_not_found`	404	Client	No	The specified response does not exist.
`rate_limit_exceeded`	429	Client	Yes	Per-key or per-endpoint request rate limit exceeded. Check the `Retry-After` and `X-RateLimit-Reset` headers and retry with exponential backoff.
`capacity_exceeded`	429	Agent	Yes	Backend worker is at capacity. Retry after the suggested delay.
`quota_exceeded`	429	Client	No	Your account or endpoint has exceeded its usage quota. Upgrade your tier or wait for the quota to reset.
`endpoint_inactive`	503	Agent	Yes	The endpoint is not currently active. It may be provisioning or disabled.
`backend_unavailable`	503	Agent	Yes	Backend inference engine is unavailable (down, unreachable, or returning 5xx errors).
`timeout`	408	Network	Yes	Request timed out before completion. May be a connect timeout, read timeout, or deadline exceeded.
`invalid_state`	400	Client	No	The resource is not in a valid state for the requested operation (e.g., cancelling an already completed response).
`cancelled`	499	Client	No	Request was cancelled by the client (connection closed) or by the router.
`internal_error`	500	Agent	Yes	An unexpected internal error occurred. The router will attempt to retry on a different backend.
No codes match this filter.

Operational Error Codes

The following additional code values can appear on specific routes (project management, billing, tool invocations, exec, embeddings). They use the same envelope shape and fault-category retry rules as the inference codes above. Operators integrating against the billed router are most likely to encounter scope_insufficient, cross_project_access, insufficient_quota, and billing_delinquent.

21 of 21

Code	HTTP	Description
`scope_insufficient`	403	The API key does not carry the scope required by this route (one of `inference`, `management`, `execution`, `research`).
`cross_project_access`	403	The API key belongs to a different project than the targeted resource.
`tool_not_mcp_visible`	403	The tool exists but is not exposed to MCP clients on this endpoint.
`signing_public_key_required`	400	Endpoint requires a registered signing public key before requests are accepted.
`invalid_model_id`	400	The `model` field is malformed or refers to an unknown model.
`invalid_tier`	400	The requested tier does not exist or is not available to the project.
`unsupported_modality`	400	The request includes a modality (audio, vision) that this endpoint does not support.
`model_not_embedding`	400	The model targeted by `/v1/embeddings` is not an embeddings model.
`model_capability_missing`	400	The model is loaded but does not advertise the capability required by the request (tools, reranking, etc.).
`endpoint_task_mode_mismatch`	400	The endpoint is configured for a different task mode than the request implies.
`model_not_scoring`	400	The model targeted by a rerank / score request is not a scoring model.
`model_provisioning`	503	The model is being provisioned on a backend. Retry after a short delay.
`tool_executor_unavailable`	503	The exec backend is offline or unreachable.
`invocation_terminal`	409	The tool invocation has already reached a terminal state and cannot be modified.
`invocation_not_found`	404	The referenced tool invocation does not exist within this project.
`execution_not_found`	404	The referenced exec execution does not exist within this project.
`approval_not_found`	404	The referenced approval request does not exist.
`approval_not_pending`	409	The approval request has already been resolved (approved, denied, or expired).
`candidate_not_found`	404	The referenced rerank candidate does not exist.
`exec_tool_not_found`	404	The referenced exec tool name is not registered.
`agent_not_found`	404	The referenced agent does not exist within this project.
No codes match this filter.

A complete enumeration of operational codes appears in the operator handbook; this section lists the codes most likely to surface in production SDK integrations.

Distinguishing 429 Errors

Three error codes map to HTTP 429, each with different meanings and retry behavior:

rate_limit_exceeded -- Per-key or per-endpoint request rate limit hit. The rate limiter's sliding window is full. Retry after the Retry-After / X-RateLimit-Reset headers indicate. Use exponential backoff.
capacity_exceeded -- Transient backend overload. The backend worker has no available slots. Retry after the delay indicated in the response or the Retry-After header.
quota_exceeded -- Persistent billing quota exhaustion. Your account has hit its token or request usage limit. Do not retry; upgrade your plan or wait for the quota period to reset.

Check the code field in the error response to distinguish them.

Rate Limit Headers

Every inference response includes rate limit headers so clients can track their current usage window and anticipate throttling before it occurs. Both the IETF draft standard form (RateLimit-*, per draft-ietf-httpapi-ratelimit-headers) and the widely-supported vendor form (X-RateLimit-*) are always present, so a client behind an HTTP proxy that strips the X- prefix can still read the unprefixed headers (and vice versa).

Header	Description
`X-RateLimit-Limit`	Maximum number of requests allowed per rate limit window for the authenticated API key or endpoint.
`X-RateLimit-Remaining`	Number of requests remaining in the current window.
`X-RateLimit-Reset`	Seconds until the current window resets and the limit is restored.
`RateLimit-Limit`	Same as `X-RateLimit-Limit` (IETF draft form).
`RateLimit-Remaining`	Same as `X-RateLimit-Remaining` (IETF draft form).
`RateLimit-Reset`	Same as `X-RateLimit-Reset` (IETF draft form).
`X-RateLimit-Warning`	Present only when remaining requests fall below 20% of the limit. Value is `approaching_limit`. Use this as an early warning to reduce request rate.
`Retry-After`	Present on 429 responses. Number of seconds to wait before retrying. Also included in the JSON error body as `retry_after`.
`X-Request-ID`	Opaque request identifier set by the router on every response. Include this value when contacting support so the request can be located in server logs.

Rate Limit Window

Rate limits use a sliding window algorithm. The window width and maximum request count depend on your service tier and any custom limits configured for your API key or endpoint. When a custom limit is configured, it takes precedence over the tier default.

Rate Limit Error Body

When a request is rate limited, the response body includes backoff guidance in addition to the standard error fields:

JSON

                    {
  "error": {
    "message": "Rate limit exceeded. Please retry after 15 seconds using exponential backoff.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded",
    "retry_after": 15,
    "retry_strategy": {
      "type": "exponential_backoff",
      "initial_delay_ms": 15000,
      "max_delay_ms": 60000,
      "multiplier": 2,
      "jitter": true
    }
  }
}
                

Fault Categories

Errors are classified into three fault categories, each with a different retry policy:

// Client fault

Fix the request

0 retries -- backoff -- multiplier

The request is invalid. The router does not retry; the caller fixes parameters, prompt length, or model name and resubmits.

// Agent fault

Router retries

3 retries 1s -> 30s backoff 2x multiplier

Backend issue. The router retries on a different backend automatically before returning an error to the client.

// Network fault

Aggressive backoff

5 retries 0.5s -> 60s backoff 2x multiplier

Network or timeout issue. The router retries with aggressive backoff before surfacing the failure.

Category	Max Retries	Initial Backoff	Max Backoff	Multiplier	Description
Client Fault	0	--	--	--	The request is invalid. Fix the request before retrying.
Agent Fault	3	1s	30s	2x	Backend issue. The router retries on a different backend automatically.
Network Fault	5	0.5s	60s	2x	Network or timeout issue. The router retries with aggressive backoff.

The router performs internal retries for agent and network faults before returning an error to the client. If all retries are exhausted, the final error is returned with the appropriate HTTP status code.

Inference-Specific Errors

The following scenarios are common during inference and have specific handling guidance:

Context Length Exceeded (400)

Your prompt exceeds the model's maximum context length. The error message includes details about the limit. Reduce your prompt length or use a model with a larger context window.

Max Tokens Invalid (400)

The max_tokens or max_completion_tokens value is invalid (negative, zero, or exceeds the model's limit). Adjust the value to be within the model's supported range.

Invalid Reasoning Effort (400)

The reasoning_effort parameter must be one of "low", "medium", or "high". Any other value (including uppercase variants like "LOW" or numeric strings like "1") returns a 400 error with type: "invalid_request_error" and param: "reasoning_effort". Omit the field entirely if you do not need reasoning effort control.

Invalid Logprobs Configuration (400)

Two logprobs validation rules are enforced:

top_logprobs without logprobs -- Setting top_logprobs requires logprobs: true. If logprobs is omitted or set to false while top_logprobs is present, the request returns a 400 error with type: "invalid_request_error" and param: "top_logprobs".
top_logprobs out of range -- The top_logprobs value must be between 0 and 20 (inclusive). Values outside this range return a 400 error with type: "invalid_request_error" and param: "top_logprobs".

Model Not Loaded (404)

The model specified in your endpoint configuration is not currently loaded on any backend. This can happen if no backends are available for your tier or if the model has been removed. Check your endpoint configuration and backend status.

Request Timeout (408)

The request did not complete within the tier's timeout limit. Timeouts vary by tier: the foundational free tier is 30 seconds and the self_hosted tier is 1800 seconds. Custom tiers may set any timeout; the active value is visible on the tier configuration in the dashboard. Consider using streaming to avoid timeouts on long-running generation, or reduce the max_tokens value.

Streaming Errors

Errors during streaming behave differently depending on when they occur:

Pre-Stream Errors

If an error occurs before any tokens are generated (e.g., invalid request, model not found), you receive a standard HTTP error response with the appropriate status code. No SSE events are sent.

Mid-Stream Errors

If an error occurs after streaming has started (HTTP 200 has already been sent), the error is delivered as an SSE event:

SSE Error Event

                    event: error
data: {"error":{"message":"Backend connection lost","type":"server_error","code":"backend_unavailable"}}
                

Branch on code, not type. After an error event, the stream is terminated with a data: [DONE] sentinel. Mid-stream error envelopes populate the type and code fields inconsistently across emit sites: some events carry only code, some carry both. The code field is the stable handle.

The combinations that can appear mid-stream are:

Error `type`	Error `code`	Cause
`server_error`	`internal_error`, `backend_unavailable`	Internal server error during generation. The `code` field identifies the specific cause.
`stream_error`	(varies)	Generic streaming-pipeline failure. Inspect `code` and `message` for the specific cause.
`timeout_error`	`timeout`	Request deadline exceeded during generation.
(not set)	`stream_idle_timeout`	No data received from backend within the idle timeout period. The envelope has only `code`; `type` is omitted.
(not set)	`cancelled`	Request was cancelled by the client or router. The envelope has only `code`; `type` is omitted.

Handling Partial Responses

When a mid-stream error occurs, you may have received partial content. Concatenate all delta.content values received before the error to get the partial response. Decide whether to use the partial content or retry the full request based on your application requirements.

Retry Guidance

The router handles most retries internally, but if the final response is an error, use these guidelines for client-side retries:

Retryable Errors

rate_limit_exceeded (429) -- Respect the Retry-After header and use exponential backoff. The error body includes a retry_strategy object with recommended parameters.
capacity_exceeded (429) -- Wait for the duration in the Retry-After header, then retry.
backend_unavailable (503) -- Wait 10-30 seconds, then retry. The Retry-After header provides a specific delay.
timeout (408) -- Retry after 5 seconds. Consider reducing prompt length or max_tokens.
internal_error (500) -- Retry after 10 seconds. If persistent, contact support.

Non-Retryable Errors

invalid_request (400) -- Fix the request. Check parameters, prompt length, and model name.
authentication_error (401) -- Verify your API key is correct and active.
endpoint_restricted (403) -- The API key is restricted to a different endpoint, or the client IP is blocked. Update the key or IP filter configuration.
model_not_found (404) -- Verify the model is configured for your endpoint.
quota_exceeded (429) -- Upgrade your plan or wait for the quota period to reset.
cancelled (499) -- The client disconnected. No retry needed unless the disconnection was unintentional.

Exponential Backoff Example

Python

                    import time, random
from email.utils import parsedate_to_datetime
from datetime import datetime, timezone

def parse_retry_after(value):
    """Parse a Retry-After header value.

    RFC 7231 allows either a non-negative integer (delta-seconds) or an
    HTTP-date. Returns the number of seconds to wait, or None if the
    value cannot be parsed.
    """
    if value is None:
        return None
    try:
        return max(0.0, float(value))
    except (TypeError, ValueError):
        pass
    try:
        when = parsedate_to_datetime(value)
        if when.tzinfo is None:
            when = when.replace(tzinfo=timezone.utc)
        return max(0.0, (when - datetime.now(timezone.utc)).total_seconds())
    except (TypeError, ValueError):
        return None

def request_with_retry(make_request, max_retries=3):
    delay = 1.0
    for attempt in range(max_retries + 1):
        response = make_request()
        if 200 <= response.status_code < 300:
            return response
        if response.status_code in (408, 429, 500, 503):
            parsed = parse_retry_after(response.headers.get("Retry-After"))
            if parsed is not None:
                delay = parsed
            jitter = random.uniform(0, delay * 0.1)
            time.sleep(delay + jitter)
            delay = min(delay * 2, 60)
        else:
            raise Exception(f"Non-retryable error: {response.status_code}")
    raise Exception("Max retries exhausted")
                

Node.js Exponential Backoff Example

Node.js

                    // Parse a Retry-After header value per RFC 7231: either a non-negative
// number of seconds or an HTTP-date. Returns milliseconds to wait, or
// null if the value cannot be parsed.
function parseRetryAfterMs(value) {
    if (value == null) {
        return null;
    }
    const asNumber = Number(value);
    if (Number.isFinite(asNumber) && asNumber >= 0) {
        return asNumber * 1000;
    }
    const asDate = Date.parse(value);
    if (Number.isFinite(asDate)) {
        return Math.max(0, asDate - Date.now());
    }
    return null;
}

async function requestWithRetry(makeRequest, maxRetries = 3) {
    let delay = 1000;
    for (let attempt = 0; attempt <= maxRetries; attempt++) {
        const response = await makeRequest();
        if (response.ok) {
            return response;
        }
        if ([408, 429, 500, 503].includes(response.status)) {
            const parsed = parseRetryAfterMs(response.headers.get("Retry-After"));
            if (parsed !== null) {
                delay = parsed;
            }
            const jitter = Math.random() * delay * 0.1;
            await new Promise(r => setTimeout(r, delay + jitter));
            delay = Math.min(delay * 2, 60000);
        } else {
            throw new Error(`Non-retryable error: ${response.status}`);
        }
    }
    throw new Error("Max retries exhausted");
}
                

Troubleshooting

My request returns 400 Bad Request

Check that your request body is valid JSON.
Verify the messages array is present and non-empty.
Check that your prompt does not exceed the model's context length.
Verify that max_tokens / max_completion_tokens are positive integers within the model's limit.
Check that temperature is between 0.0 and 2.0.
If using reasoning_effort, verify the value is one of "low", "medium", or "high".

My request returns 401 Unauthorized

Verify your API key is correct and has not been revoked.
Check the Authorization header format: Bearer xero_...
Ensure the API key belongs to the correct project.

My request returns 402 Payment Required

Check the code field. insufficient_quota means the project's credit balance is exhausted; top up credits in the billing dashboard or upgrade your subscription.
billing_delinquent means payment is overdue; resolve the outstanding invoice in the billing dashboard. Requests will not resume automatically until the balance is settled.

My request returns 403 Forbidden

Check the code field. endpoint_restricted means your API key is scoped to a specific endpoint and you are trying to access a different one.
If your IP is being blocked, verify your client IP is in the endpoint's IP allowlist, or that it is not in the blocklist. Check your endpoint's IP filter configuration in the dashboard.
Ensure you are not using a key issued for one project to access an endpoint belonging to another project.

I am getting 429 Too Many Requests

Check the code field: rate_limit_exceeded means per-key/endpoint request rate limit hit (retry with backoff); capacity_exceeded is transient backend overload (retry); quota_exceeded is a persistent billing quota (upgrade or wait).
Respect the Retry-After header. For rate_limit_exceeded, also check X-RateLimit-Reset.
Monitor X-RateLimit-Remaining on each response to proactively slow down before hitting the limit.
Watch for the X-RateLimit-Warning: approaching_limit header, it appears when fewer than 20% of the window's requests remain.
Reduce your request rate or implement client-side rate limiting.
Consider upgrading to a higher tier or requesting a custom rate limit for increased throughput.

My request returns 499 Client Closed Request

499 is not a server-side error: it indicates the client disconnected (or the router-side timeout closed the connection) before the response completed. SDKs and proxies sometimes log it as a failure, but no retry is required unless the disconnect was unintentional.
If you are seeing repeated 499s with no client-side cancellation, check intermediate proxies for connection-idle timeouts shorter than your tier's request timeout.

My request returns 500 Internal Server Error

Retry once after 10 seconds. The router handles most transient backend failures internally, so a 500 that reaches the client typically indicates an unhandled condition.
If 500s appear in a burst shortly after a router release, the root cause is often database role-grant drift (PostgreSQL 42501) that requires an operator-side refresh migration. Capture the X-Request-ID header and contact support; the issue cannot be self-diagnosed from the error envelope alone because the underlying sqlState is logged server-side, not surfaced in the response.
Persistent 500s on a single route after a release should be reported with the request ID and approximate timestamp.

My responses are slow

Use streaming to reduce perceived latency.
Check if your prompts are optimized for prefix caching. See Prefix Caching.
Consider a GPU tier for latency-sensitive workloads.
Reduce max_tokens if you do not need long responses.

My streaming connection drops

Check your client's read timeout, it must exceed the tier's idle stream timeout. The foundational tiers are 120 seconds for free and 3600 seconds for self_hosted; custom tiers may set any value. The active idle timeout is visible on the tier configuration in the dashboard.
The server sends heartbeat comments every 15 seconds to keep the connection alive. If you are behind a proxy, ensure it does not strip SSE events or impose its own timeout shorter than the idle timeout.
If you receive an event: error, the stream has been terminated by the server. Check the error type for the cause.

When to Contact Support

Contact support if:

You receive persistent internal_error (500) responses that do not resolve with retries.
You see backend_unavailable (503) errors consistently for more than 5 minutes.
Your usage metrics do not match your billing.
You suspect unauthorized access to your account or API keys.

When contacting support, include: your project ID, the endpoint name, the request ID from the X-Request-ID response header, the error response body, and the approximate time of the issue.