Responses API
Stateful generation without rebuilding the conversation each turn. Chain responses by id, queue long jobs in the background, stream incremental reasoning, and let the server keep the message history. Same wire shape as the OpenAI Responses API.
Overview
When to Use Responses API
- Multi-turn conversations, Chain responses together with
previous_response_idinstead of resending the full message history. - Persistent storage, Responses are stored and retrievable by ID for later reference.
- Background processing, Queue long-running requests and poll for completion.
- Client SDK support, OpenAI Python/Node.js SDKs natively support the Responses API.
When to Use Chat Completions
- You need full control over the message history.
- You are using a client or tool that only supports the Chat Completions API.
- You do not need server-side storage of responses.
Translation layer. Internally, every Responses API request is converted to a Chat Completion, routed through the same inference pipeline, and converted back to the Response format. Model behavior is identical.
Quick Start
Create a response with a simple text input:
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
print(response.output[0].content[0].text)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?"
});
console.log(response.output[0].content[0].text);
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
Response
{
"id": "resp_abc123def456ghi789jkl012",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris."
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 8,
"total_tokens": 20
},
"created_at": 1706123456,
"service_tier": "default",
"store": true,
"metadata": null
}
Authentication
All Responses API endpoints require a valid API key with the
inference scope. Pass it in the Authorization header:
Authorization: Bearer xero_myproject_your_api_key
See Authentication & Security for details on creating and managing API keys.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/responses | Create a response |
| GET | /v1/responses | List responses |
| GET | /v1/responses/{response_id} | Get a response by ID |
| DELETE | /v1/responses/{response_id} | Delete a response |
| POST | /v1/responses/{response_id}/cancel | Cancel an in-progress response |
| GET | /v1/responses/{response_id}/input_items | List input items for a response |
All paths are relative to your endpoint base URL:
https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG
Create Response
POST /v1/responses
Request Body
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Used by the router for validation gates (e.g. model_not_found, invalid_model_id); the endpoint configuration ultimately determines which backend model serves the request. |
| inputrequired | string | array | Input content. Can be a plain text string, an array of messages ({role, content}), or an array of input items ({type, role, content, call_id, output}). |
| instructionsoptional | string | System/developer instructions. Prepended as a system message if not already present from the response chain. |
| streamoptional | boolean | If true, the response is streamed as Server-Sent Events. Default: false |
| storeoptional | boolean | Whether to persist the response for later retrieval. Default: true |
| backgroundoptional | boolean | If true, the request returns immediately with a queued status. Poll the response ID for completion. Default: false |
| previous_response_idoptional | string | ID of a previous response to chain onto. The previous response's context is automatically prepended. See Conversation Chaining. |
| conversationoptional | object | Link this response to a server-side conversation. Pass {"id": "conv_xxx"} to prepend the conversation's existing items as context and append the response output as new items. See Conversations. |
| max_output_tokensoptional | integer | Maximum number of output tokens to generate. |
| temperatureoptional | number | Sampling temperature (0.0-2.0). Higher values produce more random output. |
| top_poptional | number | Nucleus sampling parameter (0.0-1.0). |
| toolsoptional | array | Tool definitions the model may call. See Tool Calling. |
| tool_choiceoptional | string | object | Controls tool selection: "auto", "none", "required", or {"type":"function","function":{"name":"fn_name"}} |
| parallel_tool_callsoptional | boolean | Allow multiple tool calls in a single response. Default: true |
| textoptional | object | Text format configuration. Supports {"format":{"type":"text"}}, {"format":{"type":"json_object"}}, or {"format":{"type":"json_schema","json_schema":{...}}} |
| reasoningoptional | object | Reasoning configuration for reasoning models. {"effort":"low|medium|high"} |
| metadataoptional | object | Up to 16 key-value pairs. Keys max 64 characters, values max 512 characters. |
| useroptional | string | End-user identifier for abuse monitoring and usage tracking. |
| truncationoptional | string | Truncation strategy when input exceeds the model context window. "auto" (default) drops middle items to fit; "disabled" returns an error if input exceeds the context. |
| service_tieroptional | string | Requested service tier. Accepted for API compatibility; the effective tier is resolved from the calling project's tier first, then the endpoint's configured tier. |
| includeoptional | array | Filter which fields appear in the response. Supported values: file_search_call.results (include full document chunk text in file_search results). |
Input Formats
The input field accepts three formats:
"input": "What is the capital of France?"
"input": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "What is 2+2?"}
]
"input": [
{"type": "message", "role": "user", "content": "Call get_weather for Paris"},
{"type": "function_call_output", "call_id": "call_abc", "output": "{\"temp\":18}"}
]
List Responses
GET /v1/responses
Returns a paginated list of responses for the project, ordered by creation time (newest first).
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| afteroptional | string | Cursor for forward pagination. Pass the id of the last response from the previous page. |
| limitoptional | integer | Number of responses to return. Default: 20. Maximum: 100. |
Response
{
"object": "list",
"data": [
{
"id": "resp_abc123",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"created_at": 1706123456,
"completed_at": 1706123458,
"input_tokens": 12,
"output_tokens": 8,
"store": true,
"metadata": null
}
],
"first_id": "resp_abc123",
"last_id": "resp_abc123",
"has_more": false
}
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=20" \
-H "Authorization: Bearer xero_myproject_your_api_key"
Get Response
GET /v1/responses/{response_id}
Retrieves the full response object for a stored response, including output content.
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response
{
"id": "resp_abc123",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"created_at": 1706123456,
"completed_at": 1706123458,
"input_tokens": 12,
"output_tokens": 24,
"output": [
{
"type": "message",
"role": "assistant",
"content": [
{"type": "output_text", "text": "Paris is the capital of France."}
]
}
],
"store": true,
"metadata": null
}
See Response Object for the full field list. Returns 404 if the response does not exist or belongs to a different project.
Delete Response
DELETE /v1/responses/{response_id}
Permanently deletes a stored response and its associated content from both hot and cold storage. This action cannot be undone.
curl -X DELETE \
"https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response
{
"id": "resp_abc123",
"object": "response.deleted",
"deleted": true
}
Cancel Response
POST /v1/responses/{response_id}/cancel
Cancels an in-progress response. Only responses with status in_progress
or queued can be cancelled. Completed, failed, or already-cancelled
responses return a 400 error.
curl -X POST \
"https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel" \
-H "Authorization: Bearer xero_myproject_your_api_key"
Returns the updated response object with status: "cancelled".
List Input Items
GET /v1/responses/{response_id}/input_items
Returns the input items that were submitted with the response request.
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| afteroptional | string | Cursor for forward pagination. |
| limitoptional | integer | Number of items to return. Default: 20. Maximum: 100. |
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items" \
-H "Authorization: Bearer xero_myproject_your_api_key"
Response Object
A completed response contains the model's output, usage information, and metadata.
{
"id": "resp_abc123def456ghi789jkl012",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"previous_response_id": null,
"output": [
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 8,
"total_tokens": 20,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": 0
}
},
"created_at": 1706123456,
"completed_at": 1706123457,
"service_tier": "default",
"store": true,
"metadata": null
}
Status Values
| Status | Description |
|---|---|
queued | Request received, waiting for processing. |
in_progress | Inference is actively running. |
completed | Response generated successfully. |
failed | An error occurred during generation. |
cancelled | Cancelled by the user before completion. |
incomplete | Generation stopped early (max tokens, content filter, etc.). |
Output Item Types
| Type | Description |
|---|---|
message |
Assistant text message. Contains role, content[] (array of content parts), and status. |
function_call |
Tool/function call. Contains call_id, name, arguments (JSON string), and status. |
web_search_call |
Web search tool execution. Contains id, status (in_progress, searching, completed). |
file_search_call |
Document search tool execution. Contains id, status, and optionally results (when include contains file_search_call.results). |
reasoning |
Reasoning summary from models that produce think-tag content. Contains id and summary text. |
Echo Fields
The response object includes echo fields that mirror request parameters, making it easy to see the exact configuration used.
| Field | Type | Description |
|---|---|---|
| temperature | number | null | The temperature value used for this response. |
| top_p | number | null | The top_p value used for this response. |
| max_output_tokens | integer | null | The maximum output tokens configured for this response. |
| tools | array | null | The tools that were available for this response. |
| tool_choice | string | object | null | The tool choice setting used for this response. |
| text | object | null | The text format configuration used for this response. |
| reasoning | object | null | The reasoning configuration used for this response. |
| truncation | string | null | The truncation strategy used for this response. |
| instructions | string | null | The system instructions used for this response. |
| parallel_tool_calls | boolean | null | Whether parallel tool calls were enabled. |
Streaming
Set "stream": true to receive the response as Server-Sent Events.
The Responses API uses named event types for structured streaming.
Lifecycle Events
| Event | Description |
|---|---|
response.created | Response record created (status: queued). Contains the initial response object. |
response.in_progress | Inference started (status: in_progress). |
response.completed | Response completed normally. Contains the final response object with full usage data. |
response.failed | An error occurred during generation. Contains the response object with error details. Mid-stream errors are surfaced via this event (the Responses stream does not emit a separate event: error named line). |
Output Item Events
| Event | Description |
|---|---|
response.output_item.added | New output item (message, function_call, or reasoning) started. Contains output_index and initial item object. |
response.output_item.done | Output item finished. Contains the completed item object. |
response.content_part.added | New content part added to an output item. Contains output_index, content_index, and initial part object. |
response.content_part.done | Content part finished. Contains the completed part object. |
Text Delta Events
| Event | Description |
|---|---|
response.output_text.delta | Incremental text content. Contains item_id, output_index, content_index, and delta string. |
response.output_text.done | Text content part complete. Contains item_id, output_index, content_index, and full accumulated text. |
Tool Call Events
| Event | Description |
|---|---|
response.function_call_arguments.done | Function call arguments complete. Contains output_index, item_id, and full arguments JSON string. (Arguments are emitted as a single terminal event; no incremental .delta stream is produced.) |
Reasoning Summary Events
Emitted by models that produce think-tag reasoning content (e.g. Qwen3, DeepSeek-R1).
| Event | Description |
|---|---|
response.reasoning_summary_part.added | Reasoning summary part started within a reasoning output item. |
response.reasoning_summary_text.delta | Incremental reasoning summary text. Contains item_id, output_index, summary_index, and delta string. |
response.reasoning_summary_text.done | Reasoning summary text complete. Contains the full accumulated text. |
response.reasoning_summary_part.done | Reasoning summary part finished. |
Streaming Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
stream = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?",
stream=True
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const stream = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?",
stream: true
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
curl --no-buffer -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?",
"stream": true
}'
event: response.created
data: {"id":"resp_abc123","object":"response","status":"queued","model":"llama-3.1-8b"}
event: response.in_progress
data: {"id":"resp_abc123","status":"in_progress"}
event: response.output_item.added
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}
event: response.content_part.added
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}
event: response.output_text.delta
data: {"item_id":"msg_001","output_index":0,"content_index":0,"delta":"The capital of France is Paris."}
event: response.content_part.done
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":"The capital of France is Paris."}}
event: response.output_item.done
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"The capital of France is Paris."}],"status":"completed"}}
event: response.completed
data: {"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":8,"total_tokens":20}}
event: done
data: [DONE]
Vendor Events
In addition to standard response.* events, the platform emits
vendor-prefixed events (x_*) for Xerotier-specific features such
as research mode, deep think, and user interaction. Vendor events ride on the
Responses stream in two wire shapes:
- The dominant shape is a bare
data: {"type":"x_*",...}line with no precedingevent:name. Standard OpenAI SDK clients ignore unrecognized data lines, so this shape is wire-compatible. - The
x_artifact.*family on the Responses path uses a named SSE event:event: x_artifact.created\ndata: {json}\n\nwith camelCase payload keys. Strict OpenAI SDK clients will surface these as unknown event types.
Some vendor families listed below are emitted only by the Xerotier dashboard
surface and are not produced on the public Responses stream;
those rows are marked dashboard-only. Public SDK consumers should
not depend on dashboard-only events arriving on /v1/responses.
Compatibility. Standard OpenAI SDK clients ignore unrecognized data lines. Vendor events are only relevant when building a custom stream consumer that wants to render research progress, deep think status, or artifact notifications.
Research Events (x_research.*)
Emitted during agentic research loops (research mode). Each event is a
data: line containing a JSON object with a type
field, name (tool name), arguments (JSON string),
and optional metadata.
| Event type | Description |
|---|---|
x_research.searching | Web search tool invoked. arguments contains the search query JSON. |
x_research.reading | URL fetch tool invoked. arguments contains {"url":"..."}. |
x_research.code_searching | Code search tool invoked (GitLab or local index). |
x_research.calculating | Calculator tool invoked. |
x_research.result | Tool returned a result. metadata contains a brief summary of the result. |
x_research.complete (dashboard-only) | Research loop finished. Emitted only by the Xerotier dashboard surface; the public Responses stream does not produce this event. Dashboard payload contains elapsed_ms, input_tokens, output_tokens, iterations, and sources counts. |
data: {"type":"x_research.searching","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}"}
data: {"type":"x_research.reading","name":"x_fetch_url","arguments":"{\"url\":\"https://example.com/\"}"}
data: {"type":"x_research.result","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}","metadata":{"summary":"Paris is currently 18C and sunny."}}
data: {"type":"x_research.complete","elapsed_ms":4200,"input_tokens":3200,"output_tokens":480,"iterations":3,"sources":5}
Deep Think Events (x_deep_think.*)
Emitted during deep think (multi-step research with planning and synthesis). Clients can use these to render a sub-task progress panel.
| Event type | Description |
|---|---|
x_deep_think.plan_created | Planning phase complete. Contains title and total_subtasks. |
x_deep_think.discovery_started | Target-focused discovery phase begun. Contains message. |
x_deep_think.discovery_completed | Discovery phase complete. Contains message. |
x_deep_think.subtask_started | A sub-task has begun. Contains subtask_id, subtask_index, subtask_query, and total_subtasks. |
x_deep_think.subtask_completed | A sub-task has finished. Contains subtask_index, input_tokens, and output_tokens. |
x_deep_think.subtask_artifact_saved (dashboard-only) | A sub-task research artifact was persisted. Dashboard payload contains artifact_id, artifact_name, and subtask_index. |
x_deep_think.artifact_created (declared, not currently emitted) | Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event. |
x_deep_think.artifact_saved (dashboard-only) | The deep think synthesis artifact was persisted. Dashboard payload contains artifact_name and artifact_id. |
x_deep_think.synthesizing | Synthesis phase begun. |
x_deep_think.completed (declared, not currently emitted) | Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event. |
x_deep_think.memories_created (declared, not currently emitted) | Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event. |
x_deep_think.error (declared, not currently emitted) | Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event. |
Ask User Events (x_ask_user.*)
Emitted when the model needs clarification before it can continue. The stream pauses and the application should prompt the user, then resume with the answer.
| Event type | Description |
|---|---|
x_ask_user.question | The model requires user input before continuing. Contains ask_user_id (correlation ID) and question text. Submit the answer via the chat answer endpoint. |
x_ask_user.pending_state | Captures the assistant content and tool calls accumulated before the pause, enabling conversation resumption after the user answers. |
Artifact Events (x_artifact.*)
Emitted when code artifacts (code blocks, documents) are created or updated
during a generation. On the Responses path these are written as named
SSE events (event: x_artifact.created\ndata: {json}\n\n)
with camelCase payload keys, this differs from the
chat-completions surface, which emits the same family as bare
data: lines with snake_case keys.
| Event type | Description |
|---|---|
x_artifact.created | New artifact created. Responses-path payload contains artifactId, identifier, title, language, contentType, and contentBase64 (base64-encoded artifact bytes). |
x_artifact.updated | Existing artifact updated. Same payload shape as x_artifact.created with the new version of the content. |
Context Fork Event (x_context_fork) (dashboard-only)
Emitted by the Xerotier dashboard surface when the user's message triggered
creation of a new conversation branch. The public Responses stream does not
emit this event. Unlike other vendor events the name has no
.<suffix> segment.
Chat Metadata Event (x_chat.metadata) (dashboard-only)
Emitted by the Xerotier dashboard surface as the final data event before
[DONE]. The public Responses stream (/v1/responses)
does not emit this event; public SDK consumers should not wait for it.
Payload uses camelCase keys (dashboard convention) rather than the snake_case
used by the rest of the vendor surface.
| Field | Description |
|---|---|
type | "x_chat.metadata" |
messageId | Server-assigned external ID for the persisted assistant message. |
userMessageId | External ID for the persisted user message. |
sequence | Monotonically increasing sequence number for the assistant message within the conversation. |
context | Context budget breakdown: systemTokens, summaryTokens, retrievedTokens, recentTokens, fileTokens, currentMessageTokens, totalTokens, inputBudget, retrievedCount, recentCount, usedSemanticRetrieval, semanticRetrievalActive, chunkSelectionMethod. |
usage | Combined token usage including model inference plus any research or deep think overhead: input_tokens, output_tokens, total_tokens. |
Analyst Events (x_analyst.*)
Emitted when the analyst mode builds or refreshes the workspace context brief before generating a response.
| Event type | Description |
|---|---|
x_analyst.context_gathering | Workspace context gathering has begun. |
x_analyst.context_completed | Context gathering finished. Contains counts of gathered items. |
x_analyst.context_brief_created | The LLM-generated context brief is ready. The brief summarizes the workspace for the response. |
x_analyst.context_refreshed | A previously cached context brief was refreshed due to workspace changes. |
Conversation Chaining
Use previous_response_id to build multi-turn conversations without
resending the full message history. The server automatically retrieves the
previous response's context and prepends it to your new input.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
# Returns: {"id": "resp_abc123", ...}
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What about Germany?",
"previous_response_id": "resp_abc123"
}'
Chain Requirements
- The previous response must exist and belong to the same project.
- The previous response must be in a terminal state (completed, failed, cancelled, or incomplete).
- Maximum chain depth is 50 responses.
- Circular references are detected and rejected.
Tool Calling
Define function tools in the tools parameter. The model may generate
function_call output items that your application executes.
{
"model": "llama-3.1-8b",
"input": "What is the weather in Paris?",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
}
{
"id": "resp_tool123",
"status": "completed",
"output": [
{
"type": "function_call",
"id": "fc_001",
"call_id": "call_abc123",
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}",
"status": "completed"
}
]
}
{
"model": "llama-3.1-8b",
"previous_response_id": "resp_tool123",
"input": [
{
"type": "function_call_output",
"call_id": "call_abc123",
"output": "{\"temperature\": 18, \"condition\": \"sunny\"}"
}
]
}
Storage & Retention
Responses are stored by default ("store": true) using the
platform's standard two-tier storage architecture. Content is encrypted at
rest and retained based on the endpoint's service tier. For details on
storage tiers, encryption, retention, and billing, see
Storage.
Set "store": false to skip storage entirely. The response will
still be returned but will not be retrievable by ID afterward.
Error Handling
Common Error Codes
| HTTP Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request |
Missing or invalid parameters. |
| 400 | invalid_state |
Previous response is not in a terminal state, or response cannot be cancelled. |
| 400 | chain_depth_exceeded |
Response chain exceeds the maximum depth of 50. |
| 401 | authentication_error |
Invalid or missing API key. |
| 404 | not_found |
Response or previous response not found. |
| 429 | rate_limit_exceeded |
Too many requests. Check Retry-After header. |
| 503 | capacity_exceeded |
No available workers. Check Retry-After header. |
{
"error": {
"message": "Previous response is not complete: resp_abc123",
"type": "invalid_request_error",
"code": "invalid_state"
}
}
Client Integrations
opencode
OpenCode
supports the Responses API via the @ai-sdk/openai-compatible
adapter. Configure it in ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"xerotier": {
"npm": "@ai-sdk/openai-compatible",
"name": "xerotier",
"options": {
"baseURL": "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
"headers": {
"Authorization": "Bearer xero_myproject_your_api_key"
}
},
"models": {
"my-model": {
"name": "llama-3.1-8b",
"reasoning": true,
"tool_call": true,
"tools": true
}
}
}
},
"model": "xerotier/my-model"
}
See OpenCode Integration for full configuration details and troubleshooting.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Non-streaming
response = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
print(response.output[0].content[0].text)
# Streaming
stream = client.responses.create(
model="llama-3.1-8b",
input="Explain quantum computing in simple terms.",
stream=True
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
# Conversation chaining
first = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
second = client.responses.create(
model="llama-3.1-8b",
input="What about Germany?",
previous_response_id=first.id
)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
// Non-streaming
const response = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);
// Streaming
const stream = await client.responses.create({
model: "llama-3.1-8b",
input: "Explain quantum computing in simple terms.",
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
curl (Non-Streaming)
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
curl (Streaming)
curl --no-buffer -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?",
"stream": true
}'
List and Retrieve
# List responses
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=10 \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Get a specific response
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Get input items
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Cancel an in-progress response
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Delete a response
curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
-H "Authorization: Bearer xero_myproject_your_api_key"