Responses API
Create model responses with built-in conversation state, tool orchestration, and persistent storage. Compatible with the OpenAI Responses API.
Overview
The Responses API is a higher-level interface for generating model outputs. Unlike the Chat Completions API where you manage conversation history yourself, the Responses API handles state automatically through response chaining.
When to Use Responses API
- Multi-turn conversations -- Chain responses together with
previous_response_idinstead of resending the full message history. - Persistent storage -- Responses are stored and retrievable by ID for later reference.
- Background processing -- Queue long-running requests and poll for completion.
- Client SDK support -- OpenAI Python/Node.js SDKs natively support the Responses API.
When to Use Chat Completions
- You need full control over the message history.
- You are using a client or tool that only supports the Chat Completions API.
- You do not need server-side storage of responses.
Translation layer. Internally, every Responses API request is converted to a Chat Completion, routed through the same inference pipeline, and converted back to the Response format. Model behavior is identical.
Quick Start
Create a response with a simple text input:
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
print(response.output[0].content[0].text)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?"
});
console.log(response.output[0].content[0].text);
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
Response
{
"id": "resp_abc123def456ghi789jkl012",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"output": [
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris."
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 8,
"total_tokens": 20
},
"created_at": 1706123456,
"service_tier": "default",
"store": true,
"metadata": null
}
Authentication
All Responses API endpoints require a valid API key with the
inference scope. Pass it in the Authorization header:
Authorization: Bearer xero_myproject_your_api_key
See Authentication & Security for details on creating and managing API keys.
Endpoints
| Method | Path | Description |
|---|---|---|
| POST | /v1/responses | Create a response |
| GET | /v1/responses | List responses |
| GET | /v1/responses/{response_id} | Get a response by ID |
| DELETE | /v1/responses/{response_id} | Delete a response |
| POST | /v1/responses/{response_id}/cancel | Cancel an in-progress response |
| GET | /v1/responses/{response_id}/input_items | List input items for a response |
All paths are relative to your endpoint base URL:
https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG
Create Response
POST /v1/responses
Request Body
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Informational only -- the endpoint configuration determines the actual model used. |
| inputrequired | string | array | Input content. Can be a plain text string, an array of messages ({role, content}), or an array of input items ({type, role, content, call_id, output}). |
| instructionsoptional | string | System/developer instructions. Prepended as a system message if not already present from the response chain. |
| streamoptional | boolean | If true, the response is streamed as Server-Sent Events. Default: false |
| storeoptional | boolean | Whether to persist the response for later retrieval. Default: true |
| backgroundoptional | boolean | If true, the request returns immediately with a queued status. Poll the response ID for completion. Default: false |
| previous_response_idoptional | string | ID of a previous response to chain onto. The previous response's context is automatically prepended. See Conversation Chaining. |
| max_output_tokensoptional | integer | Maximum number of output tokens to generate. |
| temperatureoptional | number | Sampling temperature (0.0-2.0). Higher values produce more random output. |
| top_poptional | number | Nucleus sampling parameter (0.0-1.0). |
| toolsoptional | array | Tool definitions the model may call. See Tool Calling. |
| tool_choiceoptional | string | object | Controls tool selection: "auto", "none", "required", or {"type":"function","function":{"name":"fn_name"}} |
| parallel_tool_callsoptional | boolean | Allow multiple tool calls in a single response. Default: true |
| textoptional | object | Text format configuration. Supports {"format":{"type":"text"}}, {"format":{"type":"json_object"}}, or {"format":{"type":"json_schema","json_schema":{...}}} |
| reasoningoptional | object | Reasoning configuration for reasoning models. {"effort":"low|medium|high"} |
| metadataoptional | object | Up to 16 key-value pairs. Keys max 64 characters, values max 512 characters. |
| useroptional | string | End-user identifier for abuse monitoring and usage tracking. |
| truncationoptional | string | Truncation strategy when input exceeds the model context window. "auto" (default) drops middle items to fit; "disabled" returns an error if input exceeds the context. |
| service_tieroptional | string | Requested service tier. Accepted for API compatibility but the endpoint's configured tier is always used for routing. |
| includeoptional | array | Filter which fields appear in the response. Supported values: file_search_call.results (include full document chunk text in file_search results). |
Input Formats
The input field accepts three formats:
"input": "What is the capital of France?"
"input": [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi there!"},
{"role": "user", "content": "What is 2+2?"}
]
"input": [
{"type": "message", "role": "user", "content": "Call get_weather for Paris"},
{"type": "function_call_output", "call_id": "call_abc", "output": "{\"temp\":18}"}
]
Response Object
A completed response contains the model's output, usage information, and metadata.
{
"id": "resp_abc123def456ghi789jkl012",
"object": "response",
"model": "llama-3.1-8b",
"status": "completed",
"previous_response_id": null,
"output": [
{
"type": "message",
"id": "msg_001",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "The capital of France is Paris.",
"annotations": []
}
],
"status": "completed"
}
],
"usage": {
"input_tokens": 12,
"output_tokens": 8,
"total_tokens": 20,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens_details": {
"reasoning_tokens": null
}
},
"created_at": 1706123456,
"completed_at": 1706123457,
"service_tier": "default",
"store": true,
"metadata": null
}
Status Values
| Status | Description |
|---|---|
queued | Request received, waiting for processing. |
in_progress | Inference is actively running. |
completed | Response generated successfully. |
failed | An error occurred during generation. |
cancelled | Cancelled by the user before completion. |
incomplete | Generation stopped early (max tokens, content filter, etc.). |
Output Item Types
| Type | Description |
|---|---|
message |
Assistant text message. Contains role, content[] (array of content parts), and status. |
function_call |
Tool/function call. Contains call_id, name, arguments (JSON string), and status. |
web_search_call |
Web search tool execution. Contains id, status (in_progress, searching, completed). |
file_search_call |
Document search tool execution. Contains id, status, and optionally results (when include contains file_search_call.results). |
reasoning |
Reasoning summary from models that produce think-tag content. Contains id and summary text. |
Echo Fields
The response object includes echo fields that mirror request parameters, making it easy to see the exact configuration used.
| Field | Type | Description |
|---|---|---|
| temperature | number | null | The temperature value used for this response. |
| top_p | number | null | The top_p value used for this response. |
| max_output_tokens | integer | null | The maximum output tokens configured for this response. |
| tools | array | null | The tools that were available for this response. |
| tool_choice | string | object | null | The tool choice setting used for this response. |
| text | object | null | The text format configuration used for this response. |
| reasoning | object | null | The reasoning configuration used for this response. |
| truncation | string | null | The truncation strategy used for this response. |
| instructions | string | null | The system instructions used for this response. |
| parallel_tool_calls | boolean | null | Whether parallel tool calls were enabled. |
Streaming
Set "stream": true to receive the response as Server-Sent Events.
The Responses API uses named event types for structured streaming.
SSE Event Types
| Event | Description |
|---|---|
response.created | Response record created (status: queued). |
response.in_progress | Inference started (status: in_progress). |
response.output_item.added | New output item (message or function_call) started. |
response.content_part.added | New content part added to an output item. |
response.output_text.delta | Incremental text content. |
response.content_part.done | Content part finished. |
response.output_item.done | Output item finished. |
response.function_call_arguments.done | Function call arguments finished streaming. |
response.web_search_call.in_progress | Web search tool invoked, search starting. |
response.web_search_call.searching | Query submitted to search engine. |
response.web_search_call.completed | Web search results returned. |
response.file_search_call.in_progress | File search tool invoked. |
response.file_search_call.searching | Document search query running. |
response.file_search_call.completed | Document search results returned. |
response.reasoning_summary_part.added | Reasoning summary output started. |
response.reasoning_summary_text.delta | Incremental reasoning summary text. |
response.reasoning_summary_text.done | Reasoning summary text complete. |
response.reasoning_summary_part.done | Reasoning summary part finished. |
error | Streaming error occurred. Contains error object with type, message, code. |
response.completed | Response completed with full usage data. |
Streaming Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
stream = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?",
stream=True
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const stream = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?",
stream: true
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
curl --no-buffer -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?",
"stream": true
}'
event: response.created
data: {"id":"resp_abc123","object":"response","status":"queued","model":"llama-3.1-8b"}
event: response.in_progress
data: {"id":"resp_abc123","status":"in_progress"}
event: response.output_item.added
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}
event: response.content_part.added
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}
event: response.output_text.delta
data: {"output_index":0,"content_index":0,"delta":"The capital of France is Paris."}
event: response.content_part.done
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":"The capital of France is Paris."}}
event: response.output_item.done
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"The capital of France is Paris."}],"status":"completed"}}
event: response.completed
data: {"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":8,"total_tokens":20}}
data: [DONE]
Conversation Chaining
Use previous_response_id to build multi-turn conversations without
resending the full message history. The server automatically retrieves the
previous response's context and prepends it to your new input.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
# Returns: {"id": "resp_abc123", ...}
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What about Germany?",
"previous_response_id": "resp_abc123"
}'
Chain Requirements
- The previous response must exist and belong to the same project.
- The previous response must be in a terminal state (completed, failed, cancelled, or incomplete).
- Maximum chain depth is 50 responses.
- Circular references are detected and rejected.
Tool Calling
Define function tools in the tools parameter. The model may generate
function_call output items that your application executes.
{
"model": "llama-3.1-8b",
"input": "What is the weather in Paris?",
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
}
}
]
}
{
"id": "resp_tool123",
"status": "completed",
"output": [
{
"type": "function_call",
"id": "fc_001",
"call_id": "call_abc123",
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}",
"status": "completed"
}
]
}
{
"model": "llama-3.1-8b",
"previous_response_id": "resp_tool123",
"input": [
{
"type": "function_call_output",
"call_id": "call_abc123",
"output": "{\"temperature\": 18, \"condition\": \"sunny\"}"
}
]
}
Storage & Retention
Responses are stored by default ("store": true) using the
platform's standard two-tier storage architecture. Content is encrypted at
rest and retained for 90 days. For details on storage tiers, encryption,
retention, and billing, see Storage.
Set "store": false to skip storage entirely. The response will
still be returned but will not be retrievable by ID afterward.
Error Handling
Common Error Codes
| HTTP Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request |
Missing or invalid parameters. |
| 400 | invalid_state |
Previous response is not in a terminal state, or response cannot be cancelled. |
| 400 | chain_depth_exceeded |
Response chain exceeds the maximum depth of 50. |
| 401 | authentication_error |
Invalid or missing API key. |
| 404 | not_found |
Response or previous response not found. |
| 429 | rate_limit_exceeded |
Too many requests. Check Retry-After header. |
| 503 | capacity_exceeded |
No available workers. Check Retry-After header. |
{
"error": {
"message": "Previous response is not complete: resp_abc123",
"type": "invalid_request_error",
"code": "invalid_state"
}
}
Client Integrations
opencode
OpenCode
supports the Responses API via the @ai-sdk/openai-compatible
adapter. Configure it in ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"xerotier": {
"npm": "@ai-sdk/openai-compatible",
"name": "xerotier",
"options": {
"baseURL": "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
"headers": {
"Authorization": "Bearer xero_myproject_your_api_key"
}
},
"models": {
"my-model": {
"name": "llama-3.1-8b",
"reasoning": true,
"tool_call": true,
"tools": true
}
}
}
},
"model": "xerotier/my-model"
}
See OpenCode Integration for full configuration details and troubleshooting.
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Non-streaming
response = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
print(response.output[0].content[0].text)
# Streaming
stream = client.responses.create(
model="llama-3.1-8b",
input="Explain quantum computing in simple terms.",
stream=True
)
for event in stream:
if event.type == "response.output_text.delta":
print(event.delta, end="", flush=True)
# Conversation chaining
first = client.responses.create(
model="llama-3.1-8b",
input="What is the capital of France?"
)
second = client.responses.create(
model="llama-3.1-8b",
input="What about Germany?",
previous_response_id=first.id
)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
// Non-streaming
const response = await client.responses.create({
model: "llama-3.1-8b",
input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);
// Streaming
const stream = await client.responses.create({
model: "llama-3.1-8b",
input: "Explain quantum computing in simple terms.",
stream: true,
});
for await (const event of stream) {
if (event.type === "response.output_text.delta") {
process.stdout.write(event.delta);
}
}
curl (Non-Streaming)
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?"
}'
curl (Streaming)
curl --no-buffer -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"input": "What is the capital of France?",
"stream": true
}'
List and Retrieve
# List responses
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=10 \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Get a specific response
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Get input items
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Cancel an in-progress response
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel \
-H "Authorization: Bearer xero_myproject_your_api_key"
# Delete a response
curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
-H "Authorization: Bearer xero_myproject_your_api_key"