Responses API

Create model responses with built-in conversation state, tool orchestration, and persistent storage. Compatible with the OpenAI Responses API.

Overview

The Responses API is a higher-level interface for generating model outputs. Unlike the Chat Completions API where you manage conversation history yourself, the Responses API handles state automatically through response chaining.

When to Use Responses API

  • Multi-turn conversations -- Chain responses together with previous_response_id instead of resending the full message history.
  • Persistent storage -- Responses are stored and retrievable by ID for later reference.
  • Background processing -- Queue long-running requests and poll for completion.
  • Client SDK support -- OpenAI Python/Node.js SDKs natively support the Responses API.

When to Use Chat Completions

  • You need full control over the message history.
  • You are using a client or tool that only supports the Chat Completions API.
  • You do not need server-side storage of responses.

Translation layer. Internally, every Responses API request is converted to a Chat Completion, routed through the same inference pipeline, and converted back to the Response format. Model behavior is identical.

Quick Start

Create a response with a simple text input:

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) print(response.output[0].content[0].text)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?" }); console.log(response.output[0].content[0].text);
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }'

Response

{ "id": "resp_abc123def456ghi789jkl012", "object": "response", "model": "llama-3.1-8b", "status": "completed", "output": [ { "type": "message", "id": "msg_001", "role": "assistant", "content": [ { "type": "output_text", "text": "The capital of France is Paris." } ], "status": "completed" } ], "usage": { "input_tokens": 12, "output_tokens": 8, "total_tokens": 20 }, "created_at": 1706123456, "service_tier": "default", "store": true, "metadata": null }

Authentication

All Responses API endpoints require a valid API key with the inference scope. Pass it in the Authorization header:

HTTP Header
Authorization: Bearer xero_myproject_your_api_key

See Authentication & Security for details on creating and managing API keys.

Endpoints

Method Path Description
POST /v1/responses Create a response
GET /v1/responses List responses
GET /v1/responses/{response_id} Get a response by ID
DELETE /v1/responses/{response_id} Delete a response
POST /v1/responses/{response_id}/cancel Cancel an in-progress response
GET /v1/responses/{response_id}/input_items List input items for a response

All paths are relative to your endpoint base URL: https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG

Create Response

POST /v1/responses

Request Body

Parameter Type Description
modelrequired string Model identifier. Informational only -- the endpoint configuration determines the actual model used.
inputrequired string | array Input content. Can be a plain text string, an array of messages ({role, content}), or an array of input items ({type, role, content, call_id, output}).
instructionsoptional string System/developer instructions. Prepended as a system message if not already present from the response chain.
streamoptional boolean If true, the response is streamed as Server-Sent Events. Default: false
storeoptional boolean Whether to persist the response for later retrieval. Default: true
backgroundoptional boolean If true, the request returns immediately with a queued status. Poll the response ID for completion. Default: false
previous_response_idoptional string ID of a previous response to chain onto. The previous response's context is automatically prepended. See Conversation Chaining.
max_output_tokensoptional integer Maximum number of output tokens to generate.
temperatureoptional number Sampling temperature (0.0-2.0). Higher values produce more random output.
top_poptional number Nucleus sampling parameter (0.0-1.0).
toolsoptional array Tool definitions the model may call. See Tool Calling.
tool_choiceoptional string | object Controls tool selection: "auto", "none", "required", or {"type":"function","function":{"name":"fn_name"}}
parallel_tool_callsoptional boolean Allow multiple tool calls in a single response. Default: true
textoptional object Text format configuration. Supports {"format":{"type":"text"}}, {"format":{"type":"json_object"}}, or {"format":{"type":"json_schema","json_schema":{...}}}
reasoningoptional object Reasoning configuration for reasoning models. {"effort":"low|medium|high"}
metadataoptional object Up to 16 key-value pairs. Keys max 64 characters, values max 512 characters.
useroptional string End-user identifier for abuse monitoring and usage tracking.
truncationoptional string Truncation strategy when input exceeds the model context window. "auto" (default) drops middle items to fit; "disabled" returns an error if input exceeds the context.
service_tieroptional string Requested service tier. Accepted for API compatibility but the endpoint's configured tier is always used for routing.
includeoptional array Filter which fields appear in the response. Supported values: file_search_call.results (include full document chunk text in file_search results).

Input Formats

The input field accepts three formats:

Plain text (simplest)
"input": "What is the capital of France?"
Message array
"input": [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}, {"role": "user", "content": "What is 2+2?"} ]
Input items (supports function outputs)
"input": [ {"type": "message", "role": "user", "content": "Call get_weather for Paris"}, {"type": "function_call_output", "call_id": "call_abc", "output": "{\"temp\":18}"} ]

Response Object

A completed response contains the model's output, usage information, and metadata.

JSON
{ "id": "resp_abc123def456ghi789jkl012", "object": "response", "model": "llama-3.1-8b", "status": "completed", "previous_response_id": null, "output": [ { "type": "message", "id": "msg_001", "role": "assistant", "content": [ { "type": "output_text", "text": "The capital of France is Paris.", "annotations": [] } ], "status": "completed" } ], "usage": { "input_tokens": 12, "output_tokens": 8, "total_tokens": 20, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens_details": { "reasoning_tokens": null } }, "created_at": 1706123456, "completed_at": 1706123457, "service_tier": "default", "store": true, "metadata": null }

Status Values

Status Description
queuedRequest received, waiting for processing.
in_progressInference is actively running.
completedResponse generated successfully.
failedAn error occurred during generation.
cancelledCancelled by the user before completion.
incompleteGeneration stopped early (max tokens, content filter, etc.).

Output Item Types

Type Description
message Assistant text message. Contains role, content[] (array of content parts), and status.
function_call Tool/function call. Contains call_id, name, arguments (JSON string), and status.
web_search_call Web search tool execution. Contains id, status (in_progress, searching, completed).
file_search_call Document search tool execution. Contains id, status, and optionally results (when include contains file_search_call.results).
reasoning Reasoning summary from models that produce think-tag content. Contains id and summary text.

Echo Fields

The response object includes echo fields that mirror request parameters, making it easy to see the exact configuration used.

Field Type Description
temperature number | null The temperature value used for this response.
top_p number | null The top_p value used for this response.
max_output_tokens integer | null The maximum output tokens configured for this response.
tools array | null The tools that were available for this response.
tool_choice string | object | null The tool choice setting used for this response.
text object | null The text format configuration used for this response.
reasoning object | null The reasoning configuration used for this response.
truncation string | null The truncation strategy used for this response.
instructions string | null The system instructions used for this response.
parallel_tool_calls boolean | null Whether parallel tool calls were enabled.

Streaming

Set "stream": true to receive the response as Server-Sent Events. The Responses API uses named event types for structured streaming.

SSE Event Types

Event Description
response.createdResponse record created (status: queued).
response.in_progressInference started (status: in_progress).
response.output_item.addedNew output item (message or function_call) started.
response.content_part.addedNew content part added to an output item.
response.output_text.deltaIncremental text content.
response.content_part.doneContent part finished.
response.output_item.doneOutput item finished.
response.function_call_arguments.doneFunction call arguments finished streaming.
response.web_search_call.in_progressWeb search tool invoked, search starting.
response.web_search_call.searchingQuery submitted to search engine.
response.web_search_call.completedWeb search results returned.
response.file_search_call.in_progressFile search tool invoked.
response.file_search_call.searchingDocument search query running.
response.file_search_call.completedDocument search results returned.
response.reasoning_summary_part.addedReasoning summary output started.
response.reasoning_summary_text.deltaIncremental reasoning summary text.
response.reasoning_summary_text.doneReasoning summary text complete.
response.reasoning_summary_part.doneReasoning summary part finished.
errorStreaming error occurred. Contains error object with type, message, code.
response.completedResponse completed with full usage data.

Streaming Example

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) stream = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const stream = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?", stream: true }); for await (const event of stream) { if (event.type === "response.output_text.delta") { process.stdout.write(event.delta); } }
curl
curl --no-buffer -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?", "stream": true }'
SSE Output
event: response.created data: {"id":"resp_abc123","object":"response","status":"queued","model":"llama-3.1-8b"} event: response.in_progress data: {"id":"resp_abc123","status":"in_progress"} event: response.output_item.added data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}} event: response.content_part.added data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":""}} event: response.output_text.delta data: {"output_index":0,"content_index":0,"delta":"The capital of France is Paris."} event: response.content_part.done data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":"The capital of France is Paris."}} event: response.output_item.done data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"The capital of France is Paris."}],"status":"completed"}} event: response.completed data: {"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":8,"total_tokens":20}} data: [DONE]

Conversation Chaining

Use previous_response_id to build multi-turn conversations without resending the full message history. The server automatically retrieves the previous response's context and prepends it to your new input.

First turn
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }' # Returns: {"id": "resp_abc123", ...}
Second turn (chained)
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What about Germany?", "previous_response_id": "resp_abc123" }'

Chain Requirements

  • The previous response must exist and belong to the same project.
  • The previous response must be in a terminal state (completed, failed, cancelled, or incomplete).
  • Maximum chain depth is 50 responses.
  • Circular references are detected and rejected.

Tool Calling

Define function tools in the tools parameter. The model may generate function_call output items that your application executes.

Request with tools
{ "model": "llama-3.1-8b", "input": "What is the weather in Paris?", "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"] } } } ] }
Response with function call
{ "id": "resp_tool123", "status": "completed", "output": [ { "type": "function_call", "id": "fc_001", "call_id": "call_abc123", "name": "get_weather", "arguments": "{\"location\":\"Paris\"}", "status": "completed" } ] }
Follow-up with function output
{ "model": "llama-3.1-8b", "previous_response_id": "resp_tool123", "input": [ { "type": "function_call_output", "call_id": "call_abc123", "output": "{\"temperature\": 18, \"condition\": \"sunny\"}" } ] }

Storage & Retention

Responses are stored by default ("store": true) using the platform's standard two-tier storage architecture. Content is encrypted at rest and retained for 90 days. For details on storage tiers, encryption, retention, and billing, see Storage.

Set "store": false to skip storage entirely. The response will still be returned but will not be retrievable by ID afterward.

Error Handling

Common Error Codes

HTTP Status Error Code Description
400 invalid_request Missing or invalid parameters.
400 invalid_state Previous response is not in a terminal state, or response cannot be cancelled.
400 chain_depth_exceeded Response chain exceeds the maximum depth of 50.
401 authentication_error Invalid or missing API key.
404 not_found Response or previous response not found.
429 rate_limit_exceeded Too many requests. Check Retry-After header.
503 capacity_exceeded No available workers. Check Retry-After header.
Error Response
{ "error": { "message": "Previous response is not complete: resp_abc123", "type": "invalid_request_error", "code": "invalid_state" } }

Client Integrations

opencode

OpenCode supports the Responses API via the @ai-sdk/openai-compatible adapter. Configure it in ~/.config/opencode/opencode.json:

opencode.json
{ "$schema": "https://opencode.ai/config.json", "provider": { "xerotier": { "npm": "@ai-sdk/openai-compatible", "name": "xerotier", "options": { "baseURL": "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", "headers": { "Authorization": "Bearer xero_myproject_your_api_key" } }, "models": { "my-model": { "name": "llama-3.1-8b", "reasoning": true, "tool_call": true, "tools": true } } } }, "model": "xerotier/my-model" }

See OpenCode Integration for full configuration details and troubleshooting.

Python (OpenAI SDK)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Non-streaming response = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) print(response.output[0].content[0].text) # Streaming stream = client.responses.create( model="llama-3.1-8b", input="Explain quantum computing in simple terms.", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True) # Conversation chaining first = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) second = client.responses.create( model="llama-3.1-8b", input="What about Germany?", previous_response_id=first.id )

Node.js (OpenAI SDK)

JavaScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); // Non-streaming const response = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?", }); console.log(response.output[0].content[0].text); // Streaming const stream = await client.responses.create({ model: "llama-3.1-8b", input: "Explain quantum computing in simple terms.", stream: true, }); for await (const event of stream) { if (event.type === "response.output_text.delta") { process.stdout.write(event.delta); } }

curl (Non-Streaming)

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }'

curl (Streaming)

curl
curl --no-buffer -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?", "stream": true }'

List and Retrieve

curl
# List responses curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=10 \ -H "Authorization: Bearer xero_myproject_your_api_key" # Get a specific response curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \ -H "Authorization: Bearer xero_myproject_your_api_key" # Get input items curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items \ -H "Authorization: Bearer xero_myproject_your_api_key" # Cancel an in-progress response curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel \ -H "Authorization: Bearer xero_myproject_your_api_key" # Delete a response curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \ -H "Authorization: Bearer xero_myproject_your_api_key"