Responses API - Xerotier

Overview

When to Use Responses API

Multi-turn conversations, Chain responses together with previous_response_id instead of resending the full message history.
Persistent storage, Responses are stored and retrievable by ID for later reference.
Background processing, Queue long-running requests and poll for completion.
Client SDK support, OpenAI Python/Node.js SDKs natively support the Responses API.

When to Use Chat Completions

You need full control over the message history.
You are using a client or tool that only supports the Chat Completions API.
You do not need server-side storage of responses.

Translation layer. Internally, every Responses API request is converted to a Chat Completion, routed through the same inference pipeline, and converted back to the Response format. Model behavior is identical.

Quick Start

Create a response with a simple text input:

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

response = client.responses.create(
    model="llama-3.1-8b",
    input="What is the capital of France?"
)
print(response.output[0].content[0].text)
                

Node.js

                    import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    apiKey: "xero_myproject_your_api_key"
});

const response = await client.responses.create({
    model: "llama-3.1-8b",
    input: "What is the capital of France?"
});
console.log(response.output[0].content[0].text);
                

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What is the capital of France?"
  }'
                

Response

                        {
  "id": "resp_abc123def456ghi789jkl012",
  "object": "response",
  "model": "llama-3.1-8b",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris."
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 8,
    "total_tokens": 20
  },
  "created_at": 1706123456,
  "service_tier": "default",
  "store": true,
  "metadata": null
}
                    

Authentication

All Responses API endpoints require a valid API key with the inference scope. Pass it in the Authorization header:

HTTP Header

Authorization: Bearer xero_myproject_your_api_key

See Authentication & Security for details on creating and managing API keys.

Endpoints

Method	Path	Description
POST	/v1/responses	Create a response
GET	/v1/responses	List responses
GET	/v1/responses/{response_id}	Get a response by ID
DELETE	/v1/responses/{response_id}	Delete a response
POST	/v1/responses/{response_id}/cancel	Cancel an in-progress response
GET	/v1/responses/{response_id}/input_items	List input items for a response

All paths are relative to your endpoint base URL: https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG

Create Response

POST /v1/responses

Request Body

Parameter	Type	Description
modelrequired	string	Model identifier. Used by the router for validation gates (e.g. `model_not_found`, `invalid_model_id`); the endpoint configuration ultimately determines which backend model serves the request.
inputrequired	string \| array	Input content. Can be a plain text string, an array of messages (`{role, content}`), or an array of input items (`{type, role, content, call_id, output}`).
instructionsoptional	string	System/developer instructions. Prepended as a system message if not already present from the response chain.
streamoptional	boolean	If true, the response is streamed as Server-Sent Events. Default: false
storeoptional	boolean	Whether to persist the response for later retrieval. Default: true
backgroundoptional	boolean	If true, the request returns immediately with a `queued` status. Poll the response ID for completion. Default: false
previous_response_idoptional	string	ID of a previous response to chain onto. The previous response's context is automatically prepended. See Conversation Chaining.
conversationoptional	object	Link this response to a server-side conversation. Pass `{"id": "conv_xxx"}` to prepend the conversation's existing items as context and append the response output as new items. See Conversations.
max_output_tokensoptional	integer	Maximum number of output tokens to generate.
temperatureoptional	number	Sampling temperature (0.0-2.0). Higher values produce more random output.
top_poptional	number	Nucleus sampling parameter (0.0-1.0).
toolsoptional	array	Tool definitions the model may call. See Tool Calling.
tool_choiceoptional	string \| object	Controls tool selection: `"auto"`, `"none"`, `"required"`, or `{"type":"function","function":{"name":"fn_name"}}`
parallel_tool_callsoptional	boolean	Allow multiple tool calls in a single response. Default: true
textoptional	object	Text format configuration. Supports `{"format":{"type":"text"}}`, `{"format":{"type":"json_object"}}`, or `{"format":{"type":"json_schema","json_schema":{...}}}`
reasoningoptional	object	Reasoning configuration for reasoning models. `{"effort":"low\|medium\|high"}`
metadataoptional	object	Up to 16 key-value pairs. Keys max 64 characters, values max 512 characters.
useroptional	string	End-user identifier for abuse monitoring and usage tracking.
truncationoptional	string	Truncation strategy when input exceeds the model context window. `"auto"` (default) drops middle items to fit; `"disabled"` returns an error if input exceeds the context.
service_tieroptional	string	Requested service tier. Accepted for API compatibility; the effective tier is resolved from the calling project's tier first, then the endpoint's configured tier.
includeoptional	array	Filter which fields appear in the response. Supported values: `file_search_call.results` (include full document chunk text in file_search results).

Input Formats

The input field accepts three formats:

Plain text (simplest)

"input": "What is the capital of France?"

Message array

                    "input": [
  {"role": "user", "content": "Hello"},
  {"role": "assistant", "content": "Hi there!"},
  {"role": "user", "content": "What is 2+2?"}
]
                

Input items (supports function outputs)

                    "input": [
  {"type": "message", "role": "user", "content": "Call get_weather for Paris"},
  {"type": "function_call_output", "call_id": "call_abc", "output": "{\"temp\":18}"}
]
                

List Responses

GET /v1/responses

Returns a paginated list of responses for the project, ordered by creation time (newest first).

Query Parameters

Parameter	Type	Description
afteroptional	string	Cursor for forward pagination. Pass the `id` of the last response from the previous page.
limitoptional	integer	Number of responses to return. Default: 20. Maximum: 100.

Response

                        {
  "object": "list",
  "data": [
    {
      "id": "resp_abc123",
      "object": "response",
      "model": "llama-3.1-8b",
      "status": "completed",
      "created_at": 1706123456,
      "completed_at": 1706123458,
      "input_tokens": 12,
      "output_tokens": 8,
      "store": true,
      "metadata": null
    }
  ],
  "first_id": "resp_abc123",
  "last_id": "resp_abc123",
  "has_more": false
}
                    

curl

                    curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=20" \
  -H "Authorization: Bearer xero_myproject_your_api_key"
                

Get Response

GET /v1/responses/{response_id}

Retrieves the full response object for a stored response, including output content.

curl

                    curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \
  -H "Authorization: Bearer xero_myproject_your_api_key"
                

Response

                        {
  "id": "resp_abc123",
  "object": "response",
  "model": "llama-3.1-8b",
  "status": "completed",
  "created_at": 1706123456,
  "completed_at": 1706123458,
  "input_tokens": 12,
  "output_tokens": 24,
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "content": [
        {"type": "output_text", "text": "Paris is the capital of France."}
      ]
    }
  ],
  "store": true,
  "metadata": null
}
                    

See Response Object for the full field list. Returns 404 if the response does not exist or belongs to a different project.

Delete Response

DELETE /v1/responses/{response_id}

Permanently deletes a stored response and its associated content from both hot and cold storage. This action cannot be undone.

curl

                    curl -X DELETE \
  "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \
  -H "Authorization: Bearer xero_myproject_your_api_key"
                

Response

                        {
  "id": "resp_abc123",
  "object": "response.deleted",
  "deleted": true
}
                    

Cancel Response

POST /v1/responses/{response_id}/cancel

Cancels an in-progress response. Only responses with status in_progress or queued can be cancelled. Completed, failed, or already-cancelled responses return a 400 error.

curl

                    curl -X POST \
  "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel" \
  -H "Authorization: Bearer xero_myproject_your_api_key"
                

Returns the updated response object with status: "cancelled".

List Input Items

GET /v1/responses/{response_id}/input_items

Returns the input items that were submitted with the response request.

Query Parameters

Parameter	Type	Description
afteroptional	string	Cursor for forward pagination.
limitoptional	integer	Number of items to return. Default: 20. Maximum: 100.

curl

                    curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items" \
  -H "Authorization: Bearer xero_myproject_your_api_key"
                

Response Object

A completed response contains the model's output, usage information, and metadata.

JSON

                    {
  "id": "resp_abc123def456ghi789jkl012",
  "object": "response",
  "model": "llama-3.1-8b",
  "status": "completed",
  "previous_response_id": null,
  "output": [
    {
      "type": "message",
      "id": "msg_001",
      "role": "assistant",
      "content": [
        {
          "type": "output_text",
          "text": "The capital of France is Paris.",
          "annotations": []
        }
      ],
      "status": "completed"
    }
  ],
  "usage": {
    "input_tokens": 12,
    "output_tokens": 8,
    "total_tokens": 20,
    "input_tokens_details": {
      "cached_tokens": 0
    },
    "output_tokens_details": {
      "reasoning_tokens": 0
    }
  },
  "created_at": 1706123456,
  "completed_at": 1706123457,
  "service_tier": "default",
  "store": true,
  "metadata": null
}
                

Status Values

Status	Description
`queued`	Request received, waiting for processing.
`in_progress`	Inference is actively running.
`completed`	Response generated successfully.
`failed`	An error occurred during generation.
`cancelled`	Cancelled by the user before completion.
`incomplete`	Generation stopped early (max tokens, content filter, etc.).

Output Item Types

Type	Description
`message`	Assistant text message. Contains `role`, `content[]` (array of content parts), and `status`.
`function_call`	Tool/function call. Contains `call_id`, `name`, `arguments` (JSON string), and `status`.
`web_search_call`	Web search tool execution. Contains `id`, `status` (in_progress, searching, completed).
`file_search_call`	Document search tool execution. Contains `id`, `status`, and optionally `results` (when `include` contains `file_search_call.results`).
`reasoning`	Reasoning summary from models that produce think-tag content. Contains `id` and `summary` text.

Echo Fields

The response object includes echo fields that mirror request parameters, making it easy to see the exact configuration used.

Field	Type	Description
temperature	number \| null	The temperature value used for this response.
top_p	number \| null	The top_p value used for this response.
max_output_tokens	integer \| null	The maximum output tokens configured for this response.
tools	array \| null	The tools that were available for this response.
tool_choice	string \| object \| null	The tool choice setting used for this response.
text	object \| null	The text format configuration used for this response.
reasoning	object \| null	The reasoning configuration used for this response.
truncation	string \| null	The truncation strategy used for this response.
instructions	string \| null	The system instructions used for this response.
parallel_tool_calls	boolean \| null	Whether parallel tool calls were enabled.

Streaming

Set "stream": true to receive the response as Server-Sent Events. The Responses API uses named event types for structured streaming.

Lifecycle Events

Event	Description
`response.created`	Response record created (status: queued). Contains the initial response object.
`response.in_progress`	Inference started (status: in_progress).
`response.completed`	Response completed normally. Contains the final response object with full usage data.
`response.failed`	An error occurred during generation. Contains the response object with error details. Mid-stream errors are surfaced via this event (the Responses stream does not emit a separate `event: error` named line).

Output Item Events

Event	Description
`response.output_item.added`	New output item (message, function_call, or reasoning) started. Contains `output_index` and initial item object.
`response.output_item.done`	Output item finished. Contains the completed item object.
`response.content_part.added`	New content part added to an output item. Contains `output_index`, `content_index`, and initial part object.
`response.content_part.done`	Content part finished. Contains the completed part object.

Text Delta Events

Event	Description
`response.output_text.delta`	Incremental text content. Contains `item_id`, `output_index`, `content_index`, and `delta` string.
`response.output_text.done`	Text content part complete. Contains `item_id`, `output_index`, `content_index`, and full accumulated `text`.

Tool Call Events

Event	Description
`response.function_call_arguments.done`	Function call arguments complete. Contains `output_index`, `item_id`, and full `arguments` JSON string. (Arguments are emitted as a single terminal event; no incremental `.delta` stream is produced.)

Reasoning Summary Events

Emitted by models that produce think-tag reasoning content (e.g. Qwen3, DeepSeek-R1).

Event	Description
`response.reasoning_summary_part.added`	Reasoning summary part started within a reasoning output item.
`response.reasoning_summary_text.delta`	Incremental reasoning summary text. Contains `item_id`, `output_index`, `summary_index`, and `delta` string.
`response.reasoning_summary_text.done`	Reasoning summary text complete. Contains the full accumulated text.
`response.reasoning_summary_part.done`	Reasoning summary part finished.

Streaming Example

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

stream = client.responses.create(
    model="llama-3.1-8b",
    input="What is the capital of France?",
    stream=True
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)
                

Node.js

                    import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    apiKey: "xero_myproject_your_api_key"
});

const stream = await client.responses.create({
    model: "llama-3.1-8b",
    input: "What is the capital of France?",
    stream: true
});
for await (const event of stream) {
    if (event.type === "response.output_text.delta") {
        process.stdout.write(event.delta);
    }
}
                

curl

                    curl --no-buffer -X POST \
  https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What is the capital of France?",
    "stream": true
  }'
                

SSE Output

                    event: response.created
data: {"id":"resp_abc123","object":"response","status":"queued","model":"llama-3.1-8b"}

event: response.in_progress
data: {"id":"resp_abc123","status":"in_progress"}

event: response.output_item.added
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}}

event: response.content_part.added
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":""}}

event: response.output_text.delta
data: {"item_id":"msg_001","output_index":0,"content_index":0,"delta":"The capital of France is Paris."}

event: response.content_part.done
data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":"The capital of France is Paris."}}

event: response.output_item.done
data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"The capital of France is Paris."}],"status":"completed"}}

event: response.completed
data: {"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":8,"total_tokens":20}}

event: done
data: [DONE]
                

Vendor Events

In addition to standard response.* events, the platform emits vendor-prefixed events (x_*) for Xerotier-specific features such as research mode, deep think, and user interaction. Vendor events ride on the Responses stream in two wire shapes:

The dominant shape is a bare data: {"type":"x_*",...} line with no preceding event: name. Standard OpenAI SDK clients ignore unrecognized data lines, so this shape is wire-compatible.
The x_artifact.* family on the Responses path uses a named SSE event: event: x_artifact.created\ndata: {json}\n\n with camelCase payload keys. Strict OpenAI SDK clients will surface these as unknown event types.

Some vendor families listed below are emitted only by the Xerotier dashboard surface and are not produced on the public Responses stream; those rows are marked dashboard-only. Public SDK consumers should not depend on dashboard-only events arriving on /v1/responses.

Compatibility. Standard OpenAI SDK clients ignore unrecognized data lines. Vendor events are only relevant when building a custom stream consumer that wants to render research progress, deep think status, or artifact notifications.

Research Events (x_research.*)

Emitted during agentic research loops (research mode). Each event is a data: line containing a JSON object with a type field, name (tool name), arguments (JSON string), and optional metadata.

Event type	Description
`x_research.searching`	Web search tool invoked. `arguments` contains the search query JSON.
`x_research.reading`	URL fetch tool invoked. `arguments` contains `{"url":"..."}`.
`x_research.code_searching`	Code search tool invoked (GitLab or local index).
`x_research.calculating`	Calculator tool invoked.
`x_research.result`	Tool returned a result. `metadata` contains a brief summary of the result.
`x_research.complete` (dashboard-only)	Research loop finished. Emitted only by the Xerotier dashboard surface; the public Responses stream does not produce this event. Dashboard payload contains `elapsed_ms`, `input_tokens`, `output_tokens`, `iterations`, and `sources` counts.

x_research event examples

                    data: {"type":"x_research.searching","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}"}

data: {"type":"x_research.reading","name":"x_fetch_url","arguments":"{\"url\":\"https://example.com/\"}"}

data: {"type":"x_research.result","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}","metadata":{"summary":"Paris is currently 18C and sunny."}}

data: {"type":"x_research.complete","elapsed_ms":4200,"input_tokens":3200,"output_tokens":480,"iterations":3,"sources":5}

Deep Think Events (x_deep_think.*)

Emitted during deep think (multi-step research with planning and synthesis). Clients can use these to render a sub-task progress panel.

Event type	Description
`x_deep_think.plan_created`	Planning phase complete. Contains `title` and `total_subtasks`.
`x_deep_think.discovery_started`	Target-focused discovery phase begun. Contains `message`.
`x_deep_think.discovery_completed`	Discovery phase complete. Contains `message`.
`x_deep_think.subtask_started`	A sub-task has begun. Contains `subtask_id`, `subtask_index`, `subtask_query`, and `total_subtasks`.
`x_deep_think.subtask_completed`	A sub-task has finished. Contains `subtask_index`, `input_tokens`, and `output_tokens`.
`x_deep_think.subtask_artifact_saved` (dashboard-only)	A sub-task research artifact was persisted. Dashboard payload contains `artifact_id`, `artifact_name`, and `subtask_index`.
`x_deep_think.artifact_created` (declared, not currently emitted)	Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
`x_deep_think.artifact_saved` (dashboard-only)	The deep think synthesis artifact was persisted. Dashboard payload contains `artifact_name` and `artifact_id`.
`x_deep_think.synthesizing`	Synthesis phase begun.
`x_deep_think.completed` (declared, not currently emitted)	Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
`x_deep_think.memories_created` (declared, not currently emitted)	Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
`x_deep_think.error` (declared, not currently emitted)	Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.

Ask User Events (x_ask_user.*)

Emitted when the model needs clarification before it can continue. The stream pauses and the application should prompt the user, then resume with the answer.

Event type	Description
`x_ask_user.question`	The model requires user input before continuing. Contains `ask_user_id` (correlation ID) and `question` text. Submit the answer via the chat answer endpoint.
`x_ask_user.pending_state`	Captures the assistant content and tool calls accumulated before the pause, enabling conversation resumption after the user answers.

Artifact Events (x_artifact.*)

Emitted when code artifacts (code blocks, documents) are created or updated during a generation. On the Responses path these are written as named SSE events (event: x_artifact.created\ndata: {json}\n\n) with camelCase payload keys, this differs from the chat-completions surface, which emits the same family as bare data: lines with snake_case keys.

Event type	Description
`x_artifact.created`	New artifact created. Responses-path payload contains `artifactId`, `identifier`, `title`, `language`, `contentType`, and `contentBase64` (base64-encoded artifact bytes).
`x_artifact.updated`	Existing artifact updated. Same payload shape as `x_artifact.created` with the new version of the content.

Context Fork Event (x_context_fork) (dashboard-only)

Emitted by the Xerotier dashboard surface when the user's message triggered creation of a new conversation branch. The public Responses stream does not emit this event. Unlike other vendor events the name has no .<suffix> segment.

Chat Metadata Event (x_chat.metadata) (dashboard-only)

Emitted by the Xerotier dashboard surface as the final data event before [DONE]. The public Responses stream (/v1/responses) does not emit this event; public SDK consumers should not wait for it. Payload uses camelCase keys (dashboard convention) rather than the snake_case used by the rest of the vendor surface.

Field	Description
`type`	`"x_chat.metadata"`
`messageId`	Server-assigned external ID for the persisted assistant message.
`userMessageId`	External ID for the persisted user message.
`sequence`	Monotonically increasing sequence number for the assistant message within the conversation.
`context`	Context budget breakdown: `systemTokens`, `summaryTokens`, `retrievedTokens`, `recentTokens`, `fileTokens`, `currentMessageTokens`, `totalTokens`, `inputBudget`, `retrievedCount`, `recentCount`, `usedSemanticRetrieval`, `semanticRetrievalActive`, `chunkSelectionMethod`.
`usage`	Combined token usage including model inference plus any research or deep think overhead: `input_tokens`, `output_tokens`, `total_tokens`.

Analyst Events (x_analyst.*)

Emitted when the analyst mode builds or refreshes the workspace context brief before generating a response.

Event type	Description
`x_analyst.context_gathering`	Workspace context gathering has begun.
`x_analyst.context_completed`	Context gathering finished. Contains counts of gathered items.
`x_analyst.context_brief_created`	The LLM-generated context brief is ready. The brief summarizes the workspace for the response.
`x_analyst.context_refreshed`	A previously cached context brief was refreshed due to workspace changes.

Conversation Chaining

Use previous_response_id to build multi-turn conversations without resending the full message history. The server automatically retrieves the previous response's context and prepends it to your new input.

First turn

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What is the capital of France?"
  }'
# Returns: {"id": "resp_abc123", ...}
                

Second turn (chained)

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What about Germany?",
    "previous_response_id": "resp_abc123"
  }'
                

Chain Requirements

The previous response must exist and belong to the same project.
The previous response must be in a terminal state (completed, failed, cancelled, or incomplete).
Maximum chain depth is 50 responses.
Circular references are detected and rejected.

Tool Calling

Define function tools in the tools parameter. The model may generate function_call output items that your application executes.

Request with tools

                    {
  "model": "llama-3.1-8b",
  "input": "What is the weather in Paris?",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string", "description": "City name"}
          },
          "required": ["location"]
        }
      }
    }
  ]
}
                

Response with function call

                    {
  "id": "resp_tool123",
  "status": "completed",
  "output": [
    {
      "type": "function_call",
      "id": "fc_001",
      "call_id": "call_abc123",
      "name": "get_weather",
      "arguments": "{\"location\":\"Paris\"}",
      "status": "completed"
    }
  ]
}
                

Follow-up with function output

                    {
  "model": "llama-3.1-8b",
  "previous_response_id": "resp_tool123",
  "input": [
    {
      "type": "function_call_output",
      "call_id": "call_abc123",
      "output": "{\"temperature\": 18, \"condition\": \"sunny\"}"
    }
  ]
}
                

Storage & Retention

Responses are stored by default ("store": true) using the platform's standard two-tier storage architecture. Content is encrypted at rest and retained based on the endpoint's service tier. For details on storage tiers, encryption, retention, and billing, see Storage.

Set "store": false to skip storage entirely. The response will still be returned but will not be retrievable by ID afterward.

Error Handling

Common Error Codes

HTTP Status	Error Code	Description
400	`invalid_request`	Missing or invalid parameters.
400	`invalid_state`	Previous response is not in a terminal state, or response cannot be cancelled.
400	`chain_depth_exceeded`	Response chain exceeds the maximum depth of 50.
401	`authentication_error`	Invalid or missing API key.
404	`not_found`	Response or previous response not found.
429	`rate_limit_exceeded`	Too many requests. Check `Retry-After` header.
503	`capacity_exceeded`	No available workers. Check `Retry-After` header.

Error Response

                    {
  "error": {
    "message": "Previous response is not complete: resp_abc123",
    "type": "invalid_request_error",
    "code": "invalid_state"
  }
}
                

Client Integrations

opencode

OpenCode supports the Responses API via the @ai-sdk/openai-compatible adapter. Configure it in ~/.config/opencode/opencode.json:

opencode.json

                    {
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "xerotier": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "xerotier",
      "options": {
        "baseURL": "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
        "headers": {
          "Authorization": "Bearer xero_myproject_your_api_key"
        }
      },
      "models": {
        "my-model": {
          "name": "llama-3.1-8b",
          "reasoning": true,
          "tool_call": true,
          "tools": true
        }
      }
    }
  },
  "model": "xerotier/my-model"
}
                

See OpenCode Integration for full configuration details and troubleshooting.

Python (OpenAI SDK)

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

# Non-streaming
response = client.responses.create(
    model="llama-3.1-8b",
    input="What is the capital of France?"
)
print(response.output[0].content[0].text)

# Streaming
stream = client.responses.create(
    model="llama-3.1-8b",
    input="Explain quantum computing in simple terms.",
    stream=True
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

# Conversation chaining
first = client.responses.create(
    model="llama-3.1-8b",
    input="What is the capital of France?"
)
second = client.responses.create(
    model="llama-3.1-8b",
    input="What about Germany?",
    previous_response_id=first.id
)
                

Node.js (OpenAI SDK)

JavaScript

                    import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
  apiKey: "xero_myproject_your_api_key",
});

// Non-streaming
const response = await client.responses.create({
  model: "llama-3.1-8b",
  input: "What is the capital of France?",
});
console.log(response.output[0].content[0].text);

// Streaming
const stream = await client.responses.create({
  model: "llama-3.1-8b",
  input: "Explain quantum computing in simple terms.",
  stream: true,
});
for await (const event of stream) {
  if (event.type === "response.output_text.delta") {
    process.stdout.write(event.delta);
  }
}
                

curl (Non-Streaming)

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What is the capital of France?"
  }'
                

curl (Streaming)

curl

                    curl --no-buffer -X POST \
  https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "input": "What is the capital of France?",
    "stream": true
  }'
                

List and Retrieve

curl

                    # List responses
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=10 \
  -H "Authorization: Bearer xero_myproject_your_api_key"

# Get a specific response
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
  -H "Authorization: Bearer xero_myproject_your_api_key"

# Get input items
curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items \
  -H "Authorization: Bearer xero_myproject_your_api_key"

# Cancel an in-progress response
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel \
  -H "Authorization: Bearer xero_myproject_your_api_key"

# Delete a response
curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \
  -H "Authorization: Bearer xero_myproject_your_api_key"