// API Reference

Responses API

Stateful generation without rebuilding the conversation each turn. Chain responses by id, queue long jobs in the background, stream incremental reasoning, and let the server keep the message history. Same wire shape as the OpenAI Responses API.

Overview

When to Use Responses API

  • Multi-turn conversations, Chain responses together with previous_response_id instead of resending the full message history.
  • Persistent storage, Responses are stored and retrievable by ID for later reference.
  • Background processing, Queue long-running requests and poll for completion.
  • Client SDK support, OpenAI Python/Node.js SDKs natively support the Responses API.

When to Use Chat Completions

  • You need full control over the message history.
  • You are using a client or tool that only supports the Chat Completions API.
  • You do not need server-side storage of responses.

Translation layer. Internally, every Responses API request is converted to a Chat Completion, routed through the same inference pipeline, and converted back to the Response format. Model behavior is identical.

Quick Start

Create a response with a simple text input:

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) print(response.output[0].content[0].text)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?" }); console.log(response.output[0].content[0].text);
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }'

Response

{ "id": "resp_abc123def456ghi789jkl012", "object": "response", "model": "llama-3.1-8b", "status": "completed", "output": [ { "type": "message", "id": "msg_001", "role": "assistant", "content": [ { "type": "output_text", "text": "The capital of France is Paris." } ], "status": "completed" } ], "usage": { "input_tokens": 12, "output_tokens": 8, "total_tokens": 20 }, "created_at": 1706123456, "service_tier": "default", "store": true, "metadata": null }

Authentication

All Responses API endpoints require a valid API key with the inference scope. Pass it in the Authorization header:

HTTP Header
Authorization: Bearer xero_myproject_your_api_key

See Authentication & Security for details on creating and managing API keys.

Endpoints

Method Path Description
POST /v1/responses Create a response
GET /v1/responses List responses
GET /v1/responses/{response_id} Get a response by ID
DELETE /v1/responses/{response_id} Delete a response
POST /v1/responses/{response_id}/cancel Cancel an in-progress response
GET /v1/responses/{response_id}/input_items List input items for a response

All paths are relative to your endpoint base URL: https://api.xerotier.ai/proj_ABC123/ENDPOINT_SLUG

Create Response

POST /v1/responses

Request Body

Parameter Type Description
modelrequired string Model identifier. Used by the router for validation gates (e.g. model_not_found, invalid_model_id); the endpoint configuration ultimately determines which backend model serves the request.
inputrequired string | array Input content. Can be a plain text string, an array of messages ({role, content}), or an array of input items ({type, role, content, call_id, output}).
instructionsoptional string System/developer instructions. Prepended as a system message if not already present from the response chain.
streamoptional boolean If true, the response is streamed as Server-Sent Events. Default: false
storeoptional boolean Whether to persist the response for later retrieval. Default: true
backgroundoptional boolean If true, the request returns immediately with a queued status. Poll the response ID for completion. Default: false
previous_response_idoptional string ID of a previous response to chain onto. The previous response's context is automatically prepended. See Conversation Chaining.
conversationoptional object Link this response to a server-side conversation. Pass {"id": "conv_xxx"} to prepend the conversation's existing items as context and append the response output as new items. See Conversations.
max_output_tokensoptional integer Maximum number of output tokens to generate.
temperatureoptional number Sampling temperature (0.0-2.0). Higher values produce more random output.
top_poptional number Nucleus sampling parameter (0.0-1.0).
toolsoptional array Tool definitions the model may call. See Tool Calling.
tool_choiceoptional string | object Controls tool selection: "auto", "none", "required", or {"type":"function","function":{"name":"fn_name"}}
parallel_tool_callsoptional boolean Allow multiple tool calls in a single response. Default: true
textoptional object Text format configuration. Supports {"format":{"type":"text"}}, {"format":{"type":"json_object"}}, or {"format":{"type":"json_schema","json_schema":{...}}}
reasoningoptional object Reasoning configuration for reasoning models. {"effort":"low|medium|high"}
metadataoptional object Up to 16 key-value pairs. Keys max 64 characters, values max 512 characters.
useroptional string End-user identifier for abuse monitoring and usage tracking.
truncationoptional string Truncation strategy when input exceeds the model context window. "auto" (default) drops middle items to fit; "disabled" returns an error if input exceeds the context.
service_tieroptional string Requested service tier. Accepted for API compatibility; the effective tier is resolved from the calling project's tier first, then the endpoint's configured tier.
includeoptional array Filter which fields appear in the response. Supported values: file_search_call.results (include full document chunk text in file_search results).

Input Formats

The input field accepts three formats:

Plain text (simplest)
"input": "What is the capital of France?"
Message array
"input": [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi there!"}, {"role": "user", "content": "What is 2+2?"} ]
Input items (supports function outputs)
"input": [ {"type": "message", "role": "user", "content": "Call get_weather for Paris"}, {"type": "function_call_output", "call_id": "call_abc", "output": "{\"temp\":18}"} ]

List Responses

GET /v1/responses

Returns a paginated list of responses for the project, ordered by creation time (newest first).

Query Parameters

Parameter Type Description
afteroptional string Cursor for forward pagination. Pass the id of the last response from the previous page.
limitoptional integer Number of responses to return. Default: 20. Maximum: 100.

Response

{ "object": "list", "data": [ { "id": "resp_abc123", "object": "response", "model": "llama-3.1-8b", "status": "completed", "created_at": 1706123456, "completed_at": 1706123458, "input_tokens": 12, "output_tokens": 8, "store": true, "metadata": null } ], "first_id": "resp_abc123", "last_id": "resp_abc123", "has_more": false }
curl
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=20" \ -H "Authorization: Bearer xero_myproject_your_api_key"

Get Response

GET /v1/responses/{response_id}

Retrieves the full response object for a stored response, including output content.

curl
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response

{ "id": "resp_abc123", "object": "response", "model": "llama-3.1-8b", "status": "completed", "created_at": 1706123456, "completed_at": 1706123458, "input_tokens": 12, "output_tokens": 24, "output": [ { "type": "message", "role": "assistant", "content": [ {"type": "output_text", "text": "Paris is the capital of France."} ] } ], "store": true, "metadata": null }

See Response Object for the full field list. Returns 404 if the response does not exist or belongs to a different project.

Delete Response

DELETE /v1/responses/{response_id}

Permanently deletes a stored response and its associated content from both hot and cold storage. This action cannot be undone.

curl
curl -X DELETE \ "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123" \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response

{ "id": "resp_abc123", "object": "response.deleted", "deleted": true }

Cancel Response

POST /v1/responses/{response_id}/cancel

Cancels an in-progress response. Only responses with status in_progress or queued can be cancelled. Completed, failed, or already-cancelled responses return a 400 error.

curl
curl -X POST \ "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel" \ -H "Authorization: Bearer xero_myproject_your_api_key"

Returns the updated response object with status: "cancelled".

List Input Items

GET /v1/responses/{response_id}/input_items

Returns the input items that were submitted with the response request.

Query Parameters

Parameter Type Description
afteroptional string Cursor for forward pagination.
limitoptional integer Number of items to return. Default: 20. Maximum: 100.
curl
curl "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items" \ -H "Authorization: Bearer xero_myproject_your_api_key"

Response Object

A completed response contains the model's output, usage information, and metadata.

JSON
{ "id": "resp_abc123def456ghi789jkl012", "object": "response", "model": "llama-3.1-8b", "status": "completed", "previous_response_id": null, "output": [ { "type": "message", "id": "msg_001", "role": "assistant", "content": [ { "type": "output_text", "text": "The capital of France is Paris.", "annotations": [] } ], "status": "completed" } ], "usage": { "input_tokens": 12, "output_tokens": 8, "total_tokens": 20, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens_details": { "reasoning_tokens": 0 } }, "created_at": 1706123456, "completed_at": 1706123457, "service_tier": "default", "store": true, "metadata": null }

Status Values

Status Description
queuedRequest received, waiting for processing.
in_progressInference is actively running.
completedResponse generated successfully.
failedAn error occurred during generation.
cancelledCancelled by the user before completion.
incompleteGeneration stopped early (max tokens, content filter, etc.).

Output Item Types

Type Description
message Assistant text message. Contains role, content[] (array of content parts), and status.
function_call Tool/function call. Contains call_id, name, arguments (JSON string), and status.
web_search_call Web search tool execution. Contains id, status (in_progress, searching, completed).
file_search_call Document search tool execution. Contains id, status, and optionally results (when include contains file_search_call.results).
reasoning Reasoning summary from models that produce think-tag content. Contains id and summary text.

Echo Fields

The response object includes echo fields that mirror request parameters, making it easy to see the exact configuration used.

Field Type Description
temperature number | null The temperature value used for this response.
top_p number | null The top_p value used for this response.
max_output_tokens integer | null The maximum output tokens configured for this response.
tools array | null The tools that were available for this response.
tool_choice string | object | null The tool choice setting used for this response.
text object | null The text format configuration used for this response.
reasoning object | null The reasoning configuration used for this response.
truncation string | null The truncation strategy used for this response.
instructions string | null The system instructions used for this response.
parallel_tool_calls boolean | null Whether parallel tool calls were enabled.

Streaming

Set "stream": true to receive the response as Server-Sent Events. The Responses API uses named event types for structured streaming.

Lifecycle Events

Event Description
response.createdResponse record created (status: queued). Contains the initial response object.
response.in_progressInference started (status: in_progress).
response.completedResponse completed normally. Contains the final response object with full usage data.
response.failedAn error occurred during generation. Contains the response object with error details. Mid-stream errors are surfaced via this event (the Responses stream does not emit a separate event: error named line).

Output Item Events

Event Description
response.output_item.addedNew output item (message, function_call, or reasoning) started. Contains output_index and initial item object.
response.output_item.doneOutput item finished. Contains the completed item object.
response.content_part.addedNew content part added to an output item. Contains output_index, content_index, and initial part object.
response.content_part.doneContent part finished. Contains the completed part object.

Text Delta Events

Event Description
response.output_text.deltaIncremental text content. Contains item_id, output_index, content_index, and delta string.
response.output_text.doneText content part complete. Contains item_id, output_index, content_index, and full accumulated text.

Tool Call Events

Event Description
response.function_call_arguments.doneFunction call arguments complete. Contains output_index, item_id, and full arguments JSON string. (Arguments are emitted as a single terminal event; no incremental .delta stream is produced.)

Reasoning Summary Events

Emitted by models that produce think-tag reasoning content (e.g. Qwen3, DeepSeek-R1).

Event Description
response.reasoning_summary_part.addedReasoning summary part started within a reasoning output item.
response.reasoning_summary_text.deltaIncremental reasoning summary text. Contains item_id, output_index, summary_index, and delta string.
response.reasoning_summary_text.doneReasoning summary text complete. Contains the full accumulated text.
response.reasoning_summary_part.doneReasoning summary part finished.

Streaming Example

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) stream = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True)
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const stream = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?", stream: true }); for await (const event of stream) { if (event.type === "response.output_text.delta") { process.stdout.write(event.delta); } }
curl
curl --no-buffer -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?", "stream": true }'
SSE Output
event: response.created data: {"id":"resp_abc123","object":"response","status":"queued","model":"llama-3.1-8b"} event: response.in_progress data: {"id":"resp_abc123","status":"in_progress"} event: response.output_item.added data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[],"status":"in_progress"}} event: response.content_part.added data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":""}} event: response.output_text.delta data: {"item_id":"msg_001","output_index":0,"content_index":0,"delta":"The capital of France is Paris."} event: response.content_part.done data: {"output_index":0,"content_index":0,"part":{"type":"output_text","text":"The capital of France is Paris."}} event: response.output_item.done data: {"output_index":0,"item":{"type":"message","id":"msg_001","role":"assistant","content":[{"type":"output_text","text":"The capital of France is Paris."}],"status":"completed"}} event: response.completed data: {"id":"resp_abc123","object":"response","status":"completed","usage":{"input_tokens":12,"output_tokens":8,"total_tokens":20}} event: done data: [DONE]

Vendor Events

In addition to standard response.* events, the platform emits vendor-prefixed events (x_*) for Xerotier-specific features such as research mode, deep think, and user interaction. Vendor events ride on the Responses stream in two wire shapes:

  • The dominant shape is a bare data: {"type":"x_*",...} line with no preceding event: name. Standard OpenAI SDK clients ignore unrecognized data lines, so this shape is wire-compatible.
  • The x_artifact.* family on the Responses path uses a named SSE event: event: x_artifact.created\ndata: {json}\n\n with camelCase payload keys. Strict OpenAI SDK clients will surface these as unknown event types.

Some vendor families listed below are emitted only by the Xerotier dashboard surface and are not produced on the public Responses stream; those rows are marked dashboard-only. Public SDK consumers should not depend on dashboard-only events arriving on /v1/responses.

Compatibility. Standard OpenAI SDK clients ignore unrecognized data lines. Vendor events are only relevant when building a custom stream consumer that wants to render research progress, deep think status, or artifact notifications.

Research Events (x_research.*)

Emitted during agentic research loops (research mode). Each event is a data: line containing a JSON object with a type field, name (tool name), arguments (JSON string), and optional metadata.

Event type Description
x_research.searchingWeb search tool invoked. arguments contains the search query JSON.
x_research.readingURL fetch tool invoked. arguments contains {"url":"..."}.
x_research.code_searchingCode search tool invoked (GitLab or local index).
x_research.calculatingCalculator tool invoked.
x_research.resultTool returned a result. metadata contains a brief summary of the result.
x_research.complete (dashboard-only)Research loop finished. Emitted only by the Xerotier dashboard surface; the public Responses stream does not produce this event. Dashboard payload contains elapsed_ms, input_tokens, output_tokens, iterations, and sources counts.
x_research event examples
data: {"type":"x_research.searching","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}"} data: {"type":"x_research.reading","name":"x_fetch_url","arguments":"{\"url\":\"https://example.com/\"}"} data: {"type":"x_research.result","name":"x_web_search","arguments":"{\"query\":\"Paris weather\"}","metadata":{"summary":"Paris is currently 18C and sunny."}} data: {"type":"x_research.complete","elapsed_ms":4200,"input_tokens":3200,"output_tokens":480,"iterations":3,"sources":5}

Deep Think Events (x_deep_think.*)

Emitted during deep think (multi-step research with planning and synthesis). Clients can use these to render a sub-task progress panel.

Event type Description
x_deep_think.plan_createdPlanning phase complete. Contains title and total_subtasks.
x_deep_think.discovery_startedTarget-focused discovery phase begun. Contains message.
x_deep_think.discovery_completedDiscovery phase complete. Contains message.
x_deep_think.subtask_startedA sub-task has begun. Contains subtask_id, subtask_index, subtask_query, and total_subtasks.
x_deep_think.subtask_completedA sub-task has finished. Contains subtask_index, input_tokens, and output_tokens.
x_deep_think.subtask_artifact_saved (dashboard-only)A sub-task research artifact was persisted. Dashboard payload contains artifact_id, artifact_name, and subtask_index.
x_deep_think.artifact_created (declared, not currently emitted)Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
x_deep_think.artifact_saved (dashboard-only)The deep think synthesis artifact was persisted. Dashboard payload contains artifact_name and artifact_id.
x_deep_think.synthesizingSynthesis phase begun.
x_deep_think.completed (declared, not currently emitted)Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
x_deep_think.memories_created (declared, not currently emitted)Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.
x_deep_think.error (declared, not currently emitted)Declared in the event vocabulary but not emitted by the current Responses pipeline. Do not depend on this event.

Ask User Events (x_ask_user.*)

Emitted when the model needs clarification before it can continue. The stream pauses and the application should prompt the user, then resume with the answer.

Event type Description
x_ask_user.questionThe model requires user input before continuing. Contains ask_user_id (correlation ID) and question text. Submit the answer via the chat answer endpoint.
x_ask_user.pending_stateCaptures the assistant content and tool calls accumulated before the pause, enabling conversation resumption after the user answers.

Artifact Events (x_artifact.*)

Emitted when code artifacts (code blocks, documents) are created or updated during a generation. On the Responses path these are written as named SSE events (event: x_artifact.created\ndata: {json}\n\n) with camelCase payload keys, this differs from the chat-completions surface, which emits the same family as bare data: lines with snake_case keys.

Event type Description
x_artifact.createdNew artifact created. Responses-path payload contains artifactId, identifier, title, language, contentType, and contentBase64 (base64-encoded artifact bytes).
x_artifact.updatedExisting artifact updated. Same payload shape as x_artifact.created with the new version of the content.

Context Fork Event (x_context_fork) (dashboard-only)

Emitted by the Xerotier dashboard surface when the user's message triggered creation of a new conversation branch. The public Responses stream does not emit this event. Unlike other vendor events the name has no .<suffix> segment.

Chat Metadata Event (x_chat.metadata) (dashboard-only)

Emitted by the Xerotier dashboard surface as the final data event before [DONE]. The public Responses stream (/v1/responses) does not emit this event; public SDK consumers should not wait for it. Payload uses camelCase keys (dashboard convention) rather than the snake_case used by the rest of the vendor surface.

Field Description
type"x_chat.metadata"
messageIdServer-assigned external ID for the persisted assistant message.
userMessageIdExternal ID for the persisted user message.
sequenceMonotonically increasing sequence number for the assistant message within the conversation.
contextContext budget breakdown: systemTokens, summaryTokens, retrievedTokens, recentTokens, fileTokens, currentMessageTokens, totalTokens, inputBudget, retrievedCount, recentCount, usedSemanticRetrieval, semanticRetrievalActive, chunkSelectionMethod.
usageCombined token usage including model inference plus any research or deep think overhead: input_tokens, output_tokens, total_tokens.

Analyst Events (x_analyst.*)

Emitted when the analyst mode builds or refreshes the workspace context brief before generating a response.

Event type Description
x_analyst.context_gatheringWorkspace context gathering has begun.
x_analyst.context_completedContext gathering finished. Contains counts of gathered items.
x_analyst.context_brief_createdThe LLM-generated context brief is ready. The brief summarizes the workspace for the response.
x_analyst.context_refreshedA previously cached context brief was refreshed due to workspace changes.

Conversation Chaining

Use previous_response_id to build multi-turn conversations without resending the full message history. The server automatically retrieves the previous response's context and prepends it to your new input.

First turn
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }' # Returns: {"id": "resp_abc123", ...}
Second turn (chained)
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What about Germany?", "previous_response_id": "resp_abc123" }'

Chain Requirements

  • The previous response must exist and belong to the same project.
  • The previous response must be in a terminal state (completed, failed, cancelled, or incomplete).
  • Maximum chain depth is 50 responses.
  • Circular references are detected and rejected.

Tool Calling

Define function tools in the tools parameter. The model may generate function_call output items that your application executes.

Request with tools
{ "model": "llama-3.1-8b", "input": "What is the weather in Paris?", "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": {"type": "string", "description": "City name"} }, "required": ["location"] } } } ] }
Response with function call
{ "id": "resp_tool123", "status": "completed", "output": [ { "type": "function_call", "id": "fc_001", "call_id": "call_abc123", "name": "get_weather", "arguments": "{\"location\":\"Paris\"}", "status": "completed" } ] }
Follow-up with function output
{ "model": "llama-3.1-8b", "previous_response_id": "resp_tool123", "input": [ { "type": "function_call_output", "call_id": "call_abc123", "output": "{\"temperature\": 18, \"condition\": \"sunny\"}" } ] }

Storage & Retention

Responses are stored by default ("store": true) using the platform's standard two-tier storage architecture. Content is encrypted at rest and retained based on the endpoint's service tier. For details on storage tiers, encryption, retention, and billing, see Storage.

Set "store": false to skip storage entirely. The response will still be returned but will not be retrievable by ID afterward.

Error Handling

Common Error Codes

HTTP Status Error Code Description
400 invalid_request Missing or invalid parameters.
400 invalid_state Previous response is not in a terminal state, or response cannot be cancelled.
400 chain_depth_exceeded Response chain exceeds the maximum depth of 50.
401 authentication_error Invalid or missing API key.
404 not_found Response or previous response not found.
429 rate_limit_exceeded Too many requests. Check Retry-After header.
503 capacity_exceeded No available workers. Check Retry-After header.
Error Response
{ "error": { "message": "Previous response is not complete: resp_abc123", "type": "invalid_request_error", "code": "invalid_state" } }

Client Integrations

opencode

OpenCode supports the Responses API via the @ai-sdk/openai-compatible adapter. Configure it in ~/.config/opencode/opencode.json:

opencode.json
{ "$schema": "https://opencode.ai/config.json", "provider": { "xerotier": { "npm": "@ai-sdk/openai-compatible", "name": "xerotier", "options": { "baseURL": "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", "headers": { "Authorization": "Bearer xero_myproject_your_api_key" } }, "models": { "my-model": { "name": "llama-3.1-8b", "reasoning": true, "tool_call": true, "tools": true } } } }, "model": "xerotier/my-model" }

See OpenCode Integration for full configuration details and troubleshooting.

Python (OpenAI SDK)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Non-streaming response = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) print(response.output[0].content[0].text) # Streaming stream = client.responses.create( model="llama-3.1-8b", input="Explain quantum computing in simple terms.", stream=True ) for event in stream: if event.type == "response.output_text.delta": print(event.delta, end="", flush=True) # Conversation chaining first = client.responses.create( model="llama-3.1-8b", input="What is the capital of France?" ) second = client.responses.create( model="llama-3.1-8b", input="What about Germany?", previous_response_id=first.id )

Node.js (OpenAI SDK)

JavaScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); // Non-streaming const response = await client.responses.create({ model: "llama-3.1-8b", input: "What is the capital of France?", }); console.log(response.output[0].content[0].text); // Streaming const stream = await client.responses.create({ model: "llama-3.1-8b", input: "Explain quantum computing in simple terms.", stream: true, }); for await (const event of stream) { if (event.type === "response.output_text.delta") { process.stdout.write(event.delta); } }

curl (Non-Streaming)

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?" }'

curl (Streaming)

curl
curl --no-buffer -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "input": "What is the capital of France?", "stream": true }'

List and Retrieve

curl
# List responses curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses?limit=10 \ -H "Authorization: Bearer xero_myproject_your_api_key" # Get a specific response curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \ -H "Authorization: Bearer xero_myproject_your_api_key" # Get input items curl https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/input_items \ -H "Authorization: Bearer xero_myproject_your_api_key" # Cancel an in-progress response curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123/cancel \ -H "Authorization: Bearer xero_myproject_your_api_key" # Delete a response curl -X DELETE https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses/resp_abc123 \ -H "Authorization: Bearer xero_myproject_your_api_key"