// API Reference

Tool Calling

Give the model a function signature and let it pick when to call it. Same JSON Schema tools your OpenAI code already declares, the same parallel-call semantics, structured-output enforcement via strict mode, and log probabilities you can rank against.

tool_choice default auto when tools provided
parallel_tool_calls true default; model-dependent
function.name [a-zA-Z0-9_-]+ router-validated, 400 on miss
strict mode requires required + additionalProperties:false at every nesting level

How it works

Tool calling lets models emit structured function calls your application executes. The wire shape matches OpenAI chat completions, so any SDK or raw HTTP client that already speaks tools + tool_choice works against this router without changes.

  1. You declare tools in the request as JSON Schema.
  2. The model decides whether to call a tool and emits the call with arguments.
  3. Your application executes the function and returns the result as a tool-role message.
  4. The model uses the result to produce the next assistant turn.

Do not combine tools with response_format: json_schema in the same request. The two structured-output paths are mutually exclusive: pick tool calling when the model should choose which schema to emit, or pick response_format when there is one fixed schema and no branching.

Tool Definition

Tools are defined using JSON Schema in the tools array:

JSON
{ "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name (e.g., 'Paris')" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["location"], "additionalProperties": false }, "strict": true } } ] }

Tool Definition Fields

Tool definition fields: type, function.name, function.description, function.parameters, function.strict.
Field Type Description
typerequired string Must be "function".
function.namerequired string Name of the function. Must match [a-zA-Z0-9_-]+. The router rejects requests whose function name violates this pattern with an HTTP 400 validation error before the model is invoked.
function.descriptionoptional string Description of what the function does. Helps the model decide when to use it.
function.parametersoptional object JSON Schema describing the function's parameters.
function.strictoptional boolean When true, the model strictly follows the parameter schema (structured output mode). When strict mode is enabled, all parameters must be listed in required and additionalProperties must be false at every level of the schema. Nested objects that leave additionalProperties unset (or set to true) silently break strict-mode dispatch. Omit or set false for non-strict mode.

Tool Choice

The tool_choice parameter controls when the model calls tools:

Accepted tool_choice values and their behavior on chat completions and the Responses API.
Value Behavior
"auto" Model decides whether to call a tool or generate text. When tools are provided and tool_choice is omitted, the model behaves as if "auto" were passed. When no tools are provided, the field is ignored.
"none" Model will not call any tools. Useful when you want the model to generate a text response even though tools are defined.
"required" Model must call at least one tool. The response will always contain tool_calls.
{"type": "function", "function": {"name": "get_weather"}} Chat completions: force the model to call a specific function by name.
{"type": "function", "name": "get_weather"} Responses API: same intent as the row above with the flatter Responses-API tool shape (no nested function wrapper).

Parallel Tool Calls

When tools are provided, the model may call multiple tools in a single response. This is controlled by the parallel_tool_calls parameter.

parallel_tool_calls values and their behavior per response turn.
Value Behavior
true (default) The model may generate multiple tool calls in a single response, each with a unique id.
false The model generates at most one tool call per response turn.

When multiple tool calls are returned, each has a unique id and its own function with name and arguments. You must return results for all tool calls before sending the next message. Each result is a tool-role message with the matching tool_call_id.

Final behavior depends on the model. Smaller open-source models may only emit one tool call per response even when parallel_tool_calls: true.

Multi-Turn Conversation Flow

Step 1: Send Message with Tool Definitions

Equivalent code in three transports; pick one. Tab selection persists across this page.

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="llama-3.1-8b", messages=[ {"role": "user", "content": "What is the weather in Paris and London?"} ], tools=tools, tool_choice="auto" )
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const tools = [{ type: "function", function: { name: "get_weather", description: "Get weather for a city", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] } } }]; const response = await client.chat.completions.create({ model: "llama-3.1-8b", messages: [ { role: "user", content: "What is the weather in Paris and London?" } ], tools: tools, tool_choice: "auto" });
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "messages": [ {"role": "user", "content": "What is the weather in Paris and London?"} ], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }], "tool_choice": "auto" }'

Step 2: Model Responds with Tool Calls

JSON Response
{ "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_paris", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris\"}" } }, { "id": "call_london", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"London\"}" } } ] }, "finish_reason": "tool_calls" }] }

Step 3: Send Tool Results

Execute the functions and send results back as tool-role messages, each with the matching tool_call_id. The original assistant turn containing the tool_calls must remain in messages so the model can correlate results with calls; do not strip it before sending the follow-up.

JSON Request Body
{ "model": "llama-3.1-8b", "messages": [ {"role": "user", "content": "What is the weather in Paris and London?"}, {"role": "assistant", "content": null, "tool_calls": [ {"id": "call_paris", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}}, {"id": "call_london", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"London\"}"}} ]}, {"role": "tool", "tool_call_id": "call_paris", "content": "{\"temp\": 18, \"condition\": \"sunny\"}"}, {"role": "tool", "tool_call_id": "call_london", "content": "{\"temp\": 14, \"condition\": \"cloudy\"}"} ], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"] } } }] }

Step 4: Model Generates Final Response

JSON Response
{ "choices": [{ "message": { "role": "assistant", "content": "The weather in Paris is 18C and sunny. In London, it is 14C and cloudy." }, "finish_reason": "stop" }] }

Streaming Tool Calls

When streaming is enabled, tool calls are delivered incrementally across multiple chunks. The first chunk includes the tool call id, type, and function name. Subsequent chunks stream the arguments string incrementally. See the Streaming API for details.

Tool-using streams do not emit a final usage chunk unless the request includes stream_options: {"include_usage": true}. Without it, the stream terminates at data: [DONE] with no aggregated token counts; set include_usage to true if you need to bill or report token usage for tool-using conversations.

Structured Output (response_format)

The response_format parameter controls the output format. Use it when you need the model to return valid JSON or conform to a specific schema.

Text Mode (Default)

JSON
{"response_format": {"type": "text"}}

Unstructured text output. This is the default behavior when response_format is not specified.

JSON Object Mode

JSON
{ "response_format": {"type": "json_object"}, "messages": [ {"role": "system", "content": "Respond with valid JSON."}, {"role": "user", "content": "List three colors with their hex codes."} ] }

Forces the model to output valid JSON. You should include instructions in the system message about the expected JSON structure. The model will always return a parseable JSON object, but the schema is not enforced.

JSON Schema Mode

JSON
{ "response_format": { "type": "json_schema", "json_schema": { "name": "color_list", "description": "A list of colors with their hex codes", "schema": { "type": "object", "properties": { "colors": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "hex": {"type": "string", "pattern": "^#[0-9a-fA-F]{6}$"} }, "required": ["name", "hex"], "additionalProperties": false } } }, "required": ["colors"], "additionalProperties": false }, "strict": true } } }

Forces the model to output JSON that conforms to the provided JSON Schema. Strict mode (strict: true) is the same constraint documented in Tool Definition: every property listed in required, additionalProperties: false at every nesting level. Outside strict mode the schema is a hint, not a guarantee.

json_schema Fields

Fields accepted inside response_format.json_schema.
Field Type Description
namerequired string Name of the schema. Used for identification in multi-schema scenarios.
descriptionoptional string Description of the schema, helping the model understand what to produce.
schemaoptional object The JSON Schema definition describing the required output structure.
strictoptional boolean When true, enforces strict schema conformance. Requires all properties to be in required and additionalProperties: false.

When to Use Each Mode

Response format selection guide.
Mode Use Case
text General text generation, chat, creative writing.
json_object Simple structured extraction where exact schema is flexible.
json_schema Data pipelines, API integrations, and anywhere you need guaranteed schema conformance.

Log Probabilities

Set logprobs: true to receive log probability information for each output token. This is useful for confidence scoring, classification, and understanding model behavior.

Parameters that enable and shape logprobs output.
Parameter Type Description
logprobs boolean Enable log probability output. Default: false.
top_logprobs integer Number of most likely tokens to return probabilities for (0-20). Requires logprobs: true. The router does not enforce this cap; values above 20 may be rejected or silently clamped by the inference backend.

When enabled, each choice in the response includes a logprobs object containing the log probabilities for the generated tokens. For streaming responses, logprobs are included in each chunk.

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Is the sky blue? Answer yes or no."}], logprobs=True, top_logprobs=3, max_tokens=5 ) # Access log probabilities for token_info in response.choices[0].logprobs.content: print(f"Token: {token_info.token}, Logprob: {token_info.logprob}")
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.chat.completions.create({ model: "deepseek-r1-distill-llama-70b", messages: [{ role: "user", content: "Is the sky blue? Answer yes or no." }], logprobs: true, top_logprobs: 3, max_tokens: 5 }); // Access log probabilities for (const tokenInfo of response.choices[0].logprobs.content) { console.log(`Token: ${tokenInfo.token}, Logprob: ${tokenInfo.logprob}`); }
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Is the sky blue? Answer yes or no."}], "logprobs": true, "top_logprobs": 3, "max_tokens": 5 }' # The response choices[0].logprobs.content will contain: # - token: the generated token string # - logprob: log probability (0.0 = 100% confident, more negative = less confident) # - bytes: UTF-8 byte representation # - top_logprobs: array of the top N alternative tokens with their probabilities

For more detailed examples including Python confidence scoring, streaming logprobs, and classification patterns, see the Log Probabilities Guide.

Completion Storage

When store: true is set in a chat completion request, the full request and response are stored for later retrieval. This is useful for auditing, debugging, and building datasets from production traffic.

Storing a Completion

JSON
{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}], "store": true, "metadata": {"session_id": "abc123", "user_type": "premium"} }

The metadata parameter (up to 16 key-value pairs) is stored alongside the completion for filtering and organization. Each key is capped at 64 characters and each value at 512 characters; requests that exceed these limits are rejected with an HTTP 400 validation error.

Completion API Endpoints

All paths below are relative to https://api.xerotier.ai/proj_ABC123/<endpoint_slug>; the project external id and endpoint slug are required prefixes. See the API Reference for the full URL pattern.

REST verbs and paths for managing stored chat completions.
Method Endpoint Description
GET /{project_id}/{endpoint_slug}/v1/chat/completions List stored completions.
GET /{project_id}/{endpoint_slug}/v1/chat/completions/{id} Retrieve a stored completion by ID.
POST /{project_id}/{endpoint_slug}/v1/chat/completions/{id} Update completion metadata.
DELETE /{project_id}/{endpoint_slug}/v1/chat/completions/{id} Delete a stored completion.
GET /{project_id}/{endpoint_slug}/v1/chat/completions/{id}/messages Retrieve the input messages for a stored completion.

Retention

Stored completions have tier-dependent retention periods. See Service Tiers for hot storage, cold storage, and total retention durations per tier. Completions are automatically purged after the retention period expires.