Tool Calling
Give the model a function signature and let it pick when to call it. Same JSON Schema tools your OpenAI code already declares, the same parallel-call semantics, structured-output enforcement via strict mode, and log probabilities you can rank against.
How it works
Tool calling lets models emit structured function calls your application
executes. The wire shape matches OpenAI chat completions, so any SDK or
raw HTTP client that already speaks tools + tool_choice
works against this router without changes.
- You declare tools in the request as JSON Schema.
- The model decides whether to call a tool and emits the call with arguments.
- Your application executes the function and returns the result as a
tool-role message. - The model uses the result to produce the next assistant turn.
Do not combine tools with response_format: json_schema
in the same request. The two structured-output paths are mutually
exclusive: pick tool calling when the model should choose which
schema to emit, or pick response_format when there is one fixed
schema and no branching.
Tool Definition
Tools are defined using JSON Schema in the tools array:
{
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name (e.g., 'Paris')"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"],
"additionalProperties": false
},
"strict": true
}
}
]
}
Tool Definition Fields
| Field | Type | Description |
|---|---|---|
| typerequired | string | Must be "function". |
| function.namerequired | string | Name of the function. Must match [a-zA-Z0-9_-]+. The router rejects requests whose function name violates this pattern with an HTTP 400 validation error before the model is invoked. |
| function.descriptionoptional | string | Description of what the function does. Helps the model decide when to use it. |
| function.parametersoptional | object | JSON Schema describing the function's parameters. |
| function.strictoptional | boolean | When true, the model strictly follows the parameter schema (structured output mode). When strict mode is enabled, all parameters must be listed in required and additionalProperties must be false at every level of the schema. Nested objects that leave additionalProperties unset (or set to true) silently break strict-mode dispatch. Omit or set false for non-strict mode. |
Tool Choice
The tool_choice parameter controls when the model calls tools:
| Value | Behavior |
|---|---|
"auto" |
Model decides whether to call a tool or generate text. When tools are provided and tool_choice is omitted, the model behaves as if "auto" were passed. When no tools are provided, the field is ignored. |
"none" |
Model will not call any tools. Useful when you want the model to generate a text response even though tools are defined. |
"required" |
Model must call at least one tool. The response will always contain tool_calls. |
{"type": "function", "function": {"name": "get_weather"}} |
Chat completions: force the model to call a specific function by name. |
{"type": "function", "name": "get_weather"} |
Responses API: same intent as the row above with the flatter Responses-API tool shape (no nested function wrapper). |
Parallel Tool Calls
When tools are provided, the model may call multiple tools in a single response.
This is controlled by the parallel_tool_calls parameter.
| Value | Behavior |
|---|---|
true (default) |
The model may generate multiple tool calls in a single response, each with a unique id. |
false |
The model generates at most one tool call per response turn. |
When multiple tool calls are returned, each has a unique id and
its own function with name and arguments.
You must return results for all tool calls before sending the next message.
Each result is a tool-role message with the matching
tool_call_id.
Final behavior depends on the model. Smaller open-source models may only emit
one tool call per response even when parallel_tool_calls: true.
Multi-Turn Conversation Flow
Step 1: Send Message with Tool Definitions
Equivalent code in three transports; pick one. Tab selection persists across this page.
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}]
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[
{"role": "user", "content": "What is the weather in Paris and London?"}
],
tools=tools,
tool_choice="auto"
)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const tools = [{
type: "function",
function: {
name: "get_weather",
description: "Get weather for a city",
parameters: {
type: "object",
properties: {
location: { type: "string" }
},
required: ["location"]
}
}
}];
const response = await client.chat.completions.create({
model: "llama-3.1-8b",
messages: [
{ role: "user", content: "What is the weather in Paris and London?" }
],
tools: tools,
tool_choice: "auto"
});
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [
{"role": "user", "content": "What is the weather in Paris and London?"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}],
"tool_choice": "auto"
}'
Step 2: Model Responds with Tool Calls
{
"choices": [{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_paris",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"Paris\"}"
}
},
{
"id": "call_london",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"London\"}"
}
}
]
},
"finish_reason": "tool_calls"
}]
}
Step 3: Send Tool Results
Execute the functions and send results back as tool-role messages, each with the matching tool_call_id. The original assistant turn containing the tool_calls must remain in messages so the model can correlate results with calls; do not strip it before sending the follow-up.
{
"model": "llama-3.1-8b",
"messages": [
{"role": "user", "content": "What is the weather in Paris and London?"},
{"role": "assistant", "content": null, "tool_calls": [
{"id": "call_paris", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}},
{"id": "call_london", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"London\"}"}}
]},
{"role": "tool", "tool_call_id": "call_paris", "content": "{\"temp\": 18, \"condition\": \"sunny\"}"},
{"role": "tool", "tool_call_id": "call_london", "content": "{\"temp\": 14, \"condition\": \"cloudy\"}"}
],
"tools": [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city",
"parameters": {
"type": "object",
"properties": {"location": {"type": "string"}},
"required": ["location"]
}
}
}]
}
Step 4: Model Generates Final Response
{
"choices": [{
"message": {
"role": "assistant",
"content": "The weather in Paris is 18C and sunny. In London, it is 14C and cloudy."
},
"finish_reason": "stop"
}]
}
Streaming Tool Calls
When streaming is enabled, tool calls are delivered incrementally across
multiple chunks. The first chunk includes the tool call id,
type, and function name. Subsequent chunks stream
the arguments string incrementally. See the
Streaming API for details.
Tool-using streams do not emit a final usage chunk unless the request includes
stream_options: {"include_usage": true}. Without it, the stream
terminates at data: [DONE] with no aggregated token counts; set
include_usage to true if you need to bill or report
token usage for tool-using conversations.
Structured Output (response_format)
The response_format parameter controls the output format.
Use it when you need the model to return valid JSON or conform to a specific schema.
Text Mode (Default)
{"response_format": {"type": "text"}}
Unstructured text output. This is the default behavior when response_format is not specified.
JSON Object Mode
{
"response_format": {"type": "json_object"},
"messages": [
{"role": "system", "content": "Respond with valid JSON."},
{"role": "user", "content": "List three colors with their hex codes."}
]
}
Forces the model to output valid JSON. You should include instructions in the system message about the expected JSON structure. The model will always return a parseable JSON object, but the schema is not enforced.
JSON Schema Mode
{
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "color_list",
"description": "A list of colors with their hex codes",
"schema": {
"type": "object",
"properties": {
"colors": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"hex": {"type": "string", "pattern": "^#[0-9a-fA-F]{6}$"}
},
"required": ["name", "hex"],
"additionalProperties": false
}
}
},
"required": ["colors"],
"additionalProperties": false
},
"strict": true
}
}
}
Forces the model to output JSON that conforms to the provided JSON Schema.
Strict mode (strict: true) is the same constraint documented in
Tool Definition: every property listed in
required, additionalProperties: false at every
nesting level. Outside strict mode the schema is a hint, not a guarantee.
json_schema Fields
| Field | Type | Description |
|---|---|---|
| namerequired | string | Name of the schema. Used for identification in multi-schema scenarios. |
| descriptionoptional | string | Description of the schema, helping the model understand what to produce. |
| schemaoptional | object | The JSON Schema definition describing the required output structure. |
| strictoptional | boolean | When true, enforces strict schema conformance. Requires all properties to be in required and additionalProperties: false. |
When to Use Each Mode
| Mode | Use Case |
|---|---|
text |
General text generation, chat, creative writing. |
json_object |
Simple structured extraction where exact schema is flexible. |
json_schema |
Data pipelines, API integrations, and anywhere you need guaranteed schema conformance. |
Log Probabilities
Set logprobs: true to receive log probability information for
each output token. This is useful for confidence scoring, classification,
and understanding model behavior.
| Parameter | Type | Description |
|---|---|---|
| logprobs | boolean | Enable log probability output. Default: false. |
| top_logprobs | integer | Number of most likely tokens to return probabilities for (0-20). Requires logprobs: true. The router does not enforce this cap; values above 20 may be rejected or silently clamped by the inference backend. |
When enabled, each choice in the response includes a logprobs
object containing the log probabilities for the generated tokens. For streaming
responses, logprobs are included in each chunk.
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
response = client.chat.completions.create(
model="deepseek-r1-distill-llama-70b",
messages=[{"role": "user", "content": "Is the sky blue? Answer yes or no."}],
logprobs=True,
top_logprobs=3,
max_tokens=5
)
# Access log probabilities
for token_info in response.choices[0].logprobs.content:
print(f"Token: {token_info.token}, Logprob: {token_info.logprob}")
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key"
});
const response = await client.chat.completions.create({
model: "deepseek-r1-distill-llama-70b",
messages: [{ role: "user", content: "Is the sky blue? Answer yes or no." }],
logprobs: true,
top_logprobs: 3,
max_tokens: 5
});
// Access log probabilities
for (const tokenInfo of response.choices[0].logprobs.content) {
console.log(`Token: ${tokenInfo.token}, Logprob: ${tokenInfo.logprob}`);
}
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-distill-llama-70b",
"messages": [{"role": "user", "content": "Is the sky blue? Answer yes or no."}],
"logprobs": true,
"top_logprobs": 3,
"max_tokens": 5
}'
# The response choices[0].logprobs.content will contain:
# - token: the generated token string
# - logprob: log probability (0.0 = 100% confident, more negative = less confident)
# - bytes: UTF-8 byte representation
# - top_logprobs: array of the top N alternative tokens with their probabilities
For more detailed examples including Python confidence scoring, streaming logprobs, and classification patterns, see the Log Probabilities Guide.
Completion Storage
When store: true is set in a chat completion request, the full
request and response are stored for later retrieval. This is useful for
auditing, debugging, and building datasets from production traffic.
Storing a Completion
{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}],
"store": true,
"metadata": {"session_id": "abc123", "user_type": "premium"}
}
The metadata parameter (up to 16 key-value pairs) is stored
alongside the completion for filtering and organization. Each key is capped
at 64 characters and each value at 512 characters; requests that exceed these
limits are rejected with an HTTP 400 validation error.
Completion API Endpoints
All paths below are relative to
https://api.xerotier.ai/proj_ABC123/<endpoint_slug>; the
project external id and endpoint slug are required prefixes. See the
API Reference for the full URL pattern.
| Method | Endpoint | Description |
|---|---|---|
| GET | /{project_id}/{endpoint_slug}/v1/chat/completions | List stored completions. |
| GET | /{project_id}/{endpoint_slug}/v1/chat/completions/{id} | Retrieve a stored completion by ID. |
| POST | /{project_id}/{endpoint_slug}/v1/chat/completions/{id} | Update completion metadata. |
| DELETE | /{project_id}/{endpoint_slug}/v1/chat/completions/{id} | Delete a stored completion. |
| GET | /{project_id}/{endpoint_slug}/v1/chat/completions/{id}/messages | Retrieve the input messages for a stored completion. |
Retention
Stored completions have tier-dependent retention periods. See Service Tiers for hot storage, cold storage, and total retention durations per tier. Completions are automatically purged after the retention period expires.