Tool Calling & Advanced Features

How it works

Tool calling lets models emit structured function calls your application executes. The wire shape matches OpenAI chat completions, so any SDK or raw HTTP client that already speaks tools + tool_choice works against this router without changes.

You declare tools in the request as JSON Schema.
The model decides whether to call a tool and emits the call with arguments.
Your application executes the function and returns the result as a tool-role message.
The model uses the result to produce the next assistant turn.

Do not combine tools with response_format: json_schema in the same request. The two structured-output paths are mutually exclusive: pick tool calling when the model should choose which schema to emit, or pick response_format when there is one fixed schema and no branching.

Tool Definition

Tools are defined using JSON Schema in the tools array:

JSON

                    {
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "City name (e.g., 'Paris')"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"],
              "description": "Temperature unit"
            }
          },
          "required": ["location"],
          "additionalProperties": false
        },
        "strict": true
      }
    }
  ]
}
                

Tool Definition Fields

Tool definition fields: type, function.name, function.description, function.parameters, function.strict.
Field	Type	Description
typerequired	string	Must be `"function"`.
function.namerequired	string	Name of the function. Must match `[a-zA-Z0-9_-]+`. The router rejects requests whose function name violates this pattern with an HTTP 400 validation error before the model is invoked.
function.descriptionoptional	string	Description of what the function does. Helps the model decide when to use it.
function.parametersoptional	object	JSON Schema describing the function's parameters.
function.strictoptional	boolean	When `true`, the model strictly follows the parameter schema (structured output mode). When strict mode is enabled, all parameters must be listed in `required` and `additionalProperties` must be `false` at every level of the schema. Nested objects that leave `additionalProperties` unset (or set to `true`) silently break strict-mode dispatch. Omit or set `false` for non-strict mode.

Tool Choice

The tool_choice parameter controls when the model calls tools:

Accepted tool_choice values and their behavior on chat completions and the Responses API.
Value	Behavior
`"auto"`	Model decides whether to call a tool or generate text. When tools are provided and `tool_choice` is omitted, the model behaves as if `"auto"` were passed. When no tools are provided, the field is ignored.
`"none"`	Model will not call any tools. Useful when you want the model to generate a text response even though tools are defined.
`"required"`	Model must call at least one tool. The response will always contain tool_calls.
`{"type": "function", "function": {"name": "get_weather"}}`	Chat completions: force the model to call a specific function by name.
`{"type": "function", "name": "get_weather"}`	Responses API: same intent as the row above with the flatter Responses-API tool shape (no nested `function` wrapper).

Parallel Tool Calls

When tools are provided, the model may call multiple tools in a single response. This is controlled by the parallel_tool_calls parameter.

parallel_tool_calls values and their behavior per response turn.
Value	Behavior
`true` (default)	The model may generate multiple tool calls in a single response, each with a unique `id`.
`false`	The model generates at most one tool call per response turn.

When multiple tool calls are returned, each has a unique id and its own function with name and arguments. You must return results for all tool calls before sending the next message. Each result is a tool-role message with the matching tool_call_id.

Final behavior depends on the model. Smaller open-source models may only emit one tool call per response even when parallel_tool_calls: true.

Multi-Turn Conversation Flow

Step 1: Send Message with Tool Definitions

Equivalent code in three transports; pick one. Tab selection persists across this page.

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[
        {"role": "user", "content": "What is the weather in Paris and London?"}
    ],
    tools=tools,
    tool_choice="auto"
)
                

Node.js

                    import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    apiKey: "xero_myproject_your_api_key"
});

const tools = [{
    type: "function",
    function: {
        name: "get_weather",
        description: "Get weather for a city",
        parameters: {
            type: "object",
            properties: {
                location: { type: "string" }
            },
            required: ["location"]
        }
    }
}];

const response = await client.chat.completions.create({
    model: "llama-3.1-8b",
    messages: [
        { role: "user", content: "What is the weather in Paris and London?" }
    ],
    tools: tools,
    tool_choice: "auto"
});
                

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [
      {"role": "user", "content": "What is the weather in Paris and London?"}
    ],
    "tools": [{
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {"type": "string"}
          },
          "required": ["location"]
        }
      }
    }],
    "tool_choice": "auto"
  }'
                

Step 2: Model Responds with Tool Calls

JSON Response

                    {
  "choices": [{
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_paris",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"Paris\"}"
          }
        },
        {
          "id": "call_london",
          "type": "function",
          "function": {
            "name": "get_weather",
            "arguments": "{\"location\":\"London\"}"
          }
        }
      ]
    },
    "finish_reason": "tool_calls"
  }]
}
                

Step 3: Send Tool Results

Execute the functions and send results back as tool-role messages, each with the matching tool_call_id. The original assistant turn containing the tool_calls must remain in messages so the model can correlate results with calls; do not strip it before sending the follow-up.

JSON Request Body

                    {
  "model": "llama-3.1-8b",
  "messages": [
    {"role": "user", "content": "What is the weather in Paris and London?"},
    {"role": "assistant", "content": null, "tool_calls": [
      {"id": "call_paris", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}},
      {"id": "call_london", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"London\"}"}}
    ]},
    {"role": "tool", "tool_call_id": "call_paris", "content": "{\"temp\": 18, \"condition\": \"sunny\"}"},
    {"role": "tool", "tool_call_id": "call_london", "content": "{\"temp\": 14, \"condition\": \"cloudy\"}"}
  ],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "description": "Get weather for a city",
      "parameters": {
        "type": "object",
        "properties": {"location": {"type": "string"}},
        "required": ["location"]
      }
    }
  }]
}
                

Step 4: Model Generates Final Response

JSON Response

                    {
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "The weather in Paris is 18C and sunny. In London, it is 14C and cloudy."
    },
    "finish_reason": "stop"
  }]
}
                

Streaming Tool Calls

When streaming is enabled, tool calls are delivered incrementally across multiple chunks. The first chunk includes the tool call id, type, and function name. Subsequent chunks stream the arguments string incrementally. See the Streaming API for details.

Tool-using streams do not emit a final usage chunk unless the request includes stream_options: {"include_usage": true}. Without it, the stream terminates at data: [DONE] with no aggregated token counts; set include_usage to true if you need to bill or report token usage for tool-using conversations.

Structured Output (response_format)

The response_format parameter controls the output format. Use it when you need the model to return valid JSON or conform to a specific schema.

Text Mode (Default)

JSON

{"response_format": {"type": "text"}}

Unstructured text output. This is the default behavior when response_format is not specified.

JSON Object Mode

JSON

                    {
  "response_format": {"type": "json_object"},
  "messages": [
    {"role": "system", "content": "Respond with valid JSON."},
    {"role": "user", "content": "List three colors with their hex codes."}
  ]
}
                

Forces the model to output valid JSON. You should include instructions in the system message about the expected JSON structure. The model will always return a parseable JSON object, but the schema is not enforced.

JSON Schema Mode

JSON

                    {
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "color_list",
      "description": "A list of colors with their hex codes",
      "schema": {
        "type": "object",
        "properties": {
          "colors": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "name": {"type": "string"},
                "hex": {"type": "string", "pattern": "^#[0-9a-fA-F]{6}$"}
              },
              "required": ["name", "hex"],
              "additionalProperties": false
            }
          }
        },
        "required": ["colors"],
        "additionalProperties": false
      },
      "strict": true
    }
  }
}
                

Forces the model to output JSON that conforms to the provided JSON Schema. Strict mode (strict: true) is the same constraint documented in Tool Definition: every property listed in required, additionalProperties: false at every nesting level. Outside strict mode the schema is a hint, not a guarantee.

json_schema Fields

Fields accepted inside response_format.json_schema.
Field	Type	Description
namerequired	string	Name of the schema. Used for identification in multi-schema scenarios.
descriptionoptional	string	Description of the schema, helping the model understand what to produce.
schemaoptional	object	The JSON Schema definition describing the required output structure.
strictoptional	boolean	When `true`, enforces strict schema conformance. Requires all properties to be in `required` and `additionalProperties: false`.

When to Use Each Mode

Response format selection guide.
Mode	Use Case
`text`	General text generation, chat, creative writing.
`json_object`	Simple structured extraction where exact schema is flexible.
`json_schema`	Data pipelines, API integrations, and anywhere you need guaranteed schema conformance.

Log Probabilities

Set logprobs: true to receive log probability information for each output token. This is useful for confidence scoring, classification, and understanding model behavior.

Parameters that enable and shape logprobs output.
Parameter	Type	Description
logprobs	boolean	Enable log probability output. Default: false.
top_logprobs	integer	Number of most likely tokens to return probabilities for (0-20). Requires `logprobs: true`. The router does not enforce this cap; values above 20 may be rejected or silently clamped by the inference backend.

When enabled, each choice in the response includes a logprobs object containing the log probabilities for the generated tokens. For streaming responses, logprobs are included in each chunk.

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

response = client.chat.completions.create(
    model="deepseek-r1-distill-llama-70b",
    messages=[{"role": "user", "content": "Is the sky blue? Answer yes or no."}],
    logprobs=True,
    top_logprobs=3,
    max_tokens=5
)

# Access log probabilities
for token_info in response.choices[0].logprobs.content:
    print(f"Token: {token_info.token}, Logprob: {token_info.logprob}")
                

Node.js

                    import OpenAI from "openai";

const client = new OpenAI({
    baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    apiKey: "xero_myproject_your_api_key"
});

const response = await client.chat.completions.create({
    model: "deepseek-r1-distill-llama-70b",
    messages: [{ role: "user", content: "Is the sky blue? Answer yes or no." }],
    logprobs: true,
    top_logprobs: 3,
    max_tokens: 5
});

// Access log probabilities
for (const tokenInfo of response.choices[0].logprobs.content) {
    console.log(`Token: ${tokenInfo.token}, Logprob: ${tokenInfo.logprob}`);
}
                

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
  "model": "deepseek-r1-distill-llama-70b",
  "messages": [{"role": "user", "content": "Is the sky blue? Answer yes or no."}],
  "logprobs": true,
  "top_logprobs": 3,
  "max_tokens": 5
}'

# The response choices[0].logprobs.content will contain:
# - token: the generated token string
# - logprob: log probability (0.0 = 100% confident, more negative = less confident)
# - bytes: UTF-8 byte representation
# - top_logprobs: array of the top N alternative tokens with their probabilities
                

For more detailed examples including Python confidence scoring, streaming logprobs, and classification patterns, see the Log Probabilities Guide.

Completion Storage

When store: true is set in a chat completion request, the full request and response are stored for later retrieval. This is useful for auditing, debugging, and building datasets from production traffic.

Storing a Completion

JSON

                    {
  "model": "llama-3.1-8b",
  "messages": [{"role": "user", "content": "Hello!"}],
  "store": true,
  "metadata": {"session_id": "abc123", "user_type": "premium"}
}
                

The metadata parameter (up to 16 key-value pairs) is stored alongside the completion for filtering and organization. Each key is capped at 64 characters and each value at 512 characters; requests that exceed these limits are rejected with an HTTP 400 validation error.

Completion API Endpoints

All paths below are relative to https://api.xerotier.ai/proj_ABC123/<endpoint_slug>; the project external id and endpoint slug are required prefixes. See the API Reference for the full URL pattern.

REST verbs and paths for managing stored chat completions.
Method	Endpoint	Description
GET	/{project_id}/{endpoint_slug}/v1/chat/completions	List stored completions.
GET	/{project_id}/{endpoint_slug}/v1/chat/completions/{id}	Retrieve a stored completion by ID.
POST	/{project_id}/{endpoint_slug}/v1/chat/completions/{id}	Update completion metadata.
DELETE	/{project_id}/{endpoint_slug}/v1/chat/completions/{id}	Delete a stored completion.
GET	/{project_id}/{endpoint_slug}/v1/chat/completions/{id}/messages	Retrieve the input messages for a stored completion.

Retention

Stored completions have tier-dependent retention periods. See Service Tiers for hot storage, cold storage, and total retention durations per tier. Completions are automatically purged after the retention period expires.