Tool Calling & Advanced Features

Function calling, structured output, log probabilities, and completion storage.

Tool Calling (Function Calling)

Tool calling allows models to generate structured function calls that your application can execute. This enables AI agents, data extraction, and integration with external APIs and services.

The flow works as follows:

  1. You define available tools (functions) in your request.
  2. The model decides whether to call a tool and generates the call with arguments.
  3. Your application executes the function and returns the result.
  4. The model uses the result to generate a final response.

Tool Definition

Tools are defined using JSON Schema in the tools array:

JSON
{ "tools": [ { "type": "function", "function": { "name": "get_weather", "description": "Get the current weather for a location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City name (e.g., 'Paris')" }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "Temperature unit" } }, "required": ["location"] }, "strict": false } } ] }

Tool Definition Fields

Field Type Description
typerequired string Must be "function".
function.namerequired string Name of the function. Must match [a-zA-Z0-9_-]+.
function.descriptionoptional string Description of what the function does. Helps the model decide when to use it.
function.parametersoptional object JSON Schema describing the function's parameters.
function.strictoptional boolean When true, the model strictly follows the parameter schema. Default: false.

Tool Choice

The tool_choice parameter controls when the model calls tools:

Value Behavior
"auto" Model decides whether to call a tool or generate text. This is the default when tools are provided.
"none" Model will not call any tools. Useful when you want the model to generate a text response even though tools are defined.
"required" Model must call at least one tool. The response will always contain tool_calls.
{"type": "function", "function": {"name": "get_weather"}} Force the model to call a specific function by name.

Parallel Tool Calls

Set parallel_tool_calls: true to allow the model to generate multiple tool calls in a single response. Each tool call has a unique id and its own function with name and arguments. You must return results for all tool calls before sending the next message.

Multi-Turn Conversation Flow

Step 1: Send Message with Tool Definitions

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) tools = [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }] response = client.chat.completions.create( model="llama-3.1-8b", messages=[ {"role": "user", "content": "What is the weather in Paris and London?"} ], tools=tools, tool_choice="auto" )
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const tools = [{ type: "function", function: { name: "get_weather", description: "Get weather for a city", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] } } }]; const response = await client.chat.completions.create({ model: "llama-3.1-8b", messages: [ { role: "user", content: "What is the weather in Paris and London?" } ], tools: tools, tool_choice: "auto" });
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "messages": [ {"role": "user", "content": "What is the weather in Paris and London?"} ], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": { "location": {"type": "string"} }, "required": ["location"] } } }], "tool_choice": "auto" }'

Step 2: Model Responds with Tool Calls

JSON Response
{ "choices": [{ "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_paris", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"Paris\"}" } }, { "id": "call_london", "type": "function", "function": { "name": "get_weather", "arguments": "{\"location\":\"London\"}" } } ] }, "finish_reason": "tool_calls" }] }

Step 3: Send Tool Results

Execute the functions and send results back as tool-role messages, each with the matching tool_call_id:

JSON Request Body
{ "model": "llama-3.1-8b", "messages": [ {"role": "user", "content": "What is the weather in Paris and London?"}, {"role": "assistant", "content": null, "tool_calls": [ {"id": "call_paris", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"Paris\"}"}}, {"id": "call_london", "type": "function", "function": {"name": "get_weather", "arguments": "{\"location\":\"London\"}"}} ]}, {"role": "tool", "tool_call_id": "call_paris", "content": "{\"temp\": 18, \"condition\": \"sunny\"}"}, {"role": "tool", "tool_call_id": "call_london", "content": "{\"temp\": 14, \"condition\": \"cloudy\"}"} ], "tools": [{ "type": "function", "function": { "name": "get_weather", "description": "Get weather for a city", "parameters": { "type": "object", "properties": {"location": {"type": "string"}}, "required": ["location"] } } }] }

Step 4: Model Generates Final Response

JSON Response
{ "choices": [{ "message": { "role": "assistant", "content": "The weather in Paris is 18C and sunny. In London, it is 14C and cloudy." }, "finish_reason": "stop" }] }

Streaming Tool Calls

When streaming is enabled, tool calls are delivered incrementally across multiple chunks. The first chunk includes the tool call id, type, and function name. Subsequent chunks stream the arguments string incrementally. See the Streaming API for details.

Structured Output (response_format)

The response_format parameter controls the output format. Use it when you need the model to return valid JSON or conform to a specific schema.

Text Mode (Default)

JSON
{"response_format": {"type": "text"}}

Unstructured text output. This is the default behavior when response_format is not specified.

JSON Object Mode

JSON
{ "response_format": {"type": "json_object"}, "messages": [ {"role": "system", "content": "Respond with valid JSON."}, {"role": "user", "content": "List three colors with their hex codes."} ] }

Forces the model to output valid JSON. You should include instructions in the system message about the expected JSON structure. The model will always return a parseable JSON object, but the schema is not enforced.

JSON Schema Mode

JSON
{ "response_format": { "type": "json_schema", "json_schema": { "name": "color_list", "schema": { "type": "object", "properties": { "colors": { "type": "array", "items": { "type": "object", "properties": { "name": {"type": "string"}, "hex": {"type": "string", "pattern": "^#[0-9a-fA-F]{6}$"} }, "required": ["name", "hex"] } } }, "required": ["colors"] }, "strict": true } } }

Forces the model to output JSON that conforms to the provided JSON Schema. When strict: true, the model guarantees the output matches the schema exactly. This is the most reliable way to get structured data from models.

When to Use Each Mode

Mode Use Case
text General text generation, chat, creative writing.
json_object Simple structured extraction where exact schema is flexible.
json_schema Data pipelines, API integrations, and anywhere you need guaranteed schema conformance.

Log Probabilities

Set logprobs: true to receive log probability information for each output token. This is useful for confidence scoring, classification, and understanding model behavior.

Parameter Type Description
logprobs boolean Enable log probability output. Default: false.
top_logprobs integer Number of most likely tokens to return probabilities for (0-20). Requires logprobs: true.

When enabled, each choice in the response includes a logprobs object containing the log probabilities for the generated tokens. For streaming responses, logprobs are included in each chunk.

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) response = client.chat.completions.create( model="deepseek-r1-distill-llama-70b", messages=[{"role": "user", "content": "Is the sky blue? Answer yes or no."}], logprobs=True, top_logprobs=3, max_tokens=5 ) # Access log probabilities for token_info in response.choices[0].logprobs.content: print(f"Token: {token_info.token}, Logprob: {token_info.logprob}")
Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key" }); const response = await client.chat.completions.create({ model: "deepseek-r1-distill-llama-70b", messages: [{ role: "user", content: "Is the sky blue? Answer yes or no." }], logprobs: true, top_logprobs: 3, max_tokens: 5 }); // Access log probabilities for (const tokenInfo of response.choices[0].logprobs.content) { console.log(`Token: ${tokenInfo.token}, Logprob: ${tokenInfo.logprob}`); }
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "deepseek-r1-distill-llama-70b", "messages": [{"role": "user", "content": "Is the sky blue? Answer yes or no."}], "logprobs": true, "top_logprobs": 3, "max_tokens": 5 }' # The response choices[0].logprobs.content will contain: # - token: the generated token string # - logprob: log probability (0.0 = 100% confident, more negative = less confident) # - bytes: UTF-8 byte representation # - top_logprobs: array of the top N alternative tokens with their probabilities

For more detailed examples including Python confidence scoring, streaming logprobs, and classification patterns, see the Log Probabilities Guide.

Completion Storage

When store: true is set in a chat completion request, the full request and response are stored for later retrieval. This is useful for auditing, debugging, and building datasets from production traffic.

Storing a Completion

JSON
{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}], "store": true, "metadata": {"session_id": "abc123", "user_type": "premium"} }

The metadata parameter (up to 16 key-value pairs) is stored alongside the completion for filtering and organization.

Completion API Endpoints

Method Endpoint Description
GET /v1/chat/completions List stored completions.
GET /v1/chat/completions/{id} Retrieve a stored completion by ID.
POST /v1/chat/completions/{id} Update completion metadata.
DELETE /v1/chat/completions/{id} Delete a stored completion.
GET /v1/chat/completions/{id}/messages Retrieve the input messages for a stored completion.

Retention

Stored completions have tier-dependent retention periods. See Service Tiers for hot storage, cold storage, and total retention durations per tier. Completions are automatically purged after the retention period expires.