Server-Side Tools Reference

Complete reference for all built-in tools executed server-side during chat completions and responses.

Overview

The Xerotier router includes 11 built-in tools that execute server-side during chat completions and responses. When a request opts in to server-side tooling, the router injects tool definitions into the model request and intercepts any tool calls the model makes. The router executes each tool, feeds the results back to the model, and repeats this loop until the model produces a final content response that is streamed to the client.

The tools fall into four categories:

  • Research -- tools for gathering information from external sources.
  • Content -- tools for creating and updating persistent artifacts.
  • Interaction -- tools for requesting input from the user mid-stream.
  • Knowledge -- tools for saving, recalling, and searching stored data.

Tool Summary

ToolCategoryDescription
web_search Research Built-in web search
fetch_url Research Fetch and extract text from web pages and PDFs
code_search Research Search and browse code on GitHub
gitlab_code_search Research Search and browse code on GitLab
calculator Research Evaluate mathematical expressions safely
create_artifact Content Create persistent artifacts (code, docs, HTML, SVG, diagrams)
update_artifact Content Update a previously created artifact with new content
ask_user Interaction Ask the user a clarifying question with optional structured input
save_memory Knowledge Save facts, preferences, or instructions to per-chat memory
recall_memory Knowledge Search saved memories using semantic similarity
file_search Knowledge Search content of uploaded documents

Enabling Server-Side Tools

Research tools (web_search, fetch_url, code_search, gitlab_code_search, calculator) must be explicitly opted into. Other tools are injected automatically based on context:

  • create_artifact and update_artifact -- always injected in chat context.
  • ask_user -- always injected in chat context.
  • save_memory and recall_memory -- always injected in chat context.
  • file_search -- injected when the chat has uploaded documents.

Chat Completions API

Add web_search_options to a standard /v1/chat/completions request. Use the x_tools array to select which research tools to enable:

JSON
{ "model": "my-model", "messages": [ {"role": "user", "content": "Search for the latest Rust async patterns"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "max_iterations": 5, "x_tools": ["web_search", "fetch_url", "code_search", "calculator"] } }

Responses API

Include a web_search_preview tool in the tools array:

JSON
{ "model": "my-model", "input": "Find the latest research on transformer architectures", "stream": true, "tools": [ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url", "code_search"] } ] }

x_tools Selection

BehaviorDetails
Omitted or empty Defaults to ["web_search", "fetch_url"].
web_search included fetch_url is auto-included for URL follow-up.
Invalid names Silently ignored. Falls back to defaults if all are invalid.
code_search Available when GitHub code search is enabled for your endpoint.
gitlab_code_search Available when GitLab code search is enabled for your endpoint.
file_search Available when file search is enabled. Searches uploaded documents in the chat context.
calculator Always available. Evaluates mathematical expressions server-side.

fetch_url

Fetches and extracts readable text content from web pages and PDF documents. Supports single-URL fetch, parallel multi-URL fetch, link discovery for site exploration, and automatic same-domain link following.

Parameters

ParameterTypeDescription
urlrequired string A single URL to fetch content from. Must be http or https.
urlsoptional string[] Multiple URLs to fetch in parallel (max 5). Use this OR url, not both. When both are provided, they are merged and deduplicated.
discover_linksoptional boolean Extract links from fetched pages for site exploration. Default: false.
max_followoptional integer Max discovered same-domain links to auto-follow (0-3, default 0). Only used with discover_links.

Modes

  • Single URL -- provide only url without discovery flags. Returns extracted plain text directly.
  • Multi-URL -- provide urls array for parallel fetching. Returns a JSON object with a pages array, each containing url, content, and error fields.
  • Discovery -- set discover_links: true to extract links from fetched pages. Combine with max_follow to automatically fetch discovered same-domain links.

SSRF Protection

All URL fetches are protected against Server-Side Request Forgery (SSRF). Requests to private and internal IP ranges are blocked: 10.x, 172.16-31.x, 192.168.x, 127.x, ::1, fe80::/10, 169.254.x.x.

Character Budget

Multi-URL fetches share a total budget of 24,000 characters, distributed equally across all successfully fetched pages. Pages exceeding their share are truncated.

Use Cases

  • Reading documentation pages in full
  • Extracting article content for summarization
  • Crawling related pages on the same domain
  • Fetching PDF text content

Example: Single Fetch

JSON
{ "name": "fetch_url", "arguments": { "url": "https://docs.example.com/api/authentication" } }

Example: Multi-Fetch with Discovery

JSON
{ "name": "fetch_url", "arguments": { "urls": [ "https://docs.example.com/guide/getting-started", "https://docs.example.com/guide/configuration" ], "discover_links": true, "max_follow": 2 } }

Example Multi-Fetch Response

JSON
{ "discover_links_enabled": true, "total_pages": 4, "pages": [ { "url": "https://docs.example.com/guide/getting-started", "content": "Getting Started. Install the SDK with npm install...", "error": false, "discovered_links": [ {"url": "https://docs.example.com/guide/authentication", "text": "Authentication"}, {"url": "https://docs.example.com/guide/advanced", "text": "Advanced Usage"} ] }, { "url": "https://docs.example.com/guide/configuration", "content": "Configuration. Set environment variables to customize...", "error": false, "discovered_links": [] }, { "url": "https://docs.example.com/guide/authentication", "content": "Authentication. Use API keys or OAuth2 tokens...", "error": false, "followed_from_discovery": true }, { "url": "https://docs.example.com/guide/advanced", "content": "Advanced Usage. Configure retry policies and timeouts...", "error": false, "followed_from_discovery": true } ] }

SSE Event

SSE
data: {"type":"x_research.reading","name":"fetch_url","arguments":"{\"url\":\"https://docs.example.com/api/authentication\"}"}

calculator

Evaluates mathematical expressions server-side using a safe recursive-descent parser. No code execution -- only arithmetic, functions, and constants are supported.

Parameters

ParameterTypeDescription
expressionrequired string Mathematical expression to evaluate. Maximum 1,000 characters.

Supported Operations

CategorySupported
Operators + - * / ^
Functions sqrt, log, ln, sin, cos, tan, abs, floor, ceil, round, min, max
Constants pi, e

Use Cases

  • Unit conversions and dimensional analysis
  • Financial calculations (compound interest, amortization)
  • Scientific computation (trigonometry, logarithms)
  • Quick arithmetic the model should not hallucinate

Example Tool Call

JSON
{ "name": "calculator", "arguments": { "expression": "10000 * (1 + 0.05)^3" } }

Example Response

JSON
{ "expression": "10000 * (1 + 0.05)^3", "result": 11576.25 }

More Examples

ExpressionResult
sqrt(144) + 2^320.0
sin(pi/2)1.0
log(1000)3.0
max(42, 17) * min(3, 5)126.0
abs(-273.15) + ceil(2.1)276.15

SSE Event

SSE
data: {"type":"x_research.calculating","name":"calculator","arguments":"{\"expression\":\"10000 * (1 + 0.05)^3\"}"}

create_artifact

Creates a persistent artifact such as a code file, document, HTML page, SVG image, or Mermaid diagram. Artifacts are stored in the chat and can be viewed, downloaded, and updated. This tool is always injected in chat context -- no opt-in required.

Parameters

ParameterTypeDescription
identifierrequired string Stable slug for this artifact. 1-128 characters, ASCII alphanumeric, hyphens, underscores, and dots only. Used to reference the artifact for later updates.
titlerequired string Human-readable display title for the artifact.
typerequired string MIME content type for the artifact content.
languageoptional string Programming language for syntax highlighting (e.g. python, swift, javascript). Omit for non-code artifacts.
contentrequired string The full content of the artifact. Maximum 1 MB.

Supported MIME Types

MIME TypeUse For
text/markdownMarkdown documents, reports, notes
text/htmlHTML pages, interactive previews
text/plainPlain text files, configuration
image/svg+xmlSVG vector graphics
text/x-mermaidMermaid diagram definitions
application/jsonJSON data files
text/x-{language}Code files (e.g. text/x-python, text/x-swift)

Identifier Rules

  • 1-128 characters long
  • ASCII alphanumeric characters, hyphens, underscores, and dots only
  • Slug format -- lowercase recommended (e.g. my-component.tsx, data-pipeline)
  • Must be unique within the chat for creation; reuse the same identifier with update_artifact

Versioning

Creating an artifact sets it to version 1. Subsequent updates via update_artifact increment the version number. All versions are retained for history.

Use Cases

  • Generating code files with syntax highlighting
  • Creating documentation and technical reports
  • Building interactive HTML previews
  • Rendering architecture diagrams with Mermaid

Example: Code File

JSON
{ "name": "create_artifact", "arguments": { "identifier": "fibonacci.py", "title": "Fibonacci Generator", "type": "text/x-python", "language": "python", "content": "def fibonacci(n: int) -> list[int]:\n \"\"\"Generate the first n Fibonacci numbers.\"\"\"\n if n <= 0:\n return []\n if n == 1:\n return [0]\n seq = [0, 1]\n for _ in range(2, n):\n seq.append(seq[-1] + seq[-2])\n return seq\n\nif __name__ == \"__main__\":\n print(fibonacci(10))\n" } }

Example: Mermaid Diagram

JSON
{ "name": "create_artifact", "arguments": { "identifier": "auth-flow", "title": "Authentication Flow", "type": "text/x-mermaid", "content": "sequenceDiagram\n participant U as User\n participant A as API Gateway\n participant S as Auth Service\n participant D as Database\n U->>A: POST /login\n A->>S: Validate credentials\n S->>D: Query user\n D-->>S: User record\n S-->>A: JWT token\n A-->>U: 200 OK + token\n" } }

SSE Event

SSE
data: {"type":"x_artifact.created","identifier":"fibonacci.py","title":"Fibonacci Generator","version":1}

update_artifact

Updates a previously created artifact with new content, creating a new version. The update is a full replacement -- the entire content is replaced, not a diff or patch. This tool is always injected in chat context.

Parameters

ParameterTypeDescription
identifierrequired string The identifier of the artifact to update. Must match a previously created artifact in this chat.
contentrequired string The complete updated content. Replaces the previous version entirely.

Use Cases

  • Iterating on code based on user feedback
  • Refining documents and reports
  • Updating diagrams with new components
  • Fixing bugs in previously generated code

Example Tool Call

JSON
{ "name": "update_artifact", "arguments": { "identifier": "fibonacci.py", "content": "from functools import lru_cache\n\n@lru_cache(maxsize=None)\ndef fibonacci(n: int) -> int:\n \"\"\"Return the nth Fibonacci number (memoized).\"\"\"\n if n < 2:\n return n\n return fibonacci(n - 1) + fibonacci(n - 2)\n\ndef fibonacci_sequence(count: int) -> list[int]:\n \"\"\"Generate the first count Fibonacci numbers.\"\"\"\n return [fibonacci(i) for i in range(count)]\n\nif __name__ == \"__main__\":\n print(fibonacci_sequence(10))\n" } }

SSE Event

SSE
data: {"type":"x_artifact.updated","identifier":"fibonacci.py","version":2}

ask_user

Pauses the response and asks the user a clarifying question. The model uses this when ambiguity prevents a useful response. Supports multiple interaction styles: free-text questions, multiple-choice options, confirmation prompts, rating scales, and structured form fields. This tool is always injected in chat context.

Parameters

ParameterTypeDescription
questionrequired string The clarifying question to ask the user.
optionsoptional string[] List of suggested answers for the user to pick from.
allow_free_textoptional boolean Whether the user can type a free-text answer in addition to picking an option. Default: true.
multi_selectoptional boolean Whether the user can select more than one option. Default: false.
styleoptional string Card style. One of: default, confirm, rating, form. Default: default.
fieldsoptional object[] Form fields for form style cards. Each field has label, key, and optional placeholder and required.

Fields Sub-Parameters (for form style)

FieldTypeDescription
labelrequired string Display label for the input field.
keyrequired string Machine-readable key for the field value.
placeholderoptional string Placeholder text shown in the input.
requiredoptional boolean Whether the field must be filled. Default: false.

Card Styles

Default Style

A question with optional multiple-choice options and free-text input.

JSON
{ "name": "ask_user", "arguments": { "question": "Which programming language would you like the example in?", "options": ["Python", "JavaScript", "Go", "Rust"], "allow_free_text": true } }

Confirm Style

A Yes/No confirmation prompt.

JSON
{ "name": "ask_user", "arguments": { "question": "This will overwrite the existing configuration. Proceed?", "style": "confirm" } }

Rating Style

A 1-5 rating scale.

JSON
{ "name": "ask_user", "arguments": { "question": "How satisfied are you with this solution?", "style": "rating" } }

Form Style

Labeled input fields for collecting structured data.

JSON
{ "name": "ask_user", "arguments": { "question": "Please provide the database connection details:", "style": "form", "fields": [ {"label": "Host", "key": "host", "placeholder": "localhost", "required": true}, {"label": "Port", "key": "port", "placeholder": "5432", "required": true}, {"label": "Database Name", "key": "db_name", "placeholder": "myapp_production", "required": true}, {"label": "Username", "key": "username", "placeholder": "admin"}, {"label": "Password", "key": "password", "placeholder": "********"} ] } }

Use Cases

  • Disambiguating vague requirements before generating code
  • Collecting structured configuration input
  • Getting user confirmation before destructive operations
  • Offering choices when multiple valid approaches exist

SSE Events

SSE
data: {"type":"x_ask_user.question","question":"Which programming language?","options":["Python","JavaScript","Go","Rust"],"correlation_id":"ask_abc123"} data: {"type":"x_ask_user.pending_state","correlation_id":"ask_abc123","content":"..."}

save_memory

Saves a fact, preference, or instruction to the per-chat memory store. The content is embedded as a vector and persisted to the database for future semantic recall. Near-duplicate content is automatically deduplicated via embedding similarity comparison. This tool is always injected in chat context and requires a chat ID.

Parameters

ParameterTypeDescription
contentrequired string The fact, preference, or instruction to save to memory.
categoryoptional string Category of the memory entry. One of: preference, fact, instruction. When omitted, the system infers the category from the content.

Deduplication

When saving, the system checks for existing memories with very high semantic similarity. If a near-duplicate is found, the save is skipped and the existing memory is returned instead. This prevents the memory store from accumulating redundant entries.

Use Cases

  • Storing user formatting preferences
  • Remembering project details and deadlines
  • Saving coding style instructions
  • Persisting domain-specific knowledge for the conversation

Example Tool Call

JSON
{ "name": "save_memory", "arguments": { "content": "User prefers TypeScript with strict mode enabled for all code examples", "category": "preference" } }

Example Response

JSON
{ "status": "saved", "memory_id": "mem_x7k9p2", "content": "User prefers TypeScript with strict mode enabled for all code examples", "category": "preference" }

For full details on memory architecture, passive injection, and the management API, see Chat Memory.

recall_memory

Searches the per-chat memory store using semantic similarity. Returns the top matching memories ranked by relevance score. This tool is always injected in chat context and requires a chat ID.

Parameters

ParameterTypeDescription
queryrequired string Natural language query to search memories against.

Return Value

Returns a JSON object with a memories array. Each memory includes id, content, category, relevance (0-1 similarity score), and created_at.

Use Cases

  • Retrieving user preferences before generating output
  • Looking up previously mentioned facts and context
  • Checking for saved instructions before starting a task

Example Tool Call

JSON
{ "name": "recall_memory", "arguments": { "query": "What programming language does the user prefer?" } }

Example Response

JSON
{ "memories": [ { "id": "mem_x7k9p2", "content": "User prefers TypeScript with strict mode enabled for all code examples", "category": "preference", "relevance": 0.94, "created_at": "2026-03-09T14:22:00Z" }, { "id": "mem_r3m8q1", "content": "Always use ESLint with the recommended ruleset", "category": "instruction", "relevance": 0.78, "created_at": "2026-03-09T14:18:00Z" } ] }

For full details on memory architecture, passive injection, and the management API, see Chat Memory.

Agentic Loop

When server-side tools are enabled, Xerotier uses an agentic loop to iteratively call tools and feed results back to the model. This loop is the foundational mechanism that powers all server-side tool execution, including research workflows and Deep Think.

How the Loop Works

  1. Model response -- The model generates a response that may include one or more tool calls.
  2. Tool execution -- Xerotier executes the requested tools server-side (e.g., web_search, fetch_url, code_search).
  3. Result injection -- Tool results are appended to the conversation as tool-role messages.
  4. Follow-up -- The model generates another response based on the tool results. This response may include additional tool calls.
  5. Termination -- Steps 2-4 repeat until the model produces a final text response (no tool calls) or the iteration limit is reached.

Limits

LimitValueDescription
Maximum iterations per request 5 The loop runs at most 5 tool-call rounds before forcing a final response.
Per-tool execution timeout 15 seconds Each individual tool call must complete within 15 seconds or it is cancelled.
Tool call rate limit 45 / minute Maximum of 45 tool calls per minute per endpoint to prevent abuse.

Automatic Behavior

The agentic loop runs automatically whenever tools such as web_search, fetch_url, code_search, or calculator are enabled via web_search_options or the web_search_preview tool type. No additional client-side configuration is required -- the router handles all tool execution, result marshalling, and follow-up model calls transparently.

If the model does not invoke any tools, the loop completes in a single iteration and the response is returned directly. When the maximum iteration limit is reached, the model is prompted to produce a final answer using the information gathered so far.

Note: The Deep Think feature extends this agentic loop with a multi-phase planning and synthesis layer on top. Each Deep Think sub-task runs its own agentic loop independently.

Deep Think

Deep Think performs extended, multi-phase autonomous research on a query. The system decomposes the question into sub-tasks, executes each independently through the agentic loop, and synthesizes all findings into a comprehensive report. During execution the SSE stream emits progress events so clients can display real-time status for each sub-task.

How It Works

  1. Planning -- The model decomposes the user query into 3-7 focused sub-tasks, each with a specific search question and tool set.
  2. Execution -- Each sub-task runs through the agentic loop sequentially. All research tools (web_search, fetch_url, code_search, gitlab_code_search, calculator) are available per sub-task.
  3. Synthesis -- All sub-task results are combined and the model produces a single comprehensive report streamed as normal SSE content chunks.

Enabling Deep Think

Add x_deep_think: true to the web_search_options object. All research tools are automatically enabled for deep think requests.

JSON
{ "model": "my-model", "messages": [ {"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "x_deep_think": true, "x_tools": ["web_search", "fetch_url", "code_search", "calculator"] } }

web_search_options Fields

FieldTypeDefaultDescription
x_deep_thinkoptional boolean false When true, activates the multi-phase deep think pipeline instead of the single agentic loop.

Deep Think Lifecycle

The SSE stream for a deep think request follows this sequence:

Sequence
1. x_deep_think.plan_created -- plan ready, includes title + sub-task count 2. x_deep_think.subtask_started -- sub-task N begins (repeated per sub-task) x_research.searching / x_research.reading -- tool-level events within sub-task 3. x_deep_think.subtask_completed -- sub-task N finished ... (repeat 2-3 for each sub-task) ... 4. x_deep_think.synthesizing -- synthesis phase begins data: {"choices":[...]} -- normal SSE content chunks (final report) 5. x_deep_think.completed -- pipeline finished, includes artifact info 6. data: [DONE] -- stream terminates

Limits

LimitValueDescription
Max sub-tasks 7 Maximum sub-tasks the planner can create per request.
Sub-task iterations 5 Maximum agentic loop iterations per sub-task.
Sub-task token budget 16,000 Token budget per sub-task (input + output).

Deep Think Events (SSE)

Deep think progress events are emitted as inline SSE data lines with a type field prefixed by x_deep_think.. Clients should check for this prefix and render progress UI accordingly.

Event TypeFieldsDescription
x_deep_think.plan_created title, total_subtasks Emitted after the planning phase. Contains the research title and number of sub-tasks.
x_deep_think.subtask_started subtask_id, subtask_query, subtask_index, total_subtasks Emitted when a sub-task begins execution.
x_deep_think.subtask_completed subtask_id, subtask_index Emitted when a sub-task finishes.
x_deep_think.synthesizing message Emitted when synthesis begins. Content chunks follow as normal SSE.
x_deep_think.completed title, artifact_name Emitted when the pipeline finishes. The report is auto-saved as an artifact.
x_deep_think.error message Emitted if a fatal error occurs during deep think.
x_deep_think.discovery_started mode, message Emitted when the discovery phase begins (target-focused mode only).
x_deep_think.discovery_completed pages_fetched, site_map, message Emitted when the discovery phase completes. Includes page count and site structure summary.
x_deep_think.artifact_created artifact_type, artifact_title, message Emitted when a structured artifact (table, matrix, findings list) is created from sub-task results.

Example Event Stream

SSE
data: {"type":"x_deep_think.plan_created","title":"Quantum Computing Advances","total_subtasks":4} data: {"type":"x_deep_think.subtask_started","subtask_id":"1","subtask_query":"Latest quantum error correction breakthroughs","subtask_index":0,"total_subtasks":4} data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"quantum error correction 2026\"}"} data: {"type":"x_deep_think.subtask_completed","subtask_id":"1","subtask_index":0} data: {"type":"x_deep_think.subtask_started","subtask_id":"2","subtask_query":"Major quantum hardware milestones","subtask_index":1,"total_subtasks":4} ... data: {"type":"x_deep_think.synthesizing","message":"Synthesizing final report..."} data: {"choices":[{"index":0,"delta":{"content":"# Quantum Computing Advances\n\n"},"finish_reason":null}]} ... data: {"type":"x_deep_think.completed","title":"Quantum Computing Advances","artifact_name":"deep-think-20260227-143012.md"} data: [DONE]

SSE Events Reference

When streaming, server-side tool execution emits inline SSE events so clients can display progress indicators. All vendor-specific events use the x_ prefix for OpenAI spec compliance.

Research Events

Event TypeToolDescription
x_research.searching web_search Emitted when a web search begins execution.
x_research.reading fetch_url Emitted when a URL fetch begins execution.
x_research.code_searching code_search, gitlab_code_search Emitted when a code search begins execution.
x_research.calculating calculator Emitted when a calculator evaluation begins.
x_research.tool_call Any Emitted for tool invocations that do not match a specific research event type. Includes tool_name, message.
x_research.result All research tools Emitted when a tool call completes with its result.
x_research.complete -- Emitted when the entire research phase finishes. Includes elapsed_ms, input_tokens, output_tokens, iterations, sources.

Artifact Events

Event TypeDescription
x_artifact.created Emitted when a new artifact is created. Includes identifier, title, version.
x_artifact.updated Emitted when an existing artifact is updated. Includes identifier, version.

Ask User Events

Event TypeDescription
x_ask_user.question Emitted when the model asks a clarifying question. Includes question, options, correlation_id.
x_ask_user.pending_state Emitted with the assistant's partial content and state while awaiting user response.

Deep Think Events

Event TypeDescription
x_deep_think.plan_created Emitted after the planning phase. Includes title, total_subtasks.
x_deep_think.subtask_started Emitted when a sub-task begins. Includes subtask_id, subtask_query, subtask_index, total_subtasks.
x_deep_think.subtask_completed Emitted when a sub-task finishes. Includes subtask_id, subtask_index.
x_deep_think.synthesizing Emitted when synthesis begins. Includes message.
x_deep_think.completed Emitted when the pipeline finishes. Includes title, artifact_name.
x_deep_think.error Emitted if a fatal error occurs. Includes message.
x_deep_think.discovery_started Emitted when discovery phase begins (target-focused mode). Includes mode, message.
x_deep_think.discovery_completed Emitted when discovery phase completes. Includes pages_fetched, site_map, message.
x_deep_think.artifact_created Emitted when a structured artifact is created. Includes artifact_type, artifact_title, message.

File Search Events (Responses API)

Event TypeDescription
response.file_search_call.in_progress Emitted when a server-side file search invocation begins. Includes item_id, output_index.
response.file_search_call.searching Emitted while the file search is actively executing.
response.file_search_call.completed Emitted when the file search finishes. Includes the completed output item with results.

Chat Metadata Events

Event TypeDescription
x_chat.metadata Emitted at the end of a response stream with aggregated usage metadata including research token counts. Includes usage (object with research token breakdown).

Example SSE Stream

SSE
data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"rust async patterns 2026\"}"} data: {"type":"x_research.result","name":"web_search","tool_call_id":"call_1"} data: {"type":"x_research.reading","name":"fetch_url","arguments":"{\"url\":\"https://blog.rust-lang.org/...\"}"} data: {"type":"x_research.result","name":"fetch_url","tool_call_id":"call_2"} data: {"type":"x_research.calculating","name":"calculator","arguments":"{\"expression\":\"2^32\"}"} data: {"type":"x_research.result","name":"calculator","tool_call_id":"call_3"} data: {"type":"x_research.complete","elapsed_ms":4200,"input_tokens":12500,"output_tokens":850,"iterations":3,"sources":5} data: {"choices":[{"index":0,"delta":{"content":"Based on my research..."},"finish_reason":null}]} ... data: [DONE]

Rate Limiting & Caching

Per-Project Rate Limiting

Tool calls are rate limited to 45 calls per minute per project. When the limit is exceeded, the tool returns an error response instead of executing.

Rate Limit Error Response

JSON
{ "error": "Research tool rate limit exceeded. Try again in 12 seconds." }

Result Caching

Identical tool calls (same function name and arguments) within a 5-minute window return cached results without re-executing. This prevents redundant network calls when the model re-invokes the same search or fetch. Auto-fetched URLs from web_search are also cached, so subsequent explicit fetch_url calls for the same URL are free.

Tool Limits

LimitValueDescription
Rate limit 45/min Maximum tool calls per minute per project.
Cache TTL 5 minutes How long identical tool results are cached.
Auto-fetch count 2 Number of top URLs auto-fetched from web_search results.
Max URLs per fetch 5 Maximum URLs per fetch_url call.
Max follow links 3 Maximum same-domain links to auto-follow in discovery mode.
Max total pages 8 Hard ceiling on total pages fetched per fetch_url call.
Character budget 24,000 Character budget for multi-URL fetch results, distributed across pages.

Full API Examples

Chat Completions with Research Tools

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ {"role": "user", "content": "What are the key differences between Rust and Go for building web servers?"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "x_tools": ["web_search", "fetch_url", "code_search"] } }'

Chat Completions with All Tools Enabled

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ {"role": "user", "content": "Compare the async runtimes in Rust and explain the tradeoffs"} ], "stream": true, "web_search_options": { "search_context_size": "high", "max_iterations": 8, "x_tools": [ "web_search", "fetch_url", "code_search", "gitlab_code_search", "calculator" ] } }'
Python (OpenAI SDK)
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) stream = client.chat.completions.create( model="my-model", messages=[ {"role": "user", "content": "Compare the async runtimes in Rust and explain the tradeoffs"} ], stream=True, extra_body={ "web_search_options": { "search_context_size": "high", "max_iterations": 8, "x_tools": [ "web_search", "fetch_url", "code_search", "gitlab_code_search", "calculator" ] } }, ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") print()
Node.js (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const stream = await client.chat.completions.create({ model: "my-model", messages: [ { role: "user", content: "Compare the async runtimes in Rust and explain the tradeoffs" } ], stream: true, web_search_options: { search_context_size: "high", max_iterations: 8, x_tools: [ "web_search", "fetch_url", "code_search", "gitlab_code_search", "calculator" ], }, }); for await (const chunk of stream) { const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); } console.log();

Responses API with Web Search and File Search

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "input": "Compare the uploaded design spec with current best practices for REST API design", "stream": true, "tools": [ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"] }, { "type": "file_search", "vector_store_ids": ["vs_abc123"] } ] }'
Python (OpenAI SDK)
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) response = client.responses.create( model="my-model", input="Compare the uploaded design spec with current best practices for REST API design", stream=True, tools=[ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"], }, { "type": "file_search", "vector_store_ids": ["vs_abc123"], }, ], ) for event in response: if hasattr(event, "type"): print(event)
Node.js (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const stream = await client.responses.create({ model: "my-model", input: "Compare the uploaded design spec with current best practices for REST API design", stream: true, tools: [ { type: "web_search_preview", search_context_size: "medium", x_tools: ["web_search", "fetch_url"], }, { type: "file_search", vector_store_ids: ["vs_abc123"], }, ], }); for await (const event of stream) { console.log(event); }

Multi-Tool Conversation Flow

This example shows a typical multi-tool chain where the model searches for information, fetches a page for detail, and uses the calculator to verify a number -- all within a single agentic loop.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ { "role": "user", "content": "What is the current price of gold per ounce, and how much would 3.5 troy ounces cost in EUR at today'\''s exchange rate?" } ], "stream": true, "web_search_options": { "search_context_size": "medium", "max_iterations": 6, "x_tools": ["web_search", "fetch_url", "calculator"] } }'
Python (OpenAI SDK)
import json from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) # The model may chain: web_search -> fetch_url -> calculator # all server-side in a single request stream = client.chat.completions.create( model="my-model", messages=[ { "role": "user", "content": "What is the current price of gold per ounce, " "and how much would 3.5 troy ounces cost in EUR " "at today's exchange rate?" } ], stream=True, extra_body={ "web_search_options": { "search_context_size": "medium", "max_iterations": 6, "x_tools": ["web_search", "fetch_url", "calculator"] } }, ) for chunk in stream: raw = chunk.model_dump() # Check for research progress events if "type" in raw and raw["type"].startswith("x_research."): event_type = raw["type"] if event_type == "x_research.searching": print(f"[Searching] {raw.get('arguments', '')}") elif event_type == "x_research.reading": print(f"[Reading] {raw.get('arguments', '')}") elif event_type == "x_research.calculating": print(f"[Calculating] {raw.get('arguments', '')}") elif event_type == "x_research.complete": print(f"[Research complete] {raw.get('iterations', 0)} iterations, " f"{raw.get('elapsed_ms', 0)}ms") continue # Normal content chunks if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") print()
Node.js (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); // The model may chain: web_search -> fetch_url -> calculator // all server-side in a single request const stream = await client.chat.completions.create({ model: "my-model", messages: [ { role: "user", content: "What is the current price of gold per ounce, " + "and how much would 3.5 troy ounces cost in EUR " + "at today's exchange rate?", } ], stream: true, web_search_options: { search_context_size: "medium", max_iterations: 6, x_tools: ["web_search", "fetch_url", "calculator"], }, }); for await (const chunk of stream) { const raw = chunk; // Check for research progress events if (raw.type && raw.type.startsWith("x_research.")) { if (raw.type === "x_research.searching") { console.log(`[Searching] ${raw.arguments || ""}`); } else if (raw.type === "x_research.reading") { console.log(`[Reading] ${raw.arguments || ""}`); } else if (raw.type === "x_research.calculating") { console.log(`[Calculating] ${raw.arguments || ""}`); } else if (raw.type === "x_research.complete") { console.log(`[Research complete] ${raw.iterations || 0} iterations, ${raw.elapsed_ms || 0}ms`); } continue; } // Normal content chunks const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); } console.log();

Deep Think (curl)

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ {"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "x_deep_think": true, "x_tools": ["web_search", "fetch_url", "code_search", "calculator"] } }'

Deep Think (Python)

Python (OpenAI SDK)
import json from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) stream = client.chat.completions.create( model="my-model", messages=[ {"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"} ], stream=True, extra_body={ "web_search_options": { "search_context_size": "medium", "x_deep_think": True, "x_tools": ["web_search", "fetch_url", "code_search", "calculator"] } }, ) for chunk in stream: raw = chunk.model_dump() # Check for deep think progress events if "type" in raw and raw["type"].startswith("x_deep_think."): print(f"[{raw['type']}]", json.dumps(raw, indent=2)) continue # Normal content chunks if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") print()
Node.js (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const stream = await client.chat.completions.create({ model: "my-model", messages: [ { role: "user", content: "Comprehensive analysis of quantum computing advances in 2026" } ], stream: true, web_search_options: { search_context_size: "medium", x_deep_think: true, x_tools: ["web_search", "fetch_url", "code_search", "calculator"], }, }); for await (const chunk of stream) { const raw = chunk; // Check for deep think progress events if (raw.type && raw.type.startsWith("x_deep_think.")) { console.log(`[${raw.type}]`, JSON.stringify(raw, null, 2)); continue; } // Normal content chunks const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); } console.log();

Deep think requests use streaming and emit progress events inline. The final report is streamed as normal content chunks after all sub-tasks complete. The report is also auto-saved as an artifact in the chat interface.

Artifact Creation Flow

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ { "role": "user", "content": "Create a Python script that implements a binary search tree with insert, search, and in-order traversal methods." } ], "stream": true }'
Python (OpenAI SDK)
import json from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) # In the chat context, create_artifact and update_artifact # are always available. The model will use them when appropriate. stream = client.chat.completions.create( model="my-model", messages=[ { "role": "user", "content": "Create a Python script that implements a " "binary search tree with insert, search, and " "in-order traversal methods." } ], stream=True, ) for chunk in stream: raw = chunk.model_dump() # Check for artifact events if "type" in raw: if raw["type"] == "x_artifact.created": print(f"\n[Artifact created] {raw.get('identifier', '')} " f"- {raw.get('title', '')} (v{raw.get('version', 1)})") elif raw["type"] == "x_artifact.updated": print(f"\n[Artifact updated] {raw.get('identifier', '')} " f"(v{raw.get('version', '')})") continue if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") print()
Node.js (OpenAI SDK)
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); // In the chat context, create_artifact and update_artifact // are always available. The model will use them when appropriate. const stream = await client.chat.completions.create({ model: "my-model", messages: [ { role: "user", content: "Create a Python script that implements a " + "binary search tree with insert, search, and " + "in-order traversal methods.", } ], stream: true, }); for await (const chunk of stream) { const raw = chunk; // Check for artifact events if (raw.type) { if (raw.type === "x_artifact.created") { console.log(`\n[Artifact created] ${raw.identifier || ""} ` + `- ${raw.title || ""} (v${raw.version || 1})`); } else if (raw.type === "x_artifact.updated") { console.log(`\n[Artifact updated] ${raw.identifier || ""} ` + `(v${raw.version || ""})`); } continue; } const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); } console.log();