Server-Side Tools Reference
Complete reference for all built-in tools executed server-side during chat completions and responses.
Overview
The Xerotier router includes 11 built-in tools that execute server-side during chat completions and responses. When a request opts in to server-side tooling, the router injects tool definitions into the model request and intercepts any tool calls the model makes. The router executes each tool, feeds the results back to the model, and repeats this loop until the model produces a final content response that is streamed to the client.
The tools fall into four categories:
- Research -- tools for gathering information from external sources.
- Content -- tools for creating and updating persistent artifacts.
- Interaction -- tools for requesting input from the user mid-stream.
- Knowledge -- tools for saving, recalling, and searching stored data.
Tool Summary
| Tool | Category | Description |
|---|---|---|
web_search |
Research | Built-in web search |
fetch_url |
Research | Fetch and extract text from web pages and PDFs |
code_search |
Research | Search and browse code on GitHub |
gitlab_code_search |
Research | Search and browse code on GitLab |
calculator |
Research | Evaluate mathematical expressions safely |
create_artifact |
Content | Create persistent artifacts (code, docs, HTML, SVG, diagrams) |
update_artifact |
Content | Update a previously created artifact with new content |
ask_user |
Interaction | Ask the user a clarifying question with optional structured input |
save_memory |
Knowledge | Save facts, preferences, or instructions to per-chat memory |
recall_memory |
Knowledge | Search saved memories using semantic similarity |
file_search |
Knowledge | Search content of uploaded documents |
Enabling Server-Side Tools
Research tools (web_search, fetch_url,
code_search, gitlab_code_search,
calculator) must be explicitly opted into. Other tools
are injected automatically based on context:
create_artifactandupdate_artifact-- always injected in chat context.ask_user-- always injected in chat context.save_memoryandrecall_memory-- always injected in chat context.file_search-- injected when the chat has uploaded documents.
Chat Completions API
Add web_search_options to a standard
/v1/chat/completions request. Use the x_tools
array to select which research tools to enable:
{
"model": "my-model",
"messages": [
{"role": "user", "content": "Search for the latest Rust async patterns"}
],
"stream": true,
"web_search_options": {
"search_context_size": "medium",
"max_iterations": 5,
"x_tools": ["web_search", "fetch_url", "code_search", "calculator"]
}
}
Responses API
Include a web_search_preview tool in the
tools array:
{
"model": "my-model",
"input": "Find the latest research on transformer architectures",
"stream": true,
"tools": [
{
"type": "web_search_preview",
"search_context_size": "medium",
"x_tools": ["web_search", "fetch_url", "code_search"]
}
]
}
x_tools Selection
| Behavior | Details |
|---|---|
| Omitted or empty | Defaults to ["web_search", "fetch_url"]. |
web_search included |
fetch_url is auto-included for URL follow-up. |
| Invalid names | Silently ignored. Falls back to defaults if all are invalid. |
code_search |
Available when GitHub code search is enabled for your endpoint. |
gitlab_code_search |
Available when GitLab code search is enabled for your endpoint. |
file_search |
Available when file search is enabled. Searches uploaded documents in the chat context. |
calculator |
Always available. Evaluates mathematical expressions server-side. |
web_search
Searches the web using built-in web search. Returns
structured results with titles, URLs, and snippets. The top 2 URLs
are automatically fetched and appended as fetched_pages
for immediate context enrichment.
Parameters
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | The search query to look up on the web. |
Return Value
JSON object with the following fields:
answer-- direct answer from the search engine, if available.abstract-- summary text, if available.results-- array of result objects, each withtitle,url, andsnippet.fetched_pages-- array of auto-fetched page content (top 2 URLs), each withurlandcontent.
Auto-Fetch Enrichment
After search results are returned, the router automatically fetches the top 2 URLs in parallel and appends the extracted text to the result. This means the model receives both search snippets and full page content in a single tool call, reducing round trips. Each auto-fetched page shares an equal portion of a 12,000-character budget.
Use Cases
- Current events and breaking news
- Fact checking and verification
- Product research and comparisons
- Documentation and API reference lookup
Example Tool Call
{
"name": "web_search",
"arguments": {
"query": "rust async trait stabilization 2026"
}
}
Example Response
{
"answer": "",
"abstract": "",
"results": [
{
"title": "Async Trait Methods Stabilized in Rust 1.85",
"url": "https://blog.rust-lang.org/2026/02/20/async-traits.html",
"snippet": "Rust 1.85 stabilizes async fn in traits, enabling..."
},
{
"title": "Understanding async traits in Rust",
"url": "https://docs.rs/async-trait/latest/guide",
"snippet": "A comprehensive guide to using async trait methods..."
}
],
"fetched_pages": [
{
"url": "https://blog.rust-lang.org/2026/02/20/async-traits.html",
"content": "Announcing Rust 1.85. We are happy to announce that async fn in traits..."
},
{
"url": "https://docs.rs/async-trait/latest/guide",
"content": "Async Trait Guide. This guide covers the fundamentals of async trait..."
}
]
}
SSE Event
data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"rust async trait stabilization 2026\"}"}
fetch_url
Fetches and extracts readable text content from web pages and PDF documents. Supports single-URL fetch, parallel multi-URL fetch, link discovery for site exploration, and automatic same-domain link following.
Parameters
| Parameter | Type | Description |
|---|---|---|
| urlrequired | string | A single URL to fetch content from. Must be http or https. |
| urlsoptional | string[] | Multiple URLs to fetch in parallel (max 5). Use this OR url, not both. When both are provided, they are merged and deduplicated. |
| discover_linksoptional | boolean | Extract links from fetched pages for site exploration. Default: false. |
| max_followoptional | integer | Max discovered same-domain links to auto-follow (0-3, default 0). Only used with discover_links. |
Modes
-
Single URL -- provide only
urlwithout discovery flags. Returns extracted plain text directly. -
Multi-URL -- provide
urlsarray for parallel fetching. Returns a JSON object with apagesarray, each containingurl,content, anderrorfields. -
Discovery -- set
discover_links: trueto extract links from fetched pages. Combine withmax_followto automatically fetch discovered same-domain links.
SSRF Protection
All URL fetches are protected against Server-Side Request Forgery (SSRF).
Requests to private and internal IP ranges are blocked:
10.x, 172.16-31.x, 192.168.x,
127.x, ::1, fe80::/10,
169.254.x.x.
Character Budget
Multi-URL fetches share a total budget of 24,000 characters, distributed equally across all successfully fetched pages. Pages exceeding their share are truncated.
Use Cases
- Reading documentation pages in full
- Extracting article content for summarization
- Crawling related pages on the same domain
- Fetching PDF text content
Example: Single Fetch
{
"name": "fetch_url",
"arguments": {
"url": "https://docs.example.com/api/authentication"
}
}
Example: Multi-Fetch with Discovery
{
"name": "fetch_url",
"arguments": {
"urls": [
"https://docs.example.com/guide/getting-started",
"https://docs.example.com/guide/configuration"
],
"discover_links": true,
"max_follow": 2
}
}
Example Multi-Fetch Response
{
"discover_links_enabled": true,
"total_pages": 4,
"pages": [
{
"url": "https://docs.example.com/guide/getting-started",
"content": "Getting Started. Install the SDK with npm install...",
"error": false,
"discovered_links": [
{"url": "https://docs.example.com/guide/authentication", "text": "Authentication"},
{"url": "https://docs.example.com/guide/advanced", "text": "Advanced Usage"}
]
},
{
"url": "https://docs.example.com/guide/configuration",
"content": "Configuration. Set environment variables to customize...",
"error": false,
"discovered_links": []
},
{
"url": "https://docs.example.com/guide/authentication",
"content": "Authentication. Use API keys or OAuth2 tokens...",
"error": false,
"followed_from_discovery": true
},
{
"url": "https://docs.example.com/guide/advanced",
"content": "Advanced Usage. Configure retry policies and timeouts...",
"error": false,
"followed_from_discovery": true
}
]
}
SSE Event
data: {"type":"x_research.reading","name":"fetch_url","arguments":"{\"url\":\"https://docs.example.com/api/authentication\"}"}
code_search
Searches public code on GitHub via the Code Search API. Operates in three distinct modes: Search, Review, and Browse. Code search is available when enabled for your endpoint.
Parameters
| Parameter | Type | Description |
|---|---|---|
| queryoptional | string | Code search query. Supports GitHub qualifiers: repo:owner/name, org:name, path:dir/, filename:name, extension:ext. Use this OR repository, not both. |
| repositoryoptional | string | GitHub repository to review or browse. Accepts owner/repo or a full URL (e.g. https://github.com/owner/repo). |
| pathoptional | string | Fetch a specific file or directory listing from the repository. Requires repository. |
| pathsoptional | string[] | Fetch multiple files or directories from the repository (max 10). Requires repository. |
| languageoptional | string | Filter by programming language (e.g. swift, python, javascript). Only used with query. |
| max_resultsoptional | integer | Maximum results (1-30, default 5). Full file content is fetched for the top 3. Only used with query. |
Modes
-
Search mode -- provide
queryto search code across GitHub. Returns code snippets with repository, path, URL, and full file content for the top 3 results. -
Review mode -- provide
repositoryalone to get a full overview: metadata, languages, README content, directory tree, and recent commits. -
Browse mode -- provide
repositorywithpathorpathsto fetch specific files or directory listings. Use review first to discover the structure, then browse to read specific files.
Use Cases
- Finding implementations of specific algorithms or patterns
- Reviewing open-source project architecture
- Comparing code patterns across repositories
- Discovering how libraries are used in practice
Example: Search Mode
{
"name": "code_search",
"arguments": {
"query": "async stream handler repo:apple/swift-nio",
"language": "swift",
"max_results": 5
}
}
Example: Review Mode
{
"name": "code_search",
"arguments": {
"repository": "vapor/vapor"
}
}
Example: Browse Mode
{
"name": "code_search",
"arguments": {
"repository": "apple/swift-nio",
"paths": ["Sources/NIOCore/Channel.swift", "Sources/NIOCore/EventLoop.swift"]
}
}
SSE Event
data: {"type":"x_research.code_searching","name":"code_search","arguments":"{\"query\":\"async stream handler\",\"language\":\"swift\"}"}
gitlab_code_search
Searches code on a GitLab instance. Identical in structure to
code_search but targets GitLab instead of GitHub. Code
search is available when enabled for your endpoint.
Parameters
| Parameter | Type | Description |
|---|---|---|
| queryoptional | string | Code search query. Use specific function names, class names, error messages, or patterns. Use this OR repository, not both. |
| repositoryoptional | string | GitLab project to review or browse. Accepts namespace/project or a full URL (e.g. https://gitlab.com/group/project). |
| pathoptional | string | Fetch a specific file or directory listing from the project. Requires repository. |
| pathsoptional | string[] | Fetch multiple files or directories from the project (max 10). Requires repository. |
| languageoptional | string | Filter by file extension (e.g. swift, python, javascript). Only used with query. |
| max_resultsoptional | integer | Maximum results (1-20, default 5). Full file content is fetched for the top 3. Only used with query. |
Modes
The three modes are identical to code_search:
- Search mode -- provide
queryto find code across GitLab with snippets and full file content. - Review mode -- provide
repositoryalone for a full overview (metadata, languages, README, directory tree, recent commits). - Browse mode -- provide
repositorywithpathorpathsto fetch specific files or directory listings.
Use Cases
- Searching internal codebases hosted on GitLab
- Reviewing GitLab project structure and documentation
- Browsing specific files in private repositories
Example: Search Mode
{
"name": "gitlab_code_search",
"arguments": {
"query": "async websocket handler middleware",
"language": "python",
"max_results": 5
}
}
Example: Review Mode
{
"name": "gitlab_code_search",
"arguments": {
"repository": "https://gitlab.com/myorg/backend-service"
}
}
SSE Event
data: {"type":"x_research.code_searching","name":"gitlab_code_search","arguments":"{\"query\":\"async websocket handler\"}"}
calculator
Evaluates mathematical expressions server-side using a safe recursive-descent parser. No code execution -- only arithmetic, functions, and constants are supported.
Parameters
| Parameter | Type | Description |
|---|---|---|
| expressionrequired | string | Mathematical expression to evaluate. Maximum 1,000 characters. |
Supported Operations
| Category | Supported |
|---|---|
| Operators | + - * / ^ |
| Functions | sqrt, log, ln, sin, cos, tan, abs, floor, ceil, round, min, max |
| Constants | pi, e |
Use Cases
- Unit conversions and dimensional analysis
- Financial calculations (compound interest, amortization)
- Scientific computation (trigonometry, logarithms)
- Quick arithmetic the model should not hallucinate
Example Tool Call
{
"name": "calculator",
"arguments": {
"expression": "10000 * (1 + 0.05)^3"
}
}
Example Response
{
"expression": "10000 * (1 + 0.05)^3",
"result": 11576.25
}
More Examples
| Expression | Result |
|---|---|
sqrt(144) + 2^3 | 20.0 |
sin(pi/2) | 1.0 |
log(1000) | 3.0 |
max(42, 17) * min(3, 5) | 126.0 |
abs(-273.15) + ceil(2.1) | 276.15 |
SSE Event
data: {"type":"x_research.calculating","name":"calculator","arguments":"{\"expression\":\"10000 * (1 + 0.05)^3\"}"}
create_artifact
Creates a persistent artifact such as a code file, document, HTML page, SVG image, or Mermaid diagram. Artifacts are stored in the chat and can be viewed, downloaded, and updated. This tool is always injected in chat context -- no opt-in required.
Parameters
| Parameter | Type | Description |
|---|---|---|
| identifierrequired | string | Stable slug for this artifact. 1-128 characters, ASCII alphanumeric, hyphens, underscores, and dots only. Used to reference the artifact for later updates. |
| titlerequired | string | Human-readable display title for the artifact. |
| typerequired | string | MIME content type for the artifact content. |
| languageoptional | string | Programming language for syntax highlighting (e.g. python, swift, javascript). Omit for non-code artifacts. |
| contentrequired | string | The full content of the artifact. Maximum 1 MB. |
Supported MIME Types
| MIME Type | Use For |
|---|---|
text/markdown | Markdown documents, reports, notes |
text/html | HTML pages, interactive previews |
text/plain | Plain text files, configuration |
image/svg+xml | SVG vector graphics |
text/x-mermaid | Mermaid diagram definitions |
application/json | JSON data files |
text/x-{language} | Code files (e.g. text/x-python, text/x-swift) |
Identifier Rules
- 1-128 characters long
- ASCII alphanumeric characters, hyphens, underscores, and dots only
- Slug format -- lowercase recommended (e.g.
my-component.tsx,data-pipeline) - Must be unique within the chat for creation; reuse the same identifier with
update_artifact
Versioning
Creating an artifact sets it to version 1. Subsequent updates via
update_artifact increment the version number. All versions
are retained for history.
Use Cases
- Generating code files with syntax highlighting
- Creating documentation and technical reports
- Building interactive HTML previews
- Rendering architecture diagrams with Mermaid
Example: Code File
{
"name": "create_artifact",
"arguments": {
"identifier": "fibonacci.py",
"title": "Fibonacci Generator",
"type": "text/x-python",
"language": "python",
"content": "def fibonacci(n: int) -> list[int]:\n \"\"\"Generate the first n Fibonacci numbers.\"\"\"\n if n <= 0:\n return []\n if n == 1:\n return [0]\n seq = [0, 1]\n for _ in range(2, n):\n seq.append(seq[-1] + seq[-2])\n return seq\n\nif __name__ == \"__main__\":\n print(fibonacci(10))\n"
}
}
Example: Mermaid Diagram
{
"name": "create_artifact",
"arguments": {
"identifier": "auth-flow",
"title": "Authentication Flow",
"type": "text/x-mermaid",
"content": "sequenceDiagram\n participant U as User\n participant A as API Gateway\n participant S as Auth Service\n participant D as Database\n U->>A: POST /login\n A->>S: Validate credentials\n S->>D: Query user\n D-->>S: User record\n S-->>A: JWT token\n A-->>U: 200 OK + token\n"
}
}
SSE Event
data: {"type":"x_artifact.created","identifier":"fibonacci.py","title":"Fibonacci Generator","version":1}
update_artifact
Updates a previously created artifact with new content, creating a new version. The update is a full replacement -- the entire content is replaced, not a diff or patch. This tool is always injected in chat context.
Parameters
| Parameter | Type | Description |
|---|---|---|
| identifierrequired | string | The identifier of the artifact to update. Must match a previously created artifact in this chat. |
| contentrequired | string | The complete updated content. Replaces the previous version entirely. |
Use Cases
- Iterating on code based on user feedback
- Refining documents and reports
- Updating diagrams with new components
- Fixing bugs in previously generated code
Example Tool Call
{
"name": "update_artifact",
"arguments": {
"identifier": "fibonacci.py",
"content": "from functools import lru_cache\n\n@lru_cache(maxsize=None)\ndef fibonacci(n: int) -> int:\n \"\"\"Return the nth Fibonacci number (memoized).\"\"\"\n if n < 2:\n return n\n return fibonacci(n - 1) + fibonacci(n - 2)\n\ndef fibonacci_sequence(count: int) -> list[int]:\n \"\"\"Generate the first count Fibonacci numbers.\"\"\"\n return [fibonacci(i) for i in range(count)]\n\nif __name__ == \"__main__\":\n print(fibonacci_sequence(10))\n"
}
}
SSE Event
data: {"type":"x_artifact.updated","identifier":"fibonacci.py","version":2}
ask_user
Pauses the response and asks the user a clarifying question. The model uses this when ambiguity prevents a useful response. Supports multiple interaction styles: free-text questions, multiple-choice options, confirmation prompts, rating scales, and structured form fields. This tool is always injected in chat context.
Parameters
| Parameter | Type | Description |
|---|---|---|
| questionrequired | string | The clarifying question to ask the user. |
| optionsoptional | string[] | List of suggested answers for the user to pick from. |
| allow_free_textoptional | boolean | Whether the user can type a free-text answer in addition to picking an option. Default: true. |
| multi_selectoptional | boolean | Whether the user can select more than one option. Default: false. |
| styleoptional | string | Card style. One of: default, confirm, rating, form. Default: default. |
| fieldsoptional | object[] | Form fields for form style cards. Each field has label, key, and optional placeholder and required. |
Fields Sub-Parameters (for form style)
| Field | Type | Description |
|---|---|---|
| labelrequired | string | Display label for the input field. |
| keyrequired | string | Machine-readable key for the field value. |
| placeholderoptional | string | Placeholder text shown in the input. |
| requiredoptional | boolean | Whether the field must be filled. Default: false. |
Card Styles
Default Style
A question with optional multiple-choice options and free-text input.
{
"name": "ask_user",
"arguments": {
"question": "Which programming language would you like the example in?",
"options": ["Python", "JavaScript", "Go", "Rust"],
"allow_free_text": true
}
}
Confirm Style
A Yes/No confirmation prompt.
{
"name": "ask_user",
"arguments": {
"question": "This will overwrite the existing configuration. Proceed?",
"style": "confirm"
}
}
Rating Style
A 1-5 rating scale.
{
"name": "ask_user",
"arguments": {
"question": "How satisfied are you with this solution?",
"style": "rating"
}
}
Form Style
Labeled input fields for collecting structured data.
{
"name": "ask_user",
"arguments": {
"question": "Please provide the database connection details:",
"style": "form",
"fields": [
{"label": "Host", "key": "host", "placeholder": "localhost", "required": true},
{"label": "Port", "key": "port", "placeholder": "5432", "required": true},
{"label": "Database Name", "key": "db_name", "placeholder": "myapp_production", "required": true},
{"label": "Username", "key": "username", "placeholder": "admin"},
{"label": "Password", "key": "password", "placeholder": "********"}
]
}
}
Use Cases
- Disambiguating vague requirements before generating code
- Collecting structured configuration input
- Getting user confirmation before destructive operations
- Offering choices when multiple valid approaches exist
SSE Events
data: {"type":"x_ask_user.question","question":"Which programming language?","options":["Python","JavaScript","Go","Rust"],"correlation_id":"ask_abc123"}
data: {"type":"x_ask_user.pending_state","correlation_id":"ask_abc123","content":"..."}
save_memory
Saves a fact, preference, or instruction to the per-chat memory store. The content is embedded as a vector and persisted to the database for future semantic recall. Near-duplicate content is automatically deduplicated via embedding similarity comparison. This tool is always injected in chat context and requires a chat ID.
Parameters
| Parameter | Type | Description |
|---|---|---|
| contentrequired | string | The fact, preference, or instruction to save to memory. |
| categoryoptional | string | Category of the memory entry. One of: preference, fact, instruction. When omitted, the system infers the category from the content. |
Deduplication
When saving, the system checks for existing memories with very high semantic similarity. If a near-duplicate is found, the save is skipped and the existing memory is returned instead. This prevents the memory store from accumulating redundant entries.
Use Cases
- Storing user formatting preferences
- Remembering project details and deadlines
- Saving coding style instructions
- Persisting domain-specific knowledge for the conversation
Example Tool Call
{
"name": "save_memory",
"arguments": {
"content": "User prefers TypeScript with strict mode enabled for all code examples",
"category": "preference"
}
}
Example Response
{
"status": "saved",
"memory_id": "mem_x7k9p2",
"content": "User prefers TypeScript with strict mode enabled for all code examples",
"category": "preference"
}
For full details on memory architecture, passive injection, and the management API, see Chat Memory.
recall_memory
Searches the per-chat memory store using semantic similarity. Returns the top matching memories ranked by relevance score. This tool is always injected in chat context and requires a chat ID.
Parameters
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | Natural language query to search memories against. |
Return Value
Returns a JSON object with a memories array. Each memory
includes id, content, category,
relevance (0-1 similarity score), and
created_at.
Use Cases
- Retrieving user preferences before generating output
- Looking up previously mentioned facts and context
- Checking for saved instructions before starting a task
Example Tool Call
{
"name": "recall_memory",
"arguments": {
"query": "What programming language does the user prefer?"
}
}
Example Response
{
"memories": [
{
"id": "mem_x7k9p2",
"content": "User prefers TypeScript with strict mode enabled for all code examples",
"category": "preference",
"relevance": 0.94,
"created_at": "2026-03-09T14:22:00Z"
},
{
"id": "mem_r3m8q1",
"content": "Always use ESLint with the recommended ruleset",
"category": "instruction",
"relevance": 0.78,
"created_at": "2026-03-09T14:18:00Z"
}
]
}
For full details on memory architecture, passive injection, and the management API, see Chat Memory.
file_search
Searches the content of documents uploaded to the current chat session. Returns relevant text passages from PDFs, code files, and other uploaded documents. This tool is injected automatically when the chat has uploaded documents.
Parameters
| Parameter | Type | Description |
|---|---|---|
| queryrequired | string | The search query to find relevant passages in uploaded documents. |
| max_resultsoptional | integer | Maximum number of results to return. Default: 5. |
Return Value
Returns a JSON object with a results array. Each result
includes:
filename-- the name of the uploaded file.chunk_text-- the matching text passage.relevance_score-- similarity score (0-1).artifact_id-- identifier of the uploaded document.
Use Cases
- Answering questions from uploaded PDFs and reports
- Finding code patterns in uploaded source files
- Cross-referencing information across multiple uploaded documents
Example Tool Call
{
"name": "file_search",
"arguments": {
"query": "authentication flow for API keys",
"max_results": 3
}
}
Example Response
{
"results": [
{
"filename": "api-design-spec.pdf",
"chunk_text": "API Key Authentication: All requests must include a valid API key in the Authorization header using the Bearer scheme. Keys are project-scoped and can be rotated via the dashboard...",
"relevance_score": 0.91,
"artifact_id": "doc_8f3k2m"
},
{
"filename": "security-review.md",
"chunk_text": "The authentication middleware validates API keys against the project_api_keys table. Rate limiting is applied per-key with configurable thresholds...",
"relevance_score": 0.83,
"artifact_id": "doc_p2n7x4"
}
]
}
For full details on document upload, processing pipeline, and management, see Document Workspace.
Agentic Loop
When server-side tools are enabled, Xerotier uses an agentic loop to iteratively call tools and feed results back to the model. This loop is the foundational mechanism that powers all server-side tool execution, including research workflows and Deep Think.
How the Loop Works
- Model response -- The model generates a response that may include one or more tool calls.
-
Tool execution -- Xerotier executes the requested
tools server-side (e.g.,
web_search,fetch_url,code_search). - Result injection -- Tool results are appended to the conversation as tool-role messages.
- Follow-up -- The model generates another response based on the tool results. This response may include additional tool calls.
- Termination -- Steps 2-4 repeat until the model produces a final text response (no tool calls) or the iteration limit is reached.
Limits
| Limit | Value | Description |
|---|---|---|
| Maximum iterations per request | 5 |
The loop runs at most 5 tool-call rounds before forcing a final response. |
| Per-tool execution timeout | 15 seconds |
Each individual tool call must complete within 15 seconds or it is cancelled. |
| Tool call rate limit | 45 / minute |
Maximum of 45 tool calls per minute per endpoint to prevent abuse. |
Automatic Behavior
The agentic loop runs automatically whenever tools such as
web_search, fetch_url,
code_search, or calculator are enabled
via web_search_options or the
web_search_preview tool type. No additional client-side
configuration is required -- the router handles all tool execution,
result marshalling, and follow-up model calls transparently.
If the model does not invoke any tools, the loop completes in a single iteration and the response is returned directly. When the maximum iteration limit is reached, the model is prompted to produce a final answer using the information gathered so far.
Note: The Deep Think feature extends this agentic loop with a multi-phase planning and synthesis layer on top. Each Deep Think sub-task runs its own agentic loop independently.
Deep Think
Deep Think performs extended, multi-phase autonomous research on a query. The system decomposes the question into sub-tasks, executes each independently through the agentic loop, and synthesizes all findings into a comprehensive report. During execution the SSE stream emits progress events so clients can display real-time status for each sub-task.
How It Works
- Planning -- The model decomposes the user query into 3-7 focused sub-tasks, each with a specific search question and tool set.
-
Execution -- Each sub-task runs through the agentic
loop sequentially. All research tools (
web_search,fetch_url,code_search,gitlab_code_search,calculator) are available per sub-task. - Synthesis -- All sub-task results are combined and the model produces a single comprehensive report streamed as normal SSE content chunks.
Enabling Deep Think
Add x_deep_think: true to the
web_search_options object. All research tools are
automatically enabled for deep think requests.
{
"model": "my-model",
"messages": [
{"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"}
],
"stream": true,
"web_search_options": {
"search_context_size": "medium",
"x_deep_think": true,
"x_tools": ["web_search", "fetch_url", "code_search", "calculator"]
}
}
web_search_options Fields
| Field | Type | Default | Description |
|---|---|---|---|
| x_deep_thinkoptional | boolean | false |
When true, activates the multi-phase deep think pipeline instead of the single agentic loop. |
Deep Think Lifecycle
The SSE stream for a deep think request follows this sequence:
1. x_deep_think.plan_created -- plan ready, includes title + sub-task count
2. x_deep_think.subtask_started -- sub-task N begins (repeated per sub-task)
x_research.searching / x_research.reading -- tool-level events within sub-task
3. x_deep_think.subtask_completed -- sub-task N finished
... (repeat 2-3 for each sub-task) ...
4. x_deep_think.synthesizing -- synthesis phase begins
data: {"choices":[...]} -- normal SSE content chunks (final report)
5. x_deep_think.completed -- pipeline finished, includes artifact info
6. data: [DONE] -- stream terminates
Limits
| Limit | Value | Description |
|---|---|---|
| Max sub-tasks | 7 | Maximum sub-tasks the planner can create per request. |
| Sub-task iterations | 5 | Maximum agentic loop iterations per sub-task. |
| Sub-task token budget | 16,000 | Token budget per sub-task (input + output). |
Deep Think Events (SSE)
Deep think progress events are emitted as inline SSE data lines with a
type field prefixed by x_deep_think.. Clients
should check for this prefix and render progress UI accordingly.
| Event Type | Fields | Description |
|---|---|---|
x_deep_think.plan_created |
title, total_subtasks |
Emitted after the planning phase. Contains the research title and number of sub-tasks. |
x_deep_think.subtask_started |
subtask_id, subtask_query, subtask_index, total_subtasks |
Emitted when a sub-task begins execution. |
x_deep_think.subtask_completed |
subtask_id, subtask_index |
Emitted when a sub-task finishes. |
x_deep_think.synthesizing |
message |
Emitted when synthesis begins. Content chunks follow as normal SSE. |
x_deep_think.completed |
title, artifact_name |
Emitted when the pipeline finishes. The report is auto-saved as an artifact. |
x_deep_think.error |
message |
Emitted if a fatal error occurs during deep think. |
x_deep_think.discovery_started |
mode, message |
Emitted when the discovery phase begins (target-focused mode only). |
x_deep_think.discovery_completed |
pages_fetched, site_map, message |
Emitted when the discovery phase completes. Includes page count and site structure summary. |
x_deep_think.artifact_created |
artifact_type, artifact_title, message |
Emitted when a structured artifact (table, matrix, findings list) is created from sub-task results. |
Example Event Stream
data: {"type":"x_deep_think.plan_created","title":"Quantum Computing Advances","total_subtasks":4}
data: {"type":"x_deep_think.subtask_started","subtask_id":"1","subtask_query":"Latest quantum error correction breakthroughs","subtask_index":0,"total_subtasks":4}
data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"quantum error correction 2026\"}"}
data: {"type":"x_deep_think.subtask_completed","subtask_id":"1","subtask_index":0}
data: {"type":"x_deep_think.subtask_started","subtask_id":"2","subtask_query":"Major quantum hardware milestones","subtask_index":1,"total_subtasks":4}
...
data: {"type":"x_deep_think.synthesizing","message":"Synthesizing final report..."}
data: {"choices":[{"index":0,"delta":{"content":"# Quantum Computing Advances\n\n"},"finish_reason":null}]}
...
data: {"type":"x_deep_think.completed","title":"Quantum Computing Advances","artifact_name":"deep-think-20260227-143012.md"}
data: [DONE]
SSE Events Reference
When streaming, server-side tool execution emits inline SSE events so
clients can display progress indicators. All vendor-specific events use
the x_ prefix for OpenAI spec compliance.
Research Events
| Event Type | Tool | Description |
|---|---|---|
x_research.searching |
web_search |
Emitted when a web search begins execution. |
x_research.reading |
fetch_url |
Emitted when a URL fetch begins execution. |
x_research.code_searching |
code_search, gitlab_code_search |
Emitted when a code search begins execution. |
x_research.calculating |
calculator |
Emitted when a calculator evaluation begins. |
x_research.tool_call |
Any | Emitted for tool invocations that do not match a specific research event type. Includes tool_name, message. |
x_research.result |
All research tools | Emitted when a tool call completes with its result. |
x_research.complete |
-- | Emitted when the entire research phase finishes. Includes elapsed_ms, input_tokens, output_tokens, iterations, sources. |
Artifact Events
| Event Type | Description |
|---|---|
x_artifact.created |
Emitted when a new artifact is created. Includes identifier, title, version. |
x_artifact.updated |
Emitted when an existing artifact is updated. Includes identifier, version. |
Ask User Events
| Event Type | Description |
|---|---|
x_ask_user.question |
Emitted when the model asks a clarifying question. Includes question, options, correlation_id. |
x_ask_user.pending_state |
Emitted with the assistant's partial content and state while awaiting user response. |
Deep Think Events
| Event Type | Description |
|---|---|
x_deep_think.plan_created |
Emitted after the planning phase. Includes title, total_subtasks. |
x_deep_think.subtask_started |
Emitted when a sub-task begins. Includes subtask_id, subtask_query, subtask_index, total_subtasks. |
x_deep_think.subtask_completed |
Emitted when a sub-task finishes. Includes subtask_id, subtask_index. |
x_deep_think.synthesizing |
Emitted when synthesis begins. Includes message. |
x_deep_think.completed |
Emitted when the pipeline finishes. Includes title, artifact_name. |
x_deep_think.error |
Emitted if a fatal error occurs. Includes message. |
x_deep_think.discovery_started |
Emitted when discovery phase begins (target-focused mode). Includes mode, message. |
x_deep_think.discovery_completed |
Emitted when discovery phase completes. Includes pages_fetched, site_map, message. |
x_deep_think.artifact_created |
Emitted when a structured artifact is created. Includes artifact_type, artifact_title, message. |
File Search Events (Responses API)
| Event Type | Description |
|---|---|
response.file_search_call.in_progress |
Emitted when a server-side file search invocation begins. Includes item_id, output_index. |
response.file_search_call.searching |
Emitted while the file search is actively executing. |
response.file_search_call.completed |
Emitted when the file search finishes. Includes the completed output item with results. |
Chat Metadata Events
| Event Type | Description |
|---|---|
x_chat.metadata |
Emitted at the end of a response stream with aggregated usage metadata including research token counts. Includes usage (object with research token breakdown). |
Example SSE Stream
data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"rust async patterns 2026\"}"}
data: {"type":"x_research.result","name":"web_search","tool_call_id":"call_1"}
data: {"type":"x_research.reading","name":"fetch_url","arguments":"{\"url\":\"https://blog.rust-lang.org/...\"}"}
data: {"type":"x_research.result","name":"fetch_url","tool_call_id":"call_2"}
data: {"type":"x_research.calculating","name":"calculator","arguments":"{\"expression\":\"2^32\"}"}
data: {"type":"x_research.result","name":"calculator","tool_call_id":"call_3"}
data: {"type":"x_research.complete","elapsed_ms":4200,"input_tokens":12500,"output_tokens":850,"iterations":3,"sources":5}
data: {"choices":[{"index":0,"delta":{"content":"Based on my research..."},"finish_reason":null}]}
...
data: [DONE]
Rate Limiting & Caching
Per-Project Rate Limiting
Tool calls are rate limited to 45 calls per minute per project. When the limit is exceeded, the tool returns an error response instead of executing.
Rate Limit Error Response
{
"error": "Research tool rate limit exceeded. Try again in 12 seconds."
}
Result Caching
Identical tool calls (same function name and arguments) within a 5-minute
window return cached results without re-executing. This prevents redundant
network calls when the model re-invokes the same search or fetch.
Auto-fetched URLs from web_search are also cached, so
subsequent explicit fetch_url calls for the same URL are free.
Tool Limits
| Limit | Value | Description |
|---|---|---|
| Rate limit | 45/min | Maximum tool calls per minute per project. |
| Cache TTL | 5 minutes | How long identical tool results are cached. |
| Auto-fetch count | 2 | Number of top URLs auto-fetched from web_search results. |
| Max URLs per fetch | 5 | Maximum URLs per fetch_url call. |
| Max follow links | 3 | Maximum same-domain links to auto-follow in discovery mode. |
| Max total pages | 8 | Hard ceiling on total pages fetched per fetch_url call. |
| Character budget | 24,000 | Character budget for multi-URL fetch results, distributed across pages. |
Full API Examples
Chat Completions with Research Tools
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{"role": "user", "content": "What are the key differences between Rust and Go for building web servers?"}
],
"stream": true,
"web_search_options": {
"search_context_size": "medium",
"x_tools": ["web_search", "fetch_url", "code_search"]
}
}'
Chat Completions with All Tools Enabled
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{"role": "user", "content": "Compare the async runtimes in Rust and explain the tradeoffs"}
],
"stream": true,
"web_search_options": {
"search_context_size": "high",
"max_iterations": 8,
"x_tools": [
"web_search", "fetch_url", "code_search",
"gitlab_code_search", "calculator"
]
}
}'
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key",
)
stream = client.chat.completions.create(
model="my-model",
messages=[
{"role": "user", "content": "Compare the async runtimes in Rust and explain the tradeoffs"}
],
stream=True,
extra_body={
"web_search_options": {
"search_context_size": "high",
"max_iterations": 8,
"x_tools": [
"web_search", "fetch_url", "code_search",
"gitlab_code_search", "calculator"
]
}
},
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
const stream = await client.chat.completions.create({
model: "my-model",
messages: [
{ role: "user", content: "Compare the async runtimes in Rust and explain the tradeoffs" }
],
stream: true,
web_search_options: {
search_context_size: "high",
max_iterations: 8,
x_tools: [
"web_search", "fetch_url", "code_search",
"gitlab_code_search", "calculator"
],
},
});
for await (const chunk of stream) {
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
Responses API with Web Search and File Search
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"input": "Compare the uploaded design spec with current best practices for REST API design",
"stream": true,
"tools": [
{
"type": "web_search_preview",
"search_context_size": "medium",
"x_tools": ["web_search", "fetch_url"]
},
{
"type": "file_search",
"vector_store_ids": ["vs_abc123"]
}
]
}'
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key",
)
response = client.responses.create(
model="my-model",
input="Compare the uploaded design spec with current best practices for REST API design",
stream=True,
tools=[
{
"type": "web_search_preview",
"search_context_size": "medium",
"x_tools": ["web_search", "fetch_url"],
},
{
"type": "file_search",
"vector_store_ids": ["vs_abc123"],
},
],
)
for event in response:
if hasattr(event, "type"):
print(event)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
const stream = await client.responses.create({
model: "my-model",
input: "Compare the uploaded design spec with current best practices for REST API design",
stream: true,
tools: [
{
type: "web_search_preview",
search_context_size: "medium",
x_tools: ["web_search", "fetch_url"],
},
{
type: "file_search",
vector_store_ids: ["vs_abc123"],
},
],
});
for await (const event of stream) {
console.log(event);
}
Multi-Tool Conversation Flow
This example shows a typical multi-tool chain where the model searches for information, fetches a page for detail, and uses the calculator to verify a number -- all within a single agentic loop.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "What is the current price of gold per ounce, and how much would 3.5 troy ounces cost in EUR at today'\''s exchange rate?"
}
],
"stream": true,
"web_search_options": {
"search_context_size": "medium",
"max_iterations": 6,
"x_tools": ["web_search", "fetch_url", "calculator"]
}
}'
import json
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key",
)
# The model may chain: web_search -> fetch_url -> calculator
# all server-side in a single request
stream = client.chat.completions.create(
model="my-model",
messages=[
{
"role": "user",
"content": "What is the current price of gold per ounce, "
"and how much would 3.5 troy ounces cost in EUR "
"at today's exchange rate?"
}
],
stream=True,
extra_body={
"web_search_options": {
"search_context_size": "medium",
"max_iterations": 6,
"x_tools": ["web_search", "fetch_url", "calculator"]
}
},
)
for chunk in stream:
raw = chunk.model_dump()
# Check for research progress events
if "type" in raw and raw["type"].startswith("x_research."):
event_type = raw["type"]
if event_type == "x_research.searching":
print(f"[Searching] {raw.get('arguments', '')}")
elif event_type == "x_research.reading":
print(f"[Reading] {raw.get('arguments', '')}")
elif event_type == "x_research.calculating":
print(f"[Calculating] {raw.get('arguments', '')}")
elif event_type == "x_research.complete":
print(f"[Research complete] {raw.get('iterations', 0)} iterations, "
f"{raw.get('elapsed_ms', 0)}ms")
continue
# Normal content chunks
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
// The model may chain: web_search -> fetch_url -> calculator
// all server-side in a single request
const stream = await client.chat.completions.create({
model: "my-model",
messages: [
{
role: "user",
content: "What is the current price of gold per ounce, " +
"and how much would 3.5 troy ounces cost in EUR " +
"at today's exchange rate?",
}
],
stream: true,
web_search_options: {
search_context_size: "medium",
max_iterations: 6,
x_tools: ["web_search", "fetch_url", "calculator"],
},
});
for await (const chunk of stream) {
const raw = chunk;
// Check for research progress events
if (raw.type && raw.type.startsWith("x_research.")) {
if (raw.type === "x_research.searching") {
console.log(`[Searching] ${raw.arguments || ""}`);
} else if (raw.type === "x_research.reading") {
console.log(`[Reading] ${raw.arguments || ""}`);
} else if (raw.type === "x_research.calculating") {
console.log(`[Calculating] ${raw.arguments || ""}`);
} else if (raw.type === "x_research.complete") {
console.log(`[Research complete] ${raw.iterations || 0} iterations, ${raw.elapsed_ms || 0}ms`);
}
continue;
}
// Normal content chunks
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
Deep Think (curl)
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"}
],
"stream": true,
"web_search_options": {
"search_context_size": "medium",
"x_deep_think": true,
"x_tools": ["web_search", "fetch_url", "code_search", "calculator"]
}
}'
Deep Think (Python)
import json
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key",
)
stream = client.chat.completions.create(
model="my-model",
messages=[
{"role": "user", "content": "Comprehensive analysis of quantum computing advances in 2026"}
],
stream=True,
extra_body={
"web_search_options": {
"search_context_size": "medium",
"x_deep_think": True,
"x_tools": ["web_search", "fetch_url", "code_search", "calculator"]
}
},
)
for chunk in stream:
raw = chunk.model_dump()
# Check for deep think progress events
if "type" in raw and raw["type"].startswith("x_deep_think."):
print(f"[{raw['type']}]", json.dumps(raw, indent=2))
continue
# Normal content chunks
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
const stream = await client.chat.completions.create({
model: "my-model",
messages: [
{ role: "user", content: "Comprehensive analysis of quantum computing advances in 2026" }
],
stream: true,
web_search_options: {
search_context_size: "medium",
x_deep_think: true,
x_tools: ["web_search", "fetch_url", "code_search", "calculator"],
},
});
for await (const chunk of stream) {
const raw = chunk;
// Check for deep think progress events
if (raw.type && raw.type.startsWith("x_deep_think.")) {
console.log(`[${raw.type}]`, JSON.stringify(raw, null, 2));
continue;
}
// Normal content chunks
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();
Deep think requests use streaming and emit progress events inline. The final report is streamed as normal content chunks after all sub-tasks complete. The report is also auto-saved as an artifact in the chat interface.
Artifact Creation Flow
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "my-model",
"messages": [
{
"role": "user",
"content": "Create a Python script that implements a binary search tree with insert, search, and in-order traversal methods."
}
],
"stream": true
}'
import json
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key",
)
# In the chat context, create_artifact and update_artifact
# are always available. The model will use them when appropriate.
stream = client.chat.completions.create(
model="my-model",
messages=[
{
"role": "user",
"content": "Create a Python script that implements a "
"binary search tree with insert, search, and "
"in-order traversal methods."
}
],
stream=True,
)
for chunk in stream:
raw = chunk.model_dump()
# Check for artifact events
if "type" in raw:
if raw["type"] == "x_artifact.created":
print(f"\n[Artifact created] {raw.get('identifier', '')} "
f"- {raw.get('title', '')} (v{raw.get('version', 1)})")
elif raw["type"] == "x_artifact.updated":
print(f"\n[Artifact updated] {raw.get('identifier', '')} "
f"(v{raw.get('version', '')})")
continue
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
// In the chat context, create_artifact and update_artifact
// are always available. The model will use them when appropriate.
const stream = await client.chat.completions.create({
model: "my-model",
messages: [
{
role: "user",
content: "Create a Python script that implements a " +
"binary search tree with insert, search, and " +
"in-order traversal methods.",
}
],
stream: true,
});
for await (const chunk of stream) {
const raw = chunk;
// Check for artifact events
if (raw.type) {
if (raw.type === "x_artifact.created") {
console.log(`\n[Artifact created] ${raw.identifier || ""} ` +
`- ${raw.title || ""} (v${raw.version || 1})`);
} else if (raw.type === "x_artifact.updated") {
console.log(`\n[Artifact updated] ${raw.identifier || ""} ` +
`(v${raw.version || ""})`);
}
continue;
}
const content = chunk.choices?.[0]?.delta?.content;
if (content) process.stdout.write(content);
}
console.log();