Web Search

Built-in server-side web search for chat completions and responses. The model searches the web, fetches top results, and synthesizes answers -- all within a single API request.

Overview

The web_search server-side tool searches the web using built-in search and returns structured results including titles, URLs, and snippets. The top 2 result URLs are automatically fetched in parallel and appended as fetched_pages, providing immediate full-page context enrichment without a second tool call.

Web search runs server-side inside the agentic loop. When the model decides to search, the router executes the tool, injects the results back into the conversation, and the model continues generating. Clients receive a single streaming response without managing the tool loop themselves.

When to Use Web Search

  • Current events and breaking news that the model may not know
  • Fact checking and real-time data verification
  • Product research, pricing, and comparisons
  • Documentation and API reference lookup
  • Any query that benefits from up-to-date information

Note: Web search is a research tool and must be explicitly opted in. See Enabling Web Search below. Other tools such as create_artifact and ask_user are injected automatically. See Server-Side Tools for the full tool reference.

Enabling Web Search

Web search must be opted into on each request. There are two ways to enable it depending on which API you are using.

Chat Completions API

Add a web_search_options object to your /v1/chat/completions request. Set x_tools to include "web_search":

JSON
{ "model": "my-model", "messages": [ {"role": "user", "content": "What are the latest Rust async patterns?"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"] } }

Responses API

Include a tool object with "type": "web_search_preview" in the tools array:

JSON
{ "model": "my-model", "input": "Find the latest research on transformer architectures", "stream": true, "tools": [ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"] } ] }

web_search_options Fields

FieldTypeDefaultDescription
search_context_sizeoptional string "medium" Controls how much search context is used. One of: low, medium, high.
max_iterationsoptional integer 5 Maximum agentic loop iterations. Range 1-10.
x_toolsoptional string[] ["web_search", "fetch_url"] Which research tools to enable. When omitted or empty, defaults to ["web_search", "fetch_url"]. Including web_search auto-includes fetch_url.
x_deep_thinkoptional boolean false Enables the multi-phase Deep Think pipeline. See the Deep Think section.

web_search Tool Parameters

When the model decides to invoke web search, it generates a tool call with the following argument:

ParameterTypeDescription
queryrequired string The search query to look up on the web. The model formulates this from the user message context.

Example Tool Call (generated by model)

JSON
{ "name": "web_search", "arguments": { "query": "rust async trait stabilization 2026" } }

Response Format

The tool returns a JSON object with the following fields. The router injects this as a tool-role message back to the model.

FieldTypeDescription
answer string Direct answer from the search engine, if available. May be empty.
abstract string Summary text from the search engine, if available. May be empty.
results object[] Array of search result objects, each with title, url, and snippet.
fetched_pages object[] Array of auto-fetched page content for the top 2 URLs. Each entry has url and content.

Example Response

JSON
{ "answer": "", "abstract": "", "results": [ { "title": "Async Trait Methods Stabilized in Rust 1.85", "url": "https://blog.rust-lang.org/2026/02/20/async-traits.html", "snippet": "Rust 1.85 stabilizes async fn in traits, enabling..." }, { "title": "Understanding async traits in Rust", "url": "https://docs.rs/async-trait/latest/guide", "snippet": "A comprehensive guide to using async trait methods..." } ], "fetched_pages": [ { "url": "https://blog.rust-lang.org/2026/02/20/async-traits.html", "content": "Announcing Rust 1.85. We are happy to announce that async fn in traits..." }, { "url": "https://docs.rs/async-trait/latest/guide", "content": "Async Trait Guide. This guide covers the fundamentals of async trait..." } ] }

Auto-Fetch Enrichment

After returning search results, the router automatically fetches the top 2 result URLs in parallel and appends the extracted text as fetched_pages. This gives the model both the search snippets and full page content in a single tool call, reducing the number of loop iterations needed.

Each auto-fetched page shares an equal portion of a 12,000-character budget. Pages exceeding their share are truncated.

Auto-fetched URLs are also cached. If the model subsequently calls fetch_url for a URL that was already auto-fetched, the cached result is returned instantly without a network request.

SSE Events

When streaming, the router emits inline SSE events to indicate web search progress. All vendor events use the x_ prefix for OpenAI spec compliance.

Event TypeFieldsDescription
x_research.searching name, arguments Emitted when a web search begins. Contains the tool name and JSON-encoded arguments.
x_research.result name, tool_call_id Emitted when the web search completes and results are available.
x_research.reading name, arguments Emitted when auto-fetch or an explicit fetch_url call begins fetching a URL.
x_research.complete elapsed_ms, input_tokens, output_tokens, iterations, sources Emitted when the entire research phase finishes, before content chunks begin.
x_chat.metadata usage Emitted at the end of a response stream with aggregated usage metadata including research token counts.

Example SSE Stream

SSE
data: {"type":"x_research.searching","name":"web_search","arguments":"{\"query\":\"rust async trait stabilization 2026\"}"} data: {"type":"x_research.result","name":"web_search","tool_call_id":"call_1"} data: {"type":"x_research.reading","name":"fetch_url","arguments":"{\"url\":\"https://blog.rust-lang.org/...\"}"} data: {"type":"x_research.result","name":"fetch_url","tool_call_id":"call_2"} data: {"type":"x_research.complete","elapsed_ms":3100,"input_tokens":8400,"output_tokens":620,"iterations":2,"sources":3} data: {"choices":[{"index":0,"delta":{"content":"Based on my research..."},"finish_reason":null}]} ... data: [DONE]

Code Examples

curl -- Chat Completions

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ {"role": "user", "content": "What are the latest Rust async patterns?"} ], "stream": true, "web_search_options": { "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"] } }'

Python (OpenAI SDK)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) stream = client.chat.completions.create( model="my-model", messages=[ {"role": "user", "content": "What are the latest Rust async patterns?"} ], stream=True, extra_body={ "web_search_options": { "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"], } }, ) for chunk in stream: raw = chunk.model_dump() # Handle research progress events if "type" in raw and raw["type"].startswith("x_research."): if raw["type"] == "x_research.searching": print(f"[Searching] {raw.get('arguments', '')}") elif raw["type"] == "x_research.complete": print(f"[Done] {raw.get('iterations', 0)} iterations, " f"{raw.get('elapsed_ms', 0)}ms") continue # Normal content chunks if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="") print()

Node.js (OpenAI SDK)

Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const stream = await client.chat.completions.create({ model: "my-model", messages: [ { role: "user", content: "What are the latest Rust async patterns?" } ], stream: true, web_search_options: { search_context_size: "medium", x_tools: ["web_search", "fetch_url"], }, }); for await (const chunk of stream) { const raw = chunk; // Handle research progress events if (raw.type && raw.type.startsWith("x_research.")) { if (raw.type === "x_research.searching") { console.log(`[Searching] ${raw.arguments || ""}`); } else if (raw.type === "x_research.complete") { console.log(`[Done] ${raw.iterations || 0} iterations, ${raw.elapsed_ms || 0}ms`); } continue; } // Normal content chunks const content = chunk.choices?.[0]?.delta?.content; if (content) process.stdout.write(content); } console.log();

curl -- Responses API

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/responses \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "input": "Find the latest research on transformer architectures", "stream": true, "tools": [ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"] } ] }'

Python -- Responses API

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key", ) response = client.responses.create( model="my-model", input="Find the latest research on transformer architectures", stream=True, tools=[ { "type": "web_search_preview", "search_context_size": "medium", "x_tools": ["web_search", "fetch_url"], } ], ) for event in response: if hasattr(event, "type"): print(event)

Node.js -- Responses API

Node.js
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const stream = await client.responses.create({ model: "my-model", input: "Find the latest research on transformer architectures", stream: true, tools: [ { type: "web_search_preview", search_context_size: "medium", x_tools: ["web_search", "fetch_url"], } ], }); for await (const event of stream) { console.log(event); }

Multi-Tool Chain (web_search + fetch_url + calculator)

This example shows a query that may trigger a chain of tool calls within the same agentic loop: searching for a price, fetching a page for detail, and using the calculator to convert currencies.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/chat/completions \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "my-model", "messages": [ { "role": "user", "content": "What is the current price of gold per ounce, and how much would 3.5 troy ounces cost in EUR at today'\''s exchange rate?" } ], "stream": true, "web_search_options": { "search_context_size": "medium", "max_iterations": 6, "x_tools": ["web_search", "fetch_url", "calculator"] } }'

Error Handling

Rate Limiting

Tool calls are rate limited to 45 calls per minute per project. When the limit is exceeded, the tool returns an error object instead of search results. The model receives this as a tool result and may include a message about the rate limit in its response.

JSON
{ "error": "Research tool rate limit exceeded. Try again in 12 seconds." }

Result Caching

Identical tool calls (same query) within a 5-minute window return cached results without re-executing the search. This prevents redundant network calls when the model re-invokes the same query.

Limits Reference

LimitValueDescription
Rate limit 45 calls/min Maximum tool calls per minute per project.
Cache TTL 5 minutes How long identical tool results are cached.
Max iterations 5 (default) Maximum agentic loop iterations per request. Configurable up to 10.
Auto-fetch count 2 Number of top URLs auto-fetched from web search results.
Auto-fetch budget 12,000 chars Total character budget shared across auto-fetched pages.
Tool execution timeout 15 seconds Each tool call must complete within 15 seconds.