Reranking & Scoring

Rank documents by relevance to a query or compute pairwise similarity scores using cross-encoder models. Two endpoints are provided: /v1/rerank for ranked document retrieval augmentation and /v1/score for raw similarity scoring.

Overview

Reranking and scoring APIs use cross-encoder models that jointly encode a query and a document (or two texts) to produce a single relevance score. Cross-encoders are more accurate than bi-encoder dot-product similarity for retrieval tasks because they attend to both inputs simultaneously, but they are slower -- making them ideal as a second-stage reranker on a small candidate set retrieved by a fast bi-encoder embedding search.

  • Reranking (/v1/rerank) -- Takes a query and an array of documents, scores each document against the query, and returns results sorted by relevance score descending. Follows the Cohere/Jina reranking API convention.
  • Scoring (/v1/score) -- Takes a reference text and an array of candidate texts, and returns a raw similarity score for each pair. Follows the vLLM scoring API convention.

Both endpoints require a scoring endpoint -- one configured with task_mode: "score". General text generation or embedding endpoints will reject reranking and scoring requests. These requests are always synchronous (non-streaming) and are routed through the same tier-aware scheduler as all other inference requests.

Reranking Endpoint

POST /proj_ABC123/endpoint-slug/v1/rerank

Scores each document against the query and returns results ranked by relevance score in descending order. Optionally limits results to the top N most relevant documents.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": [ "Berlin is the capital of Germany.", "Paris is the capital of France.", "Tokyo is the capital of Japan.", "France is a country in Western Europe." ], "top_n": 2, "return_documents": true }'

Rerank Request Parameters

Parameter Type Description
modelrequired string Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model.
queryrequired string The search query or reference text to rank documents against.
documentsrequired array of strings Array of document texts to score and rank. At least one document is required.
top_noptional integer Maximum number of results to return, sorted by relevance score descending. When omitted, all documents are returned.
return_documentsoptional boolean Whether to include the original document text in each result object. Defaults to true.
service_tieroptional string Priority level: "flex", "default", or "priority". Affects scheduling and billing. See Service Tiers.

Rerank Response

Response

{ "results": [ { "index": 1, "relevance_score": 0.987654, "document": { "text": "Paris is the capital of France." } }, { "index": 3, "relevance_score": 0.654321, "document": { "text": "France is a country in Western Europe." } } ], "model": "bge-reranker-v2-m3", "usage": { "total_tokens": 148 } }
Field Type Description
results array Ranked results sorted by relevance_score descending. Length is at most top_n if specified.
results[].index integer Original zero-based position of this document in the input documents array.
results[].relevance_score number Relevance score between the query and this document. Higher values indicate stronger relevance. Range is model-dependent (typically 0.0 to 1.0).
results[].document object | null Present when return_documents is true. Contains text with the original document string.
model string Model identifier used for reranking.
usage object Token usage statistics. Contains total_tokens.
usage.total_tokens integer Total tokens consumed by the reranking request across all query-document pairs.

Scoring Endpoint

POST /proj_ABC123/endpoint-slug/v1/score

Computes pairwise similarity scores between a single reference text (text_1) and one or more candidate texts (text_2). Unlike /v1/rerank, results are returned in input order without sorting. This endpoint passes the raw vLLM scoring response back to the client.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "text_1": "hello world", "text_2": ["hi there", "goodbye", "greetings"] }'

Score Request Parameters

Parameter Type Description
modelrequired string Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model.
text_1required string The reference text to score against. Each element in text_2 is compared to this text.
text_2required array of strings Array of candidate texts to compare with text_1. At least one candidate is required.
service_tieroptional string Priority level: "flex", "default", or "priority". See Service Tiers.

Score Response

The score endpoint returns the raw vLLM scoring response. The structure follows the vLLM /v1/score format with results in the same order as the input text_2 array.

Response

{ "model": "bge-reranker-v2-m3", "results": [ { "index": 0, "score": 0.812345 }, { "index": 1, "score": 0.023456 }, { "index": 2, "score": 0.753210 } ], "usage": { "total_tokens": 72 } }
Field Type Description
model string Model identifier used for scoring.
results array Scored results in the same order as the input text_2 array.
results[].index integer Zero-based position of this result in the input text_2 array.
results[].score number Similarity score between text_1 and the corresponding text_2 element. Range is model-dependent.
usage object | null Token usage statistics when reported by the backend. Contains total_tokens.

Code Examples

Reranking -- Python (OpenAI SDK)

Python
import httpx BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1" API_KEY = "xero_myproject_your_api_key" response = httpx.post( f"{BASE_URL}/rerank", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", }, json={ "model": "bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": [ "Berlin is the capital of Germany.", "Paris is the capital of France.", "Tokyo is the capital of Japan.", "France is a country in Western Europe.", ], "top_n": 2, "return_documents": True, }, ) response.raise_for_status() data = response.json() for result in data["results"]: print(f"Rank: index={result['index']} score={result['relevance_score']:.4f}") if "document" in result: print(f" Text: {result['document']['text']}")

Reranking -- Node.js

JavaScript
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"; const API_KEY = "xero_myproject_your_api_key"; const response = await fetch(`${BASE_URL}/rerank`, { method: "POST", headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-reranker-v2-m3", query: "What is the capital of France?", documents: [ "Berlin is the capital of Germany.", "Paris is the capital of France.", "Tokyo is the capital of Japan.", "France is a country in Western Europe.", ], top_n: 2, return_documents: true, }), }); if (!response.ok) { const error = await response.json(); throw new Error(`Rerank failed: ${error.error.message}`); } const data = await response.json(); for (const result of data.results) { console.log(`index=${result.index} score=${result.relevance_score.toFixed(4)}`); if (result.document) { console.log(` Text: ${result.document.text}`); } }

Reranking -- curl

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "query": "What is the capital of France?", "documents": [ "Berlin is the capital of Germany.", "Paris is the capital of France.", "Tokyo is the capital of Japan." ], "top_n": 2, "return_documents": true }'

Scoring -- Python

Python
import httpx BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1" API_KEY = "xero_myproject_your_api_key" response = httpx.post( f"{BASE_URL}/score", headers={ "Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json", }, json={ "model": "bge-reranker-v2-m3", "text_1": "hello world", "text_2": ["hi there", "goodbye", "greetings"], }, ) response.raise_for_status() data = response.json() for result in data["results"]: print(f"index={result['index']} score={result['score']:.6f}")

Scoring -- Node.js

JavaScript
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"; const API_KEY = "xero_myproject_your_api_key"; const response = await fetch(`${BASE_URL}/score`, { method: "POST", headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "bge-reranker-v2-m3", text_1: "hello world", text_2: ["hi there", "goodbye", "greetings"], }), }); if (!response.ok) { const error = await response.json(); throw new Error(`Score failed: ${error.error.message}`); } const data = await response.json(); for (const result of data.results) { console.log(`index=${result.index} score=${result.score.toFixed(6)}`); }

Scoring -- curl

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-reranker-v2-m3", "text_1": "hello world", "text_2": ["hi there", "goodbye", "greetings"] }'

xeroctl CLI

Both endpoints are available via the xeroctl CLI. See xeroctl rerank & score for full CLI documentation.

Shell
# Rerank inline documents xeroctl rerank \ --endpoint my-rerank-endpoint \ --query "What is the capital of France?" \ --documents "Paris is the capital." "Berlin is the capital of Germany." \ --top-n 1 \ --return-documents # Score similarity between a reference and candidates xeroctl score \ --endpoint my-rerank-endpoint \ --text1 "hello world" \ --text2 "hi there" "goodbye" "greetings" # Rerank from files xeroctl rerank \ --endpoint my-rerank-endpoint \ --query "search term" \ doc1.txt doc2.txt doc3.txt # Output as JSON xeroctl rerank \ --endpoint my-rerank-endpoint \ --query "query" \ --documents "doc1" "doc2" \ --output json

Error Handling

Reranking and scoring endpoints return the same error format as all other inference endpoints. See Error Handling for the full reference. Errors specific to these endpoints are listed below.

HTTP Status Error Code Description
400 invalid_request Missing or invalid parameters. Check that query / text_1 and documents / text_2 are present and non-empty.
400 model_not_scoring The endpoint is not configured for scoring. Create or use an endpoint with task_mode: "score" for reranking and scoring requests.
401 authentication_error Invalid or missing API key.
403 endpoint_restricted API key is restricted to a different endpoint, or the client IP is blocked.
404 endpoint_not_found No endpoint with the given slug exists in this project.
429 rate_limit_exceeded Per-key or per-endpoint request rate limit hit. Respect Retry-After and use exponential backoff.
503 endpoint_inactive The endpoint is not currently active. It may be provisioning or disabled.
503 model_provisioning The model is being loaded onto a worker. Retry after a short delay.
500 internal_error An unexpected internal error occurred. If persistent, contact support with the X-Request-ID value.