Reranking & Scoring
Second-stage relevance for retrieval pipelines. Run a fast bi-encoder, then send the top candidates through /v1/rerank or /v1/score for cross-encoder ranking. Cohere-shaped rerank, vLLM-shaped score, one endpoint mode away.
Overview
Reranking and scoring APIs use cross-encoder models that jointly encode a query and a document (or two texts) to produce a single relevance score. Cross-encoders are more accurate than bi-encoder dot-product similarity for retrieval tasks because they attend to both inputs simultaneously, but they are slower, making them ideal as a second-stage reranker on a small candidate set retrieved by a fast bi-encoder embedding search.
Both endpoints require a scoring endpoint, one configured
with task_mode: "score". General text
generation or embedding endpoints reject reranking and
scoring requests. The request-side model
field is accepted for API compatibility but is overridden
server-side by the endpoint's configured model. Both
endpoints are always synchronous (non-streaming).
Pick the right endpoint
/v1/rerank |
/v1/score |
|---|---|
| Query plus an array of documents. | Reference text plus an array of candidate texts. |
Results sorted by relevance_score descending; supports top_n. |
Results returned in input order; no sorting, no top_n. |
| Cohere / Jina rerank response shape. | Raw vLLM /v1/score response shape. |
| Use as a second-stage reranker over candidates returned by a bi-encoder. | Use for pairwise similarity scoring where input order matters. |
Reranking Endpoint
POST /:project_id/:endpoint_slug/v1/rerank
Scores each document against the query and returns results ranked by relevance score in descending order. Optionally limits results to the top N most relevant documents.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe."
],
"top_n": 2,
"return_documents": true
}'
Rerank Request Parameters
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model. |
| queryrequired | string | The search query or reference text to rank documents against. |
| documentsrequired | array of strings | Array of document texts to score and rank. At least one document is required; the array is capped at 1000 entries (requests exceeding this limit are rejected with HTTP 400 invalid_request). |
| top_noptional | integer | Maximum number of results to return, sorted by relevance score descending. When omitted, all documents are returned. |
| return_documentsoptional | boolean | Whether to include the original document text in each result object. Defaults to true. |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". Affects scheduling and billing. Unrecognized values are silently treated as "default". See Service Tiers. |
Rerank Response
Response
{
"results": [
{
"index": 1,
"relevance_score": 0.987654,
"document": {
"text": "Paris is the capital of France."
}
},
{
"index": 3,
"relevance_score": 0.654321,
"document": {
"text": "France is a country in Western Europe."
}
}
],
"model": "bge-reranker-v2-m3",
"usage": {
"total_tokens": 148
}
}
| Field | Type | Description |
|---|---|---|
| results | array | Ranked results sorted by relevance_score descending. Length is at most top_n if specified. |
| results[].index | integer | Original zero-based position of this document in the input documents array. |
| results[].relevance_score | number | Relevance score between the query and this document. Higher values indicate stronger relevance. Range is model-dependent (typically 0.0 to 1.0). |
| results[].document | object | null | Present when return_documents is true. Contains text with the original document string. |
| model | string | The endpoint's configured model identifier (a UUID string), not the human-readable model value supplied in the request. The request-side model field is accepted for API compatibility but is replaced server-side. |
| usage | object | Token usage statistics. Contains total_tokens. |
| usage.total_tokens | integer | Total tokens consumed by the reranking request across all query-document pairs. Best-effort: reported as 0 when the upstream worker omits a usage block. |
Scoring Endpoint
POST /:project_id/:endpoint_slug/v1/score
Computes pairwise similarity scores between a single reference text
(text_1) and one or more candidate texts (text_2).
Unlike /v1/rerank, results are returned in input order
without sorting. This endpoint passes the raw vLLM scoring response
back to the client.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"]
}'
Score Request Parameters
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model. |
| text_1required | string | The reference text to score against. Each element in text_2 is compared to this text. |
| text_2required | array of strings | Array of candidate texts to compare with text_1. At least one candidate is required; the array is capped at 1000 entries (requests exceeding this limit are rejected with HTTP 400 invalid_request). |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". Affects scheduling and billing. Unrecognized values are silently treated as "default". See Service Tiers. |
Score Response
The score endpoint returns the raw vLLM scoring response. The
structure follows the vLLM /v1/score format with results
in the same order as the input text_2 array.
Response
{
"model": "bge-reranker-v2-m3",
"results": [
{
"index": 0,
"score": 0.812345
},
{
"index": 1,
"score": 0.023456
},
{
"index": 2,
"score": 0.753210
}
],
"usage": {
"total_tokens": 72
}
}
| Field | Type | Description |
|---|---|---|
| model | string | Pass-through from the upstream worker response; not guaranteed to be present. When present, it is the endpoint's configured model identifier (typically a UUID string), not the human-readable model value supplied in the request. |
| results | array | Scored results in the same order as the input text_2 array. |
| results[].index | integer | Zero-based position of this result in the input text_2 array. |
| results[].score | number | Similarity score between text_1 and the corresponding text_2 element. Range is model-dependent. |
| usage | object | null | Token usage statistics when reported by the backend. Contains total_tokens. |
Code Examples
All examples target an endpoint configured with
task_mode: "score". Substitute your project
external id, endpoint slug, and an API key minted from your
project. The picker below remembers your language across
every tab group on this page.
Reranking
import httpx
BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"
response = httpx.post(
f"{BASE_URL}/rerank",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe.",
],
"top_n": 2,
"return_documents": True,
},
)
response.raise_for_status()
data = response.json()
for result in data["results"]:
print(f"Rank: index={result['index']} score={result['relevance_score']:.4f}")
if "document" in result:
print(f" Text: {result['document']['text']}")
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";
const response = await fetch(`${BASE_URL}/rerank`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "bge-reranker-v2-m3",
query: "What is the capital of France?",
documents: [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe.",
],
top_n: 2,
return_documents: true,
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Rerank failed: ${error.error.message}`);
}
const data = await response.json();
for (const result of data.results) {
console.log(`index=${result.index} score=${result.relevance_score.toFixed(4)}`);
if (result.document) {
console.log(` Text: ${result.document.text}`);
}
}
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan."
],
"top_n": 2,
"return_documents": true
}'
Scoring
Pairwise similarity, in input order.
import httpx
BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"
response = httpx.post(
f"{BASE_URL}/score",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"],
},
)
response.raise_for_status()
data = response.json()
for result in data["results"]:
print(f"index={result['index']} score={result['score']:.6f}")
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";
const response = await fetch(`${BASE_URL}/score`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "bge-reranker-v2-m3",
text_1: "hello world",
text_2: ["hi there", "goodbye", "greetings"],
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Score failed: ${error.error.message}`);
}
const data = await response.json();
for (const result of data.results) {
console.log(`index=${result.index} score=${result.score.toFixed(6)}`);
}
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"]
}'
xeroctl CLI
Both endpoints are available via the xeroctl CLI.
See xeroctl rerank & score for
full CLI documentation.
# Rerank inline documents
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "What is the capital of France?" \
--documents "Paris is the capital." "Berlin is the capital of Germany." \
--top-n 1 \
--return-documents
# Score similarity between a reference and candidates
xeroctl score \
--endpoint my-rerank-endpoint \
--text1 "hello world" \
--text2 "hi there" "goodbye" "greetings"
# Rerank from files
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "search term" \
doc1.txt doc2.txt doc3.txt
# Output as JSON
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "query" \
--documents "doc1" "doc2" \
--output json
Error Handling
Reranking and scoring endpoints return the same error format as all other inference endpoints. See Error Handling for the full reference. Errors specific to these endpoints are listed below.
| HTTP Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request |
Missing or invalid parameters. Check that query / text_1 and documents / text_2 are present and non-empty. |
| 400 | model_not_scoring |
The endpoint is not configured for scoring. Create or use an endpoint with task_mode: "score" for reranking and scoring requests. |
| 401 | authentication_error |
Invalid or missing API key. |
| 403 | endpoint_restricted |
API key is restricted to a different endpoint, or the client IP is blocked. |
| 404 | endpoint_not_found |
No endpoint with the given slug exists in this project. |
| 429 | rate_limit_exceeded |
Per-key or per-endpoint request rate limit hit. Respect Retry-After and use exponential backoff. |
| 429 / 503 | capacity_exceeded |
No worker had available capacity to schedule the request within its deadline. Retry after a short delay; the exact status reflects the underlying routing failure. |
| 503 | endpoint_inactive |
The endpoint is not currently active. It may be provisioning or disabled. |
| 503 | model_provisioning |
The model is being loaded onto a worker. Retry after a short delay. |
| 500 | invalid_tier |
The endpoint is configured with a service tier that the router cannot resolve. Indicates a misconfigured endpoint rather than a transient fault; contact the project administrator. |
| 500 | invalid_model_id |
The endpoint's configured model identifier could not be resolved. Indicates a misconfigured endpoint rather than a transient fault; contact the project administrator. |
| 500 | invalid_worker_response |
The upstream worker returned a response that could not be parsed into the expected reranking or scoring shape. Retry; if persistent, contact support. |
| 500 | internal_error |
An unexpected internal error occurred. Note: unlike /v1/chat/completions and /v1/responses, the /v1/rerank and /v1/score handlers do not currently emit an X-Request-ID response header. When contacting support, include a sample request, the endpoint slug, and the approximate request time. See Error Handling for the response-id mechanism used by other endpoints. |