Reranking & Scoring
Rank documents by relevance to a query or compute pairwise similarity
scores using cross-encoder models. Two endpoints are provided:
/v1/rerank for ranked document retrieval augmentation
and /v1/score for raw similarity scoring.
Overview
Reranking and scoring APIs use cross-encoder models that jointly encode a query and a document (or two texts) to produce a single relevance score. Cross-encoders are more accurate than bi-encoder dot-product similarity for retrieval tasks because they attend to both inputs simultaneously, but they are slower -- making them ideal as a second-stage reranker on a small candidate set retrieved by a fast bi-encoder embedding search.
- Reranking (
/v1/rerank) -- Takes a query and an array of documents, scores each document against the query, and returns results sorted by relevance score descending. Follows the Cohere/Jina reranking API convention. - Scoring (
/v1/score) -- Takes a reference text and an array of candidate texts, and returns a raw similarity score for each pair. Follows the vLLM scoring API convention.
Both endpoints require a scoring endpoint -- one configured with
task_mode: "score". General text generation or embedding
endpoints will reject reranking and scoring requests. These requests
are always synchronous (non-streaming) and are routed through the same
tier-aware scheduler as all other inference requests.
Reranking Endpoint
POST /proj_ABC123/endpoint-slug/v1/rerank
Scores each document against the query and returns results ranked by relevance score in descending order. Optionally limits results to the top N most relevant documents.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe."
],
"top_n": 2,
"return_documents": true
}'
Rerank Request Parameters
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model. |
| queryrequired | string | The search query or reference text to rank documents against. |
| documentsrequired | array of strings | Array of document texts to score and rank. At least one document is required. |
| top_noptional | integer | Maximum number of results to return, sorted by relevance score descending. When omitted, all documents are returned. |
| return_documentsoptional | boolean | Whether to include the original document text in each result object. Defaults to true. |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". Affects scheduling and billing. See Service Tiers. |
Rerank Response
Response
{
"results": [
{
"index": 1,
"relevance_score": 0.987654,
"document": {
"text": "Paris is the capital of France."
}
},
{
"index": 3,
"relevance_score": 0.654321,
"document": {
"text": "France is a country in Western Europe."
}
}
],
"model": "bge-reranker-v2-m3",
"usage": {
"total_tokens": 148
}
}
| Field | Type | Description |
|---|---|---|
| results | array | Ranked results sorted by relevance_score descending. Length is at most top_n if specified. |
| results[].index | integer | Original zero-based position of this document in the input documents array. |
| results[].relevance_score | number | Relevance score between the query and this document. Higher values indicate stronger relevance. Range is model-dependent (typically 0.0 to 1.0). |
| results[].document | object | null | Present when return_documents is true. Contains text with the original document string. |
| model | string | Model identifier used for reranking. |
| usage | object | Token usage statistics. Contains total_tokens. |
| usage.total_tokens | integer | Total tokens consumed by the reranking request across all query-document pairs. |
Scoring Endpoint
POST /proj_ABC123/endpoint-slug/v1/score
Computes pairwise similarity scores between a single reference text
(text_1) and one or more candidate texts (text_2).
Unlike /v1/rerank, results are returned in input order
without sorting. This endpoint passes the raw vLLM scoring response
back to the client.
curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"]
}'
Score Request Parameters
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model. |
| text_1required | string | The reference text to score against. Each element in text_2 is compared to this text. |
| text_2required | array of strings | Array of candidate texts to compare with text_1. At least one candidate is required. |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". See Service Tiers. |
Score Response
The score endpoint returns the raw vLLM scoring response. The
structure follows the vLLM /v1/score format with results
in the same order as the input text_2 array.
Response
{
"model": "bge-reranker-v2-m3",
"results": [
{
"index": 0,
"score": 0.812345
},
{
"index": 1,
"score": 0.023456
},
{
"index": 2,
"score": 0.753210
}
],
"usage": {
"total_tokens": 72
}
}
| Field | Type | Description |
|---|---|---|
| model | string | Model identifier used for scoring. |
| results | array | Scored results in the same order as the input text_2 array. |
| results[].index | integer | Zero-based position of this result in the input text_2 array. |
| results[].score | number | Similarity score between text_1 and the corresponding text_2 element. Range is model-dependent. |
| usage | object | null | Token usage statistics when reported by the backend. Contains total_tokens. |
Code Examples
Reranking -- Python (OpenAI SDK)
import httpx
BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"
response = httpx.post(
f"{BASE_URL}/rerank",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe.",
],
"top_n": 2,
"return_documents": True,
},
)
response.raise_for_status()
data = response.json()
for result in data["results"]:
print(f"Rank: index={result['index']} score={result['relevance_score']:.4f}")
if "document" in result:
print(f" Text: {result['document']['text']}")
Reranking -- Node.js
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";
const response = await fetch(`${BASE_URL}/rerank`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "bge-reranker-v2-m3",
query: "What is the capital of France?",
documents: [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan.",
"France is a country in Western Europe.",
],
top_n: 2,
return_documents: true,
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Rerank failed: ${error.error.message}`);
}
const data = await response.json();
for (const result of data.results) {
console.log(`index=${result.index} score=${result.relevance_score.toFixed(4)}`);
if (result.document) {
console.log(` Text: ${result.document.text}`);
}
}
Reranking -- curl
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"query": "What is the capital of France?",
"documents": [
"Berlin is the capital of Germany.",
"Paris is the capital of France.",
"Tokyo is the capital of Japan."
],
"top_n": 2,
"return_documents": true
}'
Scoring -- Python
import httpx
BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"
response = httpx.post(
f"{BASE_URL}/score",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"],
},
)
response.raise_for_status()
data = response.json()
for result in data["results"]:
print(f"index={result['index']} score={result['score']:.6f}")
Scoring -- Node.js
const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";
const response = await fetch(`${BASE_URL}/score`, {
method: "POST",
headers: {
Authorization: `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "bge-reranker-v2-m3",
text_1: "hello world",
text_2: ["hi there", "goodbye", "greetings"],
}),
});
if (!response.ok) {
const error = await response.json();
throw new Error(`Score failed: ${error.error.message}`);
}
const data = await response.json();
for (const result of data.results) {
console.log(`index=${result.index} score=${result.score.toFixed(6)}`);
}
Scoring -- curl
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-reranker-v2-m3",
"text_1": "hello world",
"text_2": ["hi there", "goodbye", "greetings"]
}'
xeroctl CLI
Both endpoints are available via the xeroctl CLI.
See xeroctl rerank & score for
full CLI documentation.
# Rerank inline documents
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "What is the capital of France?" \
--documents "Paris is the capital." "Berlin is the capital of Germany." \
--top-n 1 \
--return-documents
# Score similarity between a reference and candidates
xeroctl score \
--endpoint my-rerank-endpoint \
--text1 "hello world" \
--text2 "hi there" "goodbye" "greetings"
# Rerank from files
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "search term" \
doc1.txt doc2.txt doc3.txt
# Output as JSON
xeroctl rerank \
--endpoint my-rerank-endpoint \
--query "query" \
--documents "doc1" "doc2" \
--output json
Error Handling
Reranking and scoring endpoints return the same error format as all other inference endpoints. See Error Handling for the full reference. Errors specific to these endpoints are listed below.
| HTTP Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request |
Missing or invalid parameters. Check that query / text_1 and documents / text_2 are present and non-empty. |
| 400 | model_not_scoring |
The endpoint is not configured for scoring. Create or use an endpoint with task_mode: "score" for reranking and scoring requests. |
| 401 | authentication_error |
Invalid or missing API key. |
| 403 | endpoint_restricted |
API key is restricted to a different endpoint, or the client IP is blocked. |
| 404 | endpoint_not_found |
No endpoint with the given slug exists in this project. |
| 429 | rate_limit_exceeded |
Per-key or per-endpoint request rate limit hit. Respect Retry-After and use exponential backoff. |
| 503 | endpoint_inactive |
The endpoint is not currently active. It may be provisioning or disabled. |
| 503 | model_provisioning |
The model is being loaded onto a worker. Retry after a short delay. |
| 500 | internal_error |
An unexpected internal error occurred. If persistent, contact support with the X-Request-ID value. |