Reranking & Scoring

Overview

Reranking and scoring APIs use cross-encoder models that jointly encode a query and a document (or two texts) to produce a single relevance score. Cross-encoders are more accurate than bi-encoder dot-product similarity for retrieval tasks because they attend to both inputs simultaneously, but they are slower, making them ideal as a second-stage reranker on a small candidate set retrieved by a fast bi-encoder embedding search.

Both endpoints require a scoring endpoint, one configured with task_mode: "score". General text generation or embedding endpoints reject reranking and scoring requests. The request-side model field is accepted for API compatibility but is overridden server-side by the endpoint's configured model. Both endpoints are always synchronous (non-streaming).

Pick the right endpoint

`/v1/rerank`	`/v1/score`
Query plus an array of documents.	Reference text plus an array of candidate texts.
Results sorted by `relevance_score` descending; supports `top_n`.	Results returned in input order; no sorting, no `top_n`.
Cohere / Jina rerank response shape.	Raw vLLM `/v1/score` response shape.
Use as a second-stage reranker over candidates returned by a bi-encoder.	Use for pairwise similarity scoring where input order matters.

Reranking Endpoint

POST /:project_id/:endpoint_slug/v1/rerank

Scores each document against the query and returns results ranked by relevance score in descending order. Optionally limits results to the top N most relevant documents.

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "Berlin is the capital of Germany.",
      "Paris is the capital of France.",
      "Tokyo is the capital of Japan.",
      "France is a country in Western Europe."
    ],
    "top_n": 2,
    "return_documents": true
  }'
                

Rerank Request Parameters

Parameter	Type	Description
modelrequired	string	Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model.
queryrequired	string	The search query or reference text to rank documents against.
documentsrequired	array of strings	Array of document texts to score and rank. At least one document is required; the array is capped at 1000 entries (requests exceeding this limit are rejected with HTTP 400 `invalid_request`).
top_noptional	integer	Maximum number of results to return, sorted by relevance score descending. When omitted, all documents are returned.
return_documentsoptional	boolean	Whether to include the original document text in each result object. Defaults to `true`.
service_tieroptional	string	Priority level: `"flex"`, `"default"`, or `"priority"`. Affects scheduling and billing. Unrecognized values are silently treated as `"default"`. See Service Tiers.

Rerank Response

Response

                        {
  "results": [
    {
      "index": 1,
      "relevance_score": 0.987654,
      "document": {
        "text": "Paris is the capital of France."
      }
    },
    {
      "index": 3,
      "relevance_score": 0.654321,
      "document": {
        "text": "France is a country in Western Europe."
      }
    }
  ],
  "model": "bge-reranker-v2-m3",
  "usage": {
    "total_tokens": 148
  }
}
                    

Field	Type	Description
results	array	Ranked results sorted by `relevance_score` descending. Length is at most `top_n` if specified.
results[].index	integer	Original zero-based position of this document in the input `documents` array.
results[].relevance_score	number	Relevance score between the query and this document. Higher values indicate stronger relevance. Range is model-dependent (typically 0.0 to 1.0).
results[].document	object \| null	Present when `return_documents` is `true`. Contains `text` with the original document string.
model	string	The endpoint's configured model identifier (a UUID string), not the human-readable `model` value supplied in the request. The request-side `model` field is accepted for API compatibility but is replaced server-side.
usage	object	Token usage statistics. Contains `total_tokens`.
usage.total_tokens	integer	Total tokens consumed by the reranking request across all query-document pairs. Best-effort: reported as `0` when the upstream worker omits a usage block.

Scoring Endpoint

POST /:project_id/:endpoint_slug/v1/score

Computes pairwise similarity scores between a single reference text (text_1) and one or more candidate texts (text_2). Unlike /v1/rerank, results are returned in input order without sorting. This endpoint passes the raw vLLM scoring response back to the client.

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "text_1": "hello world",
    "text_2": ["hi there", "goodbye", "greetings"]
  }'
                

Score Request Parameters

Parameter	Type	Description
modelrequired	string	Model identifier. Accepted for API compatibility but overridden server-side by the endpoint's configured model.
text_1required	string	The reference text to score against. Each element in `text_2` is compared to this text.
text_2required	array of strings	Array of candidate texts to compare with `text_1`. At least one candidate is required; the array is capped at 1000 entries (requests exceeding this limit are rejected with HTTP 400 `invalid_request`).
service_tieroptional	string	Priority level: `"flex"`, `"default"`, or `"priority"`. Affects scheduling and billing. Unrecognized values are silently treated as `"default"`. See Service Tiers.

Score Response

The score endpoint returns the raw vLLM scoring response. The structure follows the vLLM /v1/score format with results in the same order as the input text_2 array.

Response

                        {
  "model": "bge-reranker-v2-m3",
  "results": [
    {
      "index": 0,
      "score": 0.812345
    },
    {
      "index": 1,
      "score": 0.023456
    },
    {
      "index": 2,
      "score": 0.753210
    }
  ],
  "usage": {
    "total_tokens": 72
  }
}
                    

Field	Type	Description
model	string	Pass-through from the upstream worker response; not guaranteed to be present. When present, it is the endpoint's configured model identifier (typically a UUID string), not the human-readable `model` value supplied in the request.
results	array	Scored results in the same order as the input `text_2` array.
results[].index	integer	Zero-based position of this result in the input `text_2` array.
results[].score	number	Similarity score between `text_1` and the corresponding `text_2` element. Range is model-dependent.
usage	object \| null	Token usage statistics when reported by the backend. Contains `total_tokens`.

Code Examples

All examples target an endpoint configured with task_mode: "score". Substitute your project external id, endpoint slug, and an API key minted from your project. The picker below remembers your language across every tab group on this page.

Reranking

Python

                    import httpx

BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"

response = httpx.post(
    f"{BASE_URL}/rerank",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "model": "bge-reranker-v2-m3",
        "query": "What is the capital of France?",
        "documents": [
            "Berlin is the capital of Germany.",
            "Paris is the capital of France.",
            "Tokyo is the capital of Japan.",
            "France is a country in Western Europe.",
        ],
        "top_n": 2,
        "return_documents": True,
    },
)
response.raise_for_status()
data = response.json()

for result in data["results"]:
    print(f"Rank: index={result['index']}  score={result['relevance_score']:.4f}")
    if "document" in result:
        print(f"  Text: {result['document']['text']}")
                

Node.js

                    const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";

const response = await fetch(`${BASE_URL}/rerank`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "bge-reranker-v2-m3",
    query: "What is the capital of France?",
    documents: [
      "Berlin is the capital of Germany.",
      "Paris is the capital of France.",
      "Tokyo is the capital of Japan.",
      "France is a country in Western Europe.",
    ],
    top_n: 2,
    return_documents: true,
  }),
});

if (!response.ok) {
  const error = await response.json();
  throw new Error(`Rerank failed: ${error.error.message}`);
}

const data = await response.json();
for (const result of data.results) {
  console.log(`index=${result.index}  score=${result.relevance_score.toFixed(4)}`);
  if (result.document) {
    console.log(`  Text: ${result.document.text}`);
  }
}
                

curl

                    curl -X POST \
  https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/rerank \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "query": "What is the capital of France?",
    "documents": [
      "Berlin is the capital of Germany.",
      "Paris is the capital of France.",
      "Tokyo is the capital of Japan."
    ],
    "top_n": 2,
    "return_documents": true
  }'
                

Scoring

Pairwise similarity, in input order.

Python

                    import httpx

BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1"
API_KEY = "xero_myproject_your_api_key"

response = httpx.post(
    f"{BASE_URL}/score",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "model": "bge-reranker-v2-m3",
        "text_1": "hello world",
        "text_2": ["hi there", "goodbye", "greetings"],
    },
)
response.raise_for_status()
data = response.json()

for result in data["results"]:
    print(f"index={result['index']}  score={result['score']:.6f}")
                

Node.js

                    const BASE_URL = "https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1";
const API_KEY = "xero_myproject_your_api_key";

const response = await fetch(`${BASE_URL}/score`, {
  method: "POST",
  headers: {
    Authorization: `Bearer ${API_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "bge-reranker-v2-m3",
    text_1: "hello world",
    text_2: ["hi there", "goodbye", "greetings"],
  }),
});

if (!response.ok) {
  const error = await response.json();
  throw new Error(`Score failed: ${error.error.message}`);
}

const data = await response.json();
for (const result of data.results) {
  console.log(`index=${result.index}  score=${result.score.toFixed(6)}`);
}
                

curl

                    curl -X POST \
  https://api.xerotier.ai/proj_ABC123/my-rerank-endpoint/v1/score \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-reranker-v2-m3",
    "text_1": "hello world",
    "text_2": ["hi there", "goodbye", "greetings"]
  }'
                

xeroctl CLI

Both endpoints are available via the xeroctl CLI. See xeroctl rerank & score for full CLI documentation.

Shell

                    # Rerank inline documents
xeroctl rerank \
  --endpoint my-rerank-endpoint \
  --query "What is the capital of France?" \
  --documents "Paris is the capital." "Berlin is the capital of Germany." \
  --top-n 1 \
  --return-documents

# Score similarity between a reference and candidates
xeroctl score \
  --endpoint my-rerank-endpoint \
  --text1 "hello world" \
  --text2 "hi there" "goodbye" "greetings"

# Rerank from files
xeroctl rerank \
  --endpoint my-rerank-endpoint \
  --query "search term" \
  doc1.txt doc2.txt doc3.txt

# Output as JSON
xeroctl rerank \
  --endpoint my-rerank-endpoint \
  --query "query" \
  --documents "doc1" "doc2" \
  --output json
                

Error Handling

Reranking and scoring endpoints return the same error format as all other inference endpoints. See Error Handling for the full reference. Errors specific to these endpoints are listed below.

HTTP Status	Error Code	Description
400	`invalid_request`	Missing or invalid parameters. Check that `query` / `text_1` and `documents` / `text_2` are present and non-empty.
400	`model_not_scoring`	The endpoint is not configured for scoring. Create or use an endpoint with `task_mode: "score"` for reranking and scoring requests.
401	`authentication_error`	Invalid or missing API key.
403	`endpoint_restricted`	API key is restricted to a different endpoint, or the client IP is blocked.
404	`endpoint_not_found`	No endpoint with the given slug exists in this project.
429	`rate_limit_exceeded`	Per-key or per-endpoint request rate limit hit. Respect `Retry-After` and use exponential backoff.
429 / 503	`capacity_exceeded`	No worker had available capacity to schedule the request within its deadline. Retry after a short delay; the exact status reflects the underlying routing failure.
503	`endpoint_inactive`	The endpoint is not currently active. It may be provisioning or disabled.
503	`model_provisioning`	The model is being loaded onto a worker. Retry after a short delay.
500	`invalid_tier`	The endpoint is configured with a service tier that the router cannot resolve. Indicates a misconfigured endpoint rather than a transient fault; contact the project administrator.
500	`invalid_model_id`	The endpoint's configured model identifier could not be resolved. Indicates a misconfigured endpoint rather than a transient fault; contact the project administrator.
500	`invalid_worker_response`	The upstream worker returned a response that could not be parsed into the expected reranking or scoring shape. Retry; if persistent, contact support.
500	`internal_error`	An unexpected internal error occurred. Note: unlike `/v1/chat/completions` and `/v1/responses`, the `/v1/rerank` and `/v1/score` handlers do not currently emit an `X-Request-ID` response header. When contacting support, include a sample request, the endpoint slug, and the approximate request time. See Error Handling for the response-id mechanism used by other endpoints.

Related Pages

Embeddings API, Generate dense vector representations for bi-encoder semantic search.
Service Tiers, Details on tier-based scheduling, priority, and billing.
Error Handling, Full error code reference, rate limit headers, and retry guidance.
xeroctl rerank & score, CLI documentation for the rerank and score subcommands.