// API Reference

Embeddings

Text in, vector out, on the OpenAI /v1/embeddings shape. Run semantic search, clustering, recommendation, or classification against your own endpoint, with the same batching, encoding-format options, and tier billing as the inference pipeline.

Use vector embeddings for:

  • Semantic search, Find documents similar to a query by comparing embedding vectors.
  • Clustering, Group documents by topic or theme.
  • Recommendations, Suggest similar content based on vector proximity.
  • Classification, Categorize text using nearest-neighbor approaches.

All requests are routed through the same inference pipeline as chat completions, with full support for rate limiting, billing, and service tier priority.

Quick Start

Generate an embedding for a single text input:

Replace the two highlighted placeholders before running any snippet on this page: the model name bge-large-en-v1.5 and the API key xero_myproject_your_api_key. There is no static model catalog, ask your project administrator which embedding model is configured for your endpoint, and substitute an API key minted from your project. The endpoint must be configured with task_mode = "embed".

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": "The capital of France is Paris." }'
Response
{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...] } ], "model": "bge-large-en-v1.5", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

Request Format

POST /:project_id/:endpoint_slug/v1/embeddings

Replace :project_id with your project external ID and :endpoint_slug with the slug of an embedding endpoint configured in your project.

Request Body

Parameter Type Description
modelrequired string Model identifier. Informational only, the endpoint configuration determines the actual model used.
inputrequired string | array Text to embed. Accepts a string, array of strings, array of token integers, or array of token integer arrays. See Input Formats.
encoding_formatoptional string Output encoding: "float" (default) or "base64". See Encoding Formats.
dimensionsoptional integer Desired output dimensionality. The model must support dimension reduction.
useroptional string End-user identifier for abuse monitoring and usage tracking.
service_tieroptional string Priority level: "flex", "default", or "priority". See Service Tiers.

Response Format

Field Type Description
object string Always "list".
data array Array of embedding objects. One per input.
data[].object string Always "embedding".
data[].index integer Index of the corresponding input.
data[].embedding array | string The embedding vector. Float array by default, base64 string if encoding_format is "base64".
model string Model identifier.
usage object Token usage. prompt_tokens and total_tokens (equal for embeddings).
service_tier string | null Reserved for future use. The field is currently always null in router-emitted responses; the assigned tier is not yet echoed back from the worker. Do not depend on this field for routing decisions.

Input Formats

The input field accepts four formats:

Single String

Embed a single text passage:

JSON
"input": "The capital of France is Paris."

String Array

Embed multiple texts in a single request. Each string produces one embedding vector:

JSON
"input": [ "The capital of France is Paris.", "Berlin is the capital of Germany.", "Tokyo is the capital of Japan." ]

Token Array

Pass pre-tokenized input as an array of token integers:

JSON
"input": [1234, 5678, 9012, 3456]

Token Array of Arrays

Multiple pre-tokenized inputs:

JSON
"input": [ [1234, 5678, 9012], [3456, 7890, 1234] ]

Encoding Formats

Float (Default)

Embeddings are returned as arrays of floating-point numbers:

JSON
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]

Base64

When encoding_format is "base64", embeddings are returned as base64-encoded little-endian float32 byte arrays. This reduces response payload size by approximately 30%.

JSON
"embedding": "AGF2Pz..."

Batch Embeddings

For large-scale embedding workloads, use the Batch API to process embeddings asynchronously at lower priority. Create a JSONL file where each line targets /v1/embeddings:

JSONL
{"custom_id": "doc-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "First document"}} {"custom_id": "doc-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Second document"}} {"custom_id": "doc-3", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Third document"}}

Upload the file, create a batch with "endpoint": "/v1/embeddings", and retrieve the output file when processing completes. See the Batch API docs for the full workflow.

Note the path asymmetry: synchronous embedding requests POST to the endpoint-scoped path /:project_id/:endpoint_slug/v1/embeddings, while batch JSONL entries use the bare /v1/embeddings path. The Batch API dispatches each line to the project's configured embedding endpoint internally.

Service Tiers

The optional service_tier parameter adjusts processing priority for individual embedding requests:

Tier Priority Cost Use Case
flex Low Base rate Background indexing, non-urgent workloads
default Normal Base rate Standard production traffic
priority High 1.25x base rate Real-time search, latency-critical pipelines

See Service Tiers for details on how priority affects scheduling and billing.

Error Handling

The embeddings handler can emit any of the codes listed below in addition to the generic auth, rate-limit, and capacity errors shared by every router route. Inference engine failures are also surfaced with dynamically mapped HTTP status codes derived from the underlying error class, so callers should treat the table as the documented set and fall back to the envelope code field for any value not listed here.

HTTP Status Error Code Description
Client request errors
400 invalid_input The input field is missing, empty, exceeds 2048 items, or otherwise fails request validation.
400 model_not_embedding The target endpoint exists but is not configured for embeddings (task_mode is not "embed").
404 endpoint_not_found The :endpoint_slug in the URL does not match any endpoint in the project.
408 timeout The worker did not produce a response within the deadline. Other inference-engine errors are mapped dynamically (see note above).
Auth and quota
401 authentication_error Invalid or missing API key.
402 insufficient_quota / billing_delinquent Project has no remaining credit, or billing is past due. The credit-hold reservation step rejects the request before dispatch.
429 rate_limit_exceeded Too many requests. Check the Retry-After header.
Server (5xx)
500 model_not_found The endpoint references a model identifier that the router could not resolve.
500 invalid_tier The endpoint's configured service tier is unknown to the router.
500 invalid_model_id The resolved model identifier could not be parsed when constructing the worker request.
500 invalid_embeddings_response The worker reply could not be decoded as a valid embeddings response.
Capacity and availability
503 endpoint_inactive The target endpoint exists but is currently disabled.
503 model_provisioning The endpoint is starting up and not yet ready to serve traffic. Retry after a short delay.
503 capacity_exceeded No available workers. The Retry-After header is set on the rate-limited path but may be absent on some capacity-rejection branches.

Client Examples

Python (OpenAI SDK)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Single embedding response = client.embeddings.create( model="bge-large-en-v1.5", input="The capital of France is Paris." ) print(f"Dimensions: {len(response.data[0].embedding)}") # Multiple embeddings response = client.embeddings.create( model="bge-large-en-v1.5", input=[ "Document one", "Document two", "Document three" ] ) for item in response.data: print(f"Index {item.index}: {len(item.embedding)} dimensions")

Node.js (OpenAI SDK)

JavaScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const response = await client.embeddings.create({ model: "bge-large-en-v1.5", input: "The capital of France is Paris.", }); console.log(`Dimensions: ${response.data[0].embedding.length}`);

curl, Multiple Inputs

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": [ "First document to embed", "Second document to embed" ] }'

curl, Base64 Encoding

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": "Hello world", "encoding_format": "base64" }'