Embeddings API

Generate vector representations of text using the OpenAI-compatible /v1/embeddings endpoint. Use embeddings for search, clustering, recommendations, and classification.

Overview

The Embeddings API converts text input into dense vector representations (embeddings) that capture semantic meaning. These vectors can be used for:

  • Semantic search -- Find documents similar to a query by comparing embedding vectors.
  • Clustering -- Group documents by topic or theme.
  • Recommendations -- Suggest similar content based on vector proximity.
  • Classification -- Categorize text using nearest-neighbor approaches.

All requests are routed through the same inference pipeline as chat completions, with full support for rate limiting, billing, and service tier priority.

Quick Start

Generate an embedding for a single text input:

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": "The capital of France is Paris." }'

Response

{ "object": "list", "data": [ { "object": "embedding", "index": 0, "embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...] } ], "model": "bge-large-en-v1.5", "usage": { "prompt_tokens": 8, "total_tokens": 8 } }

Request Format

POST /v1/embeddings

Request Body

Parameter Type Description
modelrequired string Model identifier. Informational only -- the endpoint configuration determines the actual model used.
inputrequired string | array Text to embed. Accepts a string, array of strings, array of token integers, or array of token integer arrays. See Input Formats.
encoding_formatoptional string Output encoding: "float" (default) or "base64". See Encoding Formats.
dimensionsoptional integer Desired output dimensionality. The model must support dimension reduction.
useroptional string End-user identifier for abuse monitoring and usage tracking.
service_tieroptional string Priority level: "flex", "default", or "priority". See Service Tiers.

Response Format

Field Type Description
object string Always "list".
data array Array of embedding objects. One per input.
data[].object string Always "embedding".
data[].index integer Index of the corresponding input.
data[].embedding array | string The embedding vector. Float array by default, base64 string if encoding_format is "base64".
model string Model identifier.
usage object Token usage. prompt_tokens and total_tokens (equal for embeddings).

Input Formats

The input field accepts four formats:

Single String

Embed a single text passage:

JSON
"input": "The capital of France is Paris."

String Array

Embed multiple texts in a single request. Each string produces one embedding vector:

JSON
"input": [ "The capital of France is Paris.", "Berlin is the capital of Germany.", "Tokyo is the capital of Japan." ]

Token Array

Pass pre-tokenized input as an array of token integers:

JSON
"input": [1234, 5678, 9012, 3456]

Token Array of Arrays

Multiple pre-tokenized inputs:

JSON
"input": [ [1234, 5678, 9012], [3456, 7890, 1234] ]

Encoding Formats

Float (Default)

Embeddings are returned as arrays of floating-point numbers:

JSON
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]

Base64

When encoding_format is "base64", embeddings are returned as base64-encoded little-endian float32 byte arrays. This reduces response payload size by approximately 30%.

JSON
"embedding": "AGF2Pz..."

Batch Embeddings

For large-scale embedding workloads, use the Batch API to process embeddings asynchronously at lower priority. Create a JSONL file where each line targets /v1/embeddings:

JSONL
{"custom_id": "doc-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "First document"}} {"custom_id": "doc-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Second document"}} {"custom_id": "doc-3", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Third document"}}

Upload the file, create a batch with "endpoint": "/v1/embeddings", and retrieve the output file when processing completes. See the Batch API docs for the full workflow.

Service Tiers

The optional service_tier parameter adjusts processing priority for individual embedding requests:

Tier Priority Cost Use Case
flex Low Base rate Background indexing, non-urgent workloads
default Normal Base rate Standard production traffic
priority High 1.25x base rate Real-time search, latency-critical pipelines

See Service Tiers for details on how priority affects scheduling and billing.

Error Handling

HTTP Status Error Code Description
400 invalid_request Missing or invalid parameters (bad input format, unsupported encoding, etc.).
401 authentication_error Invalid or missing API key.
429 rate_limit_exceeded Too many requests. Check the Retry-After header.
503 capacity_exceeded No available workers. Check the Retry-After header.

Client Examples

Python (OpenAI SDK)

Python
from openai import OpenAI client = OpenAI( base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", api_key="xero_myproject_your_api_key" ) # Single embedding response = client.embeddings.create( model="bge-large-en-v1.5", input="The capital of France is Paris." ) print(f"Dimensions: {len(response.data[0].embedding)}") # Multiple embeddings response = client.embeddings.create( model="bge-large-en-v1.5", input=[ "Document one", "Document two", "Document three" ] ) for item in response.data: print(f"Index {item.index}: {len(item.embedding)} dimensions")

Node.js (OpenAI SDK)

JavaScript
import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1", apiKey: "xero_myproject_your_api_key", }); const response = await client.embeddings.create({ model: "bge-large-en-v1.5", input: "The capital of France is Paris.", }); console.log(`Dimensions: ${response.data[0].embedding.length}`);

curl (Multiple Inputs)

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": [ "First document to embed", "Second document to embed" ] }'

curl (Base64 Encoding)

curl
curl -X POST \ https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "model": "bge-large-en-v1.5", "input": "Hello world", "encoding_format": "base64" }'