Embeddings API
Generate vector representations of text using the OpenAI-compatible /v1/embeddings endpoint. Use embeddings for search, clustering, recommendations, and classification.
Overview
The Embeddings API converts text input into dense vector representations (embeddings) that capture semantic meaning. These vectors can be used for:
- Semantic search -- Find documents similar to a query by comparing embedding vectors.
- Clustering -- Group documents by topic or theme.
- Recommendations -- Suggest similar content based on vector proximity.
- Classification -- Categorize text using nearest-neighbor approaches.
All requests are routed through the same inference pipeline as chat completions, with full support for rate limiting, billing, and service tier priority.
Quick Start
Generate an embedding for a single text input:
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "The capital of France is Paris."
}'
Response
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]
}
],
"model": "bge-large-en-v1.5",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
Request Format
POST /v1/embeddings
Request Body
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Informational only -- the endpoint configuration determines the actual model used. |
| inputrequired | string | array | Text to embed. Accepts a string, array of strings, array of token integers, or array of token integer arrays. See Input Formats. |
| encoding_formatoptional | string | Output encoding: "float" (default) or "base64". See Encoding Formats. |
| dimensionsoptional | integer | Desired output dimensionality. The model must support dimension reduction. |
| useroptional | string | End-user identifier for abuse monitoring and usage tracking. |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". See Service Tiers. |
Response Format
| Field | Type | Description |
|---|---|---|
| object | string | Always "list". |
| data | array | Array of embedding objects. One per input. |
| data[].object | string | Always "embedding". |
| data[].index | integer | Index of the corresponding input. |
| data[].embedding | array | string | The embedding vector. Float array by default, base64 string if encoding_format is "base64". |
| model | string | Model identifier. |
| usage | object | Token usage. prompt_tokens and total_tokens (equal for embeddings). |
Input Formats
The input field accepts four formats:
Single String
Embed a single text passage:
"input": "The capital of France is Paris."
String Array
Embed multiple texts in a single request. Each string produces one embedding vector:
"input": [
"The capital of France is Paris.",
"Berlin is the capital of Germany.",
"Tokyo is the capital of Japan."
]
Token Array
Pass pre-tokenized input as an array of token integers:
"input": [1234, 5678, 9012, 3456]
Token Array of Arrays
Multiple pre-tokenized inputs:
"input": [
[1234, 5678, 9012],
[3456, 7890, 1234]
]
Encoding Formats
Float (Default)
Embeddings are returned as arrays of floating-point numbers:
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]
Base64
When encoding_format is "base64", embeddings are returned
as base64-encoded little-endian float32 byte arrays. This reduces response payload
size by approximately 30%.
"embedding": "AGF2Pz..."
Batch Embeddings
For large-scale embedding workloads, use the Batch API
to process embeddings asynchronously at lower priority. Create a JSONL file where
each line targets /v1/embeddings:
{"custom_id": "doc-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "First document"}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Second document"}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Third document"}}
Upload the file, create a batch with "endpoint": "/v1/embeddings",
and retrieve the output file when processing completes. See the
Batch API docs for the full workflow.
Service Tiers
The optional service_tier parameter adjusts processing priority
for individual embedding requests:
| Tier | Priority | Cost | Use Case |
|---|---|---|---|
flex |
Low | Base rate | Background indexing, non-urgent workloads |
default |
Normal | Base rate | Standard production traffic |
priority |
High | 1.25x base rate | Real-time search, latency-critical pipelines |
See Service Tiers for details on how priority affects scheduling and billing.
Error Handling
| HTTP Status | Error Code | Description |
|---|---|---|
| 400 | invalid_request |
Missing or invalid parameters (bad input format, unsupported encoding, etc.). |
| 401 | authentication_error |
Invalid or missing API key. |
| 429 | rate_limit_exceeded |
Too many requests. Check the Retry-After header. |
| 503 | capacity_exceeded |
No available workers. Check the Retry-After header. |
Client Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Single embedding
response = client.embeddings.create(
model="bge-large-en-v1.5",
input="The capital of France is Paris."
)
print(f"Dimensions: {len(response.data[0].embedding)}")
# Multiple embeddings
response = client.embeddings.create(
model="bge-large-en-v1.5",
input=[
"Document one",
"Document two",
"Document three"
]
)
for item in response.data:
print(f"Index {item.index}: {len(item.embedding)} dimensions")
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
const response = await client.embeddings.create({
model: "bge-large-en-v1.5",
input: "The capital of France is Paris.",
});
console.log(`Dimensions: ${response.data[0].embedding.length}`);
curl (Multiple Inputs)
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": [
"First document to embed",
"Second document to embed"
]
}'
curl (Base64 Encoding)
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "Hello world",
"encoding_format": "base64"
}'