Embeddings
Text in, vector out, on the OpenAI /v1/embeddings shape. Run semantic search, clustering, recommendation, or classification against your own endpoint, with the same batching, encoding-format options, and tier billing as the inference pipeline.
Use vector embeddings for:
- Semantic search, Find documents similar to a query by comparing embedding vectors.
- Clustering, Group documents by topic or theme.
- Recommendations, Suggest similar content based on vector proximity.
- Classification, Categorize text using nearest-neighbor approaches.
All requests are routed through the same inference pipeline as chat completions, with full support for rate limiting, billing, and service tier priority.
Quick Start
Generate an embedding for a single text input:
Replace the two highlighted placeholders before running any snippet on
this page: the model name
bge-large-en-v1.5 and the API key
xero_myproject_your_api_key.
There is no static model catalog, ask your project administrator which
embedding model is configured for your endpoint, and substitute an API
key minted from your project. The endpoint must be configured with
task_mode = "embed".
curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "The capital of France is Paris."
}'
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]
}
],
"model": "bge-large-en-v1.5",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
Request Format
POST /:project_id/:endpoint_slug/v1/embeddings
Replace :project_id with your project external ID and
:endpoint_slug with the slug of an embedding endpoint configured
in your project.
Request Body
| Parameter | Type | Description |
|---|---|---|
| modelrequired | string | Model identifier. Informational only, the endpoint configuration determines the actual model used. |
| inputrequired | string | array | Text to embed. Accepts a string, array of strings, array of token integers, or array of token integer arrays. See Input Formats. |
| encoding_formatoptional | string | Output encoding: "float" (default) or "base64". See Encoding Formats. |
| dimensionsoptional | integer | Desired output dimensionality. The model must support dimension reduction. |
| useroptional | string | End-user identifier for abuse monitoring and usage tracking. |
| service_tieroptional | string | Priority level: "flex", "default", or "priority". See Service Tiers. |
Response Format
| Field | Type | Description |
|---|---|---|
| object | string | Always "list". |
| data | array | Array of embedding objects. One per input. |
| data[].object | string | Always "embedding". |
| data[].index | integer | Index of the corresponding input. |
| data[].embedding | array | string | The embedding vector. Float array by default, base64 string if encoding_format is "base64". |
| model | string | Model identifier. |
| usage | object | Token usage. prompt_tokens and total_tokens (equal for embeddings). |
| service_tier | string | null | Reserved for future use. The field is currently always null in router-emitted responses; the assigned tier is not yet echoed back from the worker. Do not depend on this field for routing decisions. |
Input Formats
The input field accepts four formats:
Single String
Embed a single text passage:
"input": "The capital of France is Paris."
String Array
Embed multiple texts in a single request. Each string produces one embedding vector:
"input": [
"The capital of France is Paris.",
"Berlin is the capital of Germany.",
"Tokyo is the capital of Japan."
]
Token Array
Pass pre-tokenized input as an array of token integers:
"input": [1234, 5678, 9012, 3456]
Token Array of Arrays
Multiple pre-tokenized inputs:
"input": [
[1234, 5678, 9012],
[3456, 7890, 1234]
]
Encoding Formats
Float (Default)
Embeddings are returned as arrays of floating-point numbers:
"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]
Base64
When encoding_format is "base64", embeddings are returned
as base64-encoded little-endian float32 byte arrays. This reduces response payload
size by approximately 30%.
"embedding": "AGF2Pz..."
Batch Embeddings
For large-scale embedding workloads, use the Batch API
to process embeddings asynchronously at lower priority. Create a JSONL file where
each line targets /v1/embeddings:
{"custom_id": "doc-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "First document"}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Second document"}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Third document"}}
Upload the file, create a batch with "endpoint": "/v1/embeddings",
and retrieve the output file when processing completes. See the
Batch API docs for the full workflow.
Note the path asymmetry: synchronous embedding requests POST to the
endpoint-scoped path
/:project_id/:endpoint_slug/v1/embeddings, while batch
JSONL entries use the bare /v1/embeddings path. The Batch
API dispatches each line to the project's configured embedding
endpoint internally.
Service Tiers
The optional service_tier parameter adjusts processing priority
for individual embedding requests:
| Tier | Priority | Cost | Use Case |
|---|---|---|---|
flex |
Low | Base rate | Background indexing, non-urgent workloads |
default |
Normal | Base rate | Standard production traffic |
priority |
High | 1.25x base rate | Real-time search, latency-critical pipelines |
See Service Tiers for details on how priority affects scheduling and billing.
Error Handling
The embeddings handler can emit any of the codes listed below in
addition to the generic auth, rate-limit, and capacity errors shared
by every router route. Inference engine failures are also surfaced
with dynamically mapped HTTP status codes derived from the underlying
error class, so callers should treat the table as the documented set
and fall back to the envelope code field for any value
not listed here.
| HTTP Status | Error Code | Description |
|---|---|---|
| Client request errors | ||
| 400 | invalid_input |
The input field is missing, empty, exceeds 2048 items, or otherwise fails request validation. |
| 400 | model_not_embedding |
The target endpoint exists but is not configured for embeddings (task_mode is not "embed"). |
| 404 | endpoint_not_found |
The :endpoint_slug in the URL does not match any endpoint in the project. |
| 408 | timeout |
The worker did not produce a response within the deadline. Other inference-engine errors are mapped dynamically (see note above). |
| Auth and quota | ||
| 401 | authentication_error |
Invalid or missing API key. |
| 402 | insufficient_quota / billing_delinquent |
Project has no remaining credit, or billing is past due. The credit-hold reservation step rejects the request before dispatch. |
| 429 | rate_limit_exceeded |
Too many requests. Check the Retry-After header. |
| Server (5xx) | ||
| 500 | model_not_found |
The endpoint references a model identifier that the router could not resolve. |
| 500 | invalid_tier |
The endpoint's configured service tier is unknown to the router. |
| 500 | invalid_model_id |
The resolved model identifier could not be parsed when constructing the worker request. |
| 500 | invalid_embeddings_response |
The worker reply could not be decoded as a valid embeddings response. |
| Capacity and availability | ||
| 503 | endpoint_inactive |
The target endpoint exists but is currently disabled. |
| 503 | model_provisioning |
The endpoint is starting up and not yet ready to serve traffic. Retry after a short delay. |
| 503 | capacity_exceeded |
No available workers. The Retry-After header is set on the rate-limited path but may be absent on some capacity-rejection branches. |
Client Examples
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
api_key="xero_myproject_your_api_key"
)
# Single embedding
response = client.embeddings.create(
model="bge-large-en-v1.5",
input="The capital of France is Paris."
)
print(f"Dimensions: {len(response.data[0].embedding)}")
# Multiple embeddings
response = client.embeddings.create(
model="bge-large-en-v1.5",
input=[
"Document one",
"Document two",
"Document three"
]
)
for item in response.data:
print(f"Index {item.index}: {len(item.embedding)} dimensions")
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
apiKey: "xero_myproject_your_api_key",
});
const response = await client.embeddings.create({
model: "bge-large-en-v1.5",
input: "The capital of France is Paris.",
});
console.log(`Dimensions: ${response.data[0].embedding.length}`);
curl, Multiple Inputs
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": [
"First document to embed",
"Second document to embed"
]
}'
curl, Base64 Encoding
curl -X POST \
https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
-H "Authorization: Bearer xero_myproject_your_api_key" \
-H "Content-Type: application/json" \
-d '{
"model": "bge-large-en-v1.5",
"input": "Hello world",
"encoding_format": "base64"
}'