Embeddings API - Xerotier

Use vector embeddings for:

Semantic search, Find documents similar to a query by comparing embedding vectors.
Clustering, Group documents by topic or theme.
Recommendations, Suggest similar content based on vector proximity.
Classification, Categorize text using nearest-neighbor approaches.

All requests are routed through the same inference pipeline as chat completions, with full support for rate limiting, billing, and service tier priority.

Quick Start

Generate an embedding for a single text input:

Replace the two highlighted placeholders before running any snippet on this page: the model name bge-large-en-v1.5 and the API key xero_myproject_your_api_key. There is no static model catalog, ask your project administrator which embedding model is configured for your endpoint, and substitute an API key minted from your project. The endpoint must be configured with task_mode = "embed".

curl

                    curl -X POST https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": "The capital of France is Paris."
  }'
                

Response

                        {
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]
    }
  ],
  "model": "bge-large-en-v1.5",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
                    

Request Format

POST /:project_id/:endpoint_slug/v1/embeddings

Replace :project_id with your project external ID and :endpoint_slug with the slug of an embedding endpoint configured in your project.

Request Body

Parameter	Type	Description
modelrequired	string	Model identifier. Informational only, the endpoint configuration determines the actual model used.
inputrequired	string \| array	Text to embed. Accepts a string, array of strings, array of token integers, or array of token integer arrays. See Input Formats.
encoding_formatoptional	string	Output encoding: `"float"` (default) or `"base64"`. See Encoding Formats.
dimensionsoptional	integer	Desired output dimensionality. The model must support dimension reduction.
useroptional	string	End-user identifier for abuse monitoring and usage tracking.
service_tieroptional	string	Priority level: `"flex"`, `"default"`, or `"priority"`. See Service Tiers.

Response Format

Field	Type	Description
object	string	Always `"list"`.
data	array	Array of embedding objects. One per input.
data[].object	string	Always `"embedding"`.
data[].index	integer	Index of the corresponding input.
data[].embedding	array \| string	The embedding vector. Float array by default, base64 string if `encoding_format` is `"base64"`.
model	string	Model identifier.
usage	object	Token usage. `prompt_tokens` and `total_tokens` (equal for embeddings).
service_tier	string \| null	Reserved for future use. The field is currently always `null` in router-emitted responses; the assigned tier is not yet echoed back from the worker. Do not depend on this field for routing decisions.

Input Formats

The input field accepts four formats:

Single String

Embed a single text passage:

JSON

"input": "The capital of France is Paris."

String Array

Embed multiple texts in a single request. Each string produces one embedding vector:

JSON

                    "input": [
  "The capital of France is Paris.",
  "Berlin is the capital of Germany.",
  "Tokyo is the capital of Japan."
]
                

Token Array

Pass pre-tokenized input as an array of token integers:

JSON

"input": [1234, 5678, 9012, 3456]

Token Array of Arrays

Multiple pre-tokenized inputs:

JSON

                    "input": [
  [1234, 5678, 9012],
  [3456, 7890, 1234]
]
                

Encoding Formats

Float (Default)

Embeddings are returned as arrays of floating-point numbers:

JSON

"embedding": [0.0023, -0.0091, 0.0152, 0.0371, ...]

Base64

When encoding_format is "base64", embeddings are returned as base64-encoded little-endian float32 byte arrays. This reduces response payload size by approximately 30%.

JSON

"embedding": "AGF2Pz..."

Batch Embeddings

For large-scale embedding workloads, use the Batch API to process embeddings asynchronously at lower priority. Create a JSONL file where each line targets /v1/embeddings:

JSONL

                    {"custom_id": "doc-1", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "First document"}}
{"custom_id": "doc-2", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Second document"}}
{"custom_id": "doc-3", "method": "POST", "url": "/v1/embeddings", "body": {"model": "bge-large-en-v1.5", "input": "Third document"}}
                

Upload the file, create a batch with "endpoint": "/v1/embeddings", and retrieve the output file when processing completes. See the Batch API docs for the full workflow.

Note the path asymmetry: synchronous embedding requests POST to the endpoint-scoped path /:project_id/:endpoint_slug/v1/embeddings, while batch JSONL entries use the bare /v1/embeddings path. The Batch API dispatches each line to the project's configured embedding endpoint internally.

Service Tiers

The optional service_tier parameter adjusts processing priority for individual embedding requests:

Tier	Priority	Cost	Use Case
`flex`	Low	Base rate	Background indexing, non-urgent workloads
`default`	Normal	Base rate	Standard production traffic
`priority`	High	1.25x base rate	Real-time search, latency-critical pipelines

See Service Tiers for details on how priority affects scheduling and billing.

Error Handling

The embeddings handler can emit any of the codes listed below in addition to the generic auth, rate-limit, and capacity errors shared by every router route. Inference engine failures are also surfaced with dynamically mapped HTTP status codes derived from the underlying error class, so callers should treat the table as the documented set and fall back to the envelope code field for any value not listed here.

HTTP Status	Error Code	Description
Client request errors
400	`invalid_input`	The `input` field is missing, empty, exceeds 2048 items, or otherwise fails request validation.
400	`model_not_embedding`	The target endpoint exists but is not configured for embeddings (`task_mode` is not `"embed"`).
404	`endpoint_not_found`	The `:endpoint_slug` in the URL does not match any endpoint in the project.
408	`timeout`	The worker did not produce a response within the deadline. Other inference-engine errors are mapped dynamically (see note above).
Auth and quota
401	`authentication_error`	Invalid or missing API key.
402	`insufficient_quota` / `billing_delinquent`	Project has no remaining credit, or billing is past due. The credit-hold reservation step rejects the request before dispatch.
429	`rate_limit_exceeded`	Too many requests. Check the `Retry-After` header.
Server (5xx)
500	`model_not_found`	The endpoint references a model identifier that the router could not resolve.
500	`invalid_tier`	The endpoint's configured service tier is unknown to the router.
500	`invalid_model_id`	The resolved model identifier could not be parsed when constructing the worker request.
500	`invalid_embeddings_response`	The worker reply could not be decoded as a valid embeddings response.
Capacity and availability
503	`endpoint_inactive`	The target endpoint exists but is currently disabled.
503	`model_provisioning`	The endpoint is starting up and not yet ready to serve traffic. Retry after a short delay.
503	`capacity_exceeded`	No available workers. The `Retry-After` header is set on the rate-limited path but may be absent on some capacity-rejection branches.

Client Examples

Python (OpenAI SDK)

Python

                    from openai import OpenAI

client = OpenAI(
    base_url="https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
    api_key="xero_myproject_your_api_key"
)

# Single embedding
response = client.embeddings.create(
    model="bge-large-en-v1.5",
    input="The capital of France is Paris."
)
print(f"Dimensions: {len(response.data[0].embedding)}")

# Multiple embeddings
response = client.embeddings.create(
    model="bge-large-en-v1.5",
    input=[
        "Document one",
        "Document two",
        "Document three"
    ]
)
for item in response.data:
    print(f"Index {item.index}: {len(item.embedding)} dimensions")
                

Node.js (OpenAI SDK)

JavaScript

                    import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.xerotier.ai/proj_ABC123/my-endpoint/v1",
  apiKey: "xero_myproject_your_api_key",
});

const response = await client.embeddings.create({
  model: "bge-large-en-v1.5",
  input: "The capital of France is Paris.",
});

console.log(`Dimensions: ${response.data[0].embedding.length}`);
                

curl, Multiple Inputs

curl

                    curl -X POST \
  https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": [
      "First document to embed",
      "Second document to embed"
    ]
  }'
                

curl, Base64 Encoding

curl

                    curl -X POST \
  https://api.xerotier.ai/proj_ABC123/my-endpoint/v1/embeddings \
  -H "Authorization: Bearer xero_myproject_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "bge-large-en-v1.5",
    "input": "Hello world",
    "encoding_format": "base64"
  }'