Batch API

Process large sets of inference requests asynchronously. Upload a JSONL file, create a batch, and retrieve results when processing completes.

Overview

The Batch API enables asynchronous processing of large request sets at lower priority than synchronous traffic. Batches use idle worker capacity and have a 24-hour completion deadline.

The workflow is:

  • Upload -- Upload a JSONL file containing your requests via the Files API.
  • Create -- Create a batch referencing the uploaded file.
  • Poll -- Check batch status until it reaches a terminal state.
  • Download -- Download the output file containing results.

Batch requests use separate rate limit quotas (2x synchronous values) and are billed at the base per-token rate for your endpoint's tier. Batch input and output files are stored using the platform's two-tier storage architecture. For details on storage tiers, encryption, retention, and billing, see Storage.

Supported Endpoints

Batch processing supports the following target endpoints:

Endpoint Description
/v1/chat/completions Batch chat completion requests
/v1/embeddings Batch embedding generation
/v1/responses Batch Responses API processing

All lines within a single batch must target the same endpoint. The target endpoint is specified both in the batch creation request and in each JSONL line's url field.

JSONL Format

Each line in the input file must be a valid JSON object with the following fields:

Field Type Description
custom_idrequired string Unique correlation ID. Must be unique within the file.
methodrequired string Must be "POST".
urlrequired string Target endpoint path (e.g., /v1/chat/completions).
bodyrequired object Request body matching the target endpoint format.

Chat Completions Example

JSONL
{ "custom_id": "req-1", "method": "POST", "url": "/v1/chat/completions", "body": { "model": "llama-3.1-70b", "messages": [{"role": "user", "content": "Hello"}], "max_tokens": 100 } } { "custom_id": "req-2", "method": "POST", "url": "/v1/chat/completions", "body": { "model": "llama-3.1-70b", "messages": [{"role": "user", "content": "What is 2+2?"}], "max_tokens": 100 } }

Embeddings Example

JSONL
{ "custom_id": "emb-1", "method": "POST", "url": "/v1/embeddings", "body": { "model": "bge-large-en-v1.5", "input": "First document text" } } { "custom_id": "emb-2", "method": "POST", "url": "/v1/embeddings", "body": { "model": "bge-large-en-v1.5", "input": "Second document text" } }

Responses API Example

JSONL
{ "custom_id": "resp-1", "method": "POST", "url": "/v1/responses", "body": { "model": "llama-3.1-8b", "input": "Summarize this document." } } { "custom_id": "resp-2", "method": "POST", "url": "/v1/responses", "body": { "model": "llama-3.1-8b", "input": "Translate to French: Hello world" } }

Note: Batch Responses API requests do not support background: true or stream: true. The store parameter defaults to false for batch responses.

Create Batch

POST /:project_id/v1/batches

Parameter Type Description
input_file_idrequired string ID of the uploaded JSONL file.
endpointrequired string Target endpoint: /v1/chat/completions, /v1/embeddings, or /v1/responses.
completion_windowrequired string Must be "24h".
metadataoptional object Custom key-value pairs (max 16 pairs).
curl
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/batches \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -H "Content-Type: application/json" \ -d '{ "input_file_id": "file-abc123def456", "endpoint": "/v1/chat/completions", "completion_window": "24h", "metadata": {"description": "nightly evaluation"} }'

Response

{ "id": "batch_abc123def456", "object": "batch", "endpoint": "/v1/chat/completions", "input_file_id": "file-abc123def456", "output_file_id": null, "error_file_id": null, "status": "validating", "completion_window": "24h", "created_at": 1709000000, "expires_at": 1709086400, "completed_at": null, "cancelled_at": null, "metadata": {"description": "nightly evaluation"}, "request_counts": { "total": 0, "completed": 0, "failed": 0 } }

List Batches

GET /:project_id/v1/batches

curl
curl https://api.xerotier.ai/proj_ABC123/v1/batches?limit=20 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Get Batch

GET /:project_id/v1/batches/{batch_id}

curl
curl https://api.xerotier.ai/proj_ABC123/v1/batches/batch_abc123def456 \ -H "Authorization: Bearer xero_myproject_your_api_key"

Cancel Batch

POST /:project_id/v1/batches/{batch_id}/cancel

Cancellation stops dispatching new requests. In-flight requests complete normally. The batch transitions to cancelled once all in-flight requests finish.

curl
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/batches/batch_abc123def456/cancel \ -H "Authorization: Bearer xero_myproject_your_api_key"

Output Format

When a batch completes, the output_file_id references a JSONL file containing results. Each line corresponds to a request:

Successful Request

JSONL
{ "id": "batch_req_abc123", "custom_id": "req-1", "response": { "status_code": 200, "request_id": "req-xyz", "body": { "id": "chatcmpl-abc", "object": "chat.completion", "choices": [...], "usage": {...} } }, "error": null }

Failed Request

Failed requests appear in the error_file_id JSONL file:

JSONL
{ "id": "batch_req_def456", "custom_id": "req-2", "response": null, "error": { "code": "server_error", "message": "Model unavailable" } }

Download the output file using the Files API:

curl
curl https://api.xerotier.ai/proj_ABC123/v1/files/file-output789/content \ -H "Authorization: Bearer xero_myproject_your_api_key" \ -o results.jsonl

Status Lifecycle

Status Description
validatingInput file is being parsed and validated.
in_progressRequests are being processed.
finalizingBuilding output and error files.
completedAll requests processed. Output files available.
failedUnrecoverable error during processing.
expired24-hour deadline exceeded.
cancellingCancel requested, draining in-flight requests.
cancelledCancellation complete.

CLI Commands

The xeroctl CLI provides batch management commands:

Full batch workflow
# Upload input file xeroctl files upload --purpose batch requests.jsonl # Create batch xeroctl batches create \ --input-file file-abc123def456 \ --endpoint /v1/chat/completions \ --endpoint-slug my-endpoint # Check batch status xeroctl batches get batch_abc123def456 --endpoint-slug my-endpoint # List all batches xeroctl batches list --endpoint-slug my-endpoint # Cancel a batch xeroctl batches cancel batch_abc123def456 --endpoint-slug my-endpoint # Download output when complete xeroctl files download file-output789 -o results.jsonl

Client Examples

Python (requests)

Python
import requests headers = { "Authorization": "Bearer xero_myproject_your_api_key", "Content-Type": "application/json" } base = "https://api.xerotier.ai/proj_ABC123/v1" # Step 1: Upload a JSONL input file with open("requests.jsonl", "rb") as f: upload = requests.post( f"{base}/files", headers={"Authorization": headers["Authorization"]}, files={"file": ("requests.jsonl", f)}, data={"purpose": "batch"} ).json() print(f"Uploaded file: {upload['id']}") # Step 2: Create a batch batch = requests.post(f"{base}/batches", headers=headers, json={ "input_file_id": upload["id"], "endpoint": "/v1/chat/completions", "completion_window": "24h", "metadata": {"description": "nightly evaluation"} }).json() print(f"Batch created: {batch['id']} (status: {batch['status']})") # Step 3: Poll for completion import time while True: status = requests.get( f"{base}/batches/{batch['id']}", headers=headers ).json() print(f"Status: {status['status']} ({status['request_counts']})") if status["status"] in ("completed", "failed", "expired", "cancelled"): break time.sleep(10) # Step 4: Download results if status.get("output_file_id"): output = requests.get( f"{base}/files/{status['output_file_id']}/content", headers={"Authorization": headers["Authorization"]} ) with open("results.jsonl", "wb") as f: f.write(output.content) print("Results saved to results.jsonl") # List batches batches = requests.get(f"{base}/batches?limit=20", headers=headers).json() for b in batches["data"]: print(f"{b['id']} - {b['status']}") # Cancel a batch requests.post(f"{base}/batches/{batch['id']}/cancel", headers=headers)

Node.js (fetch)

JavaScript
import { readFile, writeFile } from "node:fs/promises"; const base = "https://api.xerotier.ai/proj_ABC123/v1"; const headers = { "Authorization": "Bearer xero_myproject_your_api_key", "Content-Type": "application/json" }; // Step 1: Upload a JSONL input file const fileData = await readFile("requests.jsonl"); const formData = new FormData(); formData.append("file", new Blob([fileData]), "requests.jsonl"); formData.append("purpose", "batch"); const uploadRes = await fetch(`${base}/files`, { method: "POST", headers: { "Authorization": headers.Authorization }, body: formData }); const upload = await uploadRes.json(); console.log(`Uploaded file: ${upload.id}`); // Step 2: Create a batch const batchRes = await fetch(`${base}/batches`, { method: "POST", headers, body: JSON.stringify({ input_file_id: upload.id, endpoint: "/v1/chat/completions", completion_window: "24h", metadata: { description: "nightly evaluation" } }) }); const batch = await batchRes.json(); console.log(`Batch created: ${batch.id} (${batch.status})`); // Step 3: Poll for completion let status; while (true) { const res = await fetch(`${base}/batches/${batch.id}`, { headers }); status = await res.json(); console.log(`Status: ${status.status}`); if (["completed", "failed", "expired", "cancelled"].includes(status.status)) break; await new Promise(r => setTimeout(r, 10000)); } // Step 4: Download results if (status.output_file_id) { const outputRes = await fetch( `${base}/files/${status.output_file_id}/content`, { headers: { "Authorization": headers.Authorization } } ); const outputData = await outputRes.arrayBuffer(); await writeFile("results.jsonl", Buffer.from(outputData)); console.log("Results saved to results.jsonl"); }

Error Handling

HTTP Status Error Code Description
400 invalid_request Malformed request or invalid parameters.
400 validation_error JSONL validation failed (malformed lines, duplicate custom_ids, etc.).
401 authentication_error Invalid or missing API key.
404 not_found Batch or input file not found.
429 rate_limit_exceeded Too many requests. Check the Retry-After header.