Model Versioning
Semver-tracked model versions with promote and rollback semantics. Versions are real rows; the active one is a pointer you can move.
- // wire shape
- snake_case
- // auth scope
- project key,
xero_<project>_<secret> - // rollback
- pointer move; live workers continue until reload
01 Model Versioning
Every model in Xerotier has a semantic version (semver) string. When you
upload a model, it is assigned version 1.0.0 by default. You
can create new versions of a model, track version history, and control
which version is active.
The version-management routes are mounted under the same per-project
inference scope as /v1/chat/completions; the API key in the
Authorization header is the same key form
(xero_<project>_<secret>) you use for inference
requests. No separate management scope is required.
Version Response Fields (wire shape)
All version responses use snake_case JSON keys. The request body decoder also accepts snake_case (the canonical form); camelCase keys may decode on input but the rest of the OpenAI-compatible surface uses snake_case everywhere.
| Wire key | Type | Description |
|---|---|---|
id |
string | Version row UUID. |
object |
string | Always "model.version" on a single-version response. |
version |
string | Semver version string (e.g., "1.0.0", "2.1.3"). Defaults to "1.0.0". |
is_latest |
boolean | Whether this is the active version. At most one version per model is marked as latest. |
created |
integer | Unix epoch seconds when the version row was created. |
version_notes |
string | null | Optional description of changes in this version. |
status |
string | null | Version lifecycle status (mirrors the parent model's status field). |
Version Management with xeroctl
# List versions for a model (default action when only the model id is given)
xeroctl models versions <model-id>
# Create a new version
xeroctl models versions <model-id> --create 2.0.0 --notes "Improved accuracy"
# Promote a version to latest
xeroctl models versions <model-id> --promote 2.0.0
# Rollback to a previous version
xeroctl models versions <model-id> --rollback 1.0.0
Version Management with the API
# List versions for a model
curl https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \
-H "Authorization: Bearer xero_my-project_abc123"
# Create a new version
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \
-H "Authorization: Bearer xero_my-project_abc123" \
-H "Content-Type: application/json" \
-d '{
"version": "2.0.0",
"version_notes": "Improved accuracy"
}'
# Promote a version to latest
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/2.0.0/promote \
-H "Authorization: Bearer xero_my-project_abc123"
# Rollback to a previous version
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/1.0.0/rollback \
-H "Authorization: Bearer xero_my-project_abc123"
Example GET .../versions response (OpenAI list envelope):
{
"object": "list",
"data": [
{
"id": "9b1d...e2",
"object": "model.version",
"version": "1.0.0",
"is_latest": false,
"created": 1700000000,
"version_notes": null,
"status": "ready"
},
{
"id": "c4f0...77",
"object": "model.version",
"version": "2.0.0",
"is_latest": true,
"created": 1701000000,
"version_notes": "Improved accuracy",
"status": "ready"
}
]
}
import requests
headers = {"Authorization": "Bearer xero_my-project_abc123"}
base = "https://api.xerotier.ai/proj_ABC123/v1"
model_id = "MODEL_ID"
# List versions (envelope: {"object": "list", "data": [...]})
versions = requests.get(
f"{base}/models/{model_id}/versions",
headers=headers
).json()
for v in versions.get("data", []):
print(f" {v['version']} (latest={v['is_latest']})")
# Create a new version
response = requests.post(
f"{base}/models/{model_id}/versions",
headers=headers,
json={"version": "2.0.0", "version_notes": "Improved accuracy"}
)
print(f"Created version: {response.json()}")
# Promote a version
requests.post(
f"{base}/models/{model_id}/versions/2.0.0/promote",
headers=headers
)
# Rollback to a previous version
requests.post(
f"{base}/models/{model_id}/versions/1.0.0/rollback",
headers=headers
)
const headers = {
"Authorization": "Bearer xero_my-project_abc123",
"Content-Type": "application/json"
};
const base = "https://api.xerotier.ai/proj_ABC123/v1";
const modelId = "MODEL_ID";
// List versions (envelope: { object: "list", data: [...] })
const versionsResponse = await fetch(
`${base}/models/${modelId}/versions`,
{ headers }
);
const versions = await versionsResponse.json();
for (const v of (versions.data || [])) {
console.log(` ${v.version} (latest=${v.is_latest})`);
}
// Create a new version
const createResponse = await fetch(
`${base}/models/${modelId}/versions`,
{
method: "POST",
headers,
body: JSON.stringify({
version: "2.0.0",
version_notes: "Improved accuracy"
})
}
);
console.log("Created version:", await createResponse.json());
// Promote a version
await fetch(
`${base}/models/${modelId}/versions/2.0.0/promote`,
{ method: "POST", headers }
);
// Rollback to a previous version
await fetch(
`${base}/models/${modelId}/versions/1.0.0/rollback`,
{ method: "POST", headers }
);
02 Model Metadata
Each model carries extensive metadata that affects routing, inference behavior, and display in the catalog.
Core Properties
Field names below are the snake_case JSON keys emitted on the wire
(matching the RouterModelExtended CodingKeys).
| Wire key | Type | Description |
|---|---|---|
name |
string | Model display name. |
format |
string | Storage format: safetensors, bin, exl2, or directory. |
size_bytes |
integer | Total model size in bytes. |
status |
string | Current state: uploading, validating, ready, or error. |
architecture |
string | null | Model architecture family (e.g., "llama", "qwen", "mistral"). |
parameter_count |
integer | null | Number of model parameters. |
context_length |
integer | null | Maximum context window in tokens. Affects max_tokens auto-clamping at the router. |
workload_type |
string | null | Workload classification (see Workload Types below). |
is_shared |
boolean | null | Whether the model is published to the shared catalog. |
catalog_role |
string | null | Catalog role: deployable or shared. |
Additional model-metadata columns extracted at validation time
(architecture-derived fields such as hidden size, layer count, vocab
size, torch dtype, and MoE expert counts) are stored on the database
row and used internally for VRAM estimation. They are not surfaced on
the public RouterModelExtended response today; consult the
model catalog admin tooling if you need to inspect them.
03 Model Lifecycle
Models progress through a series of status transitions from upload to availability. The diagram is authoritative; the table below mirrors it for screen readers and search.
stateDiagram-v2
direction LR
[*] --> uploading
uploading --> validating: upload complete
validating --> ready: metadata valid
validating --> error: validation failed
error --> validating: revalidate
ready --> [*]
| Status | Description |
|---|---|
uploading |
Model files are being uploaded. Not yet available for inference. |
validating |
Upload complete. The system is validating model files, extracting metadata (architecture, parameters, context length), and checking compatibility. |
ready |
Validation passed. The model can be assigned to endpoints and loaded on backends. |
error |
Validation failed. The validationError field carries the diagnostic. Fix the upstream artifact and trigger a revalidate (see below) or re-upload. |
Trigger revalidation with xeroctl models revalidate <model-id>
or POST /{project_id}/v1/models/{model_id}/revalidate. This
re-checks model files and refreshes metadata without a re-upload.
Model Loading
When a model is assigned to an endpoint and a request arrives, the router sends a load request to a compatible backend. The backend auto-configures:
- Context length, Auto-detected from model config if not specified.
- Max sequences, Auto-calculated based on available resources.
- Quantization, Selected based on model size vs available GPU VRAM (see Quantization).
04 Workload Types
Each model can be tagged with a workload type that describes its primary use case. Workload type is used for filtering in the model catalog and does not affect inference behavior or routing.
| Type | Description |
|---|---|
chat |
General-purpose conversational models. Default workload type. |
code |
Code generation and completion models. |
reasoning |
Models optimized for chain-of-thought and analytical tasks. |
embedding |
Text embedding models for semantic search and similarity. |
multilingual |
Models with strong multilingual support. |
05 Quantization
Quantization reduces model size and memory requirements by using lower-precision number formats. Xerotier supports both pre-quantized models and runtime quantization.
Pre-Quantized Models
Some models are distributed with quantization already applied. The fields below are camelCase on the wire because they originate on the backend-agent load acknowledgment, not on the public model API.
| Field | Surface | Description |
|---|---|---|
preQuantizationMethod |
backend-internal | Method used: compressed-tensors, gptq, or awq. |
preQuantizationBits |
backend-internal | Bit-width of the pre-quantized weights. Common values are 4 and 8; the field is a free-form integer and may carry other widths if the upstream artifact does. |
Runtime Quantization
If a model does not fit in available GPU memory at full precision, the backend agent can apply runtime quantization automatically. The load acknowledgment carries:
| Field | Surface | Description |
|---|---|---|
appliedQuantization |
backend-internal | Method applied (e.g., bitsandbytes, fp8, bitsandbytes-fp4, awq, gptq). |
appliedQuantizationReason |
backend-internal | Free-form rationale string set by the backend agent describing why the method was chosen. |
Runtime quantization is transparent to the user. The model functions the same way with slightly reduced precision and memory footprint.
MoE Model Support
Mixture-of-Experts (MoE) models are supported. MoE-specific metadata fields are stored on the model row (total experts and experts activated per token) and consumed by the backend agent at load time to select tuned kernel configurations. The agent picks an MoE kernel profile automatically; there is no per-request public API knob.