// Model Management

Model Versioning

Semver-tracked model versions with promote and rollback semantics. Versions are real rows; the active one is a pointer you can move.

// wire shape
snake_case
// auth scope
project key, xero_<project>_<secret>
// rollback
pointer move; live workers continue until reload

01 Model Versioning

Every model in Xerotier has a semantic version (semver) string. When you upload a model, it is assigned version 1.0.0 by default. You can create new versions of a model, track version history, and control which version is active.

The version-management routes are mounted under the same per-project inference scope as /v1/chat/completions; the API key in the Authorization header is the same key form (xero_<project>_<secret>) you use for inference requests. No separate management scope is required.

Version Response Fields (wire shape)

All version responses use snake_case JSON keys. The request body decoder also accepts snake_case (the canonical form); camelCase keys may decode on input but the rest of the OpenAI-compatible surface uses snake_case everywhere.

Wire key Type Description
id string Version row UUID.
object string Always "model.version" on a single-version response.
version string Semver version string (e.g., "1.0.0", "2.1.3"). Defaults to "1.0.0".
is_latest boolean Whether this is the active version. At most one version per model is marked as latest.
created integer Unix epoch seconds when the version row was created.
version_notes string | null Optional description of changes in this version.
status string | null Version lifecycle status (mirrors the parent model's status field).

Version Management with xeroctl

Shell
# List versions for a model (default action when only the model id is given) xeroctl models versions <model-id> # Create a new version xeroctl models versions <model-id> --create 2.0.0 --notes "Improved accuracy" # Promote a version to latest xeroctl models versions <model-id> --promote 2.0.0 # Rollback to a previous version xeroctl models versions <model-id> --rollback 1.0.0

Version Management with the API

curl
# List versions for a model curl https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \ -H "Authorization: Bearer xero_my-project_abc123" # Create a new version curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \ -H "Authorization: Bearer xero_my-project_abc123" \ -H "Content-Type: application/json" \ -d '{ "version": "2.0.0", "version_notes": "Improved accuracy" }' # Promote a version to latest curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/2.0.0/promote \ -H "Authorization: Bearer xero_my-project_abc123" # Rollback to a previous version curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/1.0.0/rollback \ -H "Authorization: Bearer xero_my-project_abc123"

Example GET .../versions response (OpenAI list envelope):

{ "object": "list", "data": [ { "id": "9b1d...e2", "object": "model.version", "version": "1.0.0", "is_latest": false, "created": 1700000000, "version_notes": null, "status": "ready" }, { "id": "c4f0...77", "object": "model.version", "version": "2.0.0", "is_latest": true, "created": 1701000000, "version_notes": "Improved accuracy", "status": "ready" } ] }
Python
import requests headers = {"Authorization": "Bearer xero_my-project_abc123"} base = "https://api.xerotier.ai/proj_ABC123/v1" model_id = "MODEL_ID" # List versions (envelope: {"object": "list", "data": [...]}) versions = requests.get( f"{base}/models/{model_id}/versions", headers=headers ).json() for v in versions.get("data", []): print(f" {v['version']} (latest={v['is_latest']})") # Create a new version response = requests.post( f"{base}/models/{model_id}/versions", headers=headers, json={"version": "2.0.0", "version_notes": "Improved accuracy"} ) print(f"Created version: {response.json()}") # Promote a version requests.post( f"{base}/models/{model_id}/versions/2.0.0/promote", headers=headers ) # Rollback to a previous version requests.post( f"{base}/models/{model_id}/versions/1.0.0/rollback", headers=headers )
Node.js
const headers = { "Authorization": "Bearer xero_my-project_abc123", "Content-Type": "application/json" }; const base = "https://api.xerotier.ai/proj_ABC123/v1"; const modelId = "MODEL_ID"; // List versions (envelope: { object: "list", data: [...] }) const versionsResponse = await fetch( `${base}/models/${modelId}/versions`, { headers } ); const versions = await versionsResponse.json(); for (const v of (versions.data || [])) { console.log(` ${v.version} (latest=${v.is_latest})`); } // Create a new version const createResponse = await fetch( `${base}/models/${modelId}/versions`, { method: "POST", headers, body: JSON.stringify({ version: "2.0.0", version_notes: "Improved accuracy" }) } ); console.log("Created version:", await createResponse.json()); // Promote a version await fetch( `${base}/models/${modelId}/versions/2.0.0/promote`, { method: "POST", headers } ); // Rollback to a previous version await fetch( `${base}/models/${modelId}/versions/1.0.0/rollback`, { method: "POST", headers } );

02 Model Metadata

Each model carries extensive metadata that affects routing, inference behavior, and display in the catalog.

Core Properties

Field names below are the snake_case JSON keys emitted on the wire (matching the RouterModelExtended CodingKeys).

Wire key Type Description
name string Model display name.
format string Storage format: safetensors, bin, exl2, or directory.
size_bytes integer Total model size in bytes.
status string Current state: uploading, validating, ready, or error.
architecture string | null Model architecture family (e.g., "llama", "qwen", "mistral").
parameter_count integer | null Number of model parameters.
context_length integer | null Maximum context window in tokens. Affects max_tokens auto-clamping at the router.
workload_type string | null Workload classification (see Workload Types below).
is_shared boolean | null Whether the model is published to the shared catalog.
catalog_role string | null Catalog role: deployable or shared.

Additional model-metadata columns extracted at validation time (architecture-derived fields such as hidden size, layer count, vocab size, torch dtype, and MoE expert counts) are stored on the database row and used internally for VRAM estimation. They are not surfaced on the public RouterModelExtended response today; consult the model catalog admin tooling if you need to inspect them.

03 Model Lifecycle

Models progress through a series of status transitions from upload to availability. The diagram is authoritative; the table below mirrors it for screen readers and search.

stateDiagram-v2
    direction LR
    [*] --> uploading
    uploading --> validating: upload complete
    validating --> ready: metadata valid
    validating --> error: validation failed
    error --> validating: revalidate
    ready --> [*]
// fig 03.1 -- upload to ready, with the revalidate loop
Status Description
uploading Model files are being uploaded. Not yet available for inference.
validating Upload complete. The system is validating model files, extracting metadata (architecture, parameters, context length), and checking compatibility.
ready Validation passed. The model can be assigned to endpoints and loaded on backends.
error Validation failed. The validationError field carries the diagnostic. Fix the upstream artifact and trigger a revalidate (see below) or re-upload.

Trigger revalidation with xeroctl models revalidate <model-id> or POST /{project_id}/v1/models/{model_id}/revalidate. This re-checks model files and refreshes metadata without a re-upload.

Model Loading

When a model is assigned to an endpoint and a request arrives, the router sends a load request to a compatible backend. The backend auto-configures:

  • Context length, Auto-detected from model config if not specified.
  • Max sequences, Auto-calculated based on available resources.
  • Quantization, Selected based on model size vs available GPU VRAM (see Quantization).

04 Workload Types

Each model can be tagged with a workload type that describes its primary use case. Workload type is used for filtering in the model catalog and does not affect inference behavior or routing.

Type Description
chat General-purpose conversational models. Default workload type.
code Code generation and completion models.
reasoning Models optimized for chain-of-thought and analytical tasks.
embedding Text embedding models for semantic search and similarity.
multilingual Models with strong multilingual support.

05 Quantization

Quantization reduces model size and memory requirements by using lower-precision number formats. Xerotier supports both pre-quantized models and runtime quantization.

Pre-Quantized Models

Some models are distributed with quantization already applied. The fields below are camelCase on the wire because they originate on the backend-agent load acknowledgment, not on the public model API.

Field Surface Description
preQuantizationMethod backend-internal Method used: compressed-tensors, gptq, or awq.
preQuantizationBits backend-internal Bit-width of the pre-quantized weights. Common values are 4 and 8; the field is a free-form integer and may carry other widths if the upstream artifact does.

Runtime Quantization

If a model does not fit in available GPU memory at full precision, the backend agent can apply runtime quantization automatically. The load acknowledgment carries:

Field Surface Description
appliedQuantization backend-internal Method applied (e.g., bitsandbytes, fp8, bitsandbytes-fp4, awq, gptq).
appliedQuantizationReason backend-internal Free-form rationale string set by the backend agent describing why the method was chosen.

Runtime quantization is transparent to the user. The model functions the same way with slightly reduced precision and memory footprint.

MoE Model Support

Mixture-of-Experts (MoE) models are supported. MoE-specific metadata fields are stored on the model row (total experts and experts activated per token) and consumed by the backend agent at load time to select tuned kernel configurations. The agent picks an MoE kernel profile automatically; there is no per-request public API knob.