Model Versioning - Xerotier

01 Model Versioning

Every model in Xerotier has a semantic version (semver) string. When you upload a model, it is assigned version 1.0.0 by default. You can create new versions of a model, track version history, and control which version is active.

The version-management routes are mounted under the same per-project inference scope as /v1/chat/completions; the API key in the Authorization header is the same key form (xero_<project>_<secret>) you use for inference requests. No separate management scope is required.

Version Response Fields (wire shape)

All version responses use snake_case JSON keys. The request body decoder also accepts snake_case (the canonical form); camelCase keys may decode on input but the rest of the OpenAI-compatible surface uses snake_case everywhere.

Wire key	Type	Description
`id`	string	Version row UUID.
`object`	string	Always `"model.version"` on a single-version response.
`version`	string	Semver version string (e.g., "1.0.0", "2.1.3"). Defaults to "1.0.0".
`is_latest`	boolean	Whether this is the active version. At most one version per model is marked as latest.
`created`	integer	Unix epoch seconds when the version row was created.
`version_notes`	string \| null	Optional description of changes in this version.
`status`	string \| null	Version lifecycle status (mirrors the parent model's status field).

Version Management with xeroctl

Shell

                    # List versions for a model (default action when only the model id is given)
xeroctl models versions <model-id>

# Create a new version
xeroctl models versions <model-id> --create 2.0.0 --notes "Improved accuracy"

# Promote a version to latest
xeroctl models versions <model-id> --promote 2.0.0

# Rollback to a previous version
xeroctl models versions <model-id> --rollback 1.0.0
                

Version Management with the API

curl

                    # List versions for a model
curl https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \
  -H "Authorization: Bearer xero_my-project_abc123"

# Create a new version
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions \
  -H "Authorization: Bearer xero_my-project_abc123" \
  -H "Content-Type: application/json" \
  -d '{
    "version": "2.0.0",
    "version_notes": "Improved accuracy"
  }'

# Promote a version to latest
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/2.0.0/promote \
  -H "Authorization: Bearer xero_my-project_abc123"

# Rollback to a previous version
curl -X POST https://api.xerotier.ai/proj_ABC123/v1/models/MODEL_ID/versions/1.0.0/rollback \
  -H "Authorization: Bearer xero_my-project_abc123"
                

Example GET .../versions response (OpenAI list envelope):

                    {
  "object": "list",
  "data": [
    {
      "id": "9b1d...e2",
      "object": "model.version",
      "version": "1.0.0",
      "is_latest": false,
      "created": 1700000000,
      "version_notes": null,
      "status": "ready"
    },
    {
      "id": "c4f0...77",
      "object": "model.version",
      "version": "2.0.0",
      "is_latest": true,
      "created": 1701000000,
      "version_notes": "Improved accuracy",
      "status": "ready"
    }
  ]
}
                

Python

                    import requests

headers = {"Authorization": "Bearer xero_my-project_abc123"}
base = "https://api.xerotier.ai/proj_ABC123/v1"
model_id = "MODEL_ID"

# List versions (envelope: {"object": "list", "data": [...]})
versions = requests.get(
    f"{base}/models/{model_id}/versions",
    headers=headers
).json()
for v in versions.get("data", []):
    print(f"  {v['version']} (latest={v['is_latest']})")

# Create a new version
response = requests.post(
    f"{base}/models/{model_id}/versions",
    headers=headers,
    json={"version": "2.0.0", "version_notes": "Improved accuracy"}
)
print(f"Created version: {response.json()}")

# Promote a version
requests.post(
    f"{base}/models/{model_id}/versions/2.0.0/promote",
    headers=headers
)

# Rollback to a previous version
requests.post(
    f"{base}/models/{model_id}/versions/1.0.0/rollback",
    headers=headers
)
                

Node.js

                    const headers = {
    "Authorization": "Bearer xero_my-project_abc123",
    "Content-Type": "application/json"
};
const base = "https://api.xerotier.ai/proj_ABC123/v1";
const modelId = "MODEL_ID";

// List versions (envelope: { object: "list", data: [...] })
const versionsResponse = await fetch(
    `${base}/models/${modelId}/versions`,
    { headers }
);
const versions = await versionsResponse.json();
for (const v of (versions.data || [])) {
    console.log(`  ${v.version} (latest=${v.is_latest})`);
}

// Create a new version
const createResponse = await fetch(
    `${base}/models/${modelId}/versions`,
    {
        method: "POST",
        headers,
        body: JSON.stringify({
            version: "2.0.0",
            version_notes: "Improved accuracy"
        })
    }
);
console.log("Created version:", await createResponse.json());

// Promote a version
await fetch(
    `${base}/models/${modelId}/versions/2.0.0/promote`,
    { method: "POST", headers }
);

// Rollback to a previous version
await fetch(
    `${base}/models/${modelId}/versions/1.0.0/rollback`,
    { method: "POST", headers }
);
                

02 Model Metadata

Each model carries extensive metadata that affects routing, inference behavior, and display in the catalog.

Core Properties

Field names below are the snake_case JSON keys emitted on the wire (matching the RouterModelExtended CodingKeys).

Wire key	Type	Description
`name`	string	Model display name.
`format`	string	Storage format: `safetensors`, `bin`, `exl2`, or `directory`.
`size_bytes`	integer	Total model size in bytes.
`status`	string	Current state: `uploading`, `validating`, `ready`, or `error`.
`architecture`	string \| null	Model architecture family (e.g., "llama", "qwen", "mistral").
`parameter_count`	integer \| null	Number of model parameters.
`context_length`	integer \| null	Maximum context window in tokens. Affects `max_tokens` auto-clamping at the router.
`workload_type`	string \| null	Workload classification (see Workload Types below).
`is_shared`	boolean \| null	Whether the model is published to the shared catalog.
`catalog_role`	string \| null	Catalog role: `deployable` or `shared`.

Additional model-metadata columns extracted at validation time (architecture-derived fields such as hidden size, layer count, vocab size, torch dtype, and MoE expert counts) are stored on the database row and used internally for VRAM estimation. They are not surfaced on the public RouterModelExtended response today; consult the model catalog admin tooling if you need to inspect them.

03 Model Lifecycle

Models progress through a series of status transitions from upload to availability. The diagram is authoritative; the table below mirrors it for screen readers and search.

stateDiagram-v2
    direction LR
    [*] --> uploading
    uploading --> validating: upload complete
    validating --> ready: metadata valid
    validating --> error: validation failed
    error --> validating: revalidate
    ready --> [*]

// fig 03.1 -- upload to ready, with the revalidate loop

Status	Description
`uploading`	Model files are being uploaded. Not yet available for inference.
`validating`	Upload complete. The system is validating model files, extracting metadata (architecture, parameters, context length), and checking compatibility.
`ready`	Validation passed. The model can be assigned to endpoints and loaded on backends.
`error`	Validation failed. The `validationError` field carries the diagnostic. Fix the upstream artifact and trigger a revalidate (see below) or re-upload.

Trigger revalidation with xeroctl models revalidate <model-id> or POST /{project_id}/v1/models/{model_id}/revalidate. This re-checks model files and refreshes metadata without a re-upload.

Model Loading

When a model is assigned to an endpoint and a request arrives, the router sends a load request to a compatible backend. The backend auto-configures:

Context length, Auto-detected from model config if not specified.
Max sequences, Auto-calculated based on available resources.
Quantization, Selected based on model size vs available GPU VRAM (see Quantization).

04 Workload Types

Each model can be tagged with a workload type that describes its primary use case. Workload type is used for filtering in the model catalog and does not affect inference behavior or routing.

Type	Description
`chat`	General-purpose conversational models. Default workload type.
`code`	Code generation and completion models.
`reasoning`	Models optimized for chain-of-thought and analytical tasks.
`embedding`	Text embedding models for semantic search and similarity.
`multilingual`	Models with strong multilingual support.

05 Quantization

Quantization reduces model size and memory requirements by using lower-precision number formats. Xerotier supports both pre-quantized models and runtime quantization.

Pre-Quantized Models

Some models are distributed with quantization already applied. The fields below are camelCase on the wire because they originate on the backend-agent load acknowledgment, not on the public model API.

Field	Surface	Description
`preQuantizationMethod`	backend-internal	Method used: `compressed-tensors`, `gptq`, or `awq`.
`preQuantizationBits`	backend-internal	Bit-width of the pre-quantized weights. Common values are 4 and 8; the field is a free-form integer and may carry other widths if the upstream artifact does.

Runtime Quantization

If a model does not fit in available GPU memory at full precision, the backend agent can apply runtime quantization automatically. The load acknowledgment carries:

Field	Surface	Description
`appliedQuantization`	backend-internal	Method applied (e.g., `bitsandbytes`, `fp8`, `bitsandbytes-fp4`, `awq`, `gptq`).
`appliedQuantizationReason`	backend-internal	Free-form rationale string set by the backend agent describing why the method was chosen.

Runtime quantization is transparent to the user. The model functions the same way with slightly reduced precision and memory footprint.

MoE Model Support

Mixture-of-Experts (MoE) models are supported. MoE-specific metadata fields are stored on the model row (total experts and experts activated per token) and consumed by the backend agent at load time to select tuned kernel configurations. The agent picks an MoE kernel profile automatically; there is no per-request public API knob.