XEM Architecture
Four deployable components, one concern each. XIM reasons on GPU hosts, XEM executes next to the target infrastructure, the router brokers every message, and the browser frontend is a thin proxy. The router is the only piece that knows the other three exist.
Overview
The split lives along operational boundaries. Inference is stateless across requests. Execution is stateless across invocations. The router carries all shared state, database, queue, approval engine, audit log, and arbitrates every message between the other three.
Concept Map
flowchart TB
subgraph Frontend
Chat["Immersive chat
(SSE + fetch)"]
Ops["/ops dashboard
(SSE + fetch)"]
end
subgraph Router["Router (HTTPS)"]
InfAPI["Inference API
/:project_id/:endpoint_slug/v1/responses"]
ExecAPI["Exec API
/:project_id/v1/exec/*"]
MgmtAPI["Management API
(multiple namespaces)"]
Sched["Tier scheduler
+ worker health"]
Approval["Approval engine"]
Audit["Audit log
(Postgres)"]
PG["Postgres
(durable state)"]
Redis["Redis
(optional rate-limit + cache)"]
InfAPI --> Sched
InfAPI --> Audit
ExecAPI --> Approval
ExecAPI --> Audit
MgmtAPI --> Audit
Sched --> PG
Approval --> PG
Audit --> PG
Sched -. "optional" .-> Redis
end
XIM["XIM Agent
(inference)
vLLM / GPU"]
XEM["XEM Agent
(execution)
Tool bundles"]
Infra["Target infra
(k8s, OpenStack, cloud, ...)"]
Chat --> Router
Ops --> Router
Sched -. "CurveZMQ + MessagePack" .-> XIM
Approval -. "CurveZMQ + MessagePack" .-> XEM
XEM --> Infra
XIM, Inference
The XIM Agent (Xerotier Inference Microservice) is the reasoning
tier. Each XIM host enrolls with the router via a single-use
join key by posting to POST /v1/enroll, receives a
signed bearer token, and refreshes it through
POST /v1/enroll/refresh before the token's
expiry window closes. XIM presents one or more vLLM-served
models; the router picks a XIM for each request based on
model availability, worker health, tier, and prefix-cache
affinity.
Minting and rotating join keys
XIM does not know about XEM. When the model emits a tool call, the call goes back to the router, not directly to an executor.
XEM, Execution
The XEM Agent (Xerotier Execution Microservice) is the action
tier. Each XEM host enrolls with the router through the same
POST /v1/enroll + POST /v1/enroll/refresh
pair as XIM, publishes a capability manifest naming every tool
it can execute (the manifest names capabilities only, credential
material never leaves the XEM host), and waits for dispatched
invocations.
Once enrolled, the XEM drives a periodic lease-renewal loop
that re-arms the scheduler with
leaseDurationMs = leaseRenewalIntervalMs * 3; if
the agent misses heartbeats past
XEROTIER_ROUTER_HEARTBEAT_TIMEOUT_MS, the router
declares the worker stale and stops dispatching to it.
XEM requires no GPU, no model weights, and no vLLM, it is a
single Swift binary alongside its local credentials. A XEM can
serve multiple operational workspaces; each workspace binds to
one or more XEMs via the workspace_agent_bindings
table.
Router
The router is the control plane. It owns every database table, every authentication decision, every scope check, every rate limit, and every approval gate. It is the only component a customer sees on the wire; XIM and XEM both sit behind the router's CurveZMQ mesh.
The router's major subsystems are the inference scheduler (XIM selection), the exec dispatcher (XEM selection), the approval engine (human-in-the-loop gates), the audit log writer, and the SSE streamer that fans events back to the frontend.
Frontend
The frontend is Swift 6.2 + Hummingbird + Mustache + vanilla
JavaScript. It renders the immersive chat, the
/ops dashboard, and every administrative surface. It
is a thin proxy, every mutation goes to the router over a
scoped API key. No business logic lives in the frontend.
The Agentic Loop
A single conversation turn flows through every component:
- The user types in the immersive chat; the frontend posts
to
/v1/responses. - The router picks a XIM, dispatches inference, and streams
tokens back as
response.*SSE events. - The model emits a
tool_callsresponse naming one or more tools. The router'sServerToolRegistryPartitionerclassifies each call: server-side built-ins run inside the router, and any remaining call lands on the workspace's XEM. (x_execitself is the SSE event-namespace prefix for execution progress, not a flag on the tool-call envelope.) - The router validates the scope, looks up the call's risk classification in the manifest, and either auto-approves (read-only, non-destructive) or opens an approval gate (destructive, irreversible).
- Once approved, the router dispatches to the workspace's XEM via CurveZMQ.
- The XEM executes the tool, streams progress updates as SSE events, and returns the final result.
- The router re-enters inference with the tool result appended; the model synthesizes a reply.
- The loop continues until the model emits a final reply with no tool calls.
Three agentic primitives layer on top of the base loop:
auto-fork-branch (the router forks the chat
branch before any irreversible call so the operator can roll
back), auto-artifact emission (large tool
results divert to the artifact store instead of inlining --
the threshold is currently a non-tunable default), and
sub-agentic request_subplan
(the model can spawn a nested loop with a restricted scope).
The mockup tool family
(x_add_mockup_file / x_update_mockup)
is the canonical agentic surface for incremental multi-file
bundles and rides this same loop.
Data Plane
- Frontend to router: HTTPS with API-key
authentication. Every mutation runs under a scoped API
key; the
executionscope is the only constant required by the exec router, while other surfaces enforce their own scope sets. - Router to XIM: HTTP to vLLM on an internal network; the router owns the scheduler and does not expose XIM directly.
- Router to XEM: CurveZMQ on the mesh port, MessagePack-framed envelopes with W3C traceContext headers for distributed tracing. Every frame is encrypted with the XEM's per-enrollment CURVE key.
- Browser to router: SSE over HTTPS for streaming; fetch-POST for commands.
Where State Lives
| State | Owner | Notes |
|---|---|---|
| Chats, branches, memories, artifacts | Postgres (router) | Workspace-scoped. Branch and memory writes are serialized per conversation. |
| Approval state + audit trail | Postgres (router) | Every approval decision recorded immutably in
execution_audit_log. |
| Rate-limit windows | Redis or in-memory (router) | Default in-memory; switch to Redis via
XEROTIER_CACHE_BACKEND. XEM
invocations enforce a four-layer sliding window
(project / workspace / agent / global); per-user
API rate limiting is a separate middleware
tuned by XEROTIER_API_RATE_LIMIT_RPM. |
| Worker health + prefix-cache index | Router in-memory (derived) | Rehydrated from heartbeats; no durable state. |
| Tool credentials | XEM local filesystem | Never leaves the XEM host. The router only stores the capability manifest (which tools exist), not the credential material itself. |
| Model weights | XIM local filesystem | Pulled from object storage at startup; cached on the XIM host. |