// Execution Management (XEM)

XEM Architecture

Four deployable components, one concern each. XIM reasons on GPU hosts, XEM executes next to the target infrastructure, the router brokers every message, and the browser frontend is a thin proxy. The router is the only piece that knows the other three exist.

Overview

The split lives along operational boundaries. Inference is stateless across requests. Execution is stateless across invocations. The router carries all shared state, database, queue, approval engine, audit log, and arbitrates every message between the other three.

Concept Map

flowchart TB
    subgraph Frontend
        Chat["Immersive chat
(SSE + fetch)"]
        Ops["/ops dashboard
(SSE + fetch)"]
    end

    subgraph Router["Router (HTTPS)"]
        InfAPI["Inference API
/:project_id/:endpoint_slug/v1/responses"]
        ExecAPI["Exec API
/:project_id/v1/exec/*"]
        MgmtAPI["Management API
(multiple namespaces)"]
        Sched["Tier scheduler
+ worker health"]
        Approval["Approval engine"]
        Audit["Audit log
(Postgres)"]
        PG["Postgres
(durable state)"]
        Redis["Redis
(optional rate-limit + cache)"]

        InfAPI --> Sched
        InfAPI --> Audit
        ExecAPI --> Approval
        ExecAPI --> Audit
        MgmtAPI --> Audit
        Sched --> PG
        Approval --> PG
        Audit --> PG
        Sched -. "optional" .-> Redis
    end

    XIM["XIM Agent
(inference)
vLLM / GPU"]
    XEM["XEM Agent
(execution)
Tool bundles"]
    Infra["Target infra
(k8s, OpenStack, cloud, ...)"]

    Chat --> Router
    Ops --> Router
    Sched -. "CurveZMQ + MessagePack" .-> XIM
    Approval -. "CurveZMQ + MessagePack" .-> XEM
    XEM --> Infra

// frontend -> router brokers every message -> XIM reasons, XEM executes

XIM, Inference

The XIM Agent (Xerotier Inference Microservice) is the reasoning tier. Each XIM host enrolls with the router via a single-use join key by posting to POST /v1/enroll, receives a signed bearer token, and refreshes it through POST /v1/enroll/refresh before the token's expiry window closes. XIM presents one or more vLLM-served models; the router picks a XIM for each request based on model availability, worker health, tier, and prefix-cache affinity.

Minting and rotating join keys

XIM does not know about XEM. When the model emits a tool call, the call goes back to the router, not directly to an executor.

XEM, Execution

The XEM Agent (Xerotier Execution Microservice) is the action tier. Each XEM host enrolls with the router through the same POST /v1/enroll + POST /v1/enroll/refresh pair as XIM, publishes a capability manifest naming every tool it can execute (the manifest names capabilities only, credential material never leaves the XEM host), and waits for dispatched invocations.

Once enrolled, the XEM drives a periodic lease-renewal loop that re-arms the scheduler with leaseDurationMs = leaseRenewalIntervalMs * 3; if the agent misses heartbeats past XEROTIER_ROUTER_HEARTBEAT_TIMEOUT_MS, the router declares the worker stale and stops dispatching to it.

XEM requires no GPU, no model weights, and no vLLM, it is a single Swift binary alongside its local credentials. A XEM can serve multiple operational workspaces; each workspace binds to one or more XEMs via the workspace_agent_bindings table.

Router

The router is the control plane. It owns every database table, every authentication decision, every scope check, every rate limit, and every approval gate. It is the only component a customer sees on the wire; XIM and XEM both sit behind the router's CurveZMQ mesh.

The router's major subsystems are the inference scheduler (XIM selection), the exec dispatcher (XEM selection), the approval engine (human-in-the-loop gates), the audit log writer, and the SSE streamer that fans events back to the frontend.

Frontend

The frontend is Swift 6.2 + Hummingbird + Mustache + vanilla JavaScript. It renders the immersive chat, the /ops dashboard, and every administrative surface. It is a thin proxy, every mutation goes to the router over a scoped API key. No business logic lives in the frontend.

The Agentic Loop

A single conversation turn flows through every component:

The user types in the immersive chat; the frontend posts to /v1/responses.
The router picks a XIM, dispatches inference, and streams tokens back as response.* SSE events.
The model emits a tool_calls response naming one or more tools. The router's ServerToolRegistryPartitioner classifies each call: server-side built-ins run inside the router, and any remaining call lands on the workspace's XEM. (x_exec itself is the SSE event-namespace prefix for execution progress, not a flag on the tool-call envelope.)
The router validates the scope, looks up the call's risk classification in the manifest, and either auto-approves (read-only, non-destructive) or opens an approval gate (destructive, irreversible).
Once approved, the router dispatches to the workspace's XEM via CurveZMQ.
The XEM executes the tool, streams progress updates as SSE events, and returns the final result.
The router re-enters inference with the tool result appended; the model synthesizes a reply.
The loop continues until the model emits a final reply with no tool calls.

Three agentic primitives layer on top of the base loop: auto-fork-branch (the router forks the chat branch before any irreversible call so the operator can roll back), auto-artifact emission (large tool results divert to the artifact store instead of inlining -- the threshold is currently a non-tunable default), and sub-agentic request_subplan (the model can spawn a nested loop with a restricted scope). The mockup tool family (x_add_mockup_file / x_update_mockup) is the canonical agentic surface for incremental multi-file bundles and rides this same loop.

Data Plane

Frontend to router: HTTPS with API-key authentication. Every mutation runs under a scoped API key; the execution scope is the only constant required by the exec router, while other surfaces enforce their own scope sets.
Router to XIM: HTTP to vLLM on an internal network; the router owns the scheduler and does not expose XIM directly.
Router to XEM: CurveZMQ on the mesh port, MessagePack-framed envelopes with W3C traceContext headers for distributed tracing. Every frame is encrypted with the XEM's per-enrollment CURVE key.
Browser to router: SSE over HTTPS for streaming; fetch-POST for commands.

API-key scopes

Full env-var reference

Where State Lives

// state ownership across the four components
State	Owner	Notes
Chats, branches, memories, artifacts	Postgres (router)	Workspace-scoped. Branch and memory writes are serialized per conversation.
Approval state + audit trail	Postgres (router)	Every approval decision recorded immutably in `execution_audit_log`.
Rate-limit windows	Redis or in-memory (router)	Default in-memory; switch to Redis via `XEROTIER_CACHE_BACKEND`. XEM invocations enforce a four-layer sliding window (project / workspace / agent / global); per-user API rate limiting is a separate middleware tuned by `XEROTIER_API_RATE_LIMIT_RPM`.
Worker health + prefix-cache index	Router in-memory (derived)	Rehydrated from heartbeats; no durable state.
Tool credentials	XEM local filesystem	Never leaves the XEM host. The router only stores the capability manifest (which tools exist), not the credential material itself.
Model weights	XIM local filesystem	Pulled from object storage at startup; cached on the XIM host.

Deploy Your First XEM

Walk a blank host to a XEM agent serving a workspace.

XEM Overview

The shipping surface and scope of XEM.