// Execution Management (XEM)

XEM Architecture

Four deployable components, one concern each. XIM reasons on GPU hosts, XEM executes next to the target infrastructure, the router brokers every message, and the browser frontend is a thin proxy. The router is the only piece that knows the other three exist.

Overview

The split lives along operational boundaries. Inference is stateless across requests. Execution is stateless across invocations. The router carries all shared state, database, queue, approval engine, audit log, and arbitrates every message between the other three.

Concept Map

flowchart TB
    subgraph Frontend
        Chat["Immersive chat
(SSE + fetch)"] Ops["/ops dashboard
(SSE + fetch)"] end subgraph Router["Router (HTTPS)"] InfAPI["Inference API
/:project_id/:endpoint_slug/v1/responses"] ExecAPI["Exec API
/:project_id/v1/exec/*"] MgmtAPI["Management API
(multiple namespaces)"] Sched["Tier scheduler
+ worker health"] Approval["Approval engine"] Audit["Audit log
(Postgres)"] PG["Postgres
(durable state)"] Redis["Redis
(optional rate-limit + cache)"] InfAPI --> Sched InfAPI --> Audit ExecAPI --> Approval ExecAPI --> Audit MgmtAPI --> Audit Sched --> PG Approval --> PG Audit --> PG Sched -. "optional" .-> Redis end XIM["XIM Agent
(inference)
vLLM / GPU"] XEM["XEM Agent
(execution)
Tool bundles"] Infra["Target infra
(k8s, OpenStack, cloud, ...)"] Chat --> Router Ops --> Router Sched -. "CurveZMQ + MessagePack" .-> XIM Approval -. "CurveZMQ + MessagePack" .-> XEM XEM --> Infra
// frontend -> router brokers every message -> XIM reasons, XEM executes

XIM, Inference

The XIM Agent (Xerotier Inference Microservice) is the reasoning tier. Each XIM host enrolls with the router via a single-use join key by posting to POST /v1/enroll, receives a signed bearer token, and refreshes it through POST /v1/enroll/refresh before the token's expiry window closes. XIM presents one or more vLLM-served models; the router picks a XIM for each request based on model availability, worker health, tier, and prefix-cache affinity.

XIM does not know about XEM. When the model emits a tool call, the call goes back to the router, not directly to an executor.

XEM, Execution

The XEM Agent (Xerotier Execution Microservice) is the action tier. Each XEM host enrolls with the router through the same POST /v1/enroll + POST /v1/enroll/refresh pair as XIM, publishes a capability manifest naming every tool it can execute (the manifest names capabilities only, credential material never leaves the XEM host), and waits for dispatched invocations.

Once enrolled, the XEM drives a periodic lease-renewal loop that re-arms the scheduler with leaseDurationMs = leaseRenewalIntervalMs * 3; if the agent misses heartbeats past XEROTIER_ROUTER_HEARTBEAT_TIMEOUT_MS, the router declares the worker stale and stops dispatching to it.

XEM requires no GPU, no model weights, and no vLLM, it is a single Swift binary alongside its local credentials. A XEM can serve multiple operational workspaces; each workspace binds to one or more XEMs via the workspace_agent_bindings table.

Router

The router is the control plane. It owns every database table, every authentication decision, every scope check, every rate limit, and every approval gate. It is the only component a customer sees on the wire; XIM and XEM both sit behind the router's CurveZMQ mesh.

The router's major subsystems are the inference scheduler (XIM selection), the exec dispatcher (XEM selection), the approval engine (human-in-the-loop gates), the audit log writer, and the SSE streamer that fans events back to the frontend.

Frontend

The frontend is Swift 6.2 + Hummingbird + Mustache + vanilla JavaScript. It renders the immersive chat, the /ops dashboard, and every administrative surface. It is a thin proxy, every mutation goes to the router over a scoped API key. No business logic lives in the frontend.

The Agentic Loop

A single conversation turn flows through every component:

  1. The user types in the immersive chat; the frontend posts to /v1/responses.
  2. The router picks a XIM, dispatches inference, and streams tokens back as response.* SSE events.
  3. The model emits a tool_calls response naming one or more tools. The router's ServerToolRegistryPartitioner classifies each call: server-side built-ins run inside the router, and any remaining call lands on the workspace's XEM. (x_exec itself is the SSE event-namespace prefix for execution progress, not a flag on the tool-call envelope.)
  4. The router validates the scope, looks up the call's risk classification in the manifest, and either auto-approves (read-only, non-destructive) or opens an approval gate (destructive, irreversible).
  5. Once approved, the router dispatches to the workspace's XEM via CurveZMQ.
  6. The XEM executes the tool, streams progress updates as SSE events, and returns the final result.
  7. The router re-enters inference with the tool result appended; the model synthesizes a reply.
  8. The loop continues until the model emits a final reply with no tool calls.

Three agentic primitives layer on top of the base loop: auto-fork-branch (the router forks the chat branch before any irreversible call so the operator can roll back), auto-artifact emission (large tool results divert to the artifact store instead of inlining -- the threshold is currently a non-tunable default), and sub-agentic request_subplan (the model can spawn a nested loop with a restricted scope). The mockup tool family (x_add_mockup_file / x_update_mockup) is the canonical agentic surface for incremental multi-file bundles and rides this same loop.

Data Plane

  • Frontend to router: HTTPS with API-key authentication. Every mutation runs under a scoped API key; the execution scope is the only constant required by the exec router, while other surfaces enforce their own scope sets.
  • Router to XIM: HTTP to vLLM on an internal network; the router owns the scheduler and does not expose XIM directly.
  • Router to XEM: CurveZMQ on the mesh port, MessagePack-framed envelopes with W3C traceContext headers for distributed tracing. Every frame is encrypted with the XEM's per-enrollment CURVE key.
  • Browser to router: SSE over HTTPS for streaming; fetch-POST for commands.

Where State Lives

// state ownership across the four components
StateOwnerNotes
Chats, branches, memories, artifacts Postgres (router) Workspace-scoped. Branch and memory writes are serialized per conversation.
Approval state + audit trail Postgres (router) Every approval decision recorded immutably in execution_audit_log.
Rate-limit windows Redis or in-memory (router) Default in-memory; switch to Redis via XEROTIER_CACHE_BACKEND. XEM invocations enforce a four-layer sliding window (project / workspace / agent / global); per-user API rate limiting is a separate middleware tuned by XEROTIER_API_RATE_LIMIT_RPM.
Worker health + prefix-cache index Router in-memory (derived) Rehydrated from heartbeats; no durable state.
Tool credentials XEM local filesystem Never leaves the XEM host. The router only stores the capability manifest (which tools exist), not the credential material itself.
Model weights XIM local filesystem Pulled from object storage at startup; cached on the XIM host.

Back to top