XEM tutorials - Xerotier

1. Author Your First Chat Template (drain-node)

Goal: ship a chat template that encodes the drain-node runbook and proves the operational safety invariants during a live run.

Step 1, seed a file payload from the platform template. The CLI has no copy verb; export the existing template as JSON, rename it, and save it locally.

shell

xeroctl templates show drain-node -o json \
    | jq '.name = "drain-node-acme"' \
    > drain-node-acme.json

Step 2, edit the template. Open drain-node-acme.json. The file payload uses these top-level fields:

{
  "name": "drain-node-acme",
  "system_prompt": "...",
  "description": "Drain a Kubernetes node safely.",
  "allowed_tools": ["kubectl_get", "kubectl_pdb_list",
                    "kubectl_node_cordon", "kubectl_node_drain",
                    "kubectl_node_uncordon", "kubectl_pod_list"],
  "approval_policy": "elevated",
  "applicable_workspace_types": ["kubernetes"]
}

Customize the system_prompt: the procedure should enumerate pre-conditions (get pods, check PDBs, identify DaemonSet pods), the cordon/drain/verify sequence, and the uncordon rollback on failure. Do not allow tools outside allowed_tools. The approval_policy enum is standard, elevated, or restricted.

Step 3, create the template at project scope.

shell

xeroctl templates create --file drain-node-acme.json

Step 4, apply the template to a workspace. Applying opens a chat against the workspace pinned to the current template version:

shell

xeroctl templates apply drain-node-acme --workspace kubernetes-staging

In the returned chat, ask to drain a test node; approve each gated tool call; verify the final summary says every pod was evicted and no PDBs were violated.

Step 5, promote to platform scope. Once the template has run cleanly in staging, promote it so every project in the deployment can apply it:

shell

xeroctl templates promote drain-node-acme --scope platform

The promote verb only accepts --scope platform; workspace-to-workspace copy is not a supported operation. To run the template in a second workspace, call xeroctl templates apply against that workspace.

2. Deploy XEM on a Jumphost

Goal: run a XEM on a locked-down jumphost via docker compose, with credentials mounted read-only and a systemd supervisor.

Start from the published compose.agent-xem.yaml in the xerotier/container-agents repository (see Deploy Your First XEM for the base setup and the compose/.env), then layer a hardening override and a systemd supervisor on top of it. Target credentials (kubeconfig, AWS) go in the base file's read-only /data/xerotier-xem/credentials mount.

Hardening override. Save alongside the base file as compose/compose.override.yml. Docker Compose merges it onto the base xem-agent service: read-only root filesystem, all capabilities dropped, no privilege escalation, and a writable tmpfs.

services:
  xem-agent:
    read_only: true
    cap_drop:
      - ALL
    security_opt:
      - no-new-privileges:true
    tmpfs:
      - /tmp

Systemd supervisor. Save as /etc/systemd/system/xerotier-xem.service so the stack returns after a reboot. WorkingDirectory points at the cloned compose/ directory so both files and the .env resolve.

[Unit]
Description=Xerotier XEM (docker compose)
After=docker.service network-online.target
Requires=docker.service
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/xerotier-public/compose
ExecStart=/usr/bin/docker compose -f compose.agent-xem.yaml -f compose.override.yml up -d
ExecStop=/usr/bin/docker compose -f compose.agent-xem.yaml -f compose.override.yml down
TimeoutStartSec=0

[Install]
WantedBy=multi-user.target

Bootstrap sequence. Mint a join key and set it (with the router URL and registration name) in compose/.env, then enable the unit.

shell

xeroctl agents join-keys --create \
    --name xem-jumphost-01 \
    --region us-east-1 \
    --router-addr tcp://router.example.com:5555 \
    -o json | jq -r '.join_key'
# Put the printed token in compose/.env as XEROTIER_AGENT_JOIN_KEY=...

sudo systemctl daemon-reload
sudo systemctl enable --now xerotier-xem
journalctl -u xerotier-xem -f
xeroctl agents list --region us-east-1

--name, --region, and at least one --router-addr are required by the join-key creation endpoint; the agent tier is determined at enrollment by the agent binary, not by a flag on the join key.

The unit is now supervised. Docker-compose restart policy handles process-level crashes; systemd handles boot-time recovery; the router re-admits the XEM on reconnect using its CURVE key.

3. Connect a Slack Bot to Approval Webhooks

Goal: deliver pending approvals to a Slack channel, let reviewers approve/reject from a Slack message, and round-trip the decision back to the router.

Step 1, create a Slack app. In api.slack.com/apps, create an app, enable Incoming Webhooks for a channel, and enable Interactivity with a request URL pointing at your bot service (for example https://bots.example.com/slack/interact). Note the signing secret.

Step 2, register a webhook endpoint with the router.

shell

xeroctl webhooks --create \
    --url https://bots.example.com/approvals \
    --events 'exec.approval_requested,exec.approval_timed_out' \
    --secret "$(openssl rand -hex 32)"

Webhook subscription is a flat command, there is no endpoints subgroup. The signing secret is supplied via --secret; generate one locally (for example with openssl rand -hex 32) and store it alongside the bot service so signature verification can find it.

Step 3, handle exec.approval_requested. On receipt, verify the X-Webhook-Signature header against the signing secret, then post a Slack message with Approve / Reject buttons. The button value carries the approval ID.

Step 4, handle the Slack interaction. Verify Slack's signature. Read the button value to get the approval ID, then call the router:

POST /:project_id/v1/exec/approvals/:id/approve
Authorization: Bearer <bot-api-key>
Content-Type: application/json

{
  "note": "Approved via Slack by <user>"
}

The router validates that the bot's API key carries the execution scope and that the user's delegation includes approval rights on the workspace.

Step 5, close the loop. The router does not emit an approval-resolved webhook today; once the bot's POST to /approve (or /reject) returns 200, update the Slack message with the final decision and disable the buttons to prevent double-actions. Subscribe to exec.approval_timed_out to handle the timeout case from the router side.

See Webhook Events for the full payload schema.

4. Build a Custom Tool

Goal: ship a site-specific tool via /var/lib/xerotier/tools/custom/, no XEM rebuild needed.

Step 1, write the tool manifest. Save as /var/lib/xerotier/tools/custom/restart_cache.json:

{
  "name": "restart_cache",
  "description": "Restart a named Redis cache cluster.",
  "risk": "destructive",
  "idempotent": false,
  "timeout_seconds": 60,
  "parameters": {
    "type": "object",
    "required": ["cluster"],
    "properties": {
      "cluster": {
        "type": "string",
        "pattern": "^[a-z0-9-]{1,63}$"
      },
      "reason": {"type": "string"}
    }
  },
  "command": {
    "argv": ["/usr/local/bin/restart-cache.sh",
             "--cluster", "",
             "--reason", ""],
    "stdin": null,
    "env": {},
    "cwd": "/srv/ops"
  }
}

Step 2, supply the executable. /usr/local/bin/restart-cache.sh is a shell script the XEM host owns; it exits 0 on success and writes a JSON summary to stdout.

Step 3, let the XEM pick up the manifest. The XEM watches its tools directory and reloads on file change, no explicit reload command is needed. After the file is in place, confirm the agent re-registered the tool on its next heartbeat:

shell

xeroctl agents <agent-id> -o json | jq '.tools[] | select(.name == "restart_cache")'

The router accepts the updated manifest on the next heartbeat; restart_cache is now invokable from any workspace bound to the XEM.

Step 4, exercise the tool.

shell

xeroctl exec invoke \
    --workspace cache-prod \
    --tool restart_cache \
    --args '{"cluster":"sessions-primary","reason":"memory-leak"}'

The router gates the call behind the workspace's approval policy (destructive risk). After approval, the XEM shells out to the script; the summary returns as the tool result.

Security reminder: every custom tool inherits the XEM host's filesystem access. Review the argv, env, and cwd for each tool; treat the custom tools directory as privileged configuration and version-control it.