Agent Types

Xerotier.ai supports two types of agents for serving inference requests: self-hosted (private) agents that you operate, and shared agents managed by the platform.

Overview

Agents are the compute workers that execute model inference. When a request arrives at your endpoint, the Router selects an appropriate agent based on availability, capacity, and ownership.

  • Self-Hosted Agents: Owned and operated by you, running on your own infrastructure
  • Shared Agents: Platform-managed agents available to all authorized users

Tip: You can use both agent types together. Self-hosted agents handle your primary workloads while shared agents provide automatic fallback.

Quick Comparison

Aspect Self-Hosted (Private) Shared
Ownership User/Project Platform
Access Owner-only Any authorized user
Enrollment Join key required Pre-provisioned
Data isolation Single-tenant Multi-tenant
Control Full lifecycle management None
Cost User provides hardware Platform cost
Model caching Dedicated cache Shared pool
Fallback Can use shared as backup N/A

Self-Hosted Agent Features

Self-hosted agents give you full control over your inference infrastructure while leveraging Xerotier.ai routing and management capabilities.

Join Key Enrollment

To connect a self-hosted agent to Xerotier.ai, you need a join key. Join keys are secure, time-limited tokens that authorize an agent to register with the platform.

Join Key Format

xjk_{projectslug}_{randomstring}

Join keys encode the project binding, region assignment, tier configuration, and expiration time. They are created from the Agents dashboard.

Lifecycle States

Self-hosted agents progress through the following states:

State Description
pending Agent registered but not yet connected
active Agent is online and serving requests
suspended Owner has temporarily disabled the agent
disconnected Agent lost connection (will reconnect automatically)
dead Agent has been offline beyond the grace period

Owner Controls

As an agent owner, you can:

  • Suspend: Temporarily stop the agent from receiving requests
  • Resume: Re-enable a suspended agent
  • Remove: Permanently delete the agent from the platform
  • View logs: Access agent event history and audit trail
  • Monitor: Real-time health metrics and status

Region Configuration

Agents are assigned to a region during enrollment. Regions are free-form strings up to 24 ASCII characters that help organize agents geographically or logically.

us-east-1 eu-west datacenter-a gpu-cluster-01

Model Caching

Self-hosted agents maintain a dedicated model cache that is not shared with other users. This provides faster model loading, predictable performance, and data isolation.

Shared Agent Features

Shared agents are platform-managed infrastructure available to all users.

No Enrollment Needed

Shared agents are pre-provisioned and ready to use. When you create an endpoint, it can immediately route to shared agents without any setup.

Platform-Managed

Xerotier.ai handles all operational aspects:

  • Provisioning and scaling
  • Health monitoring and recovery
  • Software updates and patches
  • Hardware maintenance

Multi-Tenant with Isolation

Shared agents serve requests from multiple projects. Request isolation is enforced through process-level sandboxing, memory isolation, network segmentation, and request authentication.

Security: Even on shared agents, your requests and data are isolated from other users. Each request runs in a sandboxed environment with memory cleared between requests.

Routing Behavior

The Router uses the following logic to select an agent for each request:

  1. User-owned agents take precedence: If you have active agents, they are used first
  2. Automatic fallback: When all your agents are busy or offline, requests can fall back to shared agents
  3. Regional affinity: Requests prefer agents in the same or nearby regions
  4. Load balancing: Multiple agents of the same type are load-balanced

Note: Fallback to shared agents must be enabled in your endpoint settings. When disabled, requests will queue until an agent becomes available.

Frequently Asked Questions

Can I use both agent types simultaneously?

Yes. You can have self-hosted agents for your primary workloads while using shared agents as a fallback. Configure this in your endpoint settings under "Fallback Options."

What happens if my agent goes offline?

When an agent disconnects, it enters the disconnected state. Requests are routed to other available agents. If no agents are available and fallback is enabled, requests go to shared agents. If fallback is disabled, requests are queued until an agent becomes available.

How do I migrate from shared to self-hosted?

Generate a join key from the Agents dashboard, deploy an agent on your infrastructure using the Xerotier.ai agent Docker image, start the agent with the join key, and it will register and become available for routing. Your endpoints automatically start using your agent.

Are my requests visible to other users on shared agents?

No. Even on shared agents, requests are isolated. Each request runs in a sandboxed environment, memory is cleared between requests, network access is restricted to your project, and logs and metrics are project-scoped.

What hardware do I need for self-hosting?

Requirements depend on your model and workload. GPU agents need an NVIDIA GPU with CUDA support (vLLM compatible). CPU agents need a modern x86_64 CPU. You also need sufficient RAM to load your models, a stable network connection, and Docker with GPU passthrough for GPU agents.

Can I run multiple agents?

Yes. Running multiple agents provides redundancy (requests continue if one agent fails), scaling (handle more concurrent requests), and geographic distribution (deploy agents closer to your users).

How do join keys work?

Join keys are signed JWT tokens that contain the project ID, region, tier ID, and expiration time. The token is verified by the Router when an agent connects. Once used, the join key is invalidated and cannot be reused.

What regions can agents serve?

Regions are configured when generating the join key. The region string can be up to 24 ASCII characters and is free-form (you define your own naming scheme). It is used for routing decisions and can match or differ from your physical location.