Agent Types
Xerotier.ai supports two types of agents for serving inference requests: self-hosted (private) agents that you operate, and shared agents managed by the platform.
Overview
Agents are the compute workers that execute model inference. When a request arrives at your endpoint, the Router selects an appropriate agent based on availability, capacity, and ownership.
- Self-Hosted Agents: Owned and operated by you, running on your own infrastructure
- Shared Agents: Platform-managed agents available to all authorized users
Tip: You can use both agent types together. Self-hosted agents handle your primary workloads while shared agents provide automatic fallback.
Quick Comparison
| Aspect | Self-Hosted (Private) | Shared |
|---|---|---|
| Ownership | User/Project | Platform |
| Access | Owner-only | Any authorized user |
| Enrollment | Join key required | Pre-provisioned |
| Data isolation | Single-tenant | Multi-tenant |
| Control | Full lifecycle management | None |
| Cost | User provides hardware | Platform cost |
| Model caching | Dedicated cache | Shared pool |
| Fallback | Can use shared as backup | N/A |
Self-Hosted Agent Features
Self-hosted agents give you full control over your inference infrastructure while leveraging Xerotier.ai routing and management capabilities.
Join Key Enrollment
To connect a self-hosted agent to Xerotier.ai, you need a join key. Join keys are secure, time-limited tokens that authorize an agent to register with the platform.
Join Key Format
xjk_{projectslug}_{randomstring}
Join keys encode the project binding, region assignment, tier configuration, and expiration time. They are created from the Agents dashboard.
Lifecycle States
Self-hosted agents progress through the following states:
| State | Description |
|---|---|
| pending | Agent registered but not yet connected |
| active | Agent is online and serving requests |
| suspended | Owner has temporarily disabled the agent |
| disconnected | Agent lost connection (will reconnect automatically) |
| dead | Agent has been offline beyond the grace period |
Owner Controls
As an agent owner, you can:
- Suspend: Temporarily stop the agent from receiving requests
- Resume: Re-enable a suspended agent
- Remove: Permanently delete the agent from the platform
- View logs: Access agent event history and audit trail
- Monitor: Real-time health metrics and status
Region Configuration
Agents are assigned to a region during enrollment. Regions are free-form strings up to 24 ASCII characters that help organize agents geographically or logically.
us-east-1
eu-west
datacenter-a
gpu-cluster-01
Model Caching
Self-hosted agents maintain a dedicated model cache that is not shared with other users. This provides faster model loading, predictable performance, and data isolation.
Routing Behavior
The Router uses the following logic to select an agent for each request:
- User-owned agents take precedence: If you have active agents, they are used first
- Automatic fallback: When all your agents are busy or offline, requests can fall back to shared agents
- Regional affinity: Requests prefer agents in the same or nearby regions
- Load balancing: Multiple agents of the same type are load-balanced
Note: Fallback to shared agents must be enabled in your endpoint settings. When disabled, requests will queue until an agent becomes available.
Frequently Asked Questions
Can I use both agent types simultaneously?
Yes. You can have self-hosted agents for your primary workloads while using shared agents as a fallback. Configure this in your endpoint settings under "Fallback Options."
What happens if my agent goes offline?
When an agent disconnects, it enters the disconnected state. Requests are routed to other available agents. If no agents are available and fallback is enabled, requests go to shared agents. If fallback is disabled, requests are queued until an agent becomes available.
How do I migrate from shared to self-hosted?
Generate a join key from the Agents dashboard, deploy an agent on your infrastructure using the Xerotier.ai agent Docker image, start the agent with the join key, and it will register and become available for routing. Your endpoints automatically start using your agent.
Are my requests visible to other users on shared agents?
No. Even on shared agents, requests are isolated. Each request runs in a sandboxed environment, memory is cleared between requests, network access is restricted to your project, and logs and metrics are project-scoped.
What hardware do I need for self-hosting?
Requirements depend on your model and workload. GPU agents need an NVIDIA GPU with CUDA support (vLLM compatible). CPU agents need a modern x86_64 CPU. You also need sufficient RAM to load your models, a stable network connection, and Docker with GPU passthrough for GPU agents.
Can I run multiple agents?
Yes. Running multiple agents provides redundancy (requests continue if one agent fails), scaling (handle more concurrent requests), and geographic distribution (deploy agents closer to your users).
How do join keys work?
Join keys are signed JWT tokens that contain the project ID, region, tier ID, and expiration time. The token is verified by the Router when an agent connects. Once used, the join key is invalidated and cannot be reused.
What regions can agents serve?
Regions are configured when generating the join key. The region string can be up to 24 ASCII characters and is free-form (you define your own naming scheme). It is used for routing decisions and can match or differ from your physical location.