Why We Exist

The AI infrastructure industry has a problem: it treats customers like obstacles between product and profit.

You have seen it. Pricing designed to confuse. Documentation written to generate support tickets. "Enterprise" features that should be standard. Security treated as an upsell. Complexity manufactured to justify consulting fees.

Xerotier.ai was started because we were tired of being on the receiving end of this. We wanted inference routing that worked out of the box, priced by what you use, documented like the engineers writing it actually wanted you to succeed.

Turns out that was not a small ask. So we built it ourselves.

Xero (zero) + tier = zero lock-in, zero bullshit. Your hardware, your rules. Works on the hyperscalers when you need them. Runs on your metal when you do not. The cloud was supposed to be liberating - we are making it that way again.

What We Build

Xerotier.ai is a complete AI inference platform. You bring your models, we handle the hard parts: routing requests to the right hardware, managing capacity across tiers, scaling to zero when idle, and scaling back up when you need it.

OpenAI-compatible API. One line change in your code. Bring your own models or use shared ones. Self-host on your metal or run on ours. We do not care how you deploy - we care that it works.

Multi-vendor GPU support. NVIDIA, AMD, Intel - we do not care whose silicon you prefer. 11 service tiers from free CPU to dedicated GPU. No vendor lock-in on the hardware that matters most.

Why Swift? Everyone else builds inference routing in Python. We did not. Swift gives us memory safety without garbage collection pauses, native async/await for streaming, and the kind of performance you need when every millisecond of routing overhead is a millisecond your model is not generating tokens. Infrastructure code should be boring and fast. Swift lets us be both.

Inference Router

The traffic cop. Routes requests by tier, handles auth, manages rate limits, streams responses. Written in Swift because performance matters.

Backend Agent

The worker. Runs alongside vLLM on your GPU boxes. ZeroMQ transport, automatic model loading, health reporting. Deploy anywhere.

Control Plane

The dashboard. Model management, usage analytics, billing, team access. GDPR-compliant by design, not by afterthought.

How We Operate

What We Do Not Do

  • Hide pricing until you talk to sales. Our pricing is on the website. If you need a custom deal, email us. We will respond like humans.
  • Train on your data. Your prompts are yours. Your responses are yours. We route traffic, we do not harvest it.
  • Require annual contracts to access basic features. Pay monthly. Leave when you want. We would rather earn your business every month than trap you with paperwork.
  • Build complexity for job security. Every feature we ship has to justify its existence. If we cannot explain why it matters in two sentences, we do not ship it.
  • Pretend we are perfect. We ship bugs. We have outages. When we do, we tell you what happened and what we are doing about it. No spin, no PR-approved non-apologies.

Opinionated Defaults

We make decisions so you do not have to. Security on by default. Sane rate limits. Automatic failover. You can override everything, but you should not have to.

Operator-First Design

We build for the person who gets paged at 3 AM, not the person who demos at conferences. Clear errors. Actionable logs. Recovery that does not require a PhD.

Radical Simplicity

If a feature requires a tutorial, we failed. If configuration requires a consultant, we failed. If debugging requires our help, we failed.

Sustainable Business

No circus. We charge money for our product and we use that money to make the product better. Revolutionary, we know.

Lover of Open Source, hater of Nonsense

Who We Are

Kevin Carter (Cloudnull) founded Xerotier.ai after spending over a decade building infrastructure at scale - the kind that runs data centers, deploys clouds, and handles the traffic that keeps the internet functioning. OpenStack, bare metal, hypervisors. The unsexy stuff that everything else depends on.

The move to AI infrastructure was not a pivot, it was a continuation. The same problems that plague traditional infrastructure plague AI inference: unfair scheduling, wasted resources, security as an afterthought, complexity as a business model. The same solutions work too: fair queuing, graceful degradation, secure defaults, radical simplicity.

Kevin Carter

Most infrastructure fails not because the problem was hard, but because someone optimized for the wrong thing. They optimized for developer experience instead of operator experience. For deployment speed instead of recovery speed. For looking clever instead of being reliable. We optimize for systems that work when no one is watching.

Xerotier.ai is built by people who have been on call. Who have debugged production at 3 AM. Who have inherited codebases from people who optimized for conference talks instead of maintainability. We build the tools we wish existed.

We are not trying to build a unicorn. We are trying to build a company that makes useful software, charges fair prices, and treats customers like adults. If that sounds unremarkable, consider how rare it actually is.

Ready to try inference that just works?

Free tier available. No credit card required. No sales call. Just sign up and start routing.