Pricing

Start free. Pro recovers the waste across your whole fleet.

Every engine (vLLM · SGLang · TGI) is free forever — the measurement that finds your floor costs nothing. Pro is $49 per deployment (less at fleet scale), so it scales with your fleet.

Measured, not estimated
$49 / deployment / mo recovers $1,984 / mo

A real, A/B-proven A10G run (cold $2.24 → cached $0.26/1M at ~1B output tokens/mo).

Free

Free
Your live $/token floor across a small fleet — free forever, no card.
  • vLLM + SGLang + TGI (engines never gated)
  • Live $/1M-token + your efficient knee
  • Recover view at your volume
  • Exact config recommendations
  • Up to 3 deployments — measure your whole fleet
  • Latest sweep + recover view
Start free

Pro

Most popular
$49 / deployment · mo
$49/deployment · $39 each from the 5th · 20+ → Scale
Always-on cost recovery across every deployment in your fleet.
  • Everything in Free — on every deployment
  • Continuous monitoring per deployment — scale to your whole fleet
  • In-console cost-drift & SLO alerts
  • Verify every fix — re-measure your floor to prove it moved, or roll back
  • Full floor-over-time history & trends (last 2,000 sweeps)
  • Fleet showback (per feature / team / deployment)
  • Priority support
Get started →

Scale

Let’s talk
Fleet pricing or pay-from-savings, for 20+ deployments.
  • Everything in Pro, across unlimited deployments
  • Volume fleet pricing — or pay-from-savings (we take a cut of what we recover)
  • Annual contract & invoicing
  • SSO · RBAC · on-prem control plane
  • Dedicated recovery reviews + priority support
Contact us →

A deployment is one model-serving endpoint you point squiz at — one vLLM/SGLang/TGI server, on any number of GPUs. Pro is billed per monitored deployment — $49 each, then $39 from the 5th, 20+ → Scale. Run 1 or 100. See real measured floors in the cross-GPU benchmark.

Is it really free?

Yes — up to three deployments, every engine, the cost x-ray and recover view, forever. No card.

What’s a deployment?

One model-serving endpoint you point squiz at — a single vLLM/SGLang/TGI server, on any number of GPUs (a 70B across 4 GPUs is one deployment, not four). We meter the endpoints you monitor, not a moving GPU count — $49 each, $39 from the 5th.

Will it touch my traffic?

Never. The agent reads /metrics only — nothing in your request path.