Start free. Pro recovers the waste across your whole fleet.
Every engine (vLLM · SGLang · TGI) is free forever — the measurement that finds your floor costs nothing. Pro is $49 per deployment (less at fleet scale), so it scales with your fleet.
A real, A/B-proven A10G run (cold $2.24 → cached $0.26/1M at ~1B output tokens/mo).
Free
- vLLM + SGLang + TGI (engines never gated)
- Live $/1M-token + your efficient knee
- Recover view at your volume
- Exact config recommendations
- Up to 3 deployments — measure your whole fleet
- Latest sweep + recover view
Pro
Most popular- Everything in Free — on every deployment
- Continuous monitoring per deployment — scale to your whole fleet
- In-console cost-drift & SLO alerts
- Verify every fix — re-measure your floor to prove it moved, or roll back
- Full floor-over-time history & trends (last 2,000 sweeps)
- Fleet showback (per feature / team / deployment)
- Priority support
Scale
- Everything in Pro, across unlimited deployments
- Volume fleet pricing — or pay-from-savings (we take a cut of what we recover)
- Annual contract & invoicing
- SSO · RBAC · on-prem control plane
- Dedicated recovery reviews + priority support
A deployment is one model-serving endpoint you point squiz at — one vLLM/SGLang/TGI server, on any number of GPUs. Pro is billed per monitored deployment — $49 each, then $39 from the 5th, 20+ → Scale. Run 1 or 100. See real measured floors in the cross-GPU benchmark.
Yes — up to three deployments, every engine, the cost x-ray and recover view, forever. No card.
One model-serving endpoint you point squiz at — a single vLLM/SGLang/TGI server, on any number of GPUs (a 70B across 4 GPUs is one deployment, not four). We meter the endpoints you monitor, not a moving GPU count — $49 each, $39 from the 5th.
Never. The agent reads /metrics only — nothing in your request path.