Pricing

Start free. Pro recovers the waste across your whole fleet.

Every engine (vLLM · SGLang · TGI) is free forever — the measurement that finds your floor costs nothing. Pro is $49 per deployment (less at fleet scale), so it scales with your fleet.

Measured, not estimated

$49 / deployment / mo recovers $1,984 / mo

A real, A/B-proven A10G run (cold $2.24 → cached $0.26/1M at ~1B output tokens/mo).

Free

Your live $/token floor across a small fleet — free forever, no card.

vLLM + SGLang + TGI (engines never gated)
Live $/1M-token + your efficient knee
Recover view at your volume
Exact config recommendations
Up to 3 deployments — measure your whole fleet
Latest sweep + recover view

Start free

Pro

Scale

Let’s talk

Fleet pricing or pay-from-savings, for 20+ deployments.

Everything in Pro, across unlimited deployments
Volume fleet pricing — or pay-from-savings (we take a cut of what we recover)
Annual contract & invoicing
SSO · RBAC · on-prem control plane
Dedicated recovery reviews + priority support

A deployment is one model-serving endpoint you point squiz at — one vLLM/SGLang/TGI server, on any number of GPUs. Pro is billed per monitored deployment — $49 each, then $39 from the 5th, 20+ → Scale. Run 1 or 100. See real measured floors in the cross-GPU benchmark.

Is it really free?

Yes — up to three deployments, every engine, the cost x-ray and recover view, forever. No card.

What’s a deployment?

One model-serving endpoint you point squiz at — a single vLLM/SGLang/TGI server, on any number of GPUs (a 70B across 4 GPUs is one deployment, not four). We meter the endpoints you monitor, not a moving GPU count — $49 each, $39 from the 5th.

Will it touch my traffic?

Never. The agent reads /metrics only — nothing in your request path.