AI Cost Optimization
Optimize AI cost without breaking the product.
P402 surfaces routing, caching, and prompt-shape recommendations grounded in your own outcome data, so the savings are real and the quality risk is named.
For engineering leaders and platform owners who need to cut AI cost without cutting accepted-output quality.
The problem
Optimization advice without outcome data is a guess.
Every blog post says "switch to a smaller model for that workflow." None of them know your quality bar. The advice fits everyone and helps no one.
Real optimization needs a quality-adjusted view: cost per accepted output, not just cost per token. Without outcome status on the ledger, the smaller-model recommendation is a coin flip.
What P402 does
One ledger. Owner, budget, policy, outcome, evidence.
Each card answers nine questions: what found, where, current cost, suggested change, projected savings, quality risk, evidence, what-if-approved, rollback.
Optimize ranks workflows by cost per accepted output. Workflows with poor outcome attachment surface first as data gaps, not as fake wins.
retry_cost_usd is treated as its own savings target. A workflow that costs $4k/mo with 30% retries is a $1.2k savings opportunity, evidence-attached.
Semantic-cache hits are recorded as savings against the would-be inference cost. Finance sees the savings, not just the cost.
Proof
Per recommendation card. No black-box advice.
Savings are gated on outcome attachment. No outcome data, no fake savings claim.
Every recommendation includes its undo path. Reversible by construction.
Questions
ai cost optimization: FAQ
How does Optimize know my quality bar?
You attach outcome status and quality_score to events as the work resolves. The ranking uses cost per accepted output computed from those values, not a universal threshold.
Can we run recommendations without outcome data?
You can see them, but they're labelled as low-confidence. Optimize will not project a savings number unless outcome attachment is above your minimum coverage threshold.
Does P402 auto-apply recommendations?
No. Auto-apply is intentionally off the table. Every recommendation is reviewed; rollback is one click. Engineering teams need to own the change.
How are projected savings calculated?
Counterfactual on the last N events: re-price the workflow under the recommended change, apply the cache or route delta, cap by quality risk. The math is in the card, not hidden.
What about prompt-shape recommendations?
Optimize identifies workflows with high context_waste_usd. The card names the redundant context shape; the change is for your team to ship.