>_ DOCS / HOW-TO GUIDES
CHOOSE
ROUTING MODE.
Every P402 request routes to a different provider depending on your priority: cost, quality, speed, or a balanced blend of all three.
How to Set a Mode
Pass mode in the p402 extension block. It applies to that request only. Omitting it defaults to cost.
curl -s -X POST https://p402.io/api/v2/chat/completions \
-H "Authorization: Bearer $P402_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Review this contract clause..."}],
"p402": {
"mode": "quality",
"cache": true,
"session_id": "sess_01jx..."
}
}' | jq .The Four Modes
Selects the provider with the lowest per-token price for the request complexity. Combined with semantic caching, repeated queries cost $0.
- ✓High-volume batch processing
- ✓Classification and extraction
- ✓Summarisation pipelines
- ✓Budget agents with hard caps
- ✗Legal or medical analysis requiring highest accuracy
- ✗Nuanced creative writing
Routes to the provider with the highest ELO benchmark score for the task type. Prefers frontier models: GPT-4o, Claude Opus, Gemini 1.5 Pro.
- ✓Complex reasoning and math
- ✓Code generation and review
- ✓Legal/medical/financial analysis
- ✓Customer-facing chat that must be right
- ✗High-volume low-stakes tasks (use cost mode)
Selects the provider with the lowest current p95 latency. Prefers inference-optimised providers: Groq, Together, Fireworks.
- ✓Real-time chat interfaces
- ✓Streaming user-facing responses
- ✓Time-sensitive agentic loops
- ✓Interactive coding assistants
- ✗Batch jobs where latency doesn't matter
Equal (1/3) weight on cost, quality, and latency scores. A good default for mixed workloads where no single factor dominates.
- ✓General-purpose agents
- ✓Mixed workloads
- ✓Prototyping before tuning
- ✗Workloads with a clear primary constraint — use the specific mode instead