>_ DOCS / EXPLANATION
THE ROUTING
ENGINE.
Every request passes through three stages in order: semantic cache lookup, provider scoring, and execution with automatic failover. This page explains exactly how each stage works.
The Three Stages
The request is embedded using Google text-embedding-004 and compared against recent responses stored in Redis. If cosine similarity exceeds 0.92, the cached response is returned immediately — no LLM call, no cost.
All healthy providers are scored against your chosen routing mode (cost, speed, quality, or balanced). The top scorer is selected as the primary candidate. Two backup candidates are retained for failover.
The primary provider is called. On any failure — rate limit, timeout, 5xx — the router automatically retries with the next-ranked provider. This happens transparently; your client sees one clean response.
Scoring Algorithm
Each provider candidate receives a composite score between 0 and 1. The weights shift depending on the routing mode you specify. A provider that is down or degraded is penalized before any mode-specific scoring.
| Factor | Source | Description |
|---|---|---|
| success_rate | DB: facilitator_health | 7-day rolling success ratio. A provider at 99% beats one at 95%. |
| p95_settle_ms | DB: facilitator_health | 95th-percentile latency in ms. Weighted heavily in speed mode. |
| cost_per_1k_tokens | lib/ai-providers/registry.ts | Input + output token price. Primary factor in cost mode. |
| reputation_score | ERC-8004 on-chain | On-chain reputation from ERC-8004 registry. Normalized 0–1. |
| health_status | Live health probe | healthy=1.0, degraded=0.5, down=0. Applied as a multiplier. |
| Mode | Cost | Speed | Quality | Reliability |
|---|---|---|---|---|
| cost | 70% | 10% | 10% | 10% |
| speed | 10% | 70% | 10% | 10% |
| quality | 10% | 10% | 70% | 10% |
| balanced | 25% | 25% | 25% | 25% |
Semantic Cache Detail
The cache is tenant-scoped — one tenant's responses never leak to another. The similarity threshold of 0.92 is the empirically calibrated default: high enough to block hallucination-risk false positives, low enough to catch paraphrased duplicates.
Opt out of caching
To bypass the cache for a specific request (e.g., real-time data queries), set "cache": false in the p402 object. The request will still be routed normally but the response will not be stored.
Automatic Failover
The router holds a ranked list of three provider candidates for every request. If the top-ranked provider fails for any reason, the router immediately retries with the next candidate — no delay, no error surfaced to your client.
Failover is transparent
The p402_metadata.provider field in the response tells you which provider actually served the request. If failover occurred, this will differ from what you might expect based on your mode. You can use this to diagnose provider degradation.
OpenRouter Meta-Provider
P402 treats OpenRouter as a single provider that proxies 300+ models. When your routing mode selects OpenRouter, the model within OpenRouter is further optimized based on your mode. This means you get access to every new frontier model the moment OpenRouter adds it — no adapter changes required on your end.
GPT-4.1, Claude 4.5, Gemini 2.0 Pro available the day they land on OpenRouter.
One OPENROUTER_API_KEY covers all 300+ models. P402 adds a transparent 1% routing fee on top.
If your primary direct provider (e.g. Anthropic) fails, OpenRouter serves as a deep fallback pool.
Force a specific model with "model": "anthropic/claude-opus-4" in configuration.
Per-Request Configuration
The p402 object in your request body controls routing behavior for that request. All fields are optional; omitting them uses account defaults.
{
"messages": [...],
"p402": {
"mode": "cost", // "cost" | "speed" | "quality" | "balanced"
"cache": true, // true = use semantic cache (default: true)
"session_id": "ses_...", // budget-capped session (optional)
"max_cost_usd": 0.01, // hard ceiling per request (optional)
"provider": "anthropic", // force a specific provider (optional)
"model": "claude-opus-4" // force a specific model (optional)
}
}Routing Decision in the Response
Every response includes a p402_metadata field that tells you exactly what happened: which provider was selected, what it cost, and whether the response came from cache.
{
"p402_metadata": {
"provider": "deepseek", // Provider that served the request
"model": "deepseek-v3", // Model used
"cost_usd": 0.0003, // What you were charged
"direct_cost": 0.0031, // What GPT-4o would have cost
"savings": 0.0028, // Savings from intelligent routing
"input_tokens": 24,
"output_tokens": 187,
"cached": false, // true = served from semantic cache
"latency_ms": 1240, // Time from request to first token
"mode": "cost", // Mode used for this request
"failover": false // true = primary provider failed, used fallback
}
}