>_ DOCS / REFERENCE
BILLING
GUARD.
Six independent layers of protection on every request. No overruns. No surprise bills. No runaway agents.
Design principle
Every layer is designed to fail closed — when it trips, it rejects the request rather than allowing it through. Layers are evaluated in order (1 → 6). The first layer that trips short-circuits the rest. Each failure returns a machine-readable error code so your agent can handle it programmatically.
A sliding-window counter prevents request floods. Limits apply per API key. The Retry-After header indicates when the window resets.
A hard daily spend cap prevents runaway costs from buggy agents or prompt-injection attacks. Resets at 00:00 UTC. You can raise or lower your cap in Dashboard → Settings → Billing.
Limits the number of requests being processed simultaneously per tenant. Prevents a single burst from monopolising provider quota and degrading other tenants.
Gemini 3 Flash (Sentinel) analyses spend velocity in real time. If spend spikes 10× above your 7-day baseline — e.g. from a prompt-injection attack flooding your agent — the Sentinel pauses traffic and alerts you via email. You can resume from the Dashboard.
A ceiling on the cost of any single LLM call. Prevents accidentally expensive requests (e.g. passing a 100k-token context window to GPT-4o) from draining your budget.
Before routing any request to an LLM provider, P402 atomically reserves the estimated cost from the session budget. If the reservation fails (concurrent requests racing to the same budget), the request is rejected before any tokens are generated. You never overspend a session — even under concurrent load.
Error Response Format
All Billing Guard errors return a consistent JSON body with a machine-readable code field.
// HTTP 402
{
"error": {
"code": "SESSION_BUDGET_EXCEEDED",
"message": "Session sess_01jx... has no remaining budget ($0.00 of $5.00).",
"request_id": "req_01jx...",
"session_remaining_usd": 0.00
}
}Handling Billing Guard Errors
import openai, requests, os
P402_API_KEY = os.environ["P402_API_KEY"]
client = openai.OpenAI(api_key=P402_API_KEY, base_url="https://p402.io/api/v2")
BILLING_GUARD_CODES = {
"RATE_LIMIT_EXCEEDED",
"DAILY_SPEND_LIMIT_EXCEEDED",
"CONCURRENCY_LIMIT_EXCEEDED",
"ANOMALY_DETECTED",
"REQUEST_COST_EXCEEDED",
"SESSION_BUDGET_EXCEEDED",
}
def ask(prompt: str, session_id: str) -> str | None:
try:
resp = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": prompt}],
extra_body={"p402": {"session_id": session_id, "mode": "cost", "cache": True}},
)
return resp.choices[0].message.content
except openai.RateLimitError as e:
body = e.response.json().get("error", {})
code = body.get("code", "")
if code == "SESSION_BUDGET_EXCEEDED":
print("Budget exhausted — create a new session.")
return None
if code == "RATE_LIMIT_EXCEEDED":
import time
retry_after = int(e.response.headers.get("Retry-After", 5))
print(f"Rate limited — retrying in {retry_after}s")
time.sleep(retry_after)
return ask(prompt, session_id) # retry once
if code == "ANOMALY_DETECTED":
print("Anomaly detected — check Dashboard > Intelligence to resume.")
return None
raise # Unknown billing guard error — re-raiseConfigure Your Limits
All configurable limits can be adjusted in Dashboard → Settings → Billing. The per-request cap can also be set per-request:
curl -s -X POST https://p402.io/api/v2/chat/completions \
-H "Authorization: Bearer $P402_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "..."}],
"p402": {
"mode": "quality",
"session_id": "sess_01jx...",
"max_cost_usd": 0.05 // Reject if this request would cost more than $0.05
}
}'