Docs/Reference/Billing Guard

>_ DOCS / REFERENCE

BILLING
GUARD.

Six independent layers of protection on every request. No overruns. No surprise bills. No runaway agents.

Design principle

Every layer is designed to fail closed — when it trips, it rejects the request rather than allowing it through. Layers are evaluated in order (1 → 6). The first layer that trips short-circuits the rest. Each failure returns a machine-readable error code so your agent can handle it programmatically.

1
Rate limit
Too many requests per minute

A sliding-window counter prevents request floods. Limits apply per API key. The Retry-After header indicates when the window resets.

Default60 req/min (Free), 600 req/min (Pro), custom (Enterprise)
RetryRespect Retry-After header. Implement exponential backoff.
2
Daily circuit breaker
Spend exceeds daily cap

A hard daily spend cap prevents runaway costs from buggy agents or prompt-injection attacks. Resets at 00:00 UTC. You can raise or lower your cap in Dashboard → Settings → Billing.

Default$10/day (Free), $1,000/day (Pro), custom (Enterprise)
RetryWait for daily reset or contact support to raise the cap.
3
Concurrency cap
Too many simultaneous in-flight requests

Limits the number of requests being processed simultaneously per tenant. Prevents a single burst from monopolising provider quota and degrading other tenants.

Default5 concurrent (Free), 50 concurrent (Pro), custom (Enterprise)
RetryQueue requests client-side. Use a semaphore or p-limit.
4
Anomaly detection
Spend pattern deviates from baseline (Sentinel AI)

Gemini 3 Flash (Sentinel) analyses spend velocity in real time. If spend spikes 10× above your 7-day baseline — e.g. from a prompt-injection attack flooding your agent — the Sentinel pauses traffic and alerts you via email. You can resume from the Dashboard.

RetryReview the anomaly report in Dashboard → Intelligence. Resume manually.
5
Per-request cap
Single request cost exceeds cap

A ceiling on the cost of any single LLM call. Prevents accidentally expensive requests (e.g. passing a 100k-token context window to GPT-4o) from draining your budget.

Default$0.50/request (configurable)
RetryReduce prompt length or switch to a cost-mode routing call.
6
Atomic budget reservation
Session budget exhausted mid-flight

Before routing any request to an LLM provider, P402 atomically reserves the estimated cost from the session budget. If the reservation fails (concurrent requests racing to the same budget), the request is rejected before any tokens are generated. You never overspend a session — even under concurrent load.

RetryFund the session (POST /sessions/:id/fund) or create a new one.

Error Response Format

All Billing Guard errors return a consistent JSON body with a machine-readable code field.

// HTTP 402
{
  "error": {
    "code": "SESSION_BUDGET_EXCEEDED",
    "message": "Session sess_01jx... has no remaining budget ($0.00 of $5.00).",
    "request_id": "req_01jx...",
    "session_remaining_usd": 0.00
  }
}

Handling Billing Guard Errors

python
import openai, requests, os

P402_API_KEY = os.environ["P402_API_KEY"]
client = openai.OpenAI(api_key=P402_API_KEY, base_url="https://p402.io/api/v2")

BILLING_GUARD_CODES = {
    "RATE_LIMIT_EXCEEDED",
    "DAILY_SPEND_LIMIT_EXCEEDED",
    "CONCURRENCY_LIMIT_EXCEEDED",
    "ANOMALY_DETECTED",
    "REQUEST_COST_EXCEEDED",
    "SESSION_BUDGET_EXCEEDED",
}

def ask(prompt: str, session_id: str) -> str | None:
    try:
        resp = client.chat.completions.create(
            model="auto",
            messages=[{"role": "user", "content": prompt}],
            extra_body={"p402": {"session_id": session_id, "mode": "cost", "cache": True}},
        )
        return resp.choices[0].message.content

    except openai.RateLimitError as e:
        body = e.response.json().get("error", {})
        code = body.get("code", "")

        if code == "SESSION_BUDGET_EXCEEDED":
            print("Budget exhausted — create a new session.")
            return None
        if code == "RATE_LIMIT_EXCEEDED":
            import time
            retry_after = int(e.response.headers.get("Retry-After", 5))
            print(f"Rate limited — retrying in {retry_after}s")
            time.sleep(retry_after)
            return ask(prompt, session_id)   # retry once
        if code == "ANOMALY_DETECTED":
            print("Anomaly detected — check Dashboard > Intelligence to resume.")
            return None

        raise  # Unknown billing guard error — re-raise

Configure Your Limits

All configurable limits can be adjusted in Dashboard → Settings → Billing. The per-request cap can also be set per-request:

bash
curl -s -X POST https://p402.io/api/v2/chat/completions \
  -H "Authorization: Bearer $P402_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "..."}],
    "p402": {
      "mode": "quality",
      "session_id": "sess_01jx...",
      "max_cost_usd": 0.05    // Reject if this request would cost more than $0.05
    }
  }'