>_ DOCS / REFERENCE

BILLING
GUARD.

Six independent layers of protection on every request. No overruns. No surprise bills. No runaway agents.

Design principle

Every layer is designed to fail closed — when it trips, it rejects the request rather than allowing it through. Layers are evaluated in order (1 → 6). The first layer that trips short-circuits the rest. Each failure returns a machine-readable error code so your agent can handle it programmatically.

Rate limit

Too many requests per minute

RATE_LIMIT_EXCEEDED

HTTP 429

A sliding-window counter prevents request floods. Limits apply per API key. The Retry-After header indicates when the window resets.

Default60 req/min (Free), 600 req/min (Pro), custom (Enterprise)

RetryRespect Retry-After header. Implement exponential backoff.

Daily circuit breaker

Spend exceeds daily cap

DAILY_SPEND_LIMIT_EXCEEDED

HTTP 402

A hard daily spend cap prevents runaway costs from buggy agents or prompt-injection attacks. Resets at 00:00 UTC. You can raise or lower your cap in Dashboard → Settings → Billing.

Default$10/day (Free), $1,000/day (Pro), custom (Enterprise)

RetryWait for daily reset or contact support to raise the cap.

Concurrency cap

Too many simultaneous in-flight requests

CONCURRENCY_LIMIT_EXCEEDED

HTTP 429

Limits the number of requests being processed simultaneously per tenant. Prevents a single burst from monopolising provider quota and degrading other tenants.

Default5 concurrent (Free), 50 concurrent (Pro), custom (Enterprise)

RetryQueue requests client-side. Use a semaphore or p-limit.

Anomaly detection

Spend pattern deviates from baseline (Sentinel AI)

ANOMALY_DETECTED

HTTP 402

Gemini 3 Flash (Sentinel) analyses spend velocity in real time. If spend spikes 10× above your 7-day baseline — e.g. from a prompt-injection attack flooding your agent — the Sentinel pauses traffic and alerts you via email. You can resume from the Dashboard.

RetryReview the anomaly report in Dashboard → Intelligence. Resume manually.

Per-request cap

Single request cost exceeds cap

REQUEST_COST_EXCEEDED

HTTP 402

A ceiling on the cost of any single LLM call. Prevents accidentally expensive requests (e.g. passing a 100k-token context window to GPT-4o) from draining your budget.

Default$0.50/request (configurable)

RetryReduce prompt length or switch to a cost-mode routing call.

Atomic budget reservation

Session budget exhausted mid-flight

SESSION_BUDGET_EXCEEDED

HTTP 402

Before routing any request to an LLM provider, P402 atomically reserves the estimated cost from the session budget. If the reservation fails (concurrent requests racing to the same budget), the request is rejected before any tokens are generated. You never overspend a session — even under concurrent load.

RetryFund the session (POST /sessions/:id/fund) or create a new one.

Error Response Format

All Billing Guard errors return a consistent JSON body with a machine-readable code field.

// HTTP 402
{
  "error": {
    "code": "SESSION_BUDGET_EXCEEDED",
    "message": "Session sess_01jx... has no remaining budget ($0.00 of $5.00).",
    "request_id": "req_01jx...",
    "session_remaining_usd": 0.00
  }
}

Handling Billing Guard Errors

python

import openai, requests, os

P402_API_KEY = os.environ["P402_API_KEY"]
client = openai.OpenAI(api_key=P402_API_KEY, base_url="https://p402.io/api/v2")

BILLING_GUARD_CODES = {
    "RATE_LIMIT_EXCEEDED",
    "DAILY_SPEND_LIMIT_EXCEEDED",
    "CONCURRENCY_LIMIT_EXCEEDED",
    "ANOMALY_DETECTED",
    "REQUEST_COST_EXCEEDED",
    "SESSION_BUDGET_EXCEEDED",
}

def ask(prompt: str, session_id: str) -> str | None:
    try:
        resp = client.chat.completions.create(
            model="auto",
            messages=[{"role": "user", "content": prompt}],
            extra_body={"p402": {"session_id": session_id, "mode": "cost", "cache": True}},
        )
        return resp.choices[0].message.content

    except openai.RateLimitError as e:
        body = e.response.json().get("error", {})
        code = body.get("code", "")

        if code == "SESSION_BUDGET_EXCEEDED":
            print("Budget exhausted — create a new session.")
            return None
        if code == "RATE_LIMIT_EXCEEDED":
            import time
            retry_after = int(e.response.headers.get("Retry-After", 5))
            print(f"Rate limited — retrying in {retry_after}s")
            time.sleep(retry_after)
            return ask(prompt, session_id)   # retry once
        if code == "ANOMALY_DETECTED":
            print("Anomaly detected — check Dashboard > Intelligence to resume.")
            return None

        raise  # Unknown billing guard error — re-raise

Configure Your Limits

All configurable limits can be adjusted in Dashboard → Settings → Billing. The per-request cap can also be set per-request:

bash

curl -s -X POST https://p402.io/api/v2/chat/completions \
  -H "Authorization: Bearer $P402_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "..."}],
    "p402": {
      "mode": "quality",
      "session_id": "sess_01jx...",
      "max_cost_usd": 0.05    // Reject if this request would cost more than $0.05
    }
  }'

BILLINGGUARD.

Error Response Format

Handling Billing Guard Errors

Configure Your Limits

BILLING
GUARD.