Quota
/ docs
Dashboard
Docs/Concepts/How billing works

How billing works

Quota meters every request in dollar-denominated credits. Users see real money — "$8.50 remaining" — and the platform bills against that balance atomically as requests run.

01 What credits are

A credit is one one-millionth of a dollar. 1,000,000 credits = $1.00. Storing balances as integer credits avoids floating-point drift in the ledger and lets per-token costs land at six decimal places without rounding artifacts.

  • Balances are dollar-denominated. Render them as dollars; integers are an implementation detail.
  • Costs are deducted on every request. Quota calculates token-by-token cost from the model's pricing table and applies your markup before deducting.
  • Balance never expires. Funded credits stay on the account until they're spent.

02 How spending is calculated

Cost is a function of input tokens, output tokens, and the model's per-token rates. If your app sets a markup percentage, it's applied on top of the base cost.

effective_cost = base_cost * (1 + developer_markup_percentage)

Per-model rates live on the pricing page. Developers keep 100% of their markup — Quota doesn't take a platform cut. Markup earnings are paid out via Stripe Connect daily, with a 7-day delay for chargeback protection.

03 How a request is billed

Quota uses a two-step billing model: a pre-request reservation, then a post-request reconciliation against actual token usage. For non-streaming requests the actual cost is known when the response returns, so the reservation and the final charge line up exactly.

The reservation gate (where 402s come from)

Before a request runs, Quota atomically reserves enough credits to cover the maximum expected cost for the model and prompt. If the balance can't cover the reservation, the request is rejected with 402 insufficient_credits before any tokens are generated. Nothing is charged.

Streaming responses can overshoot the reservation

For streaming responses, the actual token count is only known after the stream finishes — and a generous response can cost more than the reservation set aside. Quota records the truth: it deducts the extra credits and writes the ledger entry, even if doing so pushes the balance below zero. The platform doesn't refuse a response the user already received.

The window is bounded: only one in-flight stream can overshoot at a time, and the next request hits the same atomic reservation gate as before — so the balance can't spiral.

Negative balances are possible during streaming

When this happens, the balance can go below zero — the next request is rejected at the reservation gate until the user tops up.

Example after a shortfall: $0.000005-

The user-facing message after a shortfall reads:

{
  "error": {
    "code": "insufficient_credits",
    "message": "Insufficient credits. The previous streaming response used more credits than reserved; current balance is -$0.000005. Top up to continue."
  }
}

Checking a balance

curl https://api.usequota.ai/v1/balance \
  -H "Authorization: Bearer $QUOTA_API_KEY"

Response is integer credits. Divide by 1,000,000 to render dollars. See the Balance API for the full surface — developer wallet and OAuth user wallet shapes.

04 Topping up

End users buy balance through Stripe checkout. Larger packages carry a smaller markup because Stripe's fixed per-transaction fees are spread over a bigger purchase.

Starter$5.00$4.05 in balance — 19% markup
Basic$10.00$8.50 in balance — 15% markup
Plus$25.00$22.50 in balance — 10% markup
Pro$50.00$46.50 in balance — 8% markup

1. List packages

curl https://api.usequota.ai/v1/packages

2. Create a checkout session

curl -X POST https://api.usequota.ai/api/payments/checkout \
  -H "Authorization: Bearer $QUOTA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "package_id": "basic",
    "success_url": "https://yourapp.com/success",
    "cancel_url": "https://yourapp.com/cancel"
  }'

Redirect the user to checkout_url. Stripe sends a webhook on completion and the balance is credited automatically.

The HTTP wire format is snake_case (checkout_url, session_id); the official SDKs map it to camelCase { url, sessionId }.