Quota
/ docs
Dashboard
Docs/Concepts/Billing modes

Billing modes

Quota supports developer billing, user billing via OAuth, and sandbox mode. Pick the one that matches who you want to charge and how closely you want local testing to exercise production billing.

Set per API key

For API keys, pass billing_mode: "developer", "user", or "test". The dashboard labels the "test" wire value as Sandbox. If you need multiple behaviors, create multiple keys.

01 Developer billing (default)

Every request is charged to your developer balance. New API keys default to this mode — it's the simplest setup and works with plain API-key auth.

Best for:

  • Internal tools and admin dashboards
  • Free-tier features where you absorb the AI cost
  • Fixed-price subscriptions with predictable usage
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1",
  apiKey: process.env.QUOTA_API_KEY,
});

// Cost is deducted from YOUR developer balance.
await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

02 User billing via OAuth

Each end user connects their own Quota wallet via OAuth and pays for their own usage. Your app sets a markup percentage and earns revenue on every request. Quota handles the wallet, the top-up flow, and the payouts.

Best for:

  • Consumer apps where users bring their own wallet
  • Marketplaces and plugin ecosystems
  • Anywhere you want per-use revenue rather than subscription
  • Users top up in dollar packages ($5–$50)
  • Balance is universal — works across every Quota app
  • You keep 100% of your markup (no platform fee)
  • Payouts via Stripe Connect, daily, with a 7-day delay
// After the user connects via OAuth, your server holds their
// access token. Pass it through on chat requests:
await fetch("https://api.usequota.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + userAccessToken,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  }),
});
// Cost (base + your markup) is deducted from the user's balance.

03 Sandbox mode

Sandbox mode is an API-key mode for local development, QA, and integration tests. It returns synthetic OpenAI, Anthropic, or Google responses without calling the upstream provider, while still running Quota's metering, balance, and ledger code paths.

Use Sandbox when you want to:

  • Verify response shapes without provider credentials or cost
  • Test insufficient-credit and per-user billing behavior
  • Exercise webhooks, ledgers, and usage reporting end to end
Sandbox still records Quota usage

Sandbox skips provider calls, so there is no OpenAI, Anthropic, or Google charge. It still deducts metered Quota credits and writes ledger entries so your billing integration behaves like production.

curl -X POST https://api.usequota.ai/developers/keys \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "local-sandbox",
    "billing_mode": "test"
  }'

04 Choosing between modes

Developer billingdefaultYou pay for everything. Simplest setup. Best for internal tools and apps that bundle AI into a fixed price.
User billing (OAuth)markupUsers pay from their own wallet. You set a markup % and keep 100% of it. Best for consumer apps and marketplaces.
Sandbox modesandboxMock provider responses with real Quota metering and ledger entries. Best for local development, QA, and integration tests.

Comparison

ConsiderationDeveloper billingUser billing (OAuth)Sandbox mode
Who paysDeveloperEnd user (own wallet)Developer (sandbox bills the developer wallet)
SetupAPI key onlyOAuth flowAPI key with billing_mode: "test"
Token handlingn/aYou hold the user's tokenn/a
Developer revenueNone100% of markup, paid via Stripe ConnectNone

05 Rate limits

All billing modes share the same rate limits. The default is 100 requests per minute per API key — a soft cap, not a hard ceiling; auth endpoints use stricter, separate limits (3–5 req/min). See Authentication for rate-limit headers and per-key overrides.