Billing modes

Quota supports developer billing, user billing via OAuth, and sandbox mode. Pick the one that matches who you want to charge and how closely you want local testing to exercise production billing.

Set per API key

For API keys, pass billing_mode: "developer", "user", or "test". The dashboard labels the "test" wire value as Sandbox. If you need multiple behaviors, create multiple keys.

01 Developer billing (default)

Every request is charged to your developer balance. New API keys default to this mode — it's the simplest setup and works with plain API-key auth.

Best for:

Internal tools and admin dashboards
Free-tier features where you absorb the AI cost
Fixed-price subscriptions with predictable usage

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1",
  apiKey: process.env.QUOTA_API_KEY,
});

// Cost is deducted from YOUR developer balance.
await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

02 User billing via OAuth

Each end user connects their own Quota wallet via OAuth and pays for their own usage. Your app sets a markup percentage and earns revenue on every request. Quota handles the wallet, the top-up flow, and the payouts.

Best for:

Consumer apps where users bring their own wallet
Marketplaces and plugin ecosystems
Anywhere you want per-use revenue rather than subscription

Users top up in dollar packages ($5–$50)
Balance is universal — works across every Quota app
You keep 100% of your markup (no platform fee)
Payouts via Stripe Connect, daily, with a 7-day delay

// After the user connects via OAuth, your server holds their
// access token. Pass it through on chat requests:
await fetch("https://api.usequota.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    "Authorization": "Bearer " + userAccessToken,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  }),
});
// Cost (base + your markup) is deducted from the user's balance.

03 Sandbox mode

Sandbox mode is an API-key mode for local development, QA, and integration tests. It returns synthetic OpenAI, Anthropic, or Google responses without calling the upstream provider, while still running Quota's metering, balance, and ledger code paths.

Use Sandbox when you want to:

Verify response shapes without provider credentials or cost
Test insufficient-credit and per-user billing behavior
Exercise webhooks, ledgers, and usage reporting end to end

Sandbox still records Quota usage

Sandbox skips provider calls, so there is no OpenAI, Anthropic, or Google charge. It still deducts metered Quota credits and writes ledger entries so your billing integration behaves like production.

curl -X POST https://api.usequota.ai/developers/keys \
  -H "Authorization: Bearer $SESSION_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "local-sandbox",
    "billing_mode": "test"
  }'

04 Choosing between modes

Developer billingdefault	You pay for everything. Simplest setup. Best for internal tools and apps that bundle AI into a fixed price.
User billing (OAuth)markup	Users pay from their own wallet. You set a markup % and keep 100% of it. Best for consumer apps and marketplaces.
Sandbox modesandbox	Mock provider responses with real Quota metering and ledger entries. Best for local development, QA, and integration tests.

Comparison

Consideration	Developer billing	User billing (OAuth)	Sandbox mode
Who pays	Developer	End user (own wallet)	Developer (sandbox bills the developer wallet)
Setup	API key only	OAuth flow	API key with billing_mode: "test"
Token handling	n/a	You hold the user's token	n/a
Developer revenue	None	100% of markup, paid via Stripe Connect	None

05 Rate limits

All billing modes share the same rate limits. The default is 100 requests per minute per API key — a soft cap, not a hard ceiling; auth endpoints use stricter, separate limits (3–5 req/min). See Authentication for rate-limit headers and per-key overrides.

→How billing works

The reservation gate, streaming reconciliation, and how a balance can dip negative.

→Tokens and actors

The credentials you'll touch when integrating Quota — who each one represents, which endpoints they call, what gets billed.