Quota
/ docs
Dashboard
Docs/OpenAI compatibility

OpenAI compatibility

Quota is a credit wallet for AI usage: one API key, every major model (OpenAI / Anthropic / Google), and built-in billing so you can charge end-users for their usage instead of eating the bill yourself. The wire format is OpenAI-shaped so the easiest way in is to point your existing OpenAI SDK at Quota — same endpoint path, same request body, same streaming envelope, same error structure, plus a quota block on every response. The list of parameters Quota actually forwards to the upstream provider is below.

What this page is for
If you've already shipped an OpenAI integration, scan the Supported parameters table to make sure nothing your code relies on is silently ignored. Then change two lines (baseURL and apiKey) and ship.

Point your OpenAI SDK at Quota

Two lines change. Everything else — model names that start with gpt-, the chat.completions.create call, streaming, tool calling — stays exactly the same.

POSThttps://api.usequota.ai/v1/chat/completions
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1", // <- only change
  apiKey: process.env.QUOTA_API_KEY,      // <- only change
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(response.choices[0].message.content);
console.log(response.quota); // { credits_used, balance_after, ... }

Provider-prefixed models

Through the same endpoint you can also call Anthropic and Google models by prefixing the model name. No extra SDKs or API keys — Quota translates the request into each provider's native shape and translates the response back.

// Same OpenAI SDK, same call, just a prefixed model.
const claude = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "Summarize this" }],
});

Supported parameters

Quota forwards a defined set of OpenAI chat-completions parameters to the upstream provider. Anything outside that set is silently dropped today — the request still succeeds and the response still comes back, but the parameter has no effect. We'd rather you know up front than debug it in production.

ParameterForwarded?Notes
modelYesRequired. Bare names route to OpenAI; anthropic/… and google/… prefixes route to those providers.
messagesYesRequired. Translated per provider — Anthropic system messages get hoisted to the top-level system field, Google to systemInstruction.
max_tokensYesForwarded as max_tokens, or auto-rewritten to max_completion_tokens for o-series and GPT-5+ reasoning models.
max_completion_tokensYesAccepted as an alias for max_tokens. Same effect.
temperatureYesForwarded for non-reasoning models. Reasoning models (o-series, GPT-5+) ignore sampling parameters per OpenAI's own contract — Quota drops it for those models to avoid an upstream error.
streamYesSSE chunks come back in the same shape as OpenAI. The final chunk also carries a quota billing block.
toolsYesOpenAI tool schema is the canonical shape. Anthropic and Google requests are translated automatically.
tool_choiceYesauto, none, required, and named function selection are all forwarded.
parallel_tool_callsYesForwarded to OpenAI for non-reasoning models. Reasoning models don't accept it; Quota drops it for them.
reasoning_effortPartialForwarded only for reasoning models (o-series, GPT-5+). Silently dropped for chat models that don't support it.
top_pDroppedNot forwarded today. Use temperature.
nDroppedNot forwarded. Quota always returns a single choice. Make multiple requests if you need multiple completions.
stopDroppedStop sequences are not forwarded today.
seedDroppedNot forwarded. Determinism is provider-side only.
response_formatDroppedJSON mode and structured outputs are not forwarded today. Ask for JSON in the prompt and parse the response yourself.
frequency_penaltyDroppedNot forwarded.
presence_penaltyDroppedNot forwarded.
logit_biasDroppedNot forwarded.
logprobsDroppedNot forwarded. Token-level probabilities aren't surfaced today.
top_logprobsDroppedNot forwarded (depends on logprobs).
userDroppedNot forwarded. For per-user attribution, pass the user's OAuth access token (quota_token_…) as the bearer — Quota bills the user's wallet directly.
storeDroppedOpenAI's conversation-storage flag is not forwarded.
metadataDroppedNot forwarded today.
modalitiesDroppedAudio/image output modalities are not forwarded. Quota's chat endpoint is text + tool calls today.
Cross-provider notes
Anthropic and Google have their own native parameter sets that don't map cleanly onto OpenAI's shape. The forwarded list above is what Quota guarantees across all three providers. Provider-specific extras (Anthropic's thinking block, Google's safety settings, etc.) aren't available through the OpenAI-shaped endpoint today — see the per-provider compat pages once they land.

Response shape

The same envelope OpenAI returns, plus a quota block with the actual cost, post-call balance, and billing mode. Always check it in production — streaming responses can charge slightly more than the pre-reservation once token usage is finalised.

{
  "id": "chatcmpl-9f3a2b1c",
  "object": "chat.completion",
  "created": 1746823412,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello. How can I help?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  },
  "quota": {
    "credits_used": 18500,
    "balance_before": 8500000,
    "balance_after":  8481500,
    "billing_mode":   "developer",
    "reservation_id": "rsv_5f3a..."
  }
}

Streaming

Streaming uses Server-Sent Events in the OpenAI shape: data: {...}\n\n chunks terminated by data: [DONE]. Existing OpenAI SDK iterators work without changes. The only Quota-specific addition is that the final chunk carries the quota billing block alongside usage. See the Chat completions reference for the full chunk schema and a streaming example.

Limitations and known differences

  • Parameter coverage is the table above. Anything not listed as forwarded is dropped silently today. We don't reject the request, so a stray top_p or response_format won't blow up — but it also won't do anything.
  • Single completion per request. n isn't forwarded. If you need k samples, fan out k requests.
  • No native JSON mode yet. response_format isn't forwarded. Prompt for JSON and parse on your side.
  • Reasoning models drop sampling. For o-series and GPT-5+ models, Quota strips temperature and parallel_tool_calls before forwarding — same as if you called OpenAI directly with a reasoning model.
  • Per-user attribution uses the bearer, not the user field. The OpenAI user parameter is dropped. To bill a specific end user's wallet, pass their OAuth access token (quota_token_…) as the Authorization bearer instead of your API key — see user billing via OAuth.
  • Endpoints surface. Quota implements /v1/chat/completions, /v1/audio/speech (TTS), and /v1/audio/transcriptions (STT) — all OpenAI-shaped, all drop-in for the OpenAI SDK with a swapped baseURL. See Audio for the per-route reference. The /v1/responses, /v1/embeddings, /v1/images/*, and /v1/files surfaces are not implemented.
  • Quota-only routes. WS /v1/audio/voice-conversion (voice-to-voice) has no OpenAI SDK shape — it's a WebSocket with a Quota-defined auth frame. Use the @usequota/core / @usequota/nextjs helpers, or speak the protocol from Audio › V2V.

What's next