OpenAI compatibility

Quota is a credit wallet for AI usage: one API key, every major model (OpenAI / Anthropic / Google), and built-in billing so you can charge end-users for their usage instead of eating the bill yourself. The wire format is OpenAI-shaped so the easiest way in is to point your existing OpenAI SDK at Quota — same endpoint path, same request body, same streaming envelope, same error structure, plus a quota block on every response. The list of parameters Quota actually forwards to the upstream provider is below.

What this page is for

If you've already shipped an OpenAI integration, scan the Supported parameters table to make sure nothing your code relies on is silently ignored. Then change two lines (baseURL and apiKey) and ship.

Point your OpenAI SDK at Quota

Two lines change. Everything else — model names that start with gpt-, the chat.completions.create call, streaming, tool calling — stays exactly the same.

POSThttps://api.usequota.ai/v1/chat/completions

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1", // <- only change
  apiKey: process.env.QUOTA_API_KEY,      // <- only change
});

const response = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello" }],
});

console.log(response.choices[0].message.content);
console.log(response.quota); // { credits_used, balance_after, ... }

Provider-prefixed models

Through the same endpoint you can also call Anthropic and Google models by prefixing the model name. No extra SDKs or API keys — Quota translates the request into each provider's native shape and translates the response back.

// Same OpenAI SDK, same call, just a prefixed model.
const claude = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "Summarize this" }],
});

Supported parameters

Quota forwards a defined set of OpenAI chat-completions parameters to the upstream provider. Anything outside that set is silently dropped today — the request still succeeds and the response still comes back, but the parameter has no effect. We'd rather you know up front than debug it in production.

Parameter	Forwarded?	Notes
model	Yes	Required. Bare names route to OpenAI; `anthropic/…` and `google/…` prefixes route to those providers.
messages	Yes	Required. Translated per provider — Anthropic system messages get hoisted to the top-level system field, Google to systemInstruction.
max_tokens	Yes	Forwarded as `max_tokens`, or auto-rewritten to `max_completion_tokens` for o-series and GPT-5+ reasoning models.
max_completion_tokens	Yes	Accepted as an alias for max_tokens. Same effect.
temperature	Yes	Forwarded for non-reasoning models. Reasoning models (o-series, GPT-5+) ignore sampling parameters per OpenAI's own contract — Quota drops it for those models to avoid an upstream error.
stream	Yes	SSE chunks come back in the same shape as OpenAI. The final chunk also carries a `quota` billing block.
tools	Yes	OpenAI tool schema is the canonical shape. Anthropic and Google requests are translated automatically.
tool_choice	Yes	`auto`, `none`, `required`, and named function selection are all forwarded.
parallel_tool_calls	Yes	Forwarded to OpenAI for non-reasoning models. Reasoning models don't accept it; Quota drops it for them.
reasoning_effort	Partial	Forwarded only for reasoning models (o-series, GPT-5+). Silently dropped for chat models that don't support it.
top_p	Dropped	Not forwarded today. Use temperature.
n	Dropped	Not forwarded. Quota always returns a single choice. Make multiple requests if you need multiple completions.
stop	Dropped	Stop sequences are not forwarded today.
seed	Dropped	Not forwarded. Determinism is provider-side only.
response_format	Dropped	JSON mode and structured outputs are not forwarded today. Ask for JSON in the prompt and parse the response yourself.
frequency_penalty	Dropped	Not forwarded.
presence_penalty	Dropped	Not forwarded.
logit_bias	Dropped	Not forwarded.
logprobs	Dropped	Not forwarded. Token-level probabilities aren't surfaced today.
top_logprobs	Dropped	Not forwarded (depends on logprobs).
user	Dropped	Not forwarded. For per-user attribution, pass the user's OAuth access token (`quota_token_…`) as the bearer — Quota bills the user's wallet directly.
store	Dropped	OpenAI's conversation-storage flag is not forwarded.
metadata	Dropped	Not forwarded today.
modalities	Dropped	Audio/image output modalities are not forwarded. Quota's chat endpoint is text + tool calls today.

Cross-provider notes

Anthropic and Google have their own native parameter sets that don't map cleanly onto OpenAI's shape. The forwarded list above is what Quota guarantees across all three providers. Provider-specific extras (Anthropic's thinking block, Google's safety settings, etc.) aren't available through the OpenAI-shaped endpoint today — see the per-provider compat pages once they land.

Response shape

The same envelope OpenAI returns, plus a quota block with the actual cost, post-call balance, and billing mode. Always check it in production — streaming responses can charge slightly more than the pre-reservation once token usage is finalised.

{
  "id": "chatcmpl-9f3a2b1c",
  "object": "chat.completion",
  "created": 1746823412,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello. How can I help?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  },
  "quota": {
    "credits_used": 18500,
    "balance_before": 8500000,
    "balance_after":  8481500,
    "billing_mode":   "developer",
    "reservation_id": "rsv_5f3a..."
  }
}

Streaming

Streaming uses Server-Sent Events in the OpenAI shape: data: {...}\n\n chunks terminated by data: [DONE]. Existing OpenAI SDK iterators work without changes. The only Quota-specific addition is that the final chunk carries the quota billing block alongside usage. See the Chat completions reference for the full chunk schema and a streaming example.

Limitations and known differences

Parameter coverage is the table above. Anything not listed as forwarded is dropped silently today. We don't reject the request, so a stray top_p or response_format won't blow up — but it also won't do anything.
Single completion per request. n isn't forwarded. If you need k samples, fan out k requests.
No native JSON mode yet. response_format isn't forwarded. Prompt for JSON and parse on your side.
Reasoning models drop sampling. For o-series and GPT-5+ models, Quota strips temperature and parallel_tool_calls before forwarding — same as if you called OpenAI directly with a reasoning model.
Per-user attribution uses the bearer, not the user field. The OpenAI user parameter is dropped. To bill a specific end user's wallet, pass their OAuth access token (quota_token_…) as the Authorization bearer instead of your API key — see user billing via OAuth.
Endpoints surface. Quota implements /v1/chat/completions, /v1/audio/speech (TTS), and /v1/audio/transcriptions (STT) — all OpenAI-shaped, all drop-in for the OpenAI SDK with a swapped baseURL. See Audio for the per-route reference. The /v1/responses, /v1/embeddings, /v1/images/*, and /v1/files surfaces are not implemented.
Quota-only routes. WS /v1/audio/voice-conversion (voice-to-voice) has no OpenAI SDK shape — it's a WebSocket with a Quota-defined auth frame. Use the @usequota/core / @usequota/nextjs helpers, or speak the protocol from Audio › V2V.

What's next

→Chat completions reference

Full schema for the OpenAI-shaped endpoint, including streaming chunks, the quota block, and error envelopes.

→Balance endpoint

Read the wallet's current credit balance from the same key you call chat completions with.

→Developer vs. user billing

Choose whether your account or your end user pays — and how Quota attributes each request.