OpenAI compatibility
Quota is a credit wallet for AI usage: one API key, every major model (OpenAI / Anthropic / Google), and built-in billing so you can charge end-users for their usage instead of eating the bill yourself. The wire format is OpenAI-shaped so the easiest way in is to point your existing OpenAI SDK at Quota — same endpoint path, same request body, same streaming envelope, same error structure, plus a quota block on every response. The list of parameters Quota actually forwards to the upstream provider is below.
Point your OpenAI SDK at Quota
Two lines change. Everything else — model names that start with gpt-, the chat.completions.create call, streaming, tool calling — stays exactly the same.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.usequota.ai/v1", // <- only change
apiKey: process.env.QUOTA_API_KEY, // <- only change
});
const response = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
console.log(response.quota); // { credits_used, balance_after, ... }Provider-prefixed models
Through the same endpoint you can also call Anthropic and Google models by prefixing the model name. No extra SDKs or API keys — Quota translates the request into each provider's native shape and translates the response back.
// Same OpenAI SDK, same call, just a prefixed model.
const claude = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4.6",
messages: [{ role: "user", content: "Summarize this" }],
});Supported parameters
Quota forwards a defined set of OpenAI chat-completions parameters to the upstream provider. Anything outside that set is silently dropped today — the request still succeeds and the response still comes back, but the parameter has no effect. We'd rather you know up front than debug it in production.
| Parameter | Forwarded? | Notes |
|---|---|---|
| model | Yes | Required. Bare names route to OpenAI; anthropic/… and google/… prefixes route to those providers. |
| messages | Yes | Required. Translated per provider — Anthropic system messages get hoisted to the top-level system field, Google to systemInstruction. |
| max_tokens | Yes | Forwarded as max_tokens, or auto-rewritten to max_completion_tokens for o-series and GPT-5+ reasoning models. |
| max_completion_tokens | Yes | Accepted as an alias for max_tokens. Same effect. |
| temperature | Yes | Forwarded for non-reasoning models. Reasoning models (o-series, GPT-5+) ignore sampling parameters per OpenAI's own contract — Quota drops it for those models to avoid an upstream error. |
| stream | Yes | SSE chunks come back in the same shape as OpenAI. The final chunk also carries a quota billing block. |
| tools | Yes | OpenAI tool schema is the canonical shape. Anthropic and Google requests are translated automatically. |
| tool_choice | Yes | auto, none, required, and named function selection are all forwarded. |
| parallel_tool_calls | Yes | Forwarded to OpenAI for non-reasoning models. Reasoning models don't accept it; Quota drops it for them. |
| reasoning_effort | Partial | Forwarded only for reasoning models (o-series, GPT-5+). Silently dropped for chat models that don't support it. |
| top_p | Dropped | Not forwarded today. Use temperature. |
| n | Dropped | Not forwarded. Quota always returns a single choice. Make multiple requests if you need multiple completions. |
| stop | Dropped | Stop sequences are not forwarded today. |
| seed | Dropped | Not forwarded. Determinism is provider-side only. |
| response_format | Dropped | JSON mode and structured outputs are not forwarded today. Ask for JSON in the prompt and parse the response yourself. |
| frequency_penalty | Dropped | Not forwarded. |
| presence_penalty | Dropped | Not forwarded. |
| logit_bias | Dropped | Not forwarded. |
| logprobs | Dropped | Not forwarded. Token-level probabilities aren't surfaced today. |
| top_logprobs | Dropped | Not forwarded (depends on logprobs). |
| user | Dropped | Not forwarded. For per-user attribution, pass the user's OAuth access token (quota_token_…) as the bearer — Quota bills the user's wallet directly. |
| store | Dropped | OpenAI's conversation-storage flag is not forwarded. |
| metadata | Dropped | Not forwarded today. |
| modalities | Dropped | Audio/image output modalities are not forwarded. Quota's chat endpoint is text + tool calls today. |
Response shape
The same envelope OpenAI returns, plus a quota block with the actual cost, post-call balance, and billing mode. Always check it in production — streaming responses can charge slightly more than the pre-reservation once token usage is finalised.
{
"id": "chatcmpl-9f3a2b1c",
"object": "chat.completion",
"created": 1746823412,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello. How can I help?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8,
"total_tokens": 20
},
"quota": {
"credits_used": 18500,
"balance_before": 8500000,
"balance_after": 8481500,
"billing_mode": "developer",
"reservation_id": "rsv_5f3a..."
}
}Streaming
Streaming uses Server-Sent Events in the OpenAI shape: data: {...}\n\n chunks terminated by data: [DONE]. Existing OpenAI SDK iterators work without changes. The only Quota-specific addition is that the final chunk carries the quota billing block alongside usage. See the Chat completions reference for the full chunk schema and a streaming example.
Limitations and known differences
- Parameter coverage is the table above. Anything not listed as forwarded is dropped silently today. We don't reject the request, so a stray
top_porresponse_formatwon't blow up — but it also won't do anything. - Single completion per request.
nisn't forwarded. If you need k samples, fan out k requests. - No native JSON mode yet.
response_formatisn't forwarded. Prompt for JSON and parse on your side. - Reasoning models drop sampling. For o-series and GPT-5+ models, Quota strips
temperatureandparallel_tool_callsbefore forwarding — same as if you called OpenAI directly with a reasoning model. - Per-user attribution uses the bearer, not the
userfield. The OpenAIuserparameter is dropped. To bill a specific end user's wallet, pass their OAuth access token (quota_token_…) as theAuthorizationbearer instead of your API key — see user billing via OAuth. - Endpoints surface. Quota implements
/v1/chat/completions,/v1/audio/speech(TTS), and/v1/audio/transcriptions(STT) — all OpenAI-shaped, all drop-in for the OpenAI SDK with a swappedbaseURL. See Audio for the per-route reference. The/v1/responses,/v1/embeddings,/v1/images/*, and/v1/filessurfaces are not implemented. - Quota-only routes.
WS /v1/audio/voice-conversion(voice-to-voice) has no OpenAI SDK shape — it's a WebSocket with a Quota-defined auth frame. Use the@usequota/core/@usequota/nextjshelpers, or speak the protocol from Audio › V2V.