Chat completions
OpenAI-compatible chat endpoint that fans out to every model Quota supports — OpenAI, Anthropic, and Google — billed against the wallet on the bearer token.
Authorization#
Send a Quota-issued bearer token in the Authorization header. Three valid token types:
| sk-quota-…API key | Server-to-server developer key. Bills the developer's account. |
| sess_…session token | Browser/CLI session token from /auth/login. Use for account-management endpoints, not chat. |
| quota_token_…OAuth access token | End-user access token issued by /oauth/token. Bills the user's wallet directly. |
Request body#
JSON. The parameters below are forwarded to the upstream provider. Any OpenAI-shaped request fields not listed here are silently dropped — see OpenAI compatibility for the full list.
| modelstringrequired | Provider-prefixed for non-OpenAI: anthropic/claude-sonnet-4.6, google/gemini-2.5-pro. Bare names default to OpenAI. See supported models. |
| messagesarray<Message>required | Conversation, oldest first. Each message has a role (system, user, assistant, or tool) and a content string or part array. |
| max_tokensinteger | Hard upper bound on completion length. Forwarded as max_completion_tokens for o-series and GPT-5+ models, which require the newer field. |
| temperaturenumber, 0–2 | Sampling temperature. Default 1. Ignored by reasoning models. |
| streamboolean | When true, returns SSE chunks. The final chunk carries the full quota billing block. |
| toolsarray<Tool> | Tool/function definitions. Routed natively per provider — OpenAI-shape function tools translate to Anthropic and Google tool schemas automatically. |
| tool_choice"none" | "auto" | "required" | object | Controls whether the model calls a tool. Pass {"type":"function","function":{"name":"..."}} to force a specific tool. |
| parallel_tool_callsboolean | Allow the model to emit multiple tool calls in one turn. Ignored by reasoning models. |
| reasoning_effort"low" | "medium" | "high" | Reasoning models only (o-series, GPT-5 reasoning). Trades latency for answer quality. |
Response#
Same envelope as OpenAI, plus a quota block with the actual cost and post-call balance. Always check it in production — streaming responses may charge slightly more than reserved once usage reconciles.
{
"id": "chatcmpl-9f3a2b1c",
"object": "chat.completion",
"created": 1746823412,
"model": "gpt-4o-mini",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a haiku..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 32,
"total_tokens": 50
},
"quota": {
"credits_used": 28500,
"balance_before": 8500000,
"balance_after": 8471500,
"ledger_id": "led_5f3a...",
"wallet": "developer",
"billing_mode": "developer"
}
}Error responses#
All errors return a JSON body of shape { "error": { "code", "message" } }.
| invalid_api_key401 | Token missing, revoked, or for the wrong environment. |
| insufficient_credits402 | Reservation exceeds balance. Body includes required_credits and current balance. |
| model_not_allowed403 | Your plan or OAuth scope does not include that model. |
| rate_limit_exceeded429 | Default 100 req/min per key — not a hard cap. Higher per-key limits available on request. |
| provider_unavailable503 | Upstream provider failure. Quota retries idempotently for non-streaming calls before surfacing the error. |
Supported models#
Use the canonical Quota model name. Provider routing is determined by the prefix; OpenAI is the default when there is no prefix.
| OpenAIno prefix | gpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, o1, o3. |
| Anthropicanthropic/ | anthropic/claude-opus-4.6, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.5, anthropic/claude-haiku-4.5, anthropic/claude-opus-4.1, anthropic/claude-3.7-sonnet. |
| Googlegoogle/ | google/gemini-2.5-pro, google/gemini-2.5-flash, google/gemini-2.0-flash, google/gemini-2.0-flash-lite. |
Examples#
Streaming
Stream tokens with Server-Sent Events. The OpenAI envelope ends with a data: [DONE] sentinel, then Quota emits one final event with the billing block.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.usequota.ai/v1",
apiKey: process.env.QUOTA_API_KEY,
});
const stream = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Stream me a story." }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Final SSE event after [DONE]:
// data: {"quota":{"credits_used":...,"balance_after":...,...}}Tool calling
One OpenAI-shape tools array works across providers. Quota translates to Anthropic and Google tool schemas on the way out.
const response = await client.chat.completions.create({
model: "anthropic/claude-sonnet-4.6",
messages: [{ role: "user", content: "What's the weather in Lisbon?" }],
tools: [{
type: "function",
function: {
name: "get_weather",
description: "Get the current weather for a city.",
parameters: {
type: "object",
properties: { city: { type: "string" } },
required: ["city"],
},
},
}],
tool_choice: "auto",
});User-billing (OAuth token)
When the bearer is a quota_token_… OAuth token, the request bills the end-user's wallet instead of the developer's. The response's quota.wallet field will be "oauth_user" (and the legacy billing_mode field "user"). Branch on wallet — it's unambiguous.
// The user's token, fetched after the OAuth callback.
const response = await fetch("https://api.usequota.ai/v1/chat/completions", {
method: "POST",
headers: {
Authorization: `Bearer ${session.quotaToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Summarize this" }],
}),
});