Quota
/ docs
Dashboard
Docs/API reference/Chat completions

Chat completions

OpenAI-compatible chat endpoint that fans out to every model Quota supports — OpenAI, Anthropic, and Google — billed against the wallet on the bearer token.

POSThttps://api.usequota.ai/v1/chat/completions
Drop-in for the OpenAI SDK
Set baseURL: "https://api.usequota.ai/v1" and most code keeps working — including streaming and tool calls. Quota forwards a focused subset of the OpenAI request schema; see OpenAI compatibility for the full list of supported and silently-dropped parameters.

Authorization#

Send a Quota-issued bearer token in the Authorization header. Three valid token types:

sk-quota-…API keyServer-to-server developer key. Bills the developer's account.
sess_…session tokenBrowser/CLI session token from /auth/login. Use for account-management endpoints, not chat.
quota_token_…OAuth access tokenEnd-user access token issued by /oauth/token. Bills the user's wallet directly.

Request body#

JSON. The parameters below are forwarded to the upstream provider. Any OpenAI-shaped request fields not listed here are silently dropped — see OpenAI compatibility for the full list.

modelstringrequiredProvider-prefixed for non-OpenAI: anthropic/claude-sonnet-4.6, google/gemini-2.5-pro. Bare names default to OpenAI. See supported models.
messagesarray<Message>requiredConversation, oldest first. Each message has a role (system, user, assistant, or tool) and a content string or part array.
max_tokensintegerHard upper bound on completion length. Forwarded as max_completion_tokens for o-series and GPT-5+ models, which require the newer field.
temperaturenumber, 0–2Sampling temperature. Default 1. Ignored by reasoning models.
streambooleanWhen true, returns SSE chunks. The final chunk carries the full quota billing block.
toolsarray<Tool>Tool/function definitions. Routed natively per provider — OpenAI-shape function tools translate to Anthropic and Google tool schemas automatically.
tool_choice"none" | "auto" | "required" | objectControls whether the model calls a tool. Pass {"type":"function","function":{"name":"..."}} to force a specific tool.
parallel_tool_callsbooleanAllow the model to emit multiple tool calls in one turn. Ignored by reasoning models.
reasoning_effort"low" | "medium" | "high"Reasoning models only (o-series, GPT-5 reasoning). Trades latency for answer quality.

Response#

Same envelope as OpenAI, plus a quota block with the actual cost and post-call balance. Always check it in production — streaming responses may charge slightly more than reserved once usage reconciles.

{
  "id": "chatcmpl-9f3a2b1c",
  "object": "chat.completion",
  "created": 1746823412,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a haiku..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 32,
    "total_tokens": 50
  },
  "quota": {
    "credits_used":   28500,
    "balance_before": 8500000,
    "balance_after":  8471500,
    "ledger_id":      "led_5f3a...",
    "wallet":         "developer",
    "billing_mode":   "developer"
  }
}

Error responses#

All errors return a JSON body of shape { "error": { "code", "message" } }.

invalid_api_key401Token missing, revoked, or for the wrong environment.
insufficient_credits402Reservation exceeds balance. Body includes required_credits and current balance.
model_not_allowed403Your plan or OAuth scope does not include that model.
rate_limit_exceeded429Default 100 req/min per key — not a hard cap. Higher per-key limits available on request.
provider_unavailable503Upstream provider failure. Quota retries idempotently for non-streaming calls before surfacing the error.

Supported models#

Use the canonical Quota model name. Provider routing is determined by the prefix; OpenAI is the default when there is no prefix.

OpenAIno prefixgpt-4o, gpt-4o-mini, gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, o1, o3.
Anthropicanthropic/anthropic/claude-opus-4.6, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.5, anthropic/claude-haiku-4.5, anthropic/claude-opus-4.1, anthropic/claude-3.7-sonnet.
Googlegoogle/google/gemini-2.5-pro, google/gemini-2.5-flash, google/gemini-2.0-flash, google/gemini-2.0-flash-lite.

Examples#

Streaming

Stream tokens with Server-Sent Events. The OpenAI envelope ends with a data: [DONE] sentinel, then Quota emits one final event with the billing block.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1",
  apiKey: process.env.QUOTA_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Stream me a story." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Final SSE event after [DONE]:
//   data: {"quota":{"credits_used":...,"balance_after":...,...}}

Tool calling

One OpenAI-shape tools array works across providers. Quota translates to Anthropic and Google tool schemas on the way out.

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "What's the weather in Lisbon?" }],
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  }],
  tool_choice: "auto",
});

User-billing (OAuth token)

When the bearer is a quota_token_… OAuth token, the request bills the end-user's wallet instead of the developer's. The response's quota.wallet field will be "oauth_user" (and the legacy billing_mode field "user"). Branch on wallet — it's unambiguous.

// The user's token, fetched after the OAuth callback.
const response = await fetch("https://api.usequota.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${session.quotaToken}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Summarize this" }],
  }),
});