Chat completions

OpenAI-compatible chat endpoint that fans out to every model Quota supports — OpenAI, Anthropic, and Google — billed against the wallet on the bearer token.

POSThttps://api.usequota.ai/v1/chat/completions

Drop-in for the OpenAI SDK

Set baseURL: "https://api.usequota.ai/v1" and most code keeps working — including streaming and tool calls. Quota forwards a focused subset of the OpenAI request schema; see OpenAI compatibility for the full list of supported and silently-dropped parameters.

Authorization#

Send a Quota-issued bearer token in the Authorization header. Three valid token types:

sk-quota-…API key	Server-to-server developer key. Bills the developer's account.
sess_…session token	Browser/CLI session token from /auth/login. Use for account-management endpoints, not chat.
quota_token_…OAuth access token	End-user access token issued by /oauth/token. Bills the user's wallet directly.

Request body#

JSON. The parameters below are forwarded to the upstream provider. Any OpenAI-shaped request fields not listed here are silently dropped — see OpenAI compatibility for the full list.

modelstringrequired	Provider-prefixed for non-OpenAI: `anthropic/claude-sonnet-4.6`, `google/gemini-2.5-pro`. Bare names default to OpenAI. See supported models.
messagesarray<Message>required	Conversation, oldest first. Each message has a `role` (`system`, `user`, `assistant`, or `tool`) and a `content` string or part array.
max_tokensinteger	Hard upper bound on completion length. Forwarded as `max_completion_tokens` for o-series and GPT-5+ models, which require the newer field.
temperaturenumber, 0–2	Sampling temperature. Default 1. Ignored by reasoning models.
streamboolean	When `true`, returns SSE chunks. The final chunk carries the full `quota` billing block.
toolsarray<Tool>	Tool/function definitions. Routed natively per provider — OpenAI-shape `function` tools translate to Anthropic and Google tool schemas automatically.
tool_choice"none" \| "auto" \| "required" \| object	Controls whether the model calls a tool. Pass `{"type":"function","function":{"name":"..."}}` to force a specific tool.
parallel_tool_callsboolean	Allow the model to emit multiple tool calls in one turn. Ignored by reasoning models.
reasoning_effort"low" \| "medium" \| "high"	Reasoning models only (o-series, GPT-5 reasoning). Trades latency for answer quality.

Response#

Same envelope as OpenAI, plus a quota block with the actual cost and post-call balance. Always check it in production — streaming responses may charge slightly more than reserved once usage reconciles.

{
  "id": "chatcmpl-9f3a2b1c",
  "object": "chat.completion",
  "created": 1746823412,
  "model": "gpt-4o-mini",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a haiku..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 32,
    "total_tokens": 50
  },
  "quota": {
    "credits_used":   28500,
    "balance_before": 8500000,
    "balance_after":  8471500,
    "ledger_id":      "led_5f3a...",
    "wallet":         "developer",
    "billing_mode":   "developer"
  }
}

Error responses#

All errors return a JSON body of shape { "error": { "code", "message" } }.

invalid_api_key401	Token missing, revoked, or for the wrong environment.
insufficient_credits402	Reservation exceeds balance. Body includes `required_credits` and current `balance`.
model_not_allowed403	Your plan or OAuth scope does not include that model.
rate_limit_exceeded429	Default 100 req/min per key — not a hard cap. Higher per-key limits available on request.
provider_unavailable503	Upstream provider failure. Quota retries idempotently for non-streaming calls before surfacing the error.

Supported models#

Use the canonical Quota model name. Provider routing is determined by the prefix; OpenAI is the default when there is no prefix.

OpenAIno prefix	`gpt-4o`, `gpt-4o-mini`, `gpt-5`, `gpt-5-mini`, `gpt-5-nano`, `gpt-5-pro`, `o1`, `o3`.
Anthropicanthropic/	`anthropic/claude-opus-4.6`, `anthropic/claude-sonnet-4.6`, `anthropic/claude-opus-4.5`, `anthropic/claude-sonnet-4.5`, `anthropic/claude-haiku-4.5`, `anthropic/claude-opus-4.1`, `anthropic/claude-3.7-sonnet`.
Googlegoogle/	`google/gemini-2.5-pro`, `google/gemini-2.5-flash`, `google/gemini-2.0-flash`, `google/gemini-2.0-flash-lite`.

Examples#

Streaming

Stream tokens with Server-Sent Events. The OpenAI envelope ends with a data: [DONE] sentinel, then Quota emits one final event with the billing block.

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.usequota.ai/v1",
  apiKey: process.env.QUOTA_API_KEY,
});

const stream = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Stream me a story." }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
// Final SSE event after [DONE]:
//   data: {"quota":{"credits_used":...,"balance_after":...,...}}

Tool calling

One OpenAI-shape tools array works across providers. Quota translates to Anthropic and Google tool schemas on the way out.

const response = await client.chat.completions.create({
  model: "anthropic/claude-sonnet-4.6",
  messages: [{ role: "user", content: "What's the weather in Lisbon?" }],
  tools: [{
    type: "function",
    function: {
      name: "get_weather",
      description: "Get the current weather for a city.",
      parameters: {
        type: "object",
        properties: { city: { type: "string" } },
        required: ["city"],
      },
    },
  }],
  tool_choice: "auto",
});

User-billing (OAuth token)

When the bearer is a quota_token_… OAuth token, the request bills the end-user's wallet instead of the developer's. The response's quota.wallet field will be "oauth_user" (and the legacy billing_mode field "user"). Branch on wallet — it's unambiguous.

// The user's token, fetched after the OAuth callback.
const response = await fetch("https://api.usequota.ai/v1/chat/completions", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${session.quotaToken}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: "Summarize this" }],
  }),
});

→OpenAI compatibility

The full matrix of which OpenAI request fields Quota forwards, translates, or silently drops.

→Balance API

Read the wallet balance for a developer key or OAuth-bound user. Useful for pre-flight checks.