Quota
/ docs
Dashboard

Messages

Anthropic-compatible Messages endpoint. Point the Anthropic SDK at Quota and your existing code keeps working — including streaming, tool use, and system prompts — billed against the wallet on the bearer token.

POSThttps://api.usequota.ai/v1/messages
Drop-in for the Anthropic SDK
Set baseURL: "https://api.usequota.ai" on the Anthropic SDK and the rest of your code stays the same. Quota proxies to Anthropic, deducts credits, and returns the response unchanged (plus a quota billing block).

Request

JSON body matching Anthropic's Messages schema. The model, max_tokens, and messages fields are required.

curl https://api.usequota.ai/v1/messages \
  -H "Authorization: Bearer $QUOTA_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-sonnet-4.6",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Write a haiku about credits."}
    ]
  }'

Request body

modelstringrequiredProvider-prefixed model ID. For Anthropic, use anthropic/claude-opus-4.6, anthropic/claude-sonnet-4.6, anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.5, anthropic/claude-haiku-4.5, anthropic/claude-opus-4.1, or anthropic/claude-3.7-sonnet. Bare Anthropic-style IDs are normalized.
max_tokensintegerrequiredMaximum tokens to generate. Required by Anthropic. Used as the upper bound when reserving credits before the call.
messagesarray<Message>requiredConversation, oldest first. Each message has a role (user or assistant) and a content string or array of content blocks.
systemstringSystem prompt. Sent as Anthropic's top-level system field, not as a message.
temperaturenumberSampling temperature, 0–1. Default 1.
streambooleanWhen true, returns Server-Sent Events. The final event is a quota_usage event with the billing block.
toolsarray<Tool>Tool definitions in Anthropic format — { name, description, input_schema }. See Tool use below.

Response

Anthropic-shaped envelope plus a quota block with the actual cost and post-call balance. Always check quota.balance_after before queuing the next request.

{
  "id": "msg_01ABC...",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-6-20260217",
  "content": [
    { "type": "text", "text": "Credits drift like leaves..." }
  ],
  "stop_reason": "end_turn",
  "usage": {
    "input_tokens": 18,
    "output_tokens": 32
  },
  "quota": {
    "credits_used": 28500,
    "balance_before": 8500000,
    "balance_after": 8471500,
    "wallet": "developer",
    "billing_mode": "developer"
  }
}

Billing metadata is also returned as response headers: X-Quota-Credits-Used, X-Quota-Balance, and X-Quota-Markup (when applied).

Streaming

Set stream: true to receive Anthropic-format SSE. Each event has both an event: line and a data: line.

event: message_start
data: {"type":"message_start","message":{"id":"msg_01ABC","model":"claude-sonnet-4-6-20260217",...}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: message_stop
data: {"type":"message_stop"}

event: quota_usage
data: {"type":"quota_usage","quota":{"credits_used":28500,"balance_before":8500000,"balance_after":8471500,"ledger_id":"led_...","wallet":"developer","billing_mode":"developer"}}
quota_usage event shape
The terminal quota_usage event nests its payload under data.quota.{...} — not at the top level. Read credits_used, balance_after, ledger_id, and billing_mode from there. When app-level markup is configured, the event also includes markup_credits.

Reserved credits are returned in SSE response headers up front: X-Quota-Credits-Reserved and X-Quota-Balance-Reserved. Final amounts (which may be slightly higher) come in the quota_usage event.

Tool use

Pass Anthropic-shaped tool definitions in the tools array. The model emits tool_use content blocks; reply with tool_result blocks on the next turn.

const message = await client.messages.create({
  model: "anthropic/claude-sonnet-4.6",
  max_tokens: 1024,
  tools: [
    {
      name: "get_weather",
      description: "Get the current weather for a city.",
      input_schema: {
        type: "object",
        properties: {
          city: { type: "string" },
        },
        required: ["city"],
      },
    },
  ],
  messages: [
    { role: "user", content: "What's the weather in Lisbon?" },
  ],
});

// message.content may include a tool_use block:
//   { type: "tool_use", id: "toolu_...", name: "get_weather",
//     input: { city: "Lisbon" } }
// Reply with a tool_result block on the next turn.

Billing a user's wallet

For end-user-pays flows, pass the user's OAuth access token (quota_token_…) as the bearer instead of your API key. Quota debits the user's wallet rather than yours. See Sign in with Quota or Connect Quota Wallet for the end-to-end OAuth flow. In Sandbox mode (billing_mode: "test") Quota returns a synthetic Anthropic response and writes a sandbox ledger entry — no provider call is made, but Quota usage is still metered.

Errors

Errors use Anthropic's envelope shape, not the OpenAI-style { error: { code, message } } wrapper:

{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Not authenticated",
    "hint": "Verify your API key is correct and active...",
    "docs_url": "https://usequota.ai/docs/errors#invalid_api_key"
  }
}
401invalid_api_keyToken missing, revoked, or for the wrong environment.
400bad_requestInvalid request body.
402insufficient_creditsReservation exceeds wallet balance.
404user_not_foundExternal user ID does not exist on this app. Fund the user first.
404model_not_foundUnknown model or no pricing configured.
429rate_limit_exceededQuota rate limit hit. Default 100 rpm; per-key overrides on request.
502upstream_errorAnthropic returned an error.