Rate limits & quotas
Two independent limits gate every API key:
- Per-second rate limit (RPM) — burst protection. Hit it →
429 rate_limited. - Monthly credit quota — your plan budget. Hit it → soft cap charges overage, hard cap returns
429 quota_exhausted.
Soft cap vs hard cap
Each plan starts soft-capped: overage credits keep working at the
per-credit price shown in your dashboard. If you'd rather get
429 quota_exhausted instead, switch the plan to hard cap.
Burst rate limit
Requests are gated by a GCRA limiter sized to your plan's RPM. Bursts up
to the full per-minute budget are fine. When you hit it, the response
includes retry_after_ms:
{
"error": {
"code": "rate_limited",
"message": "Too many requests",
"details": { "retry_after_ms": 1850, "limit_rps": 10 }
}
} SSE connections
The streaming endpoint is gated separately — your plan defines the
maximum number of concurrent SSE connections and the maximum number
of mints per connection. Exceeding either returns 429 rate_limited or
400 validation_failed respectively.
Usage reporting
Live usage is visible in the dashboard. Every request emits a usage event with the credits consumed, status code, latency, and country — you can drill into spikes or unexpected spend.