RealFlow.so

Rate limits & quotas

Two independent limits gate every API key:

  • Per-second rate limit (RPM) — burst protection. Hit it → 429 rate_limited.
  • Monthly credit quota — your plan budget. Hit it → soft cap charges overage, hard cap returns 429 quota_exhausted.

Soft cap vs hard cap

Each plan starts soft-capped: overage credits keep working at the per-credit price shown in your dashboard. If you'd rather get 429 quota_exhausted instead, switch the plan to hard cap.

Burst rate limit

Requests are gated by a GCRA limiter sized to your plan's RPM. Bursts up to the full per-minute budget are fine. When you hit it, the response includes retry_after_ms:

{
  "error": {
    "code": "rate_limited",
    "message": "Too many requests",
    "details": { "retry_after_ms": 1850, "limit_rps": 10 }
  }
}

SSE connections

The streaming endpoint is gated separately — your plan defines the maximum number of concurrent SSE connections and the maximum number of mints per connection. Exceeding either returns 429 rate_limited or 400 validation_failed respectively.

Usage reporting

Live usage is visible in the dashboard. Every request emits a usage event with the credits consumed, status code, latency, and country — you can drill into spikes or unexpected spend.