Rate Limits
The input caps MuBit enforces, how it behaves under load, and how the SDK backs off.
The SDK-facing data and control runtime does not currently enforce per-route request quotas and does not return X-RateLimit-* or Retry-After headers. The limits that apply today are the input-size caps and the overload behavior below. Quota headers may be added later — don't build clients that depend on them yet.
Input caps (enforced today)
| Limit | Where it applies | Over the limit |
|---|---|---|
| 1000 items per request | control.ingest, control.batch_insert | 400 (InvalidArgument) |
| 1000 results | list_run_history, list_projects, list_skills limit | Silently clamped to 1000 |
llm_override.timeout_ms ∈ [1000, 600000] ms | prompt / skill optimize, query overrides | Clamped into range |
Chunk bulk writes into ≤1000-item batches and paginate large lists.
Overload behavior
When a backend dependency is saturated, calls fail with a retryable status rather than a quota error:
429(ResourceExhausted) — an upstream dependency (e.g. an LLM provider used byreflect/query) is throttling. MuBit applies its own outbound rate limiting to those providers.503(Unavailable/FailedPrecondition) — a backend is temporarily unavailable.
Both are safe to retry with backoff. The SDK already retries them — there is no Retry-After header to honor, so use the SDK's exponential backoff. See Retries.
Per-tenant request limits do apply to the platform / instance-management API (the console control plane), configured by the operator via MUBIT_PLATFORM_RATE_LIMIT_REQUESTS_PER_MINUTE. That governs instance CRUD and admin traffic — not your remember / recall / query data calls.
Reducing pressure
- Batch writes with
control.ingest(≤1000 items) instead of N synchronousremember()calls. - Cache
get_context()within a single LLM turn — the same context is usually safe to reuse across tool calls in that turn. - Bound result sizes with
recall(limit=N)to cut downstream token spend.