Are you an LLM? Read llms.txt for a summary of the docs, or llms-full.txt for the full context.
Skip to content
Platform

Retries and Idempotency

When to retry MuBit calls, how to make writes safe to repeat, and what the SDK does for you automatically.

What the SDK retries automatically

By default the SDK retries transient failures with exponential backoff and jitter:

  • mubit.ServerError — any 5xx (500/503) from the server.
  • mubit.TransportError whose .code is transient: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL, CANCELLED, CONNECTION_ERROR, or TIMEOUT (includes network-level failures before any response arrived).

It does not retry AuthError (401), ValidationError (400/409), or UnsupportedFeatureError. Those are caller errors — retrying makes them worse.

Retries are tuned process-wide through environment variables (there is no per-call or per-client RetryPolicy object):

Env varDefaultMeaning
MUBIT_RETRY_ATTEMPTS3Total attempts including the first (min 1).
MUBIT_RETRY_BASE_MS200Base delay in ms.
MUBIT_RETRY_CAP_MS5000Maximum delay per retry.
MUBIT_RETRY_JITTER0.2± jitter fraction (0.0 disables jitter).

Backoff is exponential (factor 2) off MUBIT_RETRY_BASE_MS, capped at MUBIT_RETRY_CAP_MS.

.env
MUBIT_RETRY_ATTEMPTS=5
MUBIT_RETRY_BASE_MS=500

Idempotency keys

remember() (and the underlying control.ingest) carry an idempotency key so a repeated write returns the existing entry instead of creating a duplicate. If you don't pass one, the key defaults to the item id (item_id, else an auto-generated remember-<timestamp>). Pin it explicitly to dedupe across retries from a queue worker:

client.remember(
    session_id=run_id, agent_id="support-agent",
    content="…",
    intent="fact",
    idempotency_key=f"ticket-{ticket_id}-fact-1",
)

record_outcome also accepts an idempotency_key on the wire, so a retried outcome write reinforces once rather than double-counting. Other writes are naturally idempotent by their own ids (e.g. archive keys on the block id, register_agent on the agent id).

When to retry yourself

The SDK's built-in retries cover most cases. Wrap a longer outer budget (queue workers, batch jobs) only when you need one:

  • Retry transient TransportError and ServerError.
  • Don't retry AuthError, ValidationError/AlreadyExistsError, or UnsupportedFeatureError — fix the call instead.

Recommended pattern

from mubit import ServerError, TransportError
import time, random
 
_TRANSIENT = {"UNAVAILABLE", "DEADLINE_EXCEEDED", "RESOURCE_EXHAUSTED",
              "ABORTED", "INTERNAL", "CANCELLED", "CONNECTION_ERROR", "TIMEOUT"}
 
def with_retry(fn, max_attempts=4, base_ms=300):
    for attempt in range(max_attempts):
        try:
            return fn()
        except ServerError:
            pass
        except TransportError as e:
            if getattr(e, "code", None) not in _TRANSIENT:
                raise
        time.sleep(base_ms * (2 ** attempt) * (0.8 + random.random() * 0.4) / 1000)
    raise RuntimeError("retries exhausted")

See also

  • Errors — status codes and the SDK exception taxonomy
  • Rate limits — input caps and overload behavior