Platform

Retries and Idempotency

When to retry Mubit calls, how to make writes safe to repeat, and what the SDK does for you automatically.

What the SDK retries automatically

By default the SDK retries transient failures with exponential backoff and jitter:

mubit.ServerError — any 5xx (500/503) from the server.
mubit.TransportError whose .code is transient: UNAVAILABLE, DEADLINE_EXCEEDED, RESOURCE_EXHAUSTED, ABORTED, INTERNAL, CANCELLED, CONNECTION_ERROR, or TIMEOUT (includes network-level failures before any response arrived).

It does not retry AuthError (401), ValidationError (400/409), or UnsupportedFeatureError. Those are caller errors — retrying makes them worse.

Retries are tuned process-wide through environment variables (there is no per-call or per-client RetryPolicy object):

Env var	Default	Meaning
`MUBIT_RETRY_ATTEMPTS`	`3`	Total attempts including the first (min `1`).
`MUBIT_RETRY_BASE_MS`	`200`	Base delay in ms.
`MUBIT_RETRY_CAP_MS`	`5000`	Maximum delay per retry.
`MUBIT_RETRY_JITTER`	`0.2`	± jitter fraction (`0.0` disables jitter).

Backoff is exponential (factor 2) off MUBIT_RETRY_BASE_MS, capped at MUBIT_RETRY_CAP_MS.

.env

MUBIT_RETRY_ATTEMPTS=5
MUBIT_RETRY_BASE_MS=500

Idempotency keys

remember() (and the underlying control.ingest) carry an idempotency key so a repeated write returns the existing entry instead of creating a duplicate. If you don't pass one, the key defaults to the item id (item_id, else an auto-generated remember-<timestamp>). Pin it explicitly to dedupe across retries from a queue worker:

client.remember(
    session_id=run_id, agent_id="support-agent",
    content="…",
    intent="fact",
    idempotency_key=f"ticket-{ticket_id}-fact-1",
)

record_outcome also accepts an idempotency_key, so a retried outcome write reinforces once rather than double-counting — a typed kwarg/option from SDK v0.12.0 in all three languages; on earlier versions it is a wire-level field (send it via POST /v2/control/outcome directly). Other writes are naturally idempotent by their own ids (e.g. archive keys on the block id, register_agent on the agent id).

When to retry yourself

The SDK's built-in retries cover most cases. Wrap a longer outer budget (queue workers, batch jobs) only when you need one:

Retry transient TransportError and ServerError.
Don't retry AuthError, ValidationError/AlreadyExistsError, or UnsupportedFeatureError — fix the call instead.

Recommended pattern

from mubit import ServerError, TransportError
import time, random
 
_TRANSIENT = {"UNAVAILABLE", "DEADLINE_EXCEEDED", "RESOURCE_EXHAUSTED",
              "ABORTED", "INTERNAL", "CANCELLED", "CONNECTION_ERROR", "TIMEOUT"}
 
def with_retry(fn, max_attempts=4, base_ms=300):
    for attempt in range(max_attempts):
        try:
            return fn()
        except ServerError:
            pass
        except TransportError as e:
            if getattr(e, "code", None) not in _TRANSIENT:
                raise
        time.sleep(base_ms * (2 ** attempt) * (0.8 + random.random() * 0.4) / 1000)
    raise RuntimeError("retries exhausted")

What the SDK retries automatically

Idempotency keys

When to retry yourself

Recommended pattern

See also