Skip to main content
Production memory systems need more than good demos. MuBit gives you explicit lifecycle boundaries and diagnostics so you can tell whether the system is learning, compacting safely, and retrieving the right evidence.

Operational checklist

AreaTrack this
FreshnessIngest accepted-to-done latency
RetrievalQuery and context latency, weak-evidence rates
Learning loopReflection volume, outcome recording coverage, surfaced strategies
Memory qualitymemory_health results, contradictions, stale entries
Compaction safetyCheckpoint cadence and checkpoint failures
CoordinationHandoff and feedback visibility across agents

Consistency model

  • Keep deterministic run_id / session_id mapping across writes and reads.
  • Use getContext rather than reconstructing large prompts manually.
  • Treat checkpoints as explicit lifecycle boundaries.
  • Use diagnose and memory_health before changing retrieval prompts or weights.

LLM telemetry

MuBit tracks all internal LLM calls (ingestion routing, query synthesis, reflection, snapshots) with Prometheus metrics available at the /metrics endpoint.
MetricLabelsDescription
mubit_llm_calls_totaltask, provider, model, successTotal LLM call count by task and outcome
mubit_llm_call_duration_secondstask, providerCall latency histogram (buckets: 0.1s–30s)
mubit_llm_tokens_totaltask, provider, token_typeToken consumption (prompt vs completion)
mubit_llm_retries_totaltask, providerRetry attempts due to rate limits or errors
mubit_agent_degraded_totalagent, reasonAgent fallbacks to heuristic mode
Storage health is also tracked:
MetricDescription
mubit_storage_compaction_pendingPending compaction work (0 = healthy)
mubit_storage_write_stallWhether writes are being throttled (0 or 1)
mubit_disk_total_bytesTotal disk capacity
mubit_disk_used_bytesDisk space used
mubit_disk_usage_pctDisk usage percentage
These metrics can be scraped by Prometheus at a 15-second interval. The LLM Activity page in the user console provides a dashboard view.

Failure modes and troubleshooting

SymptomRoot causeFix
Learning appears inactiveReflections exist but outcomes are never recordedRecord outcomes against reflected lessons/rules
Important details disappear after long runsNo checkpoint before compactionSave checkpoints before summarization or window resets
Debugging memory quality is slowNo memory diagnostics in the workflowAdd memory_health and diagnose to incident review

Next steps