Skip to main content
Prompts drift. An agent that was well-tuned two weeks ago now misses cases, misroutes escalations, or over-hedges. The MuBit control plane ships an optimization loop that uses recorded outcomes to propose a better prompt, a diff view to review it, and a one-click approval to activate it — without touching deployed SDK code. This recipe shows the end-to-end flow. Every SDK step below has a Console equivalent inline — use the console when you want human-in-the-loop review and the SDK when you want to automate, schedule, or pass an explicit llm_override. Both paths call the same control-plane endpoints and produce identical PromptVersion rows.

The loop at a glance

Run agents → Record outcomes → Optimize → Review diff → Activate
                                     ↑                        │
                                     └────── (next cycle) ────┘
Every step is a single control-plane call. You can wire this into CI, a cron, or trigger it manually from the console’s Agent Card.

1. Record outcomes while agents run

Every interaction that ends with a judgeable result should call record_outcome (run-level) or record_step_outcome (per-step, for dense feedback). This is the signal the optimizer reads.
Python
client.record_outcome(
    run_id=run_id,
    reference_id=evidence_id,          # the specific fact / lesson / archive block the outcome is about
    outcome="success",                  # "success" | "failure" | "partial" | "neutral"
    signal=0.8,                         # -1.0..1.0
    rationale="Customer confirmed the refund was processed correctly",
    agent_id="triage",
)
For multi-step agents, also record per-step signal:
Python
client.record_step_outcome(
    run_id=run_id,
    step_id="2026-04-17T09-12-route",
    step_name="routing",
    outcome="partial",
    signal=0.3,
    rationale="Routed to billing but should have gone to compliance",
    directive_hint="Check billing AND compliance scopes before routing",
    agent_id="triage",
)
The optimizer weighs failures (signal < 0) and the rationale / directive_hint fields heavily. Invest in writing short, specific rationales — they become the material the synthesised candidate is built from.
Console equivalent: outcomes are recorded from your agent code, not the console — the console reads them back under Agents → your agent → Runs (/app/projects/<pid>/agents/<aid>/runs). Even when you drive optimization entirely from the UI, the record_outcome / record_step_outcome call in your agent loop is still the signal source.

2. Trigger an optimization

When you have enough outcomes to form an opinion (empirically: ~10–20 outcomes with at least a few negatives), ask the control plane to propose a candidate.
Python
resp = client.optimize_prompt(
    agent_id="triage",
    project_id=project_id,
    # Optional: steer which model does the synthesis
    llm_override={
        "provider": "anthropic",
        "model": "claude-sonnet-4-6",
        "temperature": 0.2,
    },
)

candidate = resp["candidate"]
print(resp["optimization_summary"])   # human-readable rationale
print(resp["confidence"])              # 0..1
print(resp["activated"])               # False by default — human review first
The response includes:
  • candidate — a new PromptVersion row with status="candidate" and source="optimization".
  • optimization_summary — what the optimizer changed and why.
  • confidence — the optimizer’s self-reported confidence.
  • activated — whether the candidate was auto-activated (default: false).
Console equivalent: open the agent’s Prompts tab (/app/projects/<pid>/agents/<aid>/prompts) and click Suggest Optimization on the Active System Prompt card. A new row appears in the Version History table with status: candidate and source: optimization, auto-expanded to show the candidate prompt, and a pending-candidate banner appears at the top of the page. The console uses the instance’s default optimizer model — for llm_override you have to use the SDK.

3. Review the diff

Never promote a candidate blind. Fetch the diff against the currently active version:
Python
active = client.get_prompt(agent_id="triage")
diff = client.get_prompt_diff(
    agent_id="triage",
    version_a_id=active["prompt"]["version_id"],
    version_b_id=candidate["version_id"],
)
print(diff["diff_text"])   # unified diff format
Console equivalent: click Review on the pending-candidate banner, or Compare in the Version History row. That opens /app/projects/<pid>/agents/<aid>/compare/<vid> with the same diff_text rendered in a split view, the optimization_summary in a muted caption above the diff, and an Approve & Activate button at the top. What to check:
  1. Does the summary match the diff? If the summary says “tightened escalation criteria” but the diff rewrites the tone, the optimizer hallucinated.
  2. Are edits localized? Small, targeted edits ship safely. A full rewrite needs a canary.
  3. Does the outcome count justify the change? The optimizer can synthesize a confident-looking candidate from 3 outcomes. Wait for more data.
Before activating, run the candidate side-by-side with the active prompt on a known replay set. Use branching for reversibility:
Python
# Snapshot current run so we can compare before / after
checkpoint = client.checkpoint(run_id=run_id, label="pre-candidate-evaluation")

# Run replay traffic. Capture outcomes for both branches.
# (Your replay harness, not shown.)
Or, for a controlled canary, activate the candidate for a fraction of traffic by routing some runs to a duplicated agent with agent_id="triage-canary" whose prompt is the candidate.

5. Activate the winner

Once you’re satisfied, promote the candidate:
Python
client.activate_prompt_version(
    agent_id="triage",
    version_id=candidate["version_id"],
)
Activation is atomic — in-flight runs continue with the old prompt; new runs see the new one. The previously active version transitions to retired and remains available for rollback.
Console equivalent: click Approve & Activate on the compare page, or Approve on the pending-candidate banner in the Prompts tab. The console flips the status badges, retires the prior active version, and returns you to the Prompts tab — no further confirmation step.

6. Rollback if something breaks

If the new prompt regresses, every prior version is still addressable. List versions, pick one, and reactivate:
Python
versions = client.list_prompt_versions(agent_id="triage")
prior_active = next(
    v for v in versions["versions"]
    if v["status"] == "retired" and v["source"] != "rollback"
)
client.activate_prompt_version(
    agent_id="triage",
    version_id=prior_active["version_id"],
)
The newly activated version takes source="rollback" so your audit log reflects intent.
Console equivalent: every retired version stays in Version History on the Prompts tab. Click Compare on a retired row to confirm the diff, then Approve & Activate. The activation is recorded with source: rollback just like the SDK path.

Skill optimization

Exactly the same loop works for skills — swap the method names:
  • optimize_skill(project_id, skill_id, llm_override?)
  • list_skill_versions(skill_id)
  • get_skill_diff(skill_id, version_a_id, version_b_id)
  • activate_skill_version(skill_id, version_id)
Skills include both parameters_schema and instructions in the diff, so review both sections of the unified diff.
Console equivalent: open a project’s Skills tab → pick a skill (/app/projects/<pid>/skills/<sid>). The Active Definition card has separate editable fields for Description, Parameters Schema, and Instructions. Suggest Optimization creates a candidate; the compare page at .../compare/<vid> renders a unified diff across all three fields; Approve & Activate promotes it.

Automating the loop

A common pattern: run the optimize step nightly, but never auto-activate. Post the diff into a Slack channel or create a ticket for a human to approve.
Python
# Cron: nightly per agent
import mubit

client = mubit.Client()

for agent_id in ("triage", "billing", "escalation"):
    resp = client.optimize_prompt(agent_id=agent_id, project_id=PROJECT)
    if resp["confidence"] < 0.6:
        continue                      # too speculative; skip
    candidate = resp["candidate"]
    active = client.get_prompt(agent_id=agent_id)
    diff = client.get_prompt_diff(
        agent_id=agent_id,
        version_a_id=active["prompt"]["version_id"],
        version_b_id=candidate["version_id"],
    )
    notify_slack(agent_id, resp["optimization_summary"], diff["diff_text"])