Prompt Optimization Lifecycle
Capture outcomes → optimize → diff → activate. A human-in-the-loop workflow for evolving agent prompts from real execution data.
Prompts drift. An agent that was well-tuned two weeks ago now misses cases, misroutes escalations, or over-hedges. The MuBit control plane ships an optimization loop that uses recorded outcomes to propose a better prompt, a diff view to review it, and a one-click approval to activate it — without touching deployed SDK code.
This recipe shows the end-to-end flow. Every SDK step below has a Console equivalent inline — use the console when you want human-in-the-loop review and the SDK when you want to automate or schedule. Both paths call the same control-plane endpoints and produce identical PromptVersion rows.
The loop at a glance
Run agents → Record outcomes → Optimize → Review diff → Activate
↑ │
└────── (next cycle) ────┘Every step is a single control-plane call. You can wire this into CI, a cron, or trigger it manually from the console's Agent Card.
1. Record outcomes while agents run
Every interaction that ends with a judgeable result should call record_outcome (run-level) or record_step_outcome (per-step, for dense feedback). This is the signal the optimizer reads.
client.record_outcome(
session_id=run_id, # falls back to the client's run_id if omitted
reference_id=evidence_id, # the specific fact / lesson / archive block the outcome is about
outcome="success", # "success" | "failure" | "partial" | "neutral"
signal=0.8, # -1.0..1.0
rationale="Customer confirmed the refund was processed correctly",
agent_id="triage",
)For multi-step agents, also record per-step signal:
client.record_step_outcome(
run_id=run_id,
step_id="2026-04-17T09-12-route",
step_name="routing",
outcome="partial",
signal=0.3,
rationale="Routed to billing but should have gone to compliance",
directive_hint="Check billing AND compliance scopes before routing",
agent_id="triage",
)The optimizer weighs failures (signal < 0) and the rationale / directive_hint fields heavily. Invest in writing short, specific rationales — they become the material the synthesised candidate is built from.
Console equivalent: outcomes are recorded from your agent code, not the console — the console reads them back under Agents → your agent → Runs (/app/projects/<pid>/agents/<aid>/runs). Even when you drive optimization entirely from the UI, the record_outcome / record_step_outcome call in your agent loop is still the signal source.
2. Trigger an optimization
When you have enough outcomes to form an opinion (empirically: ~10–20 outcomes with at least a few negatives), ask the control plane to propose a candidate.
resp = client.optimize_prompt(
agent_id="triage",
project_id=project_id,
)
candidate = resp["candidate"]
print(resp["optimization_summary"]) # human-readable rationale
print(resp["confidence"]) # 0..1
print(resp["activated"]) # False by default — human review firstSteering the synthesis model: you can override which model writes the candidate via the llm field (an LlmOverride), but only over the gRPC transport — the HTTP optimize endpoint (the SDK's default transport) ignores any override and uses the instance's default optimizer model, exactly like the console. To pass an override, construct a gRPC client and supply llm:
client = mubit.Client(transport="grpc") # override is dropped on the default HTTP transport
resp = client.optimize_prompt(
agent_id="triage",
project_id=project_id,
llm={
"provider": "anthropic",
"model": "claude-sonnet-4-6",
"temperature": 0.2,
},
)The response includes:
candidate— a newPromptVersionrow withstatus="candidate"andsource="optimization".optimization_summary— what the optimizer changed and why.confidence— the optimizer's self-reported confidence.activated— whether the candidate was auto-activated (default:false).
Console equivalent: open the agent's Prompts tab (/app/projects/<pid>/agents/<aid>/prompts) and click Suggest Optimization on the Active System Prompt card. A new row appears in the Version History table with status: candidate and source: optimization, auto-expanded to show the candidate prompt, and a pending-candidate banner appears at the top of the page. The console uses the instance's default optimizer model — as does the SDK over its default HTTP transport. To pick a different synthesis model, use the gRPC transport with an llm override (see above).
3. Review the diff
Never promote a candidate blind. Fetch the diff against the currently active version:
active = client.get_prompt(agent_id="triage")
diff = client.get_prompt_diff(
agent_id="triage",
version_a_id=active["version"]["version_id"],
version_b_id=candidate["version_id"],
)
print(diff["diff_text"]) # unified diff formatConsole equivalent: click Review on the pending-candidate banner, or Compare in the Version History row. That opens /app/projects/<pid>/agents/<aid>/compare/<vid> with the same diff_text rendered in a split view, the optimization_summary in a muted caption above the diff, and an Approve & Activate button at the top.
What to check:
- Does the summary match the diff? If the summary says "tightened escalation criteria" but the diff rewrites the tone, the optimizer hallucinated.
- Are edits localized? Small, targeted edits ship safely. A full rewrite needs a canary.
- Does the outcome count justify the change? The optimizer can synthesize a confident-looking candidate from 3 outcomes. Wait for more data.
4. Shadow test (optional but recommended)
Before activating, run the candidate side-by-side with the active prompt on a known replay set. Use branching for reversibility:
# Snapshot current run so we can compare before / after
checkpoint = client.checkpoint(run_id=run_id, label="pre-candidate-evaluation")
# Run replay traffic. Capture outcomes for both branches.
# (Your replay harness, not shown.)Or, for a controlled canary, activate the candidate for a fraction of traffic by routing some runs to a duplicated agent with agent_id="triage-canary" whose prompt is the candidate.
5. Activate the winner
Once you're satisfied, promote the candidate:
client.activate_prompt_version(
agent_id="triage",
version_id=candidate["version_id"],
)Activation is atomic — in-flight runs continue with the old prompt; new runs see the new one. The previously active version transitions to retired and remains available for rollback.
Console equivalent: click Approve & Activate on the compare page, or Approve on the pending-candidate banner in the Prompts tab. The console flips the status badges, retires the prior active version, and returns you to the Prompts tab — no further confirmation step.
6. Rollback if something breaks
If the new prompt regresses, every prior version is still addressable. List versions, pick one, and reactivate:
versions = client.list_prompt_versions(agent_id="triage")
prior_active = next(
v for v in versions["versions"]
if v["status"] == "retired" and v["source"] != "rollback"
)
client.activate_prompt_version(
agent_id="triage",
version_id=prior_active["version_id"],
)The newly activated version takes source="rollback" so your audit log reflects intent.
Console equivalent: every retired version stays in Version History on the Prompts tab. Click Compare on a retired row to confirm the diff, then Approve & Activate. The activation is recorded with source: rollback just like the SDK path.
Skill optimization
Exactly the same loop works for skills — swap the method names:
optimize_skill(project_id, skill_id)— likeoptimize_prompt, anllmoverride only applies over the gRPC transportlist_skill_versions(skill_id)get_skill_diff(skill_id, version_a_id, version_b_id)activate_skill_version(skill_id, version_id)
Skills include both parameters_schema and instructions in the diff, so review both sections of the unified diff.
Console equivalent: open a project's Skills tab → pick a skill (/app/projects/<pid>/skills/<sid>). The Active Definition card has separate editable fields for Description, Parameters Schema, and Instructions. Suggest Optimization creates a candidate; the compare page at .../compare/<vid> renders a unified diff across all three fields; Approve & Activate promotes it.
Automating the loop
A common pattern: run the optimize step nightly, but never auto-activate. Post the diff into a Slack channel or create a ticket for a human to approve.
# Cron: nightly per agent
import mubit
client = mubit.Client()
for agent_id in ("triage", "billing", "escalation"):
resp = client.optimize_prompt(agent_id=agent_id, project_id=PROJECT)
if resp["confidence"] < 0.6:
continue # too speculative; skip
candidate = resp["candidate"]
active = client.get_prompt(agent_id=agent_id)
diff = client.get_prompt_diff(
agent_id=agent_id,
version_a_id=active["version"]["version_id"],
version_b_id=candidate["version_id"],
)
notify_slack(agent_id, resp["optimization_summary"], diff["diff_text"])Related pages
- Projects, Agents, Skills, Prompts — the resource model behind the lifecycle.
- Step-Level Outcomes — dense reward signal that feeds better optimizations.
- Activity & Audit Trail — inspect what outcomes were available when the optimizer ran.