llm_override. Both paths call the same control-plane endpoints and produce identical PromptVersion rows.
The loop at a glance
1. Record outcomes while agents run
Every interaction that ends with a judgeable result should callrecord_outcome (run-level) or record_step_outcome (per-step, for dense feedback). This is the signal the optimizer reads.
Python
Python
Console equivalent: outcomes are recorded from your agent code, not the console — the console reads them back under Agents → your agent → Runs (
/app/projects/<pid>/agents/<aid>/runs). Even when you drive optimization entirely from the UI, the record_outcome / record_step_outcome call in your agent loop is still the signal source.2. Trigger an optimization
When you have enough outcomes to form an opinion (empirically: ~10–20 outcomes with at least a few negatives), ask the control plane to propose a candidate.Python
candidate— a newPromptVersionrow withstatus="candidate"andsource="optimization".optimization_summary— what the optimizer changed and why.confidence— the optimizer’s self-reported confidence.activated— whether the candidate was auto-activated (default:false).
Console equivalent: open the agent’s Prompts tab (
/app/projects/<pid>/agents/<aid>/prompts) and click Suggest Optimization on the Active System Prompt card. A new row appears in the Version History table with status: candidate and source: optimization, auto-expanded to show the candidate prompt, and a pending-candidate banner appears at the top of the page. The console uses the instance’s default optimizer model — for llm_override you have to use the SDK.3. Review the diff
Never promote a candidate blind. Fetch the diff against the currently active version:Python
Review on the pending-candidate banner, or Compare in the Version History row. That opens /app/projects/<pid>/agents/<aid>/compare/<vid> with the same diff_text rendered in a split view, the optimization_summary in a muted caption above the diff, and an Approve & Activate button at the top.
What to check:
- Does the summary match the diff? If the summary says “tightened escalation criteria” but the diff rewrites the tone, the optimizer hallucinated.
- Are edits localized? Small, targeted edits ship safely. A full rewrite needs a canary.
- Does the outcome count justify the change? The optimizer can synthesize a confident-looking candidate from 3 outcomes. Wait for more data.
4. Shadow test (optional but recommended)
Before activating, run the candidate side-by-side with the active prompt on a known replay set. Use branching for reversibility:Python
agent_id="triage-canary" whose prompt is the candidate.
5. Activate the winner
Once you’re satisfied, promote the candidate:Python
retired and remains available for rollback.
Console equivalent: click
Approve & Activate on the compare page, or Approve on the pending-candidate banner in the Prompts tab. The console flips the status badges, retires the prior active version, and returns you to the Prompts tab — no further confirmation step.6. Rollback if something breaks
If the new prompt regresses, every prior version is still addressable. List versions, pick one, and reactivate:Python
source="rollback" so your audit log reflects intent.
Console equivalent: every retired version stays in Version History on the Prompts tab. Click
Compare on a retired row to confirm the diff, then Approve & Activate. The activation is recorded with source: rollback just like the SDK path.Skill optimization
Exactly the same loop works for skills — swap the method names:optimize_skill(project_id, skill_id, llm_override?)list_skill_versions(skill_id)get_skill_diff(skill_id, version_a_id, version_b_id)activate_skill_version(skill_id, version_id)
parameters_schema and instructions in the diff, so review both sections of the unified diff.
Console equivalent: open a project’s Skills tab → pick a skill (
/app/projects/<pid>/skills/<sid>). The Active Definition card has separate editable fields for Description, Parameters Schema, and Instructions. Suggest Optimization creates a candidate; the compare page at .../compare/<vid> renders a unified diff across all three fields; Approve & Activate promotes it.Automating the loop
A common pattern: run the optimize step nightly, but never auto-activate. Post the diff into a Slack channel or create a ticket for a human to approve.Python
Related pages
- Projects, Agents, Skills, Prompts — the resource model behind the lifecycle.
- Step-Level Outcomes — dense reward signal that feeds better optimizations.
- Activity & Audit Trail — inspect what outcomes were available when the optimizer ran.