Recipes
Step-Level Outcomes and Process Rewards
Record per-step reward signals during agentic runs so reflection produces richer, step-attributed lessons.
Run-level outcomes tell you whether the whole run succeeded. Step-level outcomes tell you which specific step caused success or failure. MuBit stores these signals and feeds them into reflection so lessons are attributed to the exact step that mattered.
Prerequisites
- MuBit client initialized with a valid API key
- A multi-step agent run where each step produces an observable result
Flow
- Execute an agent step (tool call, LLM inference, decision).
- Record a step outcome with signal, rationale, and optional directive hint.
- Repeat for each step in the run.
- Reflect with
include_step_outcomesset totrueto produce step-attributed lessons. This is a wire-level field on the reflect request — the typedreflect()helpers do not forward it, so use the low-level passthrough (client.control.reflect({...})in JS, or send the field directly over gRPC/HTTP from Python). - Use
record_outcome()at the end for the overall run-level signal, passing thereference_idof the lesson, evidence item, or archive block the outcome is about.
Minimal implementation example
Field reference
| Field | Type | Required | Description |
|---|---|---|---|
step_id | string | yes | Unique step identifier within the run |
step_name | string | no | Human-readable label for the step |
outcome | string | yes | success, failure, partial, or neutral |
signal | float | no | Reward signal from -1.0 (worst) to 1.0 (best) |
rationale | string | no | Explanation of why the outcome was assigned |
directive_hint | string | no | Hindsight guidance for future runs |
agent_id | string | no | Agent that performed the step |
metadata_json | string | no | Arbitrary structured metadata |
Combining with lane-scoped memory
Step outcomes work naturally with lanes. If your multi-agent system uses lane-scoped memory, record step outcomes from each agent and then reflect with both lane and step context:
# Agent "planner" records its step outcomes in the run
client.record_step_outcome(
step_id="plan-v1",
step_name="planning",
outcome="success",
signal=0.9,
agent_id="planner",
)
# Reflect across all step outcomes. include_step_outcomes is a wire-level
# field the typed helper does not forward — send it directly over gRPC/HTTP
# (POST /v2/control/reflect with "include_step_outcomes": true), or use the
# JS control passthrough: client.control.reflect({ run_id, include_step_outcomes: true }).
client.reflect()Failure modes and troubleshooting
| Symptom | Root cause | Fix |
|---|---|---|
| Reflection produces generic lessons despite step outcomes | include_step_outcomes not set, or dropped by the typed helper | Send include_step_outcomes: true as a wire-level field on the reflect request — client.control.reflect({ run_id, include_step_outcomes: true }) in JS, or POST it directly over gRPC/HTTP from Python (the typed reflect() helper does not forward it) |
| Step outcome not accepted | Missing step_id or outcome | Both fields are required |
| Lessons lack step attribution | Step outcomes recorded after reflection | Record step outcomes before calling reflect() |
| Too many step outcomes dilute signal | Recording outcomes for trivial steps | Only record outcomes for decision-significant steps |
Next steps
- Review the full HTTP contract at Control HTTP reference.
- Review the gRPC surface at Control gRPC reference.
- See Lane-Scoped Multi-Agent Memory for memory isolation patterns.