Skip to main content
Run-level outcomes tell you whether the whole run succeeded. Step-level outcomes tell you which specific step caused success or failure. MuBit stores these signals and feeds them into reflection so lessons are attributed to the exact step that mattered.

Prerequisites

  • MuBit client initialized with a valid API key
  • A multi-step agent run where each step produces an observable result

Flow

  1. Execute an agent step (tool call, LLM inference, decision).
  2. Record a step outcome with signal, rationale, and optional directive hint.
  3. Repeat for each step in the run.
  4. Reflect with include_step_outcomes=True to produce step-attributed lessons.
  5. Use record_outcome() at the end for the overall run-level signal.

Minimal implementation example

step_outcomes.py
from mubit import Client
import os

run_id = "agent:planner:task-123"
client = Client(
    endpoint=os.getenv("MUBIT_ENDPOINT", "https://api.mubit.ai"),
    api_key=os.environ["MUBIT_API_KEY"],
    run_id=run_id,
    transport="http",
)

# Step 1: Planning
plan = call_llm("Break down the task into sub-steps")
client.record_step_outcome(
    step_id="step-1-planning",
    step_name="initial_planning",
    outcome="success",
    signal=0.8,
    rationale="Generated a clear 3-step plan with dependencies identified",
    directive_hint="Include sub-task dependencies explicitly in plans",
)

# Step 2: Tool call
tool_result = execute_tool("search_api", query="relevant docs")
client.record_step_outcome(
    step_id="step-2-search",
    step_name="search_api",
    outcome="failure",
    signal=-0.6,
    rationale="Search returned no results — query was too narrow",
    directive_hint="Use broader search terms before narrowing",
)

# Step 3: Recovery
recovery = call_llm("Retry with broader query")
client.record_step_outcome(
    step_id="step-3-recovery",
    step_name="search_retry",
    outcome="success",
    signal=0.9,
    rationale="Broader query found the target document",
)

# Reflect with step outcomes to produce step-attributed lessons
lessons = client.reflect(include_step_outcomes=True)

# Run-level outcome
client.record_outcome(
    outcome="success",
    signal=0.7,
    rationale="Task completed after one retry",
)

Field reference

FieldTypeRequiredDescription
step_idstringyesUnique step identifier within the run
step_namestringnoHuman-readable label for the step
outcomestringyessuccess, failure, partial, or neutral
signalfloatnoReward signal from -1.0 (worst) to 1.0 (best)
rationalestringnoExplanation of why the outcome was assigned
directive_hintstringnoHindsight guidance for future runs
agent_idstringnoAgent that performed the step
metadata_jsonstringnoArbitrary structured metadata

Combining with lane-scoped memory

Step outcomes work naturally with lanes. If your multi-agent system uses lane-scoped memory, record step outcomes from each agent and then reflect with both lane and step context:
# Agent "planner" records its step outcomes in the run
client.record_step_outcome(
    step_id="plan-v1",
    step_name="planning",
    outcome="success",
    signal=0.9,
    agent_id="planner",
)

# Reflect across all step outcomes
client.reflect(include_step_outcomes=True)

Failure modes and troubleshooting

SymptomRoot causeFix
Reflection produces generic lessons despite step outcomesinclude_step_outcomes not setPass include_step_outcomes=True in reflect()
Step outcome not acceptedMissing step_id or outcomeBoth fields are required
Lessons lack step attributionStep outcomes recorded after reflectionRecord step outcomes before calling reflect()
Too many step outcomes dilute signalRecording outcomes for trivial stepsOnly record outcomes for decision-significant steps

Next steps