Conceptual Guide: The Execution Trace
Understanding ToolBrain's high-fidelity execution traces - the critical innovation that ensures reliable RL training by capturing every detail of agent-environment interactions.
The Foundation: Understanding Data Quality in RL
In reinforcement learning, there's a fundamental principle that determines the quality of your trained models: the quality of RL training is entirely dependent on the quality of the data it learns from.
š The Data Quality Challenge
Many agent systems capture only the final, parsed actions, missing crucial information about what the model actually saw and said. This incomplete data creates training challenges:
š Information Gaps
- ⢠Raw model outputs are simplified
- ⢠Parsing details are abstracted away
- ⢠Context information gets condensed
- ⢠Tool execution details are reduced
šÆ Training Opportunities
- ⢠More precise loss calculations
- ⢠Cleaner training signals
- ⢠Better convergence patterns
- ⢠Enhanced reward computation
š Real-World Example: What Gets Lost
Consider an agent trying to solve a math problem. Here's what typically gets logged vs. what actually happened:
š Standard Logging
{
"action": "python_code",
"code": "print(2 + 2)",
"result": "4"
}Typical approach: Structured but incomplete information
ā High-Fidelity Trace
Model saw: "Solve: What is 2+2?"
Model output: "I need to calculate this.
Action: python_code
Code: print(2 + 2)
Final answer: The result is 4"
Tool executed: Python interpreter
Raw tool output: "4\n"
Parsed result: {"result": "4"}Complete information enables accurate training and debugging
š” The ToolBrain Advantage
With complete, high-fidelity execution traces, your model learns from an accurate view of reality. This leads to better performance, reliable behavior, and easy debugging capabilities.Complete visibility enables perfect training.
The Solution: The High-Fidelity Execution Trace
ToolBrain solves this fundamental problem with the Execution Trace - a definitive, structured record of an agent's interaction with its environment that capturesevery single detail of what happened during execution.
ā High-Fidelity Design Principles
šÆ Complete Capture
- ⢠Raw model inputs and outputs
- ⢠Parsing results and interpretations
- ⢠Tool execution details
- ⢠Structured data objects
- ⢠Conversation history
š¬ Perfect Fidelity
- ⢠No information loss or distortion
- ⢠Exact reproduction capability
- ⢠Ground truth for training
- ⢠Complete debugging visibility
- ⢠Reliable reward computation
šļø The Execution Trace Architecture
An Execution Trace is structured as a chronological sequence of interactions, where each step contains complete information about what the agent saw, did, and received back from the environment.
Trace = List[Turn]Anatomy of a Trace
Each Execution Trace is composed of a list of Turn objects. Here are the exact TypedDict definitions from toolbrain/core_types.py:
# From toolbrain/core_types.py - The exact data structures
from typing import TypedDict, Optional, Any, List
class ParsedCompletion(TypedDict):
"""
Structured interpretation of the model's raw output.
Represents how the framework parses and understands the model's response.
"""
thought: Optional[str] # Model's reasoning or explanation
tool_code: Optional[str] # Code or command to execute
final_answer: Optional[str] # Direct response to the user
class Turn(TypedDict):
"""
Complete record of a single agent-environment interaction.
Captures everything that happened in one step of execution.
"""
prompt_for_model: str # Exact input sent to the LLM
model_completion: str # Raw, unprocessed LLM output
parsed_completion: ParsedCompletion # Framework's interpretation
tool_output: Optional[str] # String representation of tool result
action_output: Optional[Any] # Original Python object from tool
formatted_conversation: Optional[str] # Pre-formatted conversation history
# An ExecutionTrace is simply a list of these turns
Trace = List[Turn]š The Flow of Information
Each Turn captures one complete cycle of interaction: the agent receives a prompt, generates a response, that response gets parsed and potentially executed as a tool call, and the result feeds back into the next turn.
Why Each Field Matters: The Complete Picture
Every field in the Turn structure serves a critical purpose for reliable RL training. Let's examine why each piece of information is essential:
prompt_for_model
The ground truth of what the LLM saw. Essential for calculating log-probabilities during training.
Why Critical:
- ⢠RL algorithms need exact input to compute gradients correctly
- ⢠Enables precise loss calculation for policy optimization
- ⢠Required for replay and reproducibility
- ⢠Debugging: "What exactly did the model see when it made this decision?"
model_completion
The raw, unaltered text the LLM generated. Preserves 100% of the output before any parsing.
Why Critical:
- ⢠Contains the model's exact reasoning and thought process
- ⢠Needed for computing action probabilities in RL training
- ⢠Reveals parsing errors and edge cases
- ⢠Debugging: "What did the model actually say vs. what we interpreted?"
parsed_completion
The framework's structured interpretation of the raw output. Shows how the system understood the model's response.
Why Critical:
- ⢠Separates reasoning (`thought`) from action (`tool_code`) from response (`final_answer`)
- ⢠Enables fine-grained reward computation on different aspects
- ⢠Identifies parsing successes and failures
- ⢠Debugging: "How did the framework interpret this output?"
tool_output
The string representation of the tool's result. This is what the LLM sees in the next turn.
Why Critical:
- ⢠Shows exactly what feedback the model received
- ⢠Essential for multi-turn conversation training
- ⢠Enables reward functions based on tool interaction quality
- ⢠Debugging: "What did the tool actually return to the model?"
action_output
The original Python object returned by the tool. Crucial for precise, rule-based reward functions that work with structured data.
Why Critical:
- ⢠Enables type-safe reward computation (e.g., check if result is a float)
- ⢠Allows complex data structure analysis
- ⢠Perfect for mathematical accuracy checking
- ⢠Debugging: "What was the actual structured result, not just the string?"
formatted_conversation
Pre-formatted conversation history. Used for efficient tokenization during the learning phase.
Why Critical:
- ⢠Optimizes training performance by pre-computing conversation formats
- ⢠Ensures consistent tokenization across training steps
- ⢠Enables efficient batch processing during RL updates
- ⢠Debugging: "How was the conversation formatted for the model?"
A Complete Execution Trace Example
Here's what a real Execution Trace looks like for a simple math problem:
# Example: Agent solving "What is the square root of 144?"
trace = [
# Turn 1: Initial reasoning and tool call
{
"prompt_for_model": "User: What is the square root of 144?\n\nYou have access to a Python interpreter. Use it to solve this problem.",
"model_completion": "I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nFinal Answer: Let me calculate this for you.",
"parsed_completion": {
"thought": "I need to calculate the square root of 144. Let me use Python for this.",
"tool_code": "import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')",
"final_answer": None # No final answer yet, still working
},
"tool_output": "The square root of 144 is 12.0",
"action_output": {
"stdout": "The square root of 144 is 12.0\n",
"stderr": "",
"return_code": 0,
"variables": {"result": 12.0}
},
"formatted_conversation": "User: What is the square root of 144?\n\nAssistant: I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nTool output: The square root of 144 is 12.0"
},
# Turn 2: Providing the final answer
{
"prompt_for_model": "User: What is the square root of 144?\n\nYou have access to a Python interpreter. Use it to solve this problem.\n\nPrevious interaction:\nAssistant: I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nTool output: The square root of 144 is 12.0\n\nNow provide your final answer:",
"model_completion": "Based on the calculation, the square root of 144 is 12.\n\nFinal Answer: The square root of 144 is 12.",
"parsed_completion": {
"thought": "Based on the calculation, the square root of 144 is 12.",
"tool_code": None, # No tool call in this turn
"final_answer": "The square root of 144 is 12."
},
"tool_output": None, # No tool was called
"action_output": None, # No tool execution
"formatted_conversation": "User: What is the square root of 144?\n\nAssistant: Based on the calculation, the square root of 144 is 12.\n\nFinal Answer: The square root of 144 is 12."
}
]
# This complete trace enables:
# 1. Perfect reproduction of the interaction
# 2. Accurate reward computation (both on text and structured data)
# 3. Reliable RL training with correct loss calculation
# 4. Complete debugging visibility
# 5. Quality assessment at every stepThe Benefits of Data Fidelity
The high-fidelity Execution Trace design delivers three critical benefits that directly translate to better trained agents:
Reliable Training
Perfect accuracy in loss computation leads to stable, effective learning
š¬ Ground Truth Data
- ⢠Exact input-output pairs for loss calculation
- ⢠No information loss during training
- ⢠Accurate gradient computation
- ⢠Stable convergence behavior
š Training Quality
- ⢠Consistent training signals
- ⢠Reduced noise in optimization
- ⢠Better sample efficiency
- ⢠Predictable learning dynamics
Result: Your agents train faster, more reliably, and achieve better final performance because the RL algorithms have access to perfect, unbiased data about what actually happened.
Powerful Rewards
Maximum flexibility in reward design using both text and structured data
šļø Reward Function Flexibility
# Example: Sophisticated reward function using both text and structured data
def advanced_math_reward(trace: Trace, **kwargs: Any) -> float:
"""
Sophisticated reward function using both text and structured data.
Rewards mathematical reasoning, correct results, and error-free execution.
"""
reward = 0.0
expected_result = kwargs.get("expected_result", 12.0) # sqrt(144) = 12
for turn in trace:
# Reward based on tool/action output (numeric result)
action_output = turn.get("action_output") or turn.get("tool_output")
if action_output is not None:
try:
result = float(str(action_output).strip())
# Reward for getting numeric result
reward += 0.3
# Bonus for correct mathematical result
if abs(result - expected_result) < 0.001:
reward += 0.5
except (ValueError, TypeError):
# Not a numeric result, check for text content
output_text = str(action_output).lower()
if "error" not in output_text:
reward += 0.1 # No errors
if any(word in output_text for word in ["sqrt", "square root", "math"]):
reward += 0.1 # Relevant mathematical terms
# Reward based on reasoning quality (parsed_completion)
parsed = turn.get("parsed_completion", {})
if parsed:
thought = parsed.get("thought", "")
if thought:
thought_lower = str(thought).lower()
if any(word in thought_lower for word in ["calculate", "math", "sqrt", "square"]):
reward += 0.2 # Shows mathematical reasoning
# Check tool code for mathematical operations
tool_code = parsed.get("tool_code", "")
if tool_code and any(func in tool_code for func in ["sqrt", "math.", "**0.5"]):
reward += 0.15 # Uses appropriate mathematical functions
return min(reward, 1.0) # Cap at 1.0š What This Enables
- ⢠Type-safe rewards: Check if results are numbers, not just strings
- ⢠Multi-aspect evaluation: Reward reasoning, execution, and results separately
- ⢠Complex logic: Implement sophisticated success criteria
- ⢠Domain-specific metrics: Use structured data for precise evaluation
Effortless Debugging
Complete visibility into every step of agent execution
šµļø Investigation Capabilities
- ⢠See exactly what the model received as input
- ⢠Compare raw output vs. parsed interpretation
- ⢠Track tool execution and results
- ⢠Analyze conversation flow and context
- ⢠Identify parsing errors and edge cases
š Common Debugging Scenarios
- ⢠"Why did the agent choose this action?"
- ⢠"What went wrong in this tool call?"
- ⢠"How was the model's output interpreted?"
- ⢠"What context was missing from this turn?"
- ⢠"Why did the reward function give this score?"
šÆ Debugging Example
# Debugging a failed interaction
def debug_trace(trace: Trace):
"""
Debug a trace by printing detailed information about each turn.
Compatible with both SmolAgent (action_output) and LangChain (tool_output) formats.
"""
if not trace:
print("ā Empty trace!")
return
print(f"š Debugging trace with {len(trace)} turns")
for i, turn in enumerate(trace):
print(f"\n=== Turn {i+1} ===")
if not isinstance(turn, dict):
print("ā ERROR: Turn is not a dictionary!")
continue
# Safe access to core fields
prompt = turn.get('prompt_for_model', 'N/A')
completion = turn.get('model_completion', 'N/A')
print(f"Model saw: {str(prompt)[:100]}{'...' if len(str(prompt)) > 100 else ''}")
print(f"Model said: {str(completion)[:100]}{'...' if len(str(completion)) > 100 else ''}")
# Safe access to parsed completion
parsed = turn.get('parsed_completion', {})
if parsed:
tool_code = parsed.get('tool_code')
thought = parsed.get('thought')
final_answer = parsed.get('final_answer')
if tool_code:
print(f"Parsed tool code: {tool_code}")
if thought:
thought_preview = str(thought)[:150]
print(f"Agent thought: {thought_preview}{'...' if len(str(thought)) > 150 else ''}")
if final_answer:
print(f"Final answer: {final_answer}")
# Check tool/action outputs (both formats)
action_output = turn.get('action_output') or turn.get('tool_output')
if action_output is not None:
print(f"Tool result: {str(action_output)[:200]}{'...' if len(str(action_output)) > 200 else ''}")
# Check for issues
if parsed and parsed.get('tool_code') and not action_output:
print("ā ļø WARNING: Tool code present but no output!")
if action_output and "error" in str(action_output).lower():
print("ā ERROR: Tool execution failed!")
# Check for empty responses
if not completion or str(completion).strip() == "":
print("ā ļø WARNING: Empty model completion!")
print(f"\nā
Debug complete - analyzed {len(trace)} turns")The Performance Impact
High-fidelity execution traces translate directly into better agent performance:
šÆ Training Accuracy
95%+
Accurate loss computation leads to reliable learning
ā” Debug Speed
10x
Faster issue identification and resolution
š¬ Reward Precision
100%
Complete data enables perfect reward computation
š The Virtuous Cycle
Better data ā More reliable training ā Better agents ā Easier debugging ā Faster iteration ā Even better agents. The high-fidelity execution trace creates a positive feedback loop that accelerates your entire development process.
The Foundation of Reliable RL
The Execution Trace isn't just a nice-to-have feature - it's the foundational innovation that makes reliable reinforcement learning for agents possible. By capturing every detail of agent-environment interactions with perfect fidelity, ToolBrain ensures that:
ā What You Get
- ⢠Accurate, stable RL training
- ⢠Flexible, powerful reward functions
- ⢠Complete debugging visibility
- ⢠Reproducible experiments
- ⢠Reliable performance metrics
šÆ Key Benefits
- ⢠High-quality training data
- ⢠Clear behavior visibility
- ⢠Enhanced reward computation
- ⢠Efficient debugging workflows
- ⢠Consistent performance metrics
š Ready to Experience High-Fidelity Training?
Every ToolBrain agent automatically generates high-fidelity execution traces. No configuration needed, no extra complexity - just reliable, debuggable, high-quality RL training out of the box.