ToolBrain

Conceptual Guide: The Execution Trace

Understanding ToolBrain's high-fidelity execution traces - the critical innovation that ensures reliable RL training by capturing every detail of agent-environment interactions.

The Foundation: Understanding Data Quality in RL

In reinforcement learning, there's a fundamental principle that determines the quality of your trained models: the quality of RL training is entirely dependent on the quality of the data it learns from.

šŸ“Š The Data Quality Challenge

Many agent systems capture only the final, parsed actions, missing crucial information about what the model actually saw and said. This incomplete data creates training challenges:

šŸ“š Information Gaps

  • • Raw model outputs are simplified
  • • Parsing details are abstracted away
  • • Context information gets condensed
  • • Tool execution details are reduced

šŸŽÆ Training Opportunities

  • • More precise loss calculations
  • • Cleaner training signals
  • • Better convergence patterns
  • • Enhanced reward computation

šŸ” Real-World Example: What Gets Lost

Consider an agent trying to solve a math problem. Here's what typically gets logged vs. what actually happened:

šŸ“ Standard Logging
{
  "action": "python_code",
  "code": "print(2 + 2)",
  "result": "4"
}

Typical approach: Structured but incomplete information

āœ… High-Fidelity Trace
Model saw: "Solve: What is 2+2?"
Model output: "I need to calculate this.
Action: python_code
Code: print(2 + 2)
Final answer: The result is 4"
Tool executed: Python interpreter
Raw tool output: "4\n"
Parsed result: {"result": "4"}

Complete information enables accurate training and debugging

šŸ’” The ToolBrain Advantage

With complete, high-fidelity execution traces, your model learns from an accurate view of reality. This leads to better performance, reliable behavior, and easy debugging capabilities.Complete visibility enables perfect training.

The Solution: The High-Fidelity Execution Trace

ToolBrain solves this fundamental problem with the Execution Trace - a definitive, structured record of an agent's interaction with its environment that capturesevery single detail of what happened during execution.

āœ… High-Fidelity Design Principles

šŸŽÆ Complete Capture

  • • Raw model inputs and outputs
  • • Parsing results and interpretations
  • • Tool execution details
  • • Structured data objects
  • • Conversation history

šŸ”¬ Perfect Fidelity

  • • No information loss or distortion
  • • Exact reproduction capability
  • • Ground truth for training
  • • Complete debugging visibility
  • • Reliable reward computation

šŸ—ļø The Execution Trace Architecture

An Execution Trace is structured as a chronological sequence of interactions, where each step contains complete information about what the agent saw, did, and received back from the environment.

Structure: Trace = List[Turn]

Anatomy of a Trace

Each Execution Trace is composed of a list of Turn objects. Here are the exact TypedDict definitions from toolbrain/core_types.py:

# From toolbrain/core_types.py - The exact data structures

from typing import TypedDict, Optional, Any, List

class ParsedCompletion(TypedDict):
    """
    Structured interpretation of the model's raw output.
    Represents how the framework parses and understands the model's response.
    """
    thought: Optional[str]        # Model's reasoning or explanation
    tool_code: Optional[str]      # Code or command to execute
    final_answer: Optional[str]   # Direct response to the user

class Turn(TypedDict):
    """
    Complete record of a single agent-environment interaction.
    Captures everything that happened in one step of execution.
    """
    prompt_for_model: str                    # Exact input sent to the LLM
    model_completion: str                    # Raw, unprocessed LLM output  
    parsed_completion: ParsedCompletion      # Framework's interpretation
    tool_output: Optional[str]               # String representation of tool result
    action_output: Optional[Any]             # Original Python object from tool
    formatted_conversation: Optional[str]    # Pre-formatted conversation history

# An ExecutionTrace is simply a list of these turns
Trace = List[Turn]

šŸ”„ The Flow of Information

Each Turn captures one complete cycle of interaction: the agent receives a prompt, generates a response, that response gets parsed and potentially executed as a tool call, and the result feeds back into the next turn.

Why Each Field Matters: The Complete Picture

Every field in the Turn structure serves a critical purpose for reliable RL training. Let's examine why each piece of information is essential:

šŸ“

prompt_for_model

The ground truth of what the LLM saw. Essential for calculating log-probabilities during training.

Why Critical:

  • • RL algorithms need exact input to compute gradients correctly
  • • Enables precise loss calculation for policy optimization
  • • Required for replay and reproducibility
  • • Debugging: "What exactly did the model see when it made this decision?"
šŸ¤–

model_completion

The raw, unaltered text the LLM generated. Preserves 100% of the output before any parsing.

Why Critical:

  • • Contains the model's exact reasoning and thought process
  • • Needed for computing action probabilities in RL training
  • • Reveals parsing errors and edge cases
  • • Debugging: "What did the model actually say vs. what we interpreted?"
šŸ”

parsed_completion

The framework's structured interpretation of the raw output. Shows how the system understood the model's response.

Why Critical:

  • • Separates reasoning (`thought`) from action (`tool_code`) from response (`final_answer`)
  • • Enables fine-grained reward computation on different aspects
  • • Identifies parsing successes and failures
  • • Debugging: "How did the framework interpret this output?"
šŸ”§

tool_output

The string representation of the tool's result. This is what the LLM sees in the next turn.

Why Critical:

  • • Shows exactly what feedback the model received
  • • Essential for multi-turn conversation training
  • • Enables reward functions based on tool interaction quality
  • • Debugging: "What did the tool actually return to the model?"
šŸ“Š

action_output

The original Python object returned by the tool. Crucial for precise, rule-based reward functions that work with structured data.

Why Critical:

  • • Enables type-safe reward computation (e.g., check if result is a float)
  • • Allows complex data structure analysis
  • • Perfect for mathematical accuracy checking
  • • Debugging: "What was the actual structured result, not just the string?"
šŸ’¬

formatted_conversation

Pre-formatted conversation history. Used for efficient tokenization during the learning phase.

Why Critical:

  • • Optimizes training performance by pre-computing conversation formats
  • • Ensures consistent tokenization across training steps
  • • Enables efficient batch processing during RL updates
  • • Debugging: "How was the conversation formatted for the model?"

A Complete Execution Trace Example

Here's what a real Execution Trace looks like for a simple math problem:

# Example: Agent solving "What is the square root of 144?"
trace = [
    # Turn 1: Initial reasoning and tool call
    {
        "prompt_for_model": "User: What is the square root of 144?\n\nYou have access to a Python interpreter. Use it to solve this problem.",
        
        "model_completion": "I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nFinal Answer: Let me calculate this for you.",
        
        "parsed_completion": {
            "thought": "I need to calculate the square root of 144. Let me use Python for this.",
            "tool_code": "import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')",
            "final_answer": None  # No final answer yet, still working
        },
        
        "tool_output": "The square root of 144 is 12.0",
        
        "action_output": {
            "stdout": "The square root of 144 is 12.0\n",
            "stderr": "",
            "return_code": 0,
            "variables": {"result": 12.0}
        },
        
        "formatted_conversation": "User: What is the square root of 144?\n\nAssistant: I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nTool output: The square root of 144 is 12.0"
    },
    
    # Turn 2: Providing the final answer
    {
        "prompt_for_model": "User: What is the square root of 144?\n\nYou have access to a Python interpreter. Use it to solve this problem.\n\nPrevious interaction:\nAssistant: I need to calculate the square root of 144. Let me use Python for this.\n\nAction: python_code\nCode: import math\nresult = math.sqrt(144)\nprint(f'The square root of 144 is {result}')\n\nTool output: The square root of 144 is 12.0\n\nNow provide your final answer:",
        
        "model_completion": "Based on the calculation, the square root of 144 is 12.\n\nFinal Answer: The square root of 144 is 12.",
        
        "parsed_completion": {
            "thought": "Based on the calculation, the square root of 144 is 12.",
            "tool_code": None,  # No tool call in this turn
            "final_answer": "The square root of 144 is 12."
        },
        
        "tool_output": None,  # No tool was called
        "action_output": None,  # No tool execution
        
        "formatted_conversation": "User: What is the square root of 144?\n\nAssistant: Based on the calculation, the square root of 144 is 12.\n\nFinal Answer: The square root of 144 is 12."
    }
]

# This complete trace enables:
# 1. Perfect reproduction of the interaction
# 2. Accurate reward computation (both on text and structured data)
# 3. Reliable RL training with correct loss calculation
# 4. Complete debugging visibility
# 5. Quality assessment at every step

The Benefits of Data Fidelity

The high-fidelity Execution Trace design delivers three critical benefits that directly translate to better trained agents:

šŸŽÆ

Reliable Training

Perfect accuracy in loss computation leads to stable, effective learning

šŸ”¬ Ground Truth Data

  • • Exact input-output pairs for loss calculation
  • • No information loss during training
  • • Accurate gradient computation
  • • Stable convergence behavior

šŸ“ˆ Training Quality

  • • Consistent training signals
  • • Reduced noise in optimization
  • • Better sample efficiency
  • • Predictable learning dynamics

Result: Your agents train faster, more reliably, and achieve better final performance because the RL algorithms have access to perfect, unbiased data about what actually happened.

⚔

Powerful Rewards

Maximum flexibility in reward design using both text and structured data

šŸŽ›ļø Reward Function Flexibility

# Example: Sophisticated reward function using both text and structured data
def advanced_math_reward(trace: Trace, **kwargs: Any) -> float:
    """
    Sophisticated reward function using both text and structured data.
    Rewards mathematical reasoning, correct results, and error-free execution.
    """
    reward = 0.0
    expected_result = kwargs.get("expected_result", 12.0)  # sqrt(144) = 12
    
    for turn in trace:
        # Reward based on tool/action output (numeric result)
        action_output = turn.get("action_output") or turn.get("tool_output")
        if action_output is not None:
            try:
                result = float(str(action_output).strip())
                # Reward for getting numeric result
                reward += 0.3
                
                # Bonus for correct mathematical result
                if abs(result - expected_result) < 0.001:
                    reward += 0.5
            except (ValueError, TypeError):
                # Not a numeric result, check for text content
                output_text = str(action_output).lower()
                if "error" not in output_text:
                    reward += 0.1  # No errors
                
                if any(word in output_text for word in ["sqrt", "square root", "math"]):
                    reward += 0.1  # Relevant mathematical terms
        
        # Reward based on reasoning quality (parsed_completion)
        parsed = turn.get("parsed_completion", {})
        if parsed:
            thought = parsed.get("thought", "")
            if thought:
                thought_lower = str(thought).lower()
                if any(word in thought_lower for word in ["calculate", "math", "sqrt", "square"]):
                    reward += 0.2  # Shows mathematical reasoning
            
            # Check tool code for mathematical operations
            tool_code = parsed.get("tool_code", "")
            if tool_code and any(func in tool_code for func in ["sqrt", "math.", "**0.5"]):
                reward += 0.15  # Uses appropriate mathematical functions
    
    return min(reward, 1.0)  # Cap at 1.0

🌟 What This Enables

  • • Type-safe rewards: Check if results are numbers, not just strings
  • • Multi-aspect evaluation: Reward reasoning, execution, and results separately
  • • Complex logic: Implement sophisticated success criteria
  • • Domain-specific metrics: Use structured data for precise evaluation
šŸ”

Effortless Debugging

Complete visibility into every step of agent execution

šŸ•µļø Investigation Capabilities

  • • See exactly what the model received as input
  • • Compare raw output vs. parsed interpretation
  • • Track tool execution and results
  • • Analyze conversation flow and context
  • • Identify parsing errors and edge cases

šŸ› Common Debugging Scenarios

  • • "Why did the agent choose this action?"
  • • "What went wrong in this tool call?"
  • • "How was the model's output interpreted?"
  • • "What context was missing from this turn?"
  • • "Why did the reward function give this score?"

šŸŽÆ Debugging Example

# Debugging a failed interaction
def debug_trace(trace: Trace):
    """
    Debug a trace by printing detailed information about each turn.
    Compatible with both SmolAgent (action_output) and LangChain (tool_output) formats.
    """
    if not trace:
        print("āŒ Empty trace!")
        return
        
    print(f"šŸ” Debugging trace with {len(trace)} turns")
    
    for i, turn in enumerate(trace):
        print(f"\n=== Turn {i+1} ===")
        
        if not isinstance(turn, dict):
            print("āŒ ERROR: Turn is not a dictionary!")
            continue
        
        # Safe access to core fields
        prompt = turn.get('prompt_for_model', 'N/A')
        completion = turn.get('model_completion', 'N/A')
        
        print(f"Model saw: {str(prompt)[:100]}{'...' if len(str(prompt)) > 100 else ''}")
        print(f"Model said: {str(completion)[:100]}{'...' if len(str(completion)) > 100 else ''}")
        
        # Safe access to parsed completion
        parsed = turn.get('parsed_completion', {})
        if parsed:
            tool_code = parsed.get('tool_code')
            thought = parsed.get('thought')
            final_answer = parsed.get('final_answer')
            
            if tool_code:
                print(f"Parsed tool code: {tool_code}")
            
            if thought:
                thought_preview = str(thought)[:150]
                print(f"Agent thought: {thought_preview}{'...' if len(str(thought)) > 150 else ''}")
                
            if final_answer:
                print(f"Final answer: {final_answer}")
        
        # Check tool/action outputs (both formats)
        action_output = turn.get('action_output') or turn.get('tool_output')
        if action_output is not None:
            print(f"Tool result: {str(action_output)[:200]}{'...' if len(str(action_output)) > 200 else ''}")
        
        # Check for issues
        if parsed and parsed.get('tool_code') and not action_output:
            print("āš ļø  WARNING: Tool code present but no output!")
        
        if action_output and "error" in str(action_output).lower():
            print("āŒ ERROR: Tool execution failed!")
            
        # Check for empty responses
        if not completion or str(completion).strip() == "":
            print("āš ļø  WARNING: Empty model completion!")

    print(f"\nāœ… Debug complete - analyzed {len(trace)} turns")

The Performance Impact

High-fidelity execution traces translate directly into better agent performance:

šŸŽÆ Training Accuracy

95%+

Accurate loss computation leads to reliable learning

⚔ Debug Speed

10x

Faster issue identification and resolution

šŸ”¬ Reward Precision

100%

Complete data enables perfect reward computation

šŸ”„ The Virtuous Cycle

Better data → More reliable training → Better agents → Easier debugging → Faster iteration → Even better agents. The high-fidelity execution trace creates a positive feedback loop that accelerates your entire development process.

The Foundation of Reliable RL

The Execution Trace isn't just a nice-to-have feature - it's the foundational innovation that makes reliable reinforcement learning for agents possible. By capturing every detail of agent-environment interactions with perfect fidelity, ToolBrain ensures that:

āœ… What You Get

  • • Accurate, stable RL training
  • • Flexible, powerful reward functions
  • • Complete debugging visibility
  • • Reproducible experiments
  • • Reliable performance metrics

šŸŽÆ Key Benefits

  • • High-quality training data
  • • Clear behavior visibility
  • • Enhanced reward computation
  • • Efficient debugging workflows
  • • Consistent performance metrics

šŸš€ Ready to Experience High-Fidelity Training?

Every ToolBrain agent automatically generates high-fidelity execution traces. No configuration needed, no extra complexity - just reliable, debuggable, high-quality RL training out of the box.