ToolBrain
ToolBrain

|

ToolBrain is a lightweight framework that lets you train your existing tool-using agents with RL, using a powerful and intuitive API.

Get Started in Minutes

The code below is a complete, runnable example. It trains a simple agent to use a multiply tool. Copy, paste, and run it to see ToolBrain in action.

quickstart_example.py
# 1. Imports and Component Definition
from toolbrain import Brain
from toolbrain.rewards import reward_exact_match
from smolagents import CodeAgent, TransformersModel, tool

@tool
def multiply(a: int, b: int) -> int:
    """
    Multiply two integers.

    Args:
        a (int): First factor.
        b (int): Second factor.

    Returns:
        int: Product of a and b.
    """
    return a * b

# Define a standard agent (CPU-compatible)
model = TransformersModel("Qwen/Qwen2.5-0.5B-Instruct", max_new_tokens=128)
agent = CodeAgent(model=model, tools=[multiply], max_steps=1)

# 2. Initialize the Brain with your agent and a built-in reward
brain = Brain(
    agent,
    reward_func=reward_exact_match,
    algorithm="GRPO"
)

# 3. Define a task and start training!
dataset = [{"query": "What is 8 multiplied by 7?", "gold_answer": "56"}]
brain.train(dataset, num_iterations=10)

# 4. Save your trained agent
brain.save("./my_first_trained_agent")

The Power Behind the Simplicity

🧠

Unified API & Architecture

The Brain class abstracts away the entire RL loop. You focus on your agent, we handle the training.

🎯

Flexible, Hybrid Rewards

Use our built-in rewards like reward_exact_match, write your own Python functions, or leverage our powerful LLM-as-a-Judge.

🔧

Intelligent Tool & Data Management

Automatically select relevant tools with Tool Retrieval and generate new training tasks with Zero-Learn.

âš¡

Efficient & Advanced Training

Out-of-the-box support for GRPO, DPO, and Knowledge Distillation, all accelerated by Unsloth and QLoRA.

🚀 Liked ToolBrain? Share it with your network!

Ready to get started?

Check out the code, contribute, or star the repository on GitHub

View on GitHub