|
ToolBrain is a lightweight framework that lets you train your existing tool-using agents with RL, using a powerful and intuitive API.
Get Started in Minutes
The code below is a complete, runnable example. It trains a simple agent to use a multiply tool. Copy, paste, and run it to see ToolBrain in action.
# 1. Imports and Component Definition
from toolbrain import Brain
from toolbrain.rewards import reward_exact_match
from smolagents import CodeAgent, TransformersModel, tool
@tool
def multiply(a: int, b: int) -> int:
"""
Multiply two integers.
Args:
a (int): First factor.
b (int): Second factor.
Returns:
int: Product of a and b.
"""
return a * b
# Define a standard agent (CPU-compatible)
model = TransformersModel("Qwen/Qwen2.5-0.5B-Instruct", max_new_tokens=128)
agent = CodeAgent(model=model, tools=[multiply], max_steps=1)
# 2. Initialize the Brain with your agent and a built-in reward
brain = Brain(
agent,
reward_func=reward_exact_match,
algorithm="GRPO"
)
# 3. Define a task and start training!
dataset = [{"query": "What is 8 multiplied by 7?", "gold_answer": "56"}]
brain.train(dataset, num_iterations=10)
# 4. Save your trained agent
brain.save("./my_first_trained_agent")The Power Behind the Simplicity
Unified API & Architecture
The Brain class abstracts away the entire RL loop. You focus on your agent, we handle the training.
Flexible, Hybrid Rewards
Use our built-in rewards like reward_exact_match, write your own Python functions, or leverage our powerful LLM-as-a-Judge.
Intelligent Tool & Data Management
Automatically select relevant tools with Tool Retrieval and generate new training tasks with Zero-Learn.
Efficient & Advanced Training
Out-of-the-box support for GRPO, DPO, and Knowledge Distillation, all accelerated by Unsloth and QLoRA.
🚀 Liked ToolBrain? Share it with your network!