OpenThoughts-Agent details open SFT data recipes
The OpenThoughts-Agent collaboration has released a systematic six-stage data curation pipeline and a 100K training set for agentic AI models. By fine-tuning Qwen3-32B on this dataset, the project delivers a SOTA open-data agentic model scoring 44.8% average accuracy across seven benchmarks.
While most agentic models are trained on narrow, closed datasets targeting specific benchmarks, this work offers a blueprint for building generalized AI agents. It demonstrates that systematic data filtering and scale are far more crucial than complex model architectures.
- –**Ablation-backed design**: Based on over 100 controlled ablation experiments analyzing task sourcing, mixing, rollout generation, and filtering.
- –**Frontier-guided filtering**: Finds that filtering tasks based on teacher model token usage and keeping agentic traces above 5 turns dramatically improves downstream agent capability.
- –**SOTA benchmark performance**: The resulting OpenThinkerAgent-32B outperforms Nemotron-Terminal-32B by 3.9 percentage points, showing strong out-of-distribution generalization.
- –**RL task synthesis**: Demonstrates a novel RL task sourcing method (pymethods2test) that converts competitive programming problems into single-function Python unit tests for reinforcing 8B models.
DISCOVERED
1d ago
2026-06-26
PUBLISHED
1d ago
2026-06-26
RELEVANCE
AUTHOR
Discover AI