OpenThoughts-Agent details open SFT data recipes

// 1d agoRESEARCH PAPER

OpenThoughts-Agent details open SFT data recipes

The OpenThoughts-Agent collaboration has released a systematic six-stage data curation pipeline and a 100K training set for agentic AI models. By fine-tuning Qwen3-32B on this dataset, the project delivers a SOTA open-data agentic model scoring 44.8% average accuracy across seven benchmarks.

// ANALYSIS

While most agentic models are trained on narrow, closed datasets targeting specific benchmarks, this work offers a blueprint for building generalized AI agents. It demonstrates that systematic data filtering and scale are far more crucial than complex model architectures.

–**Ablation-backed design**: Based on over 100 controlled ablation experiments analyzing task sourcing, mixing, rollout generation, and filtering.
–**Frontier-guided filtering**: Finds that filtering tasks based on teacher model token usage and keeping agentic traces above 5 turns dramatically improves downstream agent capability.
–**SOTA benchmark performance**: The resulting OpenThinkerAgent-32B outperforms Nemotron-Terminal-32B by 3.9 percentage points, showing strong out-of-distribution generalization.
–**RL task synthesis**: Demonstrates a novel RL task sourcing method (pymethods2test) that converts competitive programming problems into single-function Python unit tests for reinforcing 8B models.

// TAGS

openthoughts-agentagentsynthetic-datadatasetfine-tuningopen-sourceai-codingllm

DISCOVERED

1d ago

2026-06-26

PUBLISHED

1d ago

2026-06-26

RELEVANCE

9/ 10

AUTHOR

Discover AI

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE47m ago

ACE Robotics, CUHK Open-Source ACE-Ego

ACE ROBOTICS and CUHK have open-sourced ACE-Ego, a unified Vision-Language-Action (VLA) embodied AI model that enables robots to learn from egocentric human videos. The model utilizes camera-space actions and morphology conditioning to translate human movements into robot trajectories, achieving state-of-the-art benchmark performance.

RESEARCH55m ago

BinEval decomposes LLM evaluation into binary questions

BinEval is a training-free, task-agnostic LLM evaluation framework that decomposes complex evaluation criteria into atomic binary questions. By aggregating independent yes/no verdicts, the framework matches or outperforms established baselines like G-Eval while providing interpretable diagnostic feedback for prompt optimization.

OPEN SOURCE1h ago

MediaCrawler automates Chinese social media scraping

MediaCrawler is an open-source Python framework that uses Playwright-based browser automation to scrape content and comments from major Chinese social media platforms. It simulates authentic user interactions to bypass complex security and platform signing mechanisms natively.