OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
Distil Labs beats GLM-5 with synthetic traces
Distil Labs shows that noisy production traces are better used as context for synthetic data generation than as direct training labels. A Qwen3-1.7B student fine-tuned this way beats GLM-5 744B on multi-turn tool-calling, while direct training on the same traces falls sharply.
// ANALYSIS
The hot take is simple: for agent fine-tuning, trace quality matters less than trace interpretation. Distil Labs is making a strong case that the right pipeline can turn messy production logs into cleaner supervision than humans can realistically curate at scale.
- –Synthetic generation stays near the curated-data ceiling across corruption modes, while direct training collapses on noisy labels, schema drift, and domain mixing.
- –The schema-first setup matters for tool-calling because correct function names and parameter shapes are part of the task, not just the data.
- –The result is strongest as a methodology signal: small models can compete with huge teachers when the training signal is cleaned up before SFT.
- –The evaluation is still limited to one restaurant-booking domain and an LLM-as-a-judge setup, so the generalization claim is promising but not proven.
// TAGS
distil-labsfine-tuningbenchmarkagentllmopen-source
DISCOVERED
4h ago
2026-04-16
PUBLISHED
23h ago
2026-04-15
RELEVANCE
9/ 10
AUTHOR
party-horse