BACK_TO_FEEDAICRIER_2
Distil Labs beats GLM-5 with synthetic traces
OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT

Distil Labs beats GLM-5 with synthetic traces

Distil Labs shows that noisy production traces are better used as context for synthetic data generation than as direct training labels. A Qwen3-1.7B student fine-tuned this way beats GLM-5 744B on multi-turn tool-calling, while direct training on the same traces falls sharply.

// ANALYSIS

The hot take is simple: for agent fine-tuning, trace quality matters less than trace interpretation. Distil Labs is making a strong case that the right pipeline can turn messy production logs into cleaner supervision than humans can realistically curate at scale.

  • Synthetic generation stays near the curated-data ceiling across corruption modes, while direct training collapses on noisy labels, schema drift, and domain mixing.
  • The schema-first setup matters for tool-calling because correct function names and parameter shapes are part of the task, not just the data.
  • The result is strongest as a methodology signal: small models can compete with huge teachers when the training signal is cleaned up before SFT.
  • The evaluation is still limited to one restaurant-booking domain and an LLM-as-a-judge setup, so the generalization claim is promising but not proven.
// TAGS
distil-labsfine-tuningbenchmarkagentllmopen-source

DISCOVERED

4h ago

2026-04-16

PUBLISHED

23h ago

2026-04-15

RELEVANCE

9/ 10

AUTHOR

party-horse