REDDIT · REDDIT// 4h agoBENCHMARK RESULT

FINAL-Bench’s Darwin-36B-Opus hits 88.4% on GPQA Diamond

Darwin-36B-Opus is a 36-billion-parameter mixture-of-experts language model released on Hugging Face by FINAL-Bench and built with the Darwin V7 evolutionary breeding engine. It is derived from two public parents: Qwen/Qwen3.6-35B-A3B as the father and a Claude Opus 4.6 reasoning-distilled variant of that base as the mother. The release claims the model preserves the distilled reasoning behavior of the mother while keeping the father’s expert topology, and says the automated breeding process can produce a bfloat16 checkpoint in under an hour on a single GPU. Its headline result is 88.4% on GPQA Diamond, which the post presents as a new high point for the Darwin family.

// ANALYSIS

Strong benchmark-driven release, but the real story is the evolutionary training pipeline rather than a brand-new base model.

–The model is positioned as a recombination of two public Qwen-derived parents, not a conventional retrain from scratch.
–The claimed 88.4% GPQA Diamond score is the main proof point and the reason this post reads like a benchmark release.
–If the result holds up across independent evals, this is notable because it suggests the Darwin pipeline can reliably preserve reasoning gains through breeding.
–The practical appeal is portability: a deployable bf16 checkpoint and Hugging Face availability make it easy for the open-model crowd to test.

// TAGS

llmmoehuggingfaceqwenreasoningbenchmarkgpqaopen-source

DISCOVERED

4h ago

2026-04-25

PUBLISHED

5h ago

2026-04-25

RELEVANCE

9/ 10

AUTHOR

jacek2023