OPEN_SOURCE ↗
REDDIT · REDDIT// 4h agoBENCHMARK RESULT
LFM2.5-350M wins Reddit summarization evals
On a 3x Mac mini cluster, the author used GRPO to fine-tune LFM2.5-350M for 64-token Reddit post summarization and found that quality-aware rewards beat length-only training. The best setup, ROUGE-L + METEOR plus a length penalty, hit a 2.769 composite score and a 44.3% pass rate on a 200-sample SmolTLDR test set.
// ANALYSIS
This is a solid small-model RL result: the win is not just that the model got better, but that reward shaping clearly mattered more than brute length pressure. It’s still a narrow benchmark, but the infra story is the more interesting part because it shows real fine-tuning experiments running on commodity Apple silicon.
- –The best configuration outperformed length-only training on composite score, faithfulness, coverage, conciseness, and pass rate.
- –METEOR + ROUGE-L was the strongest reward mix, which fits because it captures paraphrase and overlap better than BLEU alone.
- –The 3x Mac mini, MLX, vLLM-metal, SyncPS setup shows heterogeneous local clusters can support meaningful RL fine-tuning work.
- –The evaluation is still limited by a 200-example test set and judge-based scoring, so this reads as promising evidence rather than a definitive benchmark victory.
// TAGS
lfm2-5-350msmolclusterllmfine-tuningbenchmarkmlops
DISCOVERED
4h ago
2026-04-26
PUBLISHED
6h ago
2026-04-26
RELEVANCE
8/ 10
AUTHOR
East-Muffin-6472