REDDIT · REDDIT// 4h agoBENCHMARK RESULT

LFM2.5-350M wins Reddit summarization evals

On a 3x Mac mini cluster, the author used GRPO to fine-tune LFM2.5-350M for 64-token Reddit post summarization and found that quality-aware rewards beat length-only training. The best setup, ROUGE-L + METEOR plus a length penalty, hit a 2.769 composite score and a 44.3% pass rate on a 200-sample SmolTLDR test set.

// ANALYSIS

This is a solid small-model RL result: the win is not just that the model got better, but that reward shaping clearly mattered more than brute length pressure. It’s still a narrow benchmark, but the infra story is the more interesting part because it shows real fine-tuning experiments running on commodity Apple silicon.

–The best configuration outperformed length-only training on composite score, faithfulness, coverage, conciseness, and pass rate.
–METEOR + ROUGE-L was the strongest reward mix, which fits because it captures paraphrase and overlap better than BLEU alone.
–The 3x Mac mini, MLX, vLLM-metal, SyncPS setup shows heterogeneous local clusters can support meaningful RL fine-tuning work.
–The evaluation is still limited by a 200-example test set and judge-based scoring, so this reads as promising evidence rather than a definitive benchmark victory.

// TAGS

lfm2-5-350msmolclusterllmfine-tuningbenchmarkmlops

DISCOVERED

4h ago

2026-04-26

PUBLISHED

6h ago

2026-04-26

RELEVANCE

8/ 10

AUTHOR

East-Muffin-6472