OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoBENCHMARK RESULT
minRLM beats GPT-5.4-mini prompting slump
minRLM is an open-source Recursive Language Model wrapper that keeps large inputs out of the prompt and lets the model write Python to inspect them. On a 12-task, 1,800-eval suite, it held near GPT-5-mini performance while GPT-5.4-mini vanilla prompting fell 22.3 points and AIME 2025 jumped from 0% vanilla to 80% with the REPL.
// ANALYSIS
This is a scaffold win more than a model win: once the default prompt got terser, the REPL put the missing reasoning budget back into the loop.
- –GPT-5.4-mini's raw-prompt drop, plus the official RLM's slide, points to prompt-style brittleness that vanilla benchmarks won't catch.
- –minRLM's best gains show up on structured or compute-heavy tasks like AIME, OOLONG, BrowseComp, CodeQA, and LongBench V2.
- –The downside still matters: code retrieval and some short-context tasks favor vanilla, so this is a tradeoff architecture, not a universal replacement.
- –On GPT-5.4-mini, minRLM uses about 5.1x fewer tokens and about 3.2x less cost than the official RLM, which makes the approach much more deployable.
// TAGS
minrlmllmreasoningbenchmarkopen-sourceresearch
DISCOVERED
13d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
9/ 10
AUTHOR
cov_id19