BACK_TO_FEEDAICRIER_2
minRLM beats GPT-5.4-mini prompting slump
OPEN_SOURCE ↗
REDDIT · REDDIT// 13d agoBENCHMARK RESULT

minRLM beats GPT-5.4-mini prompting slump

minRLM is an open-source Recursive Language Model wrapper that keeps large inputs out of the prompt and lets the model write Python to inspect them. On a 12-task, 1,800-eval suite, it held near GPT-5-mini performance while GPT-5.4-mini vanilla prompting fell 22.3 points and AIME 2025 jumped from 0% vanilla to 80% with the REPL.

// ANALYSIS

This is a scaffold win more than a model win: once the default prompt got terser, the REPL put the missing reasoning budget back into the loop.

  • GPT-5.4-mini's raw-prompt drop, plus the official RLM's slide, points to prompt-style brittleness that vanilla benchmarks won't catch.
  • minRLM's best gains show up on structured or compute-heavy tasks like AIME, OOLONG, BrowseComp, CodeQA, and LongBench V2.
  • The downside still matters: code retrieval and some short-context tasks favor vanilla, so this is a tradeoff architecture, not a universal replacement.
  • On GPT-5.4-mini, minRLM uses about 5.1x fewer tokens and about 3.2x less cost than the official RLM, which makes the approach much more deployable.
// TAGS
minrlmllmreasoningbenchmarkopen-sourceresearch

DISCOVERED

13d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

9/ 10

AUTHOR

cov_id19