MRCR v2 chart challenges Gemini long-context crown
A Reddit post in r/singularity highlights an MRCR v2 screenshot claiming Gemini 3.1 Pro drops from 71.9% at 128K context to 25.9% at 1M tokens, while Claude Opus is shown at 78.3%. The thread’s core takeaway is that advertised context window size does not guarantee strong retrieval quality at extreme lengths.
This is the kind of benchmark narrative that can quickly reshape developer model choices, even before broader independent replication lands.
- –The discussion separates “can accept 1M+ tokens” from “can reliably retrieve across 1M+ tokens,” which matters for production RAG and document QA.
- –Claude Opus’s reported score advantage in the post reinforces a growing market focus on long-context quality, not just window marketing.
- –If these gaps hold across independent evals, teams may prefer smaller effective windows with higher retrieval consistency over larger but less reliable contexts.
DISCOVERED
74d ago
2026-03-14
PUBLISHED
75d ago
2026-03-13
RELEVANCE
AUTHOR
Additional-Alps-8209