BACK_TO_FEEDAICRIER_2
Mnemosyne tops LongMemEval at 87.4%
OPEN_SOURCE ↗
REDDIT · REDDIT// 7h agoBENCHMARK RESULT

Mnemosyne tops LongMemEval at 87.4%

Mnemosyne reports 87.4% raw accuracy on LongMemEval, a 500-question benchmark, while running retrieval locally on a single laptop with 111K indexed facts and no cloud compute for retrieval. The system pairs deterministic structured indexing with semantic fallback and nightly consolidation to keep memory fast, inspectable, and local-first.

// ANALYSIS

This is a credible systems result, not just a flashy benchmark claim: the score looks driven by retrieval architecture, not brute-force model size.

  • Deterministic Spine indexing is the most interesting piece here; it should outperform embedding-only retrieval on exact facts, numbers, and preferences.
  • The 65.4% Multi-Session score is the real limitation, because cross-session accumulation and state drift are where memory systems usually break.
  • The flexible judge and cloud embeddings mean this is promising, but not a clean offline SOTA claim yet.
  • Flat RAM usage and low SSD I/O are the operational proof point: the architecture appears practical on consumer hardware.
  • The next meaningful benchmark is numeric state persistence across sessions, not more surface-level recall.
// TAGS
mnemosynebenchmarkmemoryragembeddingself-hostedagent

DISCOVERED

7h ago

2026-04-17

PUBLISHED

8h ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

YakaaAaaAa