YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Mnemosyne tops LongMemEval at 87.4%

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Mnemosyne tops LongMemEval at 87.4%
OPEN LINK ↗
// 45d agoBENCHMARK RESULT

Mnemosyne tops LongMemEval at 87.4%

Mnemosyne reports 87.4% raw accuracy on LongMemEval, a 500-question benchmark, while running retrieval locally on a single laptop with 111K indexed facts and no cloud compute for retrieval. The system pairs deterministic structured indexing with semantic fallback and nightly consolidation to keep memory fast, inspectable, and local-first.

// ANALYSIS

This is a credible systems result, not just a flashy benchmark claim: the score looks driven by retrieval architecture, not brute-force model size.

  • Deterministic Spine indexing is the most interesting piece here; it should outperform embedding-only retrieval on exact facts, numbers, and preferences.
  • The 65.4% Multi-Session score is the real limitation, because cross-session accumulation and state drift are where memory systems usually break.
  • The flexible judge and cloud embeddings mean this is promising, but not a clean offline SOTA claim yet.
  • Flat RAM usage and low SSD I/O are the operational proof point: the architecture appears practical on consumer hardware.
  • The next meaningful benchmark is numeric state persistence across sessions, not more surface-level recall.
// TAGS
mnemosynebenchmarkmemoryragembeddingself-hostedagent

DISCOVERED

45d ago

2026-04-17

PUBLISHED

45d ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

YakaaAaaAa