Mnemosyne tops LongMemEval at 87.4%

// 90d agoBENCHMARK RESULT

Mnemosyne tops LongMemEval at 87.4%

Mnemosyne reports 87.4% raw accuracy on LongMemEval, a 500-question benchmark, while running retrieval locally on a single laptop with 111K indexed facts and no cloud compute for retrieval. The system pairs deterministic structured indexing with semantic fallback and nightly consolidation to keep memory fast, inspectable, and local-first.

// ANALYSIS

This is a credible systems result, not just a flashy benchmark claim: the score looks driven by retrieval architecture, not brute-force model size.

–Deterministic Spine indexing is the most interesting piece here; it should outperform embedding-only retrieval on exact facts, numbers, and preferences.
–The 65.4% Multi-Session score is the real limitation, because cross-session accumulation and state drift are where memory systems usually break.
–The flexible judge and cloud embeddings mean this is promising, but not a clean offline SOTA claim yet.
–Flat RAM usage and low SSD I/O are the operational proof point: the architecture appears practical on consumer hardware.
–The next meaningful benchmark is numeric state persistence across sessions, not more surface-level recall.

// TAGS

mnemosynebenchmarkmemoryragembeddingself-hostedagent

DISCOVERED

90d ago

2026-04-17

PUBLISHED

90d ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

YakaaAaaAa

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE43m ago

Anthropic open-sources its "Code with Claude" workshop materials on GitHub to guide developers in building, evaluating, and decomposing agentic AI workflows.

Anthropic has released the codebase and materials for its "Code with Claude" (CWC) workshops on GitHub as a self-guided learning resource. The repository contains hands-on modules covering model evaluation, multi-agent decomposition using the Model Context Protocol (MCP), and building managed agents like incident dashboards. While the repository is unmaintained and not accepting external contributions, it provides developer-centric tutorials for mastering Anthropic's agentic development tools.

LAUNCH1h ago

Kimi K3 swarm builds macOS 27 simulation

A web-based simulation of a conceptual "macOS 27" operating system, built by Moonshot AI's newly released Kimi K3 model, has gone viral after being shared by Pieter Levels and the tech community. Created by an AI agent swarm, the replica features a mock "Liquid Glass" interface, interactive Dock, and functional apps like 3D Chess, Maps, and FaceTime to highlight Kimi K3's ability to autonomously generate multi-file frontend applications.

UPDATE1h ago

AINFT Platform integrates Kimi K3 API

The AINFT AI Service Platform has integrated support for Moonshot AI’s Kimi K3 API, allowing decentralized developers to access its reasoning and native multimodal capabilities. By combining blockchain infrastructure with the new model, the platform aims to simplify the creation of autonomous, monetizable Web3 AI agents.