mnemos ships persistent memory, adds verifier

// 1h agoBENCHMARK RESULT

mnemos ships persistent memory, adds verifier

mnemos is a Go-based, MCP-native persistent memory layer for AI coding agents that keeps scoped project, workspace, and global memory in local SQLite and plugs into Claude Code, Cursor, Windsurf, and Codex CLI. The post also includes a paired-run verifier showing lift where memory matters most, especially when the model's priors are weak or wrong.

// ANALYSIS

The strongest part here is the eval shape: same prompt, same model, memory toggled on and off, plus a separate capture test for whether agents actually retain corrections. That makes this feel more credible than a typical memory-layer launch post, even if the sample size is still small. The 5/5 vs 0/5 results on session-start and protocol-convention cases suggest mnemos mainly helps where the model lacks project-specific priors, while the 5/5 vs 5/5 results on common best-practice cases are a good sign it is filling gaps without disturbing what the model already knows. The jump from 7% to 53% on correction capture is the more important result, because write-side memory is usually where these systems fail. The UserPromptSubmit hook is the key architectural move: it turns correction detection into a non-skippable boundary event instead of a voluntary follow-through. The ceiling still matters: 53% means prompt framing alone is not enough, and the remaining misses look like task prompts that bury the correction too deeply. This is useful benchmark work, but n=5 per scenario is not enough to claim broad generalization yet.

// TAGS

mnemosmcpai-codingcoding-agentagent-memoryevaluationtestinglocal-first

DISCOVERED

1h ago

2026-05-08

PUBLISHED

2h ago

2026-05-07

RELEVANCE

9/ 10

AUTHOR

snozberryface

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

GitHub spotlights Spec Kit today

GitHub’s Open Source Friday stream will break down Spec Kit, an open-source toolkit for spec-driven development. The pitch is that clearer specs reduce ambiguity and make collaboration with AI coding agents more predictable.

MODEL2h ago

OpenAI GPT-Realtime-2 Powers Live Translator Demo

This retweet points to a live demo showing GPT-Realtime-2 as the core model for a real-time translator workflow. The timing lines up with OpenAI’s new voice-model release that also introduces GPT-Realtime-Translate and GPT-Realtime-Whisper, so the post reads more like an early builder showcase than a standalone product announcement.

UPDATE2h ago

Genspark Realtime Voice gains GPT-Realtime-2

Genspark says its Call for Me Agent now runs on OpenAI's GPT-Realtime-2, upgrading the voice stack behind its hands-free task runner. The move should improve longer, tool-heavy conversations like calls, scheduling, messaging, and follow-through.