Roampal benchmark shrugs off poisoned memories

// 51d agoBENCHMARK RESULT

Roampal benchmark shrugs off poisoned memories

Roampal was benchmarked locally on LoCoMo with a 20B model across roughly 2,000 questions, including adversarial cases. It held about 85% on non-adversarial questions, about 76% overall, and lost only about 4 points after roughly 1,100 poisoned memories were injected.

// ANALYSIS

Hot take: this looks less like a model-size story and more like a memory-policy story. The big signal is that tiering, promotion, decay, and outcome scoring seem to matter more than the raw 20B backbone. The adversarial LoCoMo questions did not have ground-truth answers, so the author labeled them before running all five categories, which makes the result more bespoke than a clean leaderboard score. Poisoning barely moved the needle, which suggests the retrieval stack is resilient when it learns from outcomes instead of trusting every stored fact equally. The architecture alone added 22 points, which is the real takeaway: memory management is doing most of the work, not just the model. Pulling the core reliability mechanism after it hurt every test is a useful reminder that extra trust logic can backfire if it hardens the wrong memories. This is a strong hint for RAG systems: outcome-weighted memory and tiered decay may beat naive semantic retrieval, especially when the corpus gets noisy over time.

// TAGS

roampalllmragbenchmarkresearchself-hosted

DISCOVERED

51d ago

2026-04-30

PUBLISHED

51d ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

Roampal

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS48m ago

Google, Meta models land on Huawei Ascend

The Chinese AI ecosystem is focusing on porting Western open-source models, such as Google's T5-Efficient-Tiny and Meta's V-JEPA 2, to Huawei's Ascend NPU. This trend highlights a shift toward building out software support and compatibility for domestic silicon during a quiet cycle for novel local releases.

NEWS2h ago

OpenAI Codex teases major front-end updates

An upcoming update for OpenAI Codex is being teased on social media as a potentially game-changing solution for front-end development. The teaser hints that the new release will address long-standing challenges in automating front-end coding, generating excitement within the developer community about the next generation of AI-assisted software engineering tools.

NEWS3h ago

Codex App built with okayish frontend models

In a social media post, Thomas Sottiaux, head of the Codex team at OpenAI, revealed that the Codex desktop application was developed using models with only 'okayish' frontend capabilities. He teased the massive potential of what the team will be able to build once OpenAI's models receive significant upgrades to their frontend development skills.