BACK_TO_FEEDAICRIER_2
Roampal benchmark shrugs off poisoned memories
OPEN_SOURCE ↗
REDDIT · REDDIT// 6h agoBENCHMARK RESULT

Roampal benchmark shrugs off poisoned memories

Roampal was benchmarked locally on LoCoMo with a 20B model across roughly 2,000 questions, including adversarial cases. It held about 85% on non-adversarial questions, about 76% overall, and lost only about 4 points after roughly 1,100 poisoned memories were injected.

// ANALYSIS

Hot take: this looks less like a model-size story and more like a memory-policy story. The big signal is that tiering, promotion, decay, and outcome scoring seem to matter more than the raw 20B backbone. The adversarial LoCoMo questions did not have ground-truth answers, so the author labeled them before running all five categories, which makes the result more bespoke than a clean leaderboard score. Poisoning barely moved the needle, which suggests the retrieval stack is resilient when it learns from outcomes instead of trusting every stored fact equally. The architecture alone added 22 points, which is the real takeaway: memory management is doing most of the work, not just the model. Pulling the core reliability mechanism after it hurt every test is a useful reminder that extra trust logic can backfire if it hardens the wrong memories. This is a strong hint for RAG systems: outcome-weighted memory and tiered decay may beat naive semantic retrieval, especially when the corpus gets noisy over time.

// TAGS
roampalllmragbenchmarkresearchself-hosted

DISCOVERED

6h ago

2026-04-30

PUBLISHED

6h ago

2026-04-30

RELEVANCE

8/ 10

AUTHOR

Roampal