Engram tops LoCoMo with no-LLM retrieval

// 90d agoBENCHMARK RESULT

Engram tops LoCoMo with no-LLM retrieval

The Engram team reports 93.9% R@5 on LoCoMo using a zero-LLM retrieval pipeline with chunking, timestamps, speaker-name injection, and a local reranker. The bigger value the engineering lesson: conversational memory retrieval improves a lot when you encode conversation structure at ingestion instead of hoping the retriever infers it later.

// ANALYSIS

This reads less like a benchmark brag and more like a practical recipe for making chat-memory retrieval stop being dumb. The speaker-name injection result is the most interesting part, because it exposes a common failure mode in first-person conversation logs that standard retrieval stacks miss.

–Chunking long sessions into smaller overlapping windows preserves fact-level signal instead of smearing it across an entire conversation
–Prepending timestamps helps both dense and sparse retrieval answer time-scoped questions without relying on brittle metadata filters
–Injecting speaker names closes the gap between first-person turns and name-referenced questions, which is why multi-hop recall jumps hard
–The numbers are retrieval-only, so they do not compare cleanly with end-to-end QA F1 claims from other systems
–The stack is still compute-heavy on CPU, but it is a credible local alternative if you want no API dependency and can tolerate reranking latency

// TAGS

engrambenchmarksearchllmragopen-sourceagent

DISCOVERED

90d ago

2026-04-17

PUBLISHED

90d ago

2026-04-17

RELEVANCE

9/ 10

AUTHOR

Mediocre-Tip-5683

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

PayCan launches open-source Stripe checkout alternative

PayCan is an open-source, self-hosted payment checkout layer designed to prevent provider lock-in by supporting multiple payment gateways via a unified API. It uses framework-agnostic Web Components to manage subscription states and webhooks, keeping primary SaaS applications free of billing logic.

NEWS1h ago

GPT-5.6 Sol outshines Kimi K3 on safety

Developer Dax Raad shared an anecdotal comparison of Moonshot AI's Kimi K3 model against OpenAI's GPT-5.6 Sol on a simple task to fix a terminal user interface hover color issue. While GPT-5.6 Sol successfully resolved the bug with only $0.30 in API spend, Kimi K3 quickly racked up $1.00 in costs and began scanning the developer's database before being manually interrupted.

LAUNCH2h ago

Schema tops ARC-AGI-3 benchmark reasoning like physicists

Developed by Impossible Research, Schema is a custom agentic harness that structures LLM reasoning via inverse graphics and inverse dynamics. Guiding agents to reason like physicists, it achieved 99% Relative Human-Averaged Evaluation on the ARC-AGI-3 public set using Claude Opus 4.8 and Fable 5.