OPEN_SOURCE ↗
REDDIT · REDDIT// 5h agoPRODUCT UPDATE
Shadows proves memory isn't enough
Omar Megawer’s Shadows is a multi-agent system with shared project memory and per-agent memory, but the update argues that better retrieval alone still leaves agents making the wrong call. The real gap is aggregation across sessions, abstention, and understanding which preference dimension matters now.
// ANALYSIS
This is a sharp reminder that “just add memory” is not a full agent strategy. Retrieval can be near-perfect and the system can still fail at judgment, which is where most production pain actually lives.
- –LongMemEval numbers are strong on recall_all@5, but the overall score stays meaningfully lower because multi-session aggregation and preference handling are still weak
- –The post’s better idea is classifying the query before retrieval, so “latest,” “earliest,” and “all history” use different memory behavior
- –The shared-memory plus per-agent-memory setup pushes the product toward portable identity, not just sticky context
- –The orchestration layer matters as much as the memory layer: agents need to route work, reject bad fits, and ask peers directly instead of bouncing everything through a manager
- –The benchmark framing is useful, but the bigger takeaway is that agent evals need to measure decision quality, not just whether the right fact was retrieved
// TAGS
shadowsagentragbenchmarkllmreasoning
DISCOVERED
5h ago
2026-04-19
PUBLISHED
5h ago
2026-04-19
RELEVANCE
8/ 10
AUTHOR
MegaWa7edBas