EverMemOS unveils 4B model, 100M-token memory

// 106d agoMODEL RELEASE

EverMemOS unveils 4B model, 100M-token memory

EverMind's open-source EverMemOS pairs a 4B backbone with Memory Sparse Attention, claiming 100M-token-scale inference for AI agents. The project ships an API, docs, demos, and evaluation reports, and frames itself as a long-term memory layer rather than a bolt-on RAG add-on.

// ANALYSIS

This is the right direction for agent memory: not bigger prompts, but a memory layer that behaves like infrastructure.

–MSA internalizes retrieval into the model, which could reduce the usual RAG mismatch between search and generation.
–The 100M-token claim depends on KV-cache compression plus CPU/GPU offloading, so latency and hardware cost will matter as much as benchmark scores.
–EverMind is productizing the research with a self-hosted repo, API docs, demos, evaluation scripts, and cloud positioning, which makes it easier to test than most long-context papers.
–If the benchmark gains survive messy, changing data, EverMemOS could become a serious alternative to Mem0, Zep, and other agent-memory stacks.

// TAGS

evermemosllmagentopen-sourceself-hostedinferencebenchmark

DISCOVERED

106d ago

2026-03-28

PUBLISHED

106d ago

2026-03-28

RELEVANCE

9/ 10

AUTHOR

Photochromism

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

VIDEO1h ago

Higgsfield drops developer CLI and MCP server

Higgsfield has launched a developer CLI and MCP server, allowing programmers and autonomous agents to programmatically trigger, customize, and edit marketing ads and cinematic videos directly through terminal commands. Demonstrated by developer Cole Medin using Anthropic's Claude Code and the Archon workflow engine, the toolkit enables fully automated video production pipelines.

OPEN SOURCE1h ago

AI Content Factory automates video ads

AI Content Factory is an open-source workflow that automates bulk marketing video generation from a product catalog. Built on the Archon agentic engine and Higgsfield CLI, it reduces costs by gating expensive video rendering behind cheap image exploration and human approval.

NEWS3h ago

George Hotz shares his enthusiasm for LLMs and open-source coding agents while criticizing doom-mongering and the overinflated valuations of frontier AI labs.

George Hotz (geohot) details his excitement for the practical applications of AI—such as LLMs, self-driving cars, video generation models, and AI coding agents—highlighting his successful setup of the open-source agent OpenCode on a local GLM-5.2 model. However, he strongly criticizes the prevailing industry hype, safety-related doom-mongering, and the multibillion-dollar valuations of frontier AI labs. Hotz argues that frontier labs will fail to capture most of the AI value because AI is a commodity driven by Moore's law and general computing progress. He also frames coding models not as autonomous creators, but as valuable productivity tools analogous to compilers, find-and-replace, or Stack Overflow that are changing the nature of programming.