Liquid AI's LFM2-24B-A2B hits 50 tok/s in browser

// 109d agoBENCHMARK RESULT

Liquid AI's LFM2-24B-A2B hits 50 tok/s in browser

Liquid AI's LFM2-24B-A2B MoE model is being demoed in a browser via WebGPU, with the 24B variant hitting about 50 tokens per second on an M4 Max. The same setup pushes the 8B A1B sibling past 100 tokens per second, and Liquid has published the demo source plus optimized ONNX weights on Hugging Face.

// ANALYSIS

This is the kind of number that makes browser AI feel less like a stunt and more like a real deployment path.

–The MoE setup matters: roughly 24B total parameters but only about 2B active per token keeps per-step compute low enough for client-side inference.
–WebGPU plus Transformers.js/ONNX removes the server hop, which is a real win for privacy, latency, and offline-capable apps.
–The 8B A1B result above 100 tok/s is the more immediately shippable target for interactive tools, while the 24B run shows the family can scale without falling off a speed cliff.
–A public Space plus published weights/source lowers the barrier for developers who want to fork the stack and build local assistants or tool-using UIs.
–Caveat: these numbers come from a high-end Mac, so real-world throughput will vary with browser, quantization, context length, and prompt length.

// TAGS

lfm2-24b-a2bllminferencegpuedge-aiopen-weightsbenchmark

DISCOVERED

109d ago

2026-03-25

PUBLISHED

109d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

xenovatech

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

BENCHMARK16m ago

Gemini 3.5 Pro Tops Rivals in Leak

A leaked benchmark report claims that Google's rumored Gemini 3.5 Pro model achieves superior performance compared to rival models Claude Fable 5 and GPT-5.6 in internal evaluations. The leak suggests significant advancements in Google's next-generation frontier AI model, though official validation is still pending.

NEWS1h ago

Ivan Raskovsky, CTO and Co-founder of GenLayer Foundation, joins RallyOnChain to discuss the protocol's Internet Court initiative and the upcoming Clark Testnet roadmap.

GenLayer Foundation's CTO and Co-founder, Ivan Raskovsky, was featured on the RallyOnChain Community Space (Episode 27) hosted by stargirl_hills and 0X_CUPZ. The discussion centered on GenLayer's vision for an "Internet Court"—a decentralized system enabling AI agents to resolve subjective disputes using natural language processing and consensus. Raskovsky highlighted their progress, including an internal Epoch Zero test run and the roadmap for the upcoming Clark Testnet, which is targeted at autonomous network operations following their initial Asimov and Bradbury testnets.

UPDATE2h ago

Native SDK v0.5 compiles TypeScript to native

Vercel Labs has released Native SDK v0.5, introducing TypeScript support to compile applications directly to native machine code without a JavaScript engine or garbage collector. Designed with AI agents in mind, the update features 83ns update dispatch latency, supports robust TypeScript features, and allows developers to eject to Zig at any point.