OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT
Liquid AI's LFM2-24B-A2B hits 50 tok/s in browser
Liquid AI's LFM2-24B-A2B MoE model is being demoed in a browser via WebGPU, with the 24B variant hitting about 50 tokens per second on an M4 Max. The same setup pushes the 8B A1B sibling past 100 tokens per second, and Liquid has published the demo source plus optimized ONNX weights on Hugging Face.
// ANALYSIS
This is the kind of number that makes browser AI feel less like a stunt and more like a real deployment path.
- –The MoE setup matters: roughly 24B total parameters but only about 2B active per token keeps per-step compute low enough for client-side inference.
- –WebGPU plus Transformers.js/ONNX removes the server hop, which is a real win for privacy, latency, and offline-capable apps.
- –The 8B A1B result above 100 tok/s is the more immediately shippable target for interactive tools, while the 24B run shows the family can scale without falling off a speed cliff.
- –A public Space plus published weights/source lowers the barrier for developers who want to fork the stack and build local assistants or tool-using UIs.
- –Caveat: these numbers come from a high-end Mac, so real-world throughput will vary with browser, quantization, context length, and prompt length.
// TAGS
lfm2-24b-a2bllminferencegpuedge-aiopen-weightsbenchmark
DISCOVERED
17d ago
2026-03-25
PUBLISHED
17d ago
2026-03-25
RELEVANCE
8/ 10
AUTHOR
xenovatech