YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

llama.cpp adds DeepSeek v4 Flash support

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

llama.cpp adds DeepSeek v4 Flash support
OPEN LINK ↗
// 45d agoOPENSOURCE RELEASE

llama.cpp adds DeepSeek v4 Flash support

This experimental fork of llama.cpp adds DeepSeek-V4-Flash support with a GGUF quantization strategy aimed at fitting the 284B MoE model on Macs with 128GB of RAM. The author says selective 2-bit quantization for routed experts, plus Q8 for shared weights, is already producing usable chat quality and around 21 tok/s on an M3 Max after Metal tuning.

// ANALYSIS

This is a strong proof-of-concept for local MoE inference, but it is still very much an experiment rather than a broadly validated release. The interesting part is not just that DeepSeek V4 Flash runs locally, but that the quantization scheme tries to preserve quality where it matters and squeeze size where it hurts least.

  • The repo targets a very specific tradeoff: 2-bit routed experts, Q8 shared experts, and a GGUF build for Apple Silicon memory constraints.
  • The reported 21 tok/s on an M3 Max is the real signal here: not frontier speed, but fast enough to make large local models feel practical.
  • At 284B parameters, this still sits well above the normal local-LLM comfort zone, so 128GB unified memory remains a hard requirement for most users.
  • The author’s quality comparison against Qwen 3.6 27B is promising, but it is anecdotal until proper benchmarks land.
  • For the llama.cpp ecosystem, this is the kind of release that expands the boundary of what “runs locally” can mean, especially for MoE models.
// TAGS
llama.cppdeepseek-v4-flashllminferenceopen-sourceself-hosted

DISCOVERED

45d ago

2026-04-26

PUBLISHED

45d ago

2026-04-26

RELEVANCE

9/ 10

AUTHOR

antirez