OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT
FastFlowLM runs Qwen3.5-4B on AMD NPU
On a Ryzen AI 7 350 XDNA2 NPU, FastFlowLM v0.9.36 and Lemonade v10.0.1 push Qwen3.5-4B to about 15 tok/s decode, stay well below 50°C, and report an 85.6% VLMEvalKit score. The stack also supports tool calling and up to 256k tokens, though that full context target is more realistic on larger-memory systems than this 32GB laptop.
// ANALYSIS
FastFlowLM is making AMD NPU laptops feel like a real local-AI tier, not a novelty. The interesting part is the mix of low thermals, tool-calling, and multimodal support, which is what turns a demo into something developers can actually build on.
- –15 tok/s decode at 1k context is respectable for a laptop NPU, but the 9.6 tok/s figure at 32k shows long-context overhead still matters.
- –Prefill lands at 378-493 tok/s, and image TTFT is 3.7s at 720p versus 7.5s at 1080p, so lightweight vision workflows are viable.
- –Tool-calling support makes this relevant for agents and local copilots, not just benchmark bragging rights.
- –Support across all XDNA 2 NPUs makes this a platform story, and the 256k-token ceiling matters more on bigger-memory systems than this 32GB laptop.
// TAGS
llmmultimodalinferencebenchmarkedge-aifastflowlm
DISCOVERED
17d ago
2026-03-25
PUBLISHED
17d ago
2026-03-25
RELEVANCE
8/ 10
AUTHOR
BandEnvironmental834