BACK_TO_FEEDAICRIER_2
FastFlowLM runs Qwen3.5-4B on AMD NPU
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoBENCHMARK RESULT

FastFlowLM runs Qwen3.5-4B on AMD NPU

On a Ryzen AI 7 350 XDNA2 NPU, FastFlowLM v0.9.36 and Lemonade v10.0.1 push Qwen3.5-4B to about 15 tok/s decode, stay well below 50°C, and report an 85.6% VLMEvalKit score. The stack also supports tool calling and up to 256k tokens, though that full context target is more realistic on larger-memory systems than this 32GB laptop.

// ANALYSIS

FastFlowLM is making AMD NPU laptops feel like a real local-AI tier, not a novelty. The interesting part is the mix of low thermals, tool-calling, and multimodal support, which is what turns a demo into something developers can actually build on.

  • 15 tok/s decode at 1k context is respectable for a laptop NPU, but the 9.6 tok/s figure at 32k shows long-context overhead still matters.
  • Prefill lands at 378-493 tok/s, and image TTFT is 3.7s at 720p versus 7.5s at 1080p, so lightweight vision workflows are viable.
  • Tool-calling support makes this relevant for agents and local copilots, not just benchmark bragging rights.
  • Support across all XDNA 2 NPUs makes this a platform story, and the 256k-token ceiling matters more on bigger-memory systems than this 32GB laptop.
// TAGS
llmmultimodalinferencebenchmarkedge-aifastflowlm

DISCOVERED

17d ago

2026-03-25

PUBLISHED

17d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

BandEnvironmental834