FastFlowLM runs Qwen3.5-4B on AMD NPU

// 65d agoBENCHMARK RESULT

FastFlowLM runs Qwen3.5-4B on AMD NPU

On a Ryzen AI 7 350 XDNA2 NPU, FastFlowLM v0.9.36 and Lemonade v10.0.1 push Qwen3.5-4B to about 15 tok/s decode, stay well below 50°C, and report an 85.6% VLMEvalKit score. The stack also supports tool calling and up to 256k tokens, though that full context target is more realistic on larger-memory systems than this 32GB laptop.

// ANALYSIS

FastFlowLM is making AMD NPU laptops feel like a real local-AI tier, not a novelty. The interesting part is the mix of low thermals, tool-calling, and multimodal support, which is what turns a demo into something developers can actually build on.

–15 tok/s decode at 1k context is respectable for a laptop NPU, but the 9.6 tok/s figure at 32k shows long-context overhead still matters.
–Prefill lands at 378-493 tok/s, and image TTFT is 3.7s at 720p versus 7.5s at 1080p, so lightweight vision workflows are viable.
–Tool-calling support makes this relevant for agents and local copilots, not just benchmark bragging rights.
–Support across all XDNA 2 NPUs makes this a platform story, and the 256k-token ceiling matters more on bigger-memory systems than this 32GB laptop.

// TAGS

llmmultimodalinferencebenchmarkedge-aifastflowlm

DISCOVERED

65d ago

2026-03-25

PUBLISHED

65d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

BandEnvironmental834

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1d ago

Anthropic drops Opus 4.8, teases upcoming Mythos model

Anthropic launched Claude Opus 4.8 with adjustable effort controls, dynamic workflows for Claude Code, and a cheaper fast mode. The release serves as a precursor to their highly anticipated Claude Mythos model, which is slated to roll out in the coming weeks.

VIDEO1d ago

Viral video teases Claude Opus 4.8

A viral video directed by Miguel07Code showcases impressive "hyperframes" camera movements, allegedly generated by Claude Opus 4.8. The post has sparked speculation about Claude's video generation capabilities.

LAUNCH1d ago

Browser Use Terminal launches Rust web-agent TUI

Browser Use Terminal is a new Rust-based TUI that lets developers automate and steer browser tasks directly from the command line. It combines a lightweight LLM harness with direct CDP control over Chrome for highly observable, interactive automation.