OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoBENCHMARK RESULT
Lemonade NPU Crushes TTFT, Vulkan Wins Decode
A Reddit benchmark on a Ryzen AI 9 HX370 compares Lemonade's NPU and hybrid backends against llama.cpp Vulkan at a 24.6k context window. The results suggest Lemonade is materially better for long-context prefill, especially time-to-first-token, while llama.cpp on the iGPU retains the edge for raw generation throughput once decoding starts.
// ANALYSIS
Hot take: if your workload is dominated by giant prompt ingestion, the NPU is the right place to spend silicon; if you live in steady-state chat generation, Vulkan on the iGPU still looks stronger.
- –Lemonade NPU/hybrid wins hard on TTFT, with the Qwen3 4B test showing a dramatic first-token advantage over llama.cpp Vulkan.
- –llama.cpp still leads on TPS, especially on the smaller lfm 1.2B run, where Vulkan nearly doubles NPU throughput.
- –The comparison is useful but not perfectly apples-to-apples because the quantization formats and backends differ.
- –The data supports a practical split: NPU/hybrid for RAG-style long-context prefill, iGPU Vulkan for faster decode-heavy interactions.
- –The takeaway is about workload shape, not absolute winner status; long-context latency and decode speed are optimizing for different bottlenecks.
// TAGS
lemonadellamacppnpuigpuvulkanamdryzen-ailocal-llmlong-contextbenchmark
DISCOVERED
11d ago
2026-03-31
PUBLISHED
11d ago
2026-03-31
RELEVANCE
8/ 10
AUTHOR
Final-Frosting7742