BACK_TO_FEEDAICRIER_2
Lemonade NPU Crushes TTFT, Vulkan Wins Decode
OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoBENCHMARK RESULT

Lemonade NPU Crushes TTFT, Vulkan Wins Decode

A Reddit benchmark on a Ryzen AI 9 HX370 compares Lemonade's NPU and hybrid backends against llama.cpp Vulkan at a 24.6k context window. The results suggest Lemonade is materially better for long-context prefill, especially time-to-first-token, while llama.cpp on the iGPU retains the edge for raw generation throughput once decoding starts.

// ANALYSIS

Hot take: if your workload is dominated by giant prompt ingestion, the NPU is the right place to spend silicon; if you live in steady-state chat generation, Vulkan on the iGPU still looks stronger.

  • Lemonade NPU/hybrid wins hard on TTFT, with the Qwen3 4B test showing a dramatic first-token advantage over llama.cpp Vulkan.
  • llama.cpp still leads on TPS, especially on the smaller lfm 1.2B run, where Vulkan nearly doubles NPU throughput.
  • The comparison is useful but not perfectly apples-to-apples because the quantization formats and backends differ.
  • The data supports a practical split: NPU/hybrid for RAG-style long-context prefill, iGPU Vulkan for faster decode-heavy interactions.
  • The takeaway is about workload shape, not absolute winner status; long-context latency and decode speed are optimizing for different bottlenecks.
// TAGS
lemonadellamacppnpuigpuvulkanamdryzen-ailocal-llmlong-contextbenchmark

DISCOVERED

11d ago

2026-03-31

PUBLISHED

11d ago

2026-03-31

RELEVANCE

8/ 10

AUTHOR

Final-Frosting7742