BACK_TO_FEEDAICRIER_2
FastFlowLM Linux support exposes benchmark spread
OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoBENCHMARK RESULT

FastFlowLM Linux support exposes benchmark spread

On an HP ZBook Ultra G1a with Ryzen AI Max+ 395, FastFlowLM was benchmarked on Linux across a broad mix of supported models at 0, 10k, 20k, 40k, and 70k context depths. The results show a clear split: small LFM2.5 and Gemma-family models stay fast, while larger and longer-context workloads lose speed quickly.

// ANALYSIS

FastFlowLM’s Linux support looks legit enough to run real-world local AI tests, but the numbers make the usual tradeoff obvious: model choice matters more than raw runtime hype.

  • `lfm2.5-tk:1.2b` and `lfm2.5-it:1.2b` are the short-context speed leaders, landing around 64 tok/s generation.
  • Long-context usage is the stress test, and bigger models pay for it hard; `qwen3:8b` falls from 10.3 tok/s at 0 context to 3.6 tok/s at 70k.
  • `gpt-oss:20b` is the most interesting middle ground, with solid prefill at moderate context but a steady slide as the window grows.
  • `gemma3` and `medgemma` stay comparatively stable across deeper contexts, which suggests those families are better tuned for this stack.
  • `deepseek-r1:8b` is the oddball: generation stays flat while prefill scales up sharply, suggesting a different runtime profile than the rest.
// TAGS
fastflowlmbenchmarkinferencellmedge-aiself-hosted

DISCOVERED

21d ago

2026-03-21

PUBLISHED

21d ago

2026-03-21

RELEVANCE

8/ 10

AUTHOR

spaceman_