OPEN_SOURCE ↗
REDDIT · REDDIT// 21d agoBENCHMARK RESULT
FastFlowLM Linux support exposes benchmark spread
On an HP ZBook Ultra G1a with Ryzen AI Max+ 395, FastFlowLM was benchmarked on Linux across a broad mix of supported models at 0, 10k, 20k, 40k, and 70k context depths. The results show a clear split: small LFM2.5 and Gemma-family models stay fast, while larger and longer-context workloads lose speed quickly.
// ANALYSIS
FastFlowLM’s Linux support looks legit enough to run real-world local AI tests, but the numbers make the usual tradeoff obvious: model choice matters more than raw runtime hype.
- –`lfm2.5-tk:1.2b` and `lfm2.5-it:1.2b` are the short-context speed leaders, landing around 64 tok/s generation.
- –Long-context usage is the stress test, and bigger models pay for it hard; `qwen3:8b` falls from 10.3 tok/s at 0 context to 3.6 tok/s at 70k.
- –`gpt-oss:20b` is the most interesting middle ground, with solid prefill at moderate context but a steady slide as the window grows.
- –`gemma3` and `medgemma` stay comparatively stable across deeper contexts, which suggests those families are better tuned for this stack.
- –`deepseek-r1:8b` is the oddball: generation stays flat while prefill scales up sharply, suggesting a different runtime profile than the rest.
// TAGS
fastflowlmbenchmarkinferencellmedge-aiself-hosted
DISCOVERED
21d ago
2026-03-21
PUBLISHED
21d ago
2026-03-21
RELEVANCE
8/ 10
AUTHOR
spaceman_