FastFlowLM Linux support exposes benchmark spread

// 112d agoBENCHMARK RESULT

FastFlowLM Linux support exposes benchmark spread

On an HP ZBook Ultra G1a with Ryzen AI Max+ 395, FastFlowLM was benchmarked on Linux across a broad mix of supported models at 0, 10k, 20k, 40k, and 70k context depths. The results show a clear split: small LFM2.5 and Gemma-family models stay fast, while larger and longer-context workloads lose speed quickly.

// ANALYSIS

FastFlowLM’s Linux support looks legit enough to run real-world local AI tests, but the numbers make the usual tradeoff obvious: model choice matters more than raw runtime hype.

–`lfm2.5-tk:1.2b` and `lfm2.5-it:1.2b` are the short-context speed leaders, landing around 64 tok/s generation.
–Long-context usage is the stress test, and bigger models pay for it hard; `qwen3:8b` falls from 10.3 tok/s at 0 context to 3.6 tok/s at 70k.
–`gpt-oss:20b` is the most interesting middle ground, with solid prefill at moderate context but a steady slide as the window grows.
–`gemma3` and `medgemma` stay comparatively stable across deeper contexts, which suggests those families are better tuned for this stack.
–`deepseek-r1:8b` is the oddball: generation stays flat while prefill scales up sharply, suggesting a different runtime profile than the rest.

// TAGS

fastflowlmbenchmarkinferencellmedge-aiself-hosted

DISCOVERED

112d ago

2026-03-21

PUBLISHED

112d ago

2026-03-21

RELEVANCE

8/ 10

AUTHOR

spaceman_

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL1h ago

Qwythos-9B v2 fixes LLM repetition loops

Empero AI has launched the v2 hygiene release of Qwythos-9B, an open-source, 9-billion parameter reasoning model built on an uncensored Qwen3.5 base. This update addresses common local LLM repetition and tool-calling issues by employing Final-Token Preference Optimization to eliminate decoding loops under greedy settings and restoring the native multi-token prediction head.

OPEN SOURCE3h ago

meshoptimizer is an open-source C/C++ library that optimizes 3D triangle meshes to reduce file sizes and accelerate GPU rendering performance.

meshoptimizer is a high-performance C/C++ library designed to optimize 3D meshes for faster rendering and smaller file sizes. Developed by Arseny Kapoulkine, it provides a comprehensive suite of algorithms for vertex cache optimization, vertex fetch optimization, overdraw reduction, mesh simplification (Level of Detail), and data compression. The project includes gltfpack, an opinionated tool for optimizing glTF scenes, along with WebAssembly and JavaScript bindings for web applications, making it a staple in graphics pipelines and game engines.

UPDATE4h ago

Abacus AI integrates Supercomputer with agentic workflows

Abacus AI has integrated its Supercomputer with agentic workflows in Max Mode, giving LLMs like Fable 5 root access to a persistent Linux environment to execute, debug, and host full-stack applications autonomously.