Raspberry Pi 5 boosts Gemma 4 via NVMe
A detailed performance update for running the newly released Gemma 4 and other large language models on a Raspberry Pi 5 demonstrates that hardware bottlenecks can be mitigated with consumer-grade upgrades. By switching from USB 3.0 to a PCIe NVMe SSD HAT+, the user doubled disk read speeds to 798 MB/sec, resulting in a 1.5x to 2x improvement in tokens per second for models that exceed the Pi's 16GB RAM. The benchmarks cover a wide range of architectures, including Google’s Gemma 4 variants, Qwen 3.5, and Mistral 3, providing a definitive guide for hobbyists looking to maximize local inference on low-cost edge hardware.
The Raspberry Pi 5 has evolved into a viable platform for edge LLMs, but only if you abandon microSD cards in favor of NVMe storage for memory swapping.
- –NVMe SSDs are the critical enabler for running "swapped" models like Gemma 4 26B or Qwen 3.5 122B, preventing the system from stalling during large context processing.
- –Gemma 4’s "Effective" (E2B and E4B) variants are the current gold standard for edge performance, delivering usable text generation speeds even at 32k context.
- –Thermal trade-offs are significant: the HAT+ restricts airflow, raising temperatures by up to 15°C compared to earlier SSD-less setups, making active cooling mandatory.
- –Dense models above 30B parameters remain largely theoretical for real-time use, with speeds often dipping below 1 token per second despite high-speed swap.
DISCOVERED
6d ago
2026-04-05
PUBLISHED
6d ago
2026-04-05
RELEVANCE
AUTHOR
honuvo