Cougar hits 16.1 t/s on Raspberry Pi 5
Cougar is a minimalist, dependency-free LLM engine written in Rust for high-performance inference on the Raspberry Pi. It achieves significant speedups through a custom SIMD compiler and Stride-4 Sketching to bypass memory bandwidth bottlenecks.
Cougar is a masterclass in hardware-aware software engineering, proving that specialized LLM runners can significantly outperform general-purpose frameworks on edge devices. By using a custom SIMD compiler (Eä) and techniques like Stride-4 Sketching, it reduces memory bandwidth bottlenecks and minimizes cache misses through vertical layer fusion.
DISCOVERED
63d ago
2026-03-25
PUBLISHED
63d ago
2026-03-25
RELEVANCE
AUTHOR
Acceptable_Analyst45