Cactus adds hybrid cloud fallback and Needle model
Cactus Compute updated its low-latency mobile NPU engine with a hybrid architecture that combines zero-copy memory mapping with smart cloud fallback. The release includes "Needle," a 26M parameter model optimized for fast, local tool-calling.
Cactus is the "Ollama for mobile" developers have been waiting for, finally unlocking the dedicated NPU silicon in modern smartphones. Zero-copy memory mapping and a proprietary .cact format reduce RAM overhead by 10x, making 1B+ models viable on mid-range hardware. Native SDKs for Flutter and React Native bypass the complex C++ boilerplate typically required for mobile ML. The hybrid router dynamically switches between local NPU execution and cloud APIs based on battery, latency, and task complexity. The recent release of the "Needle" 26M parameter model optimizes for fast, local tool-calling, turning phones into autonomous agents.
DISCOVERED
2h ago
2026-05-17
PUBLISHED
2h ago
2026-05-17
RELEVANCE
AUTHOR
Better Stack