1-bit Bonsai 4B breaks stock llama.cpp
PrismML's 1-bit Bonsai 4B GGUF model won't load in stock llama.cpp because it uses a custom ggml tensor type that the official Windows binary doesn't support. The fix isn't a fresh CMake rebuild so much as using PrismML's llama.cpp fork with the 1-bit kernel support noted in the model card.
The interesting part here is less the model itself than the packaging reality: a 1-bit GGUF can still be unusable on mainstream runtimes until the backend learns the new tensor type.
- –PrismML is pitching Bonsai as a low-footprint edge model, with public claims around sub-1GB memory use and Apache 2.0 availability
- –The error message points to a format/runtime mismatch, not a broken download or Windows-specific build issue
- –Users trying bleeding-edge quantization formats should expect to follow the model vendor's runtime fork, at least until upstream support lands
- –This is a good reminder that “GGUF” does not automatically mean “runs everywhere in llama.cpp” when custom kernels or novel tensor types are involved
DISCOVERED
58d ago
2026-04-01
PUBLISHED
58d ago
2026-04-01
RELEVANCE
AUTHOR
Weekly_Inflation7571
