1-bit Bonsai 8B hits 65.7 MMLU
Prism ML's 1-bit Bonsai 8B is a true 1-bit model based on the Qwen 3 architecture, achieving a 65.7 MMLU-R score with a 1.15GB footprint. By utilizing binary weights and grouped scaling, it delivers up to 6x faster inference and 80% lower energy consumption than full-precision models.
True 1-bit quantization (binary weights) compresses the model to 1.15GB, making 8B-parameter intelligence viable for smartphones and edge hardware.
- –The 65.7 MMLU-R score highlights an impressive "Intelligence Density," though it still trails Llama 3.1 8B's 72.9 score.
- –Custom dequantization kernels enable 6.2x faster inference on consumer hardware like the RTX 4090.
- –Current adoption is limited by the requirement for specialized forks of llama.cpp and custom runtime environments.
- –The model's success suggests that binary weight optimization may eventually outpace ternary (1.58-bit) quantization for edge deployment.
DISCOVERED
57d ago
2026-04-01
PUBLISHED
57d ago
2026-03-31
RELEVANCE
AUTHOR
OmarBessa