1-bit Bonsai 8B hits 8x speed boost
PrismML has released 1-bit Bonsai 8B, a model that fits 8 billion parameters into 1.15 GB of VRAM. It delivers up to 8x faster inference on edge devices while maintaining performance competitive with standard FP16 models.
1-bit Bonsai 8B proves that extreme quantization is viable through architectural training rather than lossy post-training methods. Weights are represented using only {-1, 0, +1}, reducing memory usage by roughly 14x compared to FP16 while training-time quantization awareness prevents the intelligence collapse typical of standard methods. While the weights are compact, the KV cache remains a memory bottleneck for long context windows. These principles could scale to 2-bit or 4-bit architectures, offering an Apache 2.0 licensed alternative to research like Microsoft's BitNet.
DISCOVERED
10d ago
2026-04-02
PUBLISHED
10d ago
2026-04-02
RELEVANCE
AUTHOR
True_Tangerine_4706