OPEN_SOURCE ↗
YT · YOUTUBE// 4h agoMODEL RELEASE
PrismML launches ternary Bonsai models
PrismML’s Ternary Bonsai is a 1.58-bit model family in 8B, 4B, and 1.7B sizes, using ternary weights to cut memory by about 9x versus standard 16-bit models. The company says the release improves on its 1-bit Bonsai line while keeping the footprint and throughput attractive for consumer and edge deployment.
// ANALYSIS
This is a strong compression story: PrismML is no longer just chasing the smallest possible model, it’s optimizing for the more useful point where a little extra memory buys a lot more capability.
- –The core design is fully ternary, with weights constrained to `{-1, 0, +1}` across embeddings, attention, MLPs, and the LM head.
- –PrismML claims the 8B model scores 75.5 average benchmark points, about 5 points better than its 1-bit 8B predecessor, while staying at 1.75 GB.
- –The deployment angle is the real hook: native MLX support on Apple devices and reported throughput of 82 toks/sec on M4 Pro make this feel practical, not just academic.
- –Apache 2.0 licensing matters here, because it lowers friction for experimentation and downstream packaging.
- –The big question is how these numbers hold up outside PrismML’s own benchmark setup, especially across real workloads and longer-context use.
// TAGS
llmedge-aiinferencebenchmarkopen-sourceternary-bonsai
DISCOVERED
4h ago
2026-04-19
PUBLISHED
4h ago
2026-04-19
RELEVANCE
9/ 10
AUTHOR
AI Search