GLM-5.2 hits 120 tok/s on Blackwell tinyboxes
A rumor reported by Tiny Corp suggests that Zhipu AI's upcoming GLM-5.2 model is currently running at 120 tokens per second across a setup of two networked Blackwell-based tinyboxes. The hardware configuration is estimated to cost $150,000, highlighting a potential shift towards powerful, cost-effective, and decentralized local hardware clusters for running frontier large language models.
Local AI hardware clusters are officially encroaching on cloud dominance by making high-speed, frontier-class inference affordable for enterprises.
- –Networked Blackwell tinyboxes demonstrate the viability of Tiny Corp's architecture for multi-GPU, high-bandwidth workloads.
- –A speed of 120 tokens per second makes real-time, multi-step agentic workflows highly practical for local deployments.
- –The $150,000 price tag lowers the entry barrier for organizations seeking data sovereignty and predictable operational costs over cloud APIs.
DISCOVERED
4h ago
2026-06-21
PUBLISHED
5h ago
2026-06-21
RELEVANCE
AUTHOR
AravSrinivas