OPEN_SOURCE ↗
HN · HACKER_NEWS// 5h agoINFRASTRUCTURE
Google splits TPU 8 for agents
Google detailed its eighth-generation TPU architecture with two specialized chips: TPU 8t for large-scale training and TPU 8i for low-latency inference, serving, and reasoning workloads. The split pairs hardware specialization with Axion hosts, new interconnect designs, native FP4, PyTorch preview support, and claimed gains over Ironwood.
// ANALYSIS
Google is admitting the obvious: one accelerator shape no longer fits frontier training and agentic inference. The interesting move is less the chip spec than the full-stack bet that custom silicon, networking, storage, and JAX/XLA/PyTorch integration can claw back efficiency against Nvidia’s broader ecosystem.
- –TPU 8t targets massive training scale with 9,600-chip superpods, SparseCore, native FP4, Virgo networking, TPUDirect storage, and claimed 2.7x training price-performance over Ironwood.
- –TPU 8i is tuned for inference bottlenecks: 288 GB HBM, 384 MB on-chip SRAM, a Collectives Acceleration Engine, and Boardfly topology to reduce all-to-all latency for MoE and reasoning models.
- –The PyTorch preview matters because TPU adoption has historically been gated as much by software friction as by silicon quality.
- –Google’s advantage is vertical integration; its weakness remains availability and developer mindshare outside Google Cloud.
// TAGS
tpu-8ttpu-8icloudinferencegpullmmlopsbenchmark
DISCOVERED
5h ago
2026-04-22
PUBLISHED
8h ago
2026-04-22
RELEVANCE
8/ 10
AUTHOR
meetpateltech