Google splits TPU 8 for agents
Google detailed its eighth-generation TPU architecture with two specialized chips: TPU 8t for large-scale training and TPU 8i for low-latency inference, serving, and reasoning workloads. The split pairs hardware specialization with Axion hosts, new interconnect designs, native FP4, PyTorch preview support, and claimed gains over Ironwood.
Google is admitting the obvious: one accelerator shape no longer fits frontier training and agentic inference. The interesting move is less the chip spec than the full-stack bet that custom silicon, networking, storage, and JAX/XLA/PyTorch integration can claw back efficiency against Nvidia’s broader ecosystem.
- –TPU 8t targets massive training scale with 9,600-chip superpods, SparseCore, native FP4, Virgo networking, TPUDirect storage, and claimed 2.7x training price-performance over Ironwood.
- –TPU 8i is tuned for inference bottlenecks: 288 GB HBM, 384 MB on-chip SRAM, a Collectives Acceleration Engine, and Boardfly topology to reduce all-to-all latency for MoE and reasoning models.
- –The PyTorch preview matters because TPU adoption has historically been gated as much by software friction as by silicon quality.
- –Google’s advantage is vertical integration; its weakness remains availability and developer mindshare outside Google Cloud.
DISCOVERED
45d ago
2026-04-22
PUBLISHED
45d ago
2026-04-22
RELEVANCE
AUTHOR
meetpateltech