Google splits TPUs for agents
Google introduced its eighth-generation TPUs as two specialized chips: TPU 8t for large-scale training and TPU 8i for low-latency inference. Both are aimed at AI agents and frontier workloads, with general availability planned later this year through Google Cloud’s AI Hypercomputer.
Google is turning TPU into a full-stack cloud wedge, not just a faster accelerator story.
- –Splitting training and inference acknowledges that agent workloads stress memory, latency, and utilization differently than classic batch training.
- –TPU 8t targets frontier-scale training with 9,600-chip superpods, 2 PB of shared high-bandwidth memory, and 121 ExaFlops of compute.
- –TPU 8i is the more interesting developer signal: more SRAM, higher memory bandwidth, and lower-latency collective operations are tuned for reasoning loops and multi-agent serving.
- –Native support for JAX, PyTorch, SGLang, vLLM, MaxText, and bare-metal access makes Google’s custom silicon less of a walled-garden bet than past TPU generations.
- –The catch is still availability and lock-in: these gains matter most if customers can actually get capacity and are willing to build around Google Cloud.
DISCOVERED
45d ago
2026-04-22
PUBLISHED
45d ago
2026-04-22
RELEVANCE
AUTHOR
xnx