OPEN_SOURCE ↗
HN · HACKER_NEWS// 5h agoINFRASTRUCTURE
Google splits TPUs for agents
Google introduced its eighth-generation TPUs as two specialized chips: TPU 8t for large-scale training and TPU 8i for low-latency inference. Both are aimed at AI agents and frontier workloads, with general availability planned later this year through Google Cloud’s AI Hypercomputer.
// ANALYSIS
Google is turning TPU into a full-stack cloud wedge, not just a faster accelerator story.
- –Splitting training and inference acknowledges that agent workloads stress memory, latency, and utilization differently than classic batch training.
- –TPU 8t targets frontier-scale training with 9,600-chip superpods, 2 PB of shared high-bandwidth memory, and 121 ExaFlops of compute.
- –TPU 8i is the more interesting developer signal: more SRAM, higher memory bandwidth, and lower-latency collective operations are tuned for reasoning loops and multi-agent serving.
- –Native support for JAX, PyTorch, SGLang, vLLM, MaxText, and bare-metal access makes Google’s custom silicon less of a walled-garden bet than past TPU generations.
- –The catch is still availability and lock-in: these gains matter most if customers can actually get capacity and are willing to build around Google Cloud.
// TAGS
google-tputpu-8ttpu-8icloudinferenceagentllm
DISCOVERED
5h ago
2026-04-22
PUBLISHED
8h ago
2026-04-22
RELEVANCE
9/ 10
AUTHOR
xnx