HN · HACKER_NEWS// 5h agoINFRASTRUCTURE

Google splits TPUs for agents

Google introduced its eighth-generation TPUs as two specialized chips: TPU 8t for large-scale training and TPU 8i for low-latency inference. Both are aimed at AI agents and frontier workloads, with general availability planned later this year through Google Cloud’s AI Hypercomputer.

// ANALYSIS

Google is turning TPU into a full-stack cloud wedge, not just a faster accelerator story.

–Splitting training and inference acknowledges that agent workloads stress memory, latency, and utilization differently than classic batch training.
–TPU 8t targets frontier-scale training with 9,600-chip superpods, 2 PB of shared high-bandwidth memory, and 121 ExaFlops of compute.
–TPU 8i is the more interesting developer signal: more SRAM, higher memory bandwidth, and lower-latency collective operations are tuned for reasoning loops and multi-agent serving.
–Native support for JAX, PyTorch, SGLang, vLLM, MaxText, and bare-metal access makes Google’s custom silicon less of a walled-garden bet than past TPU generations.
–The catch is still availability and lock-in: these gains matter most if customers can actually get capacity and are willing to build around Google Cloud.

// TAGS

google-tputpu-8ttpu-8icloudinferenceagentllm

DISCOVERED

5h ago

2026-04-22

PUBLISHED

8h ago

2026-04-22

RELEVANCE

9/ 10

AUTHOR

xnx