NVIDIA GB300 NVL72 hits 20x AgentPerf throughput
NVIDIA's GB300 NVL72 platform achieved record performance on Artificial Analysis's new AA-AgentPerf benchmark, delivering up to 20x more concurrent agents per megawatt than previous H200 systems. The platform leverages full-stack optimizations like high-speed NVLink fabrics and DeepGEMM to handle the complex, multi-turn reasoning requirements of advanced AI agents.
As the industry shifts from simple chatbots to long-horizon agentic workflows, hardware optimization is evolving from raw token latency to high-density, multi-turn reasoning throughput.
* The shift to "relay-style" agent workflows makes traditional token-per-second benchmarks less relevant compared to multi-turn agent trajectories.
* Fusing 72 GPUs over a single NVLink domain is crucial for maintaining low-latency state sharing in complex, iterative reasoning tasks.
* Hardware-software co-design using DeepGEMM, MXFP4 precision, and SGLang routing is proving to be a major differentiator for scaling agent concurrency.
DISCOVERED
2h ago
2026-06-22
PUBLISHED
2h ago
2026-06-22
RELEVANCE
AUTHOR
DIY Smart Code