ASUS Ascent GX10 hits 50 t/s inference
A Reddit user is exploring whether the ASUS Ascent GX10—a compact AI workstation powered by the NVIDIA GB10 Grace Blackwell Superchip—can reliably hit 50 tokens per second (t/s) when running the Qwen3.5-122B-A3B model. The investigation verifies if the device’s 300 GB/s bandwidth and Blackwell architecture can sustain high speeds for large Mixture-of-Experts models while maintaining low latency.
The Asus Ascent GX10 represents a significant milestone for local LLM enthusiasts, providing enterprise-grade Blackwell silicon in a consumer-accessible form factor.
* **Hardware Parity:** Since the GX10 is a rebadged NVIDIA Spark, scripts optimized for the GB10 Superchip are highly likely to run without modification, though thermal management in the Asus "QuietFlow" chassis may lead to slight performance variance compared to server-grade cooling.
* **MoE Efficiency:** The "A3B" suffix indicates that while the model has 122B total parameters, only 3B are active per token; this sparse activation allows the model to stay well within the GX10's 300 GB/s memory bandwidth, making 50 t/s a realistic and even conservative target.
* **TTFT Performance:** With 1 Petaflop of FP8/FP4 compute, the GX10 should easily achieve a sub-5-second TTFT for 8,000 tokens, as the prefill operation is compute-bound rather than memory-bound.
DISCOVERED
1d ago
2026-04-11
PUBLISHED
1d ago
2026-04-11
RELEVANCE
AUTHOR
kuhunaxeyive