BACK_TO_FEEDAICRIER_2
ASUS Ascent GX10 hits 50 t/s inference
OPEN_SOURCE ↗
REDDIT · REDDIT// 1d agoBENCHMARK RESULT

ASUS Ascent GX10 hits 50 t/s inference

A Reddit user is exploring whether the ASUS Ascent GX10—a compact AI workstation powered by the NVIDIA GB10 Grace Blackwell Superchip—can reliably hit 50 tokens per second (t/s) when running the Qwen3.5-122B-A3B model. The investigation verifies if the device’s 300 GB/s bandwidth and Blackwell architecture can sustain high speeds for large Mixture-of-Experts models while maintaining low latency.

// ANALYSIS

The Asus Ascent GX10 represents a significant milestone for local LLM enthusiasts, providing enterprise-grade Blackwell silicon in a consumer-accessible form factor.

* **Hardware Parity:** Since the GX10 is a rebadged NVIDIA Spark, scripts optimized for the GB10 Superchip are highly likely to run without modification, though thermal management in the Asus "QuietFlow" chassis may lead to slight performance variance compared to server-grade cooling.

* **MoE Efficiency:** The "A3B" suffix indicates that while the model has 122B total parameters, only 3B are active per token; this sparse activation allows the model to stay well within the GX10's 300 GB/s memory bandwidth, making 50 t/s a realistic and even conservative target.

* **TTFT Performance:** With 1 Petaflop of FP8/FP4 compute, the GX10 should easily achieve a sub-5-second TTFT for 8,000 tokens, as the prefill operation is compute-bound rather than memory-bound.

// TAGS
asus-ascent-gx10qwen3-5nvidia-blackwelllocal-llmmoeinferencehardwaregrace-superchip

DISCOVERED

1d ago

2026-04-11

PUBLISHED

1d ago

2026-04-11

RELEVANCE

9/ 10

AUTHOR

kuhunaxeyive