ASUS Ascent GX10 hits 50 t/s inference

// 105d agoBENCHMARK RESULT

ASUS Ascent GX10 hits 50 t/s inference

A Reddit user is exploring whether the ASUS Ascent GX10—a compact AI workstation powered by the NVIDIA GB10 Grace Blackwell Superchip—can reliably hit 50 tokens per second (t/s) when running the Qwen3.5-122B-A3B model. The investigation verifies if the device’s 300 GB/s bandwidth and Blackwell architecture can sustain high speeds for large Mixture-of-Experts models while maintaining low latency.

// ANALYSIS

The Asus Ascent GX10 represents a significant milestone for local LLM enthusiasts, providing enterprise-grade Blackwell silicon in a consumer-accessible form factor.

* **Hardware Parity:** Since the GX10 is a rebadged NVIDIA Spark, scripts optimized for the GB10 Superchip are highly likely to run without modification, though thermal management in the Asus "QuietFlow" chassis may lead to slight performance variance compared to server-grade cooling.

* **MoE Efficiency:** The "A3B" suffix indicates that while the model has 122B total parameters, only 3B are active per token; this sparse activation allows the model to stay well within the GX10's 300 GB/s memory bandwidth, making 50 t/s a realistic and even conservative target.

* **TTFT Performance:** With 1 Petaflop of FP8/FP4 compute, the GX10 should easily achieve a sub-5-second TTFT for 8,000 tokens, as the prefill operation is compute-bound rather than memory-bound.

// TAGS

asus-ascent-gx10qwen3-5nvidia-blackwelllocal-llmmoeinferencehardwaregrace-superchip

DISCOVERED

105d ago

2026-04-11

PUBLISHED

105d ago

2026-04-11

RELEVANCE

9/ 10

AUTHOR

kuhunaxeyive

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

SECURITY20m ago

Kimi K3 demonstrates autonomous corporate network intrusion

A joint evaluation by the UK and US AI Security Institutes revealed that Moonshot AI's Kimi K3 model possesses significant offensive cyber capabilities. During testing, Kimi K3 successfully achieved multi-step corporate network intrusions in an entirely autonomous manner.

VIDEO2h ago

Lower reasoning effort boosts Claude Opus 5 performance

In a video evaluation by Every, testing shows that Anthropic's Claude Opus 5 performs significantly better when configured with medium or low reasoning effort rather than maximum thinking settings. While max reasoning is designed for heavy problem-solving, it frequently causes the model to overthink, over-complicate solutions, and introduce unnecessary errors.

VIDEO2h ago

Claude Opus 5 Lags Rivals in Developer Workflows

In a hands-on review by Every, Anthropic's high-capability Claude Opus 5 model is put to the test across real-world daily coding and autonomous developer workflows. Despite its advanced reasoning metrics and position as a frontier model, the analysis highlights practical friction points—including latency and cost-benefit trade-offs—that prevent it from displacing current daily drivers like GPT-5.6 and Claude Fable in active developer setups.