GPT-OSS 120B boosts throughput on DGX Spark

// 106d agoBENCHMARK RESULT

GPT-OSS 120B boosts throughput on DGX Spark

OpenAI's GPT-OSS 120B is being benchmarked on NVIDIA's DGX Spark, and the thread is really about serving-stack efficiency rather than model quality. The OP reports about 32 tps in vLLM on a Q4_K_S build, while commenters say native MXFP4 with llama.cpp should push it much closer to 50-60 tps.

// ANALYSIS

This looks more like a stack mismatch than a hard hardware ceiling. GPT-OSS 120B is sparse, open-weight, and native MXFP4, so the fastest path is usually to respect the model's format and let the runtime/kernel stack do the heavy lifting.

–OpenAI says GPT-OSS 120B has 117B total parameters but only 5.1B active per token, which means kernel efficiency matters a lot.
–The thread's own numbers line up with that story: ~32 tps in vLLM/Q4_K_S, roughly ~50 tps after switching to llama.cpp/MXFP4, and one reply expecting around 60 tps on DGX Spark.
–NVIDIA's DGX Spark blog says llama.cpp optimizations have lifted performance by about 35% on average, reinforcing that runtime choice is the biggest lever.
–If you care about response quality, the win is native precision plus flash-attn, batching, and context tuning, not a harsher quant.

// TAGS

gpt-oss-120bdgx-sparkopen-weightsinferencegpubenchmarkllm

DISCOVERED

106d ago

2026-03-29

PUBLISHED

106d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

AdamLangePL

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL20m ago

GPT-5.6 retains reasoning context across turns

A key architectural detail has been revealed for OpenAI's new GPT-5.6 model family: unlike predecessor models that discarded Chain of Thought (CoT) context at each turn to save context window space, GPT-5.6 maintains its reasoning context across the entire conversation history. This change ensures that the model preserves its logical chain and intermediate reasoning steps throughout multi-turn interactions.

OPEN SOURCE3h ago

scroll-world launches scroll-driven 3D flight skill

scroll-world is an open-source, framework-agnostic agent skill that leverages Higgsfield to generate immersive, scroll-driven 3D camera flights through diorama scenes for landing pages. By rendering seamless connection clips between neighboring frames, it allows developers to build interactive 3D narrative websites navigated simply by scrolling, without requiring heavy game engines.

MODEL4h ago

OpenAI GPT-5.6 hits Amazon Bedrock

OpenAI's GPT-5.6 model family—including Sol, Terra, and Luna—is now generally available on Amazon Bedrock. Running on Bedrock's next-generation inference engine, the models support prompt caching with a 90% discount and match OpenAI's first-party pricing.