Atlas pushes GB10 inference past 115 tok/s

// 78d agoINFRASTRUCTURE

Atlas pushes GB10 inference past 115 tok/s

Atlas, a pure Rust LLM inference engine for NVIDIA DGX Spark and GB10 systems, says its new Qwen3.5-35B container reaches roughly 115 tokens per second with speculative decoding and NVFP4 optimizations. The release matters because it positions Atlas as a faster, OpenAI-compatible alternative to stock vLLM images for local high-end inference workloads.

// ANALYSIS

Atlas is interesting because it is not just another benchmark post — it is an attempt to own the full local inference stack on DGX Spark and turn niche hardware into a serious developer platform.

–The headline claim is the 3.1x speedup over the community-standard vLLM image, which is a big enough jump to matter for anyone serving local models interactively
–Atlas is pitching operational simplicity as much as raw speed: pure Rust, no Python stack, OpenAI-compatible serving, and a container that should be runnable in minutes
–The roadmap broadens the story beyond one model, with Qwen3.5-122B, Nemotron, ASUS Ascent GX10, and even Strix Halo mentioned as next targets
–The biggest caveat is trust: community reaction on NVIDIA’s forum has already pushed for reproducible benchmarks and open source code before treating Atlas as a new default
–If the team follows through on broader hardware support and a credible open-source release, Atlas could become one of the more important local inference projects around GB10-class systems

// TAGS

atlasllminferencegpuself-hostedapi

DISCOVERED

78d ago

2026-03-10

PUBLISHED

81d ago

2026-03-07

RELEVANCE

8/ 10

AUTHOR

Live-Possession-6726

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE5h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

UPDATE6h ago

YouTube moves AI labels to video player

YouTube is moving its AI content disclosures from video descriptions to more prominent placements beneath the player and on Shorts overlays. Starting in May, the platform will use internal signals to automatically label photorealistic AI content that creators fail to disclose.

OPEN SOURCE9h ago

Taste Skill kills AI "frontend slop"

Taste-Skill is an open-source framework that provides portable "agent skills" to enforce high-end design principles in AI-generated code. By injecting specific design directives and "anti-slop" rules, it enables LLMs to produce editorial-grade UIs that bypass generic, boilerplate-heavy AI templates.