Nemotron 120B hits 17 tok/s on AMD Strix Halo

// 124d agoBENCHMARK RESULT

Nemotron 120B hits 17 tok/s on AMD Strix Halo

Early benchmarks show NVIDIA's Nemotron-3 Super 120B-A12B model running at 14-17 tokens per second on AMD's flagship Strix Halo platform. By leveraging 128GB of unified LPDDR5x memory and the Radeon 8060S iGPU, the mobile workstation chip enables massive 384k context windows that previously required multi-GPU server setups.

// ANALYSIS

Strix Halo is proving to be a genuine Apple Silicon competitor for local LLM workloads, finally breaking the 24GB VRAM barrier for PC laptops. Its unified memory architecture allows the iGPU to address nearly the entire 128GB RAM pool, making massive 120B+ models viable on a single mobile chip. Performance of 17 tok/s on a 120B MoE model puts it in direct competition with high-end Mac Studio (M2/M3 Ultra) configurations.

// TAGS

amdryzen-ai-max-plus-395strix-halogpullmlocal-llmopen-weightsnemotron

DISCOVERED

124d ago

2026-03-23

PUBLISHED

124d ago

2026-03-22

RELEVANCE

8/ 10

AUTHOR

Mediocre_Paramedic22

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

OPEN SOURCE1h ago

Cli-Proxy-API Management Center launches WebUI configuration dashboard

Cli-Proxy-API Management Center is an open-source web interface designed to simplify the administration of CLI-Proxy-API instances. It replaces manual YAML configuration file editing with an intuitive visual dashboard for adjusting settings, monitoring runtime status, viewing live logs, and managing token authentication.

VIDEO4h ago

Granola CEO demonstrates OpenAI Codex browser automation

In a video demonstration presented by Every, Granola's CEO showcases OpenAI Codex functioning as an autonomous agent executing complex, multi-step browser workflows. Drawing upon saved user context, Codex navigates web applications and customer support chats to negotiate an internet plan migration and eliminate extra fees.

LAUNCH5h ago

Moonshot AI introduces Kimi K3 Agent Swarm

Moonshot AI has introduced Agent Swarm mode for Kimi K3, a horizontal scaling architecture capable of coordinating up to 300 parallel sub-agents to tackle complex software engineering tasks. By dividing web development across autonomous agent teams working concurrently, the system can generate multi-page websites and frontend applications significantly faster than traditional single-agent approaches.