Mercury 2 hits 1,000 tok/sec

// 82d agoMODEL RELEASE

Mercury 2 hits 1,000 tok/sec

Inception Labs has launched Mercury 2, a diffusion-based reasoning LLM that generates through parallel refinement instead of next-token decoding. The pitch is simple but important for production teams: 1,009 tokens/sec on NVIDIA Blackwell GPUs, 128K context, native tool use, structured JSON output, and an OpenAI-compatible API aimed at latency-sensitive AI workloads.

// ANALYSIS

Mercury 2 is one of the clearest shots yet at the autoregressive status quo: if the quality holds up in real workloads, speed stops being a UX tax and starts becoming a product advantage.

–The real story is not just headline throughput, but that diffusion-based generation changes the latency curve for agent loops, coding copilots, voice systems, and RAG pipelines.
–Inception is positioning Mercury 2 as a drop-in API replacement, which lowers adoption friction for teams already built around OpenAI-style tooling.
–The model looks strongest for structured output, search, real-time interaction, and coding assistance, where sub-second responsiveness matters more than squeezing out every last point of frontier reasoning quality.
–This launch also puts pressure on mainstream model vendors to show better speed-quality tradeoffs, not just bigger benchmark numbers.
–Outside commentary already frames Mercury 2 as part of a likely hybrid future, where diffusion models handle fast draft generation and slower autoregressive models handle high-stakes refinement.

// TAGS

mercury-2llmreasoninginferenceapi

DISCOVERED

82d ago

2026-03-06

PUBLISHED

82d ago

2026-03-06

RELEVANCE

10/ 10

AUTHOR

AI Revolution

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL14m ago

Gemini 3.5 Flash powers Archon UI design

Google's latest 3.5 Flash model integrates with the Archon coding harness to deliver high-fidelity frontend designs via specialized agentic workflows. The model features a 1M context window and optimized reasoning for autonomous, multi-step development tasks.

NEWS15m ago

BridgeMind hits $193K ARR via vibe coding

BridgeMind AI founder Matthew Miller reports reaching $193,248 in Annual Recurring Revenue as part of his "vibe coding" challenge. The project demonstrates the commercial viability of "agentic organizations" where small teams leverage autonomous AI agents to ship and scale production software at high velocity.

OPEN SOURCE31m ago

OpenBMB launches PilotDeck "agent OS" for WorkSpaces

PilotDeck is an open-source productivity platform that organizes AI agents into isolated "WorkSpaces" with dedicated file systems and memory. Developed by OpenBMB and Tsinghua University, it focuses on production-grade reliability and cost efficiency for complex, multi-project workflows.