Mercury diffusion coder models hit 1,109 tok/s

// 85d agoPRODUCT LAUNCH

Mercury diffusion coder models hit 1,109 tok/s

Inception Labs’ Mercury paper introduces diffusion-based coding LLMs (Mini and Small) that generate tokens in parallel and report 1,109 and 737 tokens/sec on H100 GPUs. The work claims up to 10x throughput gains versus speed-optimized autoregressive models while staying competitive on coding quality benchmarks and Copilot Arena.

// ANALYSIS

This is a serious attempt to break the autoregressive latency ceiling for coding assistants, and the speed-quality tradeoff looks compelling if independent real-world evals keep holding.

–The key technical bet is parallel denoising over discrete tokens, which attacks serial decode bottlenecks directly.
–Reported throughput numbers are large enough to materially change UX for autocomplete, agent loops, and iterative coding chat.
–Quality claims are strong but still benchmark-heavy, so production reliability across messy enterprise codebases is the next proof point.
–If diffusion LLM serving matures, incumbent “fast” autoregressive coding models could face real pricing and latency pressure.

// TAGS

mercury-coderllmai-codinginferenceresearch

DISCOVERED

85d ago

2026-03-03

PUBLISHED

91d ago

2026-02-25

RELEVANCE

9/ 10

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS7h ago

Replit hits 50M users building with Claude

Anthropic highlights Replit's Michele Catasta in its new "Problem Solvers" series, revealing that over 50 million people are now building software on Replit using Claude's reasoning models.

UPDATE7h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

NEWS7h ago

OpenAI Foundation commits $250M to AI worker transitions

The OpenAI Foundation has launched a $250 million initiative to study AI's economic impact, support displaced workers, and explore systemic changes like universal basic income. The funding is the first major deployment from its pledge to spend $1 billion annually following OpenAI's corporate restructuring.