Inception's Mercury 2, the first commercial-scale reasoning diffusion LLM, is now available for production deployment on Baseten.

// 45d agoPRODUCT LAUNCH

Inception's Mercury 2, the first commercial-scale reasoning diffusion LLM, is now available for production deployment on Baseten.

Baseten has announced that Inception's Mercury 2 is now live on its platform, making it the first inference platform to deliver production-grade reasoning diffusion LLMs (dLLMs) to developers. Unlike traditional autoregressive models that generate tokens sequentially, Mercury 2 uses a diffusion architecture to generate and refine multiple tokens in parallel, enabling speeds of over 1,000 tokens per second on widely-deployed NVIDIA GPUs. Partners like Augment Code have already deployed Mercury 2 in production, achieving a 90% reduction in inference costs and an 82% drop in latency for critical workloads, while maintaining quality comparable to speed-optimized models like Claude 3 Haiku and GPT-5 mini.

// ANALYSIS

Diffusion architectures represent a fundamental paradigm shift away from the token-by-token sequential bottleneck of autoregressive LLMs, proving that raw speed doesn't require specialized custom silicon.

–**Parallel refinement**: By drafting the output and refining it over parallel passes, dLLMs bypass sequential generation constraints, making them architecturally faster at the core rather than relying on decoding patches.
–**Massive cost and latency benefits**: Early production metrics from Augment Code (90% cost reduction, 82% latency drop) demonstrate that dLLMs can drastically improve the economics of high-throughput agent loops.
–**Ideal for targeted agentic tasks**: While not a replacement for high-intelligence frontier models, its speed makes it perfect for sub-second tasks like tool routing, code completion, and real-time voice agents.

// TAGS

baseteninception-labsmercury-2diffusion-llmdllmai-inferencemachine-learningcloud-infrastructure

DISCOVERED

45d ago

2026-06-15

PUBLISHED

45d ago

2026-06-15

RELEVANCE

8/ 10

AUTHOR

phylera14

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL19m ago

GPT-5.6 Sol Resolves Century-Old Mathematical Conjectures

OpenAI co-founder Greg Brockman highlighted how GPT-5.6 Sol, OpenAI's flagship frontier model, has successfully resolved mathematical conjectures that remained unsolved for over a century. The achievement underscores the model's advanced reasoning and problem-solving capabilities, emphasizing the democratization of high-level intelligence to empower researchers, developers, and users worldwide.

NEWS19m ago

OpenAI solicits developer feedback for Codex

Thomas Sottiaux invited the developer community on X to share feedback and feature requests for OpenAI's Codex, emphasizing that no suggestion is too small. The post garnered widespread engagement, highlighting user interest in streamlining daily workflows and fixing usability papercuts in AI coding assistance.

UPDATE2h ago

Open Science v0.8.1 enhances long-running research workflows

AIPOCH has released Open Science v0.8.1, an update dedicated to enhancing long-running research workflows, workspace management, and artifact reliability. As AI-driven research projects increase in scope and complexity, this release provides researchers with improved visibility into context usage and file management to ensure consistent, reproducible results.