YOU ARE VIEWING ONE ITEM FROM THE AICRIER FEED

Mercury 2 enters production on Baseten

AICrier tracks AI developer news across Product Hunt, GitHub, Hacker News, YouTube, X, arXiv, and more. This page keeps the article you opened front and center while giving you a path into the live feed.

// WHAT AICRIER DOES

7+

TRACKED FEEDS

24/7

SCRAPED FEED

Short summaries, external links, screenshots, relevance scoring, tags, and featured picks for AI builders.

Mercury 2 enters production on Baseten
OPEN LINK ↗
// 2h agoMODEL RELEASE

Mercury 2 enters production on Baseten

Mercury 2 is a proprietary diffusion large language model (LLM) developed by Inception Labs that generates tokens in parallel to achieve inference speeds exceeding 1,000 tokens per second on NVIDIA GPUs. Designed for high-speed reasoning and agentic workflows, the model is now available in production on Baseten, with early adopters like Augment Code reporting a 90% reduction in inference costs.

// ANALYSIS

Diffusion-based text generation is a promising paradigm shift that bypasses the sequential bottlenecks of traditional autoregressive models, enabling real-time agentic reasoning at scale.

* Generating 1,000+ tokens per second on standard GPUs dramatically reduces the latency floor for complex multi-agent reasoning loops.

* A 90% cost reduction makes high-frequency model calls economically viable for enterprise coding and reasoning applications.

* Deployment on Baseten simplifies the production serving and scaling process for developer integrations.

// TAGS
llmdiffusion-modelreasoningartificial-intelligencebaseteninception-ai

DISCOVERED

2h ago

2026-06-11

PUBLISHED

2h ago

2026-06-11

RELEVANCE

8/ 10

AUTHOR

_inception_ai