Perplexity says ROSE serves most traffic with CuTeDSL

// 45d agoINFRASTRUCTURE

Perplexity says ROSE serves most traffic with CuTeDSL

Perplexity’s research post describes how its in-house Runtime-Optimized Serving Engine, ROSE, powers most of the company’s production and API traffic across embeddings, ranking, and trillion-parameter MoE workloads. The team says it adopted CuTeDSL as the primary GPU programming environment to ship specialized kernels faster, improve debugging, and squeeze peak performance out of NVIDIA Hopper and Blackwell systems. The writeup also outlines how Perplexity uses compile-time specialization, JIT compilation, and kernel split strategies for prefill versus decode, and it notes early experiments with having AI help write these kernels.

// ANALYSIS

This is a strong signal that infrastructure quality is a product feature at Perplexity, not a back-office concern.

–ROSE looks like a serious internal platform, not a narrow inference wrapper, given it handles embeddings, scoring, MoE routing, and multiple product APIs.
–CuTeDSL is being chosen for iteration speed plus low-level control, which suggests the team values shipping kernel changes quickly without giving up hardware-specific tuning.
–The emphasis on Hopper and Blackwell peak performance makes this a competitive infrastructure story as much as a research story.
–The “AI writing kernels” angle is interesting, but the more important takeaway is that kernel authoring is being treated as an engineering workflow worth automating.

// TAGS

perplexityrosecutedslinferencegpunvidiakernelsllmmoeinfrastructure

DISCOVERED

45d ago

2026-05-06

PUBLISHED

45d ago

2026-05-06

RELEVANCE

8/ 10

AUTHOR

AravSrinivas

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE37m ago

Tesana AI generates complex game elements

Tesana AI has announced a capability allowing users to generate complex game mechanics, such as a fully functional boss fight for an underwater first-person shooter (FPS), using a single text prompt. This capability is part of their AI-powered game generation platform, which aims to democratize game development by translating natural language prompts into environments, assets, and game logic, potentially reducing the time required to design complex game features from months to minutes.

OPEN SOURCE1h ago

Anthropic open-sources launch-your-agent skill for Claude Code

Anthropic has released /launch-your-agent, an open-source skill that allows developers to build, deploy, and schedule Claude Managed Agents directly from the Claude Code CLI. Through an interactive terminal interview, the skill scopes a v0 agent, deploys it to the developer's account, and automatically grades its performance.

NEWS1h ago

Developer criticizes GLM-5.2 agent-loop performance

AI developer EXM7777 shared a critical assessment of the GLM-5.2 model on X, arguing that those praising the model are relying on benchmark cards rather than running it in practical, multi-step agent environments. The critique highlights a gap between the model's reported test-set achievements and its actual usability in production-level developer agent loops.