BACK_TO_FEEDAICRIER_2
Perplexity says ROSE serves most traffic with CuTeDSL
OPEN_SOURCE ↗
X · X// 4h agoINFRASTRUCTURE

Perplexity says ROSE serves most traffic with CuTeDSL

Perplexity’s research post describes how its in-house Runtime-Optimized Serving Engine, ROSE, powers most of the company’s production and API traffic across embeddings, ranking, and trillion-parameter MoE workloads. The team says it adopted CuTeDSL as the primary GPU programming environment to ship specialized kernels faster, improve debugging, and squeeze peak performance out of NVIDIA Hopper and Blackwell systems. The writeup also outlines how Perplexity uses compile-time specialization, JIT compilation, and kernel split strategies for prefill versus decode, and it notes early experiments with having AI help write these kernels.

// ANALYSIS

This is a strong signal that infrastructure quality is a product feature at Perplexity, not a back-office concern.

  • ROSE looks like a serious internal platform, not a narrow inference wrapper, given it handles embeddings, scoring, MoE routing, and multiple product APIs.
  • CuTeDSL is being chosen for iteration speed plus low-level control, which suggests the team values shipping kernel changes quickly without giving up hardware-specific tuning.
  • The emphasis on Hopper and Blackwell peak performance makes this a competitive infrastructure story as much as a research story.
  • The “AI writing kernels” angle is interesting, but the more important takeaway is that kernel authoring is being treated as an engineering workflow worth automating.
// TAGS
perplexityrosecutedslinferencegpunvidiakernelsllmmoeinfrastructure

DISCOVERED

4h ago

2026-05-06

PUBLISHED

5h ago

2026-05-06

RELEVANCE

8/ 10

AUTHOR

AravSrinivas