cuBLAS bug hits RTX 5090 performance

// 67d agoBENCHMARK RESULT

cuBLAS bug hits RTX 5090 performance

NVIDIA's cuBLAS library contains a critical dispatcher bug that slashes RTX 5090 performance by 60% in batched FP32 workloads. The software fails to escalate to optimized kernels on Blackwell consumer hardware, trapping the high-end GPU in inefficient legacy execution paths while professional counterparts receive proper heuristics.

// ANALYSIS

This regression exposes a "silent tax" on consumer hardware where Blackwell GPUs are trapped in legacy execution paths while professional chips receive optimized kernels. The cuBLAS dispatcher for sm_120 fails to select larger tile sizes on RTX GPUs, defaulting to tiny 128x32 kernels that underutilize the hardware. A custom kernel leveraging the Tensor Memory Accelerator (TMA) beats cuBLAS by up to 70% using only 300 lines of code. This failure mirrors Pascal-era bugs where consumer cards were routed through Maxwell kernels, suggesting persistent legacy debt in NVIDIA's dispatch logic. While tensor core paths for FP16 remain fast, the FP32 SGEMM regression severely impacts scientific computing and non-tensor ML tasks.

// TAGS

gpubenchmarkinferenceresearchblackwellcublasrtx-5090

DISCOVERED

67d ago

2026-04-10

PUBLISHED

67d ago

2026-04-10

RELEVANCE

8/ 10

AUTHOR

NoVibeCoding

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

LAUNCH34m ago

Vercel Connect enters public beta

Vercel Connect has launched in public beta to secure third-party API access for modern web apps and AI agents. The service allows developers to dynamically request short-lived, task-scoped tokens at runtime via the `@vercel/connect` SDK and CLI, completely removing static secrets from environment variables.

INFRA37m ago

Vercel unveils Agent Stack for AI

Vercel has introduced its Agent Stack, outlining core infrastructure requirements for building and operating reliable, production-grade autonomous AI agents. The suite combines the Vercel AI SDK, durable Workflows, and secure Sandboxes to provide a unified runtime cloud for agentic software.

VIDEO46m ago

Experienced devs double agentic success rate

Anthropic's study of 400,000 Claude Code sessions reveals that experienced developers achieve verified success more than twice as often as beginners. A video analysis by DIY Smart Code explains how coding agents reward domain expertise rather than substituting for it.