REDDIT · REDDIT// 14d agoRESEARCH PAPER

LLMs top radiologists without seeing images

Stanford researchers found that LLMs outperform radiologists by 10% on medical imaging benchmarks even when the images are withheld. The models act as "superhuman guessers" by exploiting clinical context, revealing a fundamental flaw in current multimodal evaluation methods.

// ANALYSIS

This study exposes a massive "shortcut" in medical AI: models are often just very good at medical trivia rather than visual diagnosis.

–Qwen 2.5 reached the top of a chest X-ray leaderboard without looking at a single image, even on private datasets.
–The "superhuman" performance suggests LLMs capture subtle clinical correlations in text that human experts typically overlook.
–For developers, this highlights the necessity of "blind" control tests to ensure multimodal models are actually performing visual reasoning.
–Results challenge existing benchmarks that don't account for text-based leakage.

// TAGS

miragellmmultimodalresearchbenchmarkqwen-2-5safety

DISCOVERED

14d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

Tolopono