OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoRESEARCH PAPER
LLMs top radiologists without seeing images
Stanford researchers found that LLMs outperform radiologists by 10% on medical imaging benchmarks even when the images are withheld. The models act as "superhuman guessers" by exploiting clinical context, revealing a fundamental flaw in current multimodal evaluation methods.
// ANALYSIS
This study exposes a massive "shortcut" in medical AI: models are often just very good at medical trivia rather than visual diagnosis.
- –Qwen 2.5 reached the top of a chest X-ray leaderboard without looking at a single image, even on private datasets.
- –The "superhuman" performance suggests LLMs capture subtle clinical correlations in text that human experts typically overlook.
- –For developers, this highlights the necessity of "blind" control tests to ensure multimodal models are actually performing visual reasoning.
- –Results challenge existing benchmarks that don't account for text-based leakage.
// TAGS
miragellmmultimodalresearchbenchmarkqwen-2-5safety
DISCOVERED
14d ago
2026-03-29
PUBLISHED
14d ago
2026-03-29
RELEVANCE
8/ 10
AUTHOR
Tolopono