BACK_TO_FEEDAICRIER_2
LLMs top radiologists without seeing images
OPEN_SOURCE ↗
REDDIT · REDDIT// 14d agoRESEARCH PAPER

LLMs top radiologists without seeing images

Stanford researchers found that LLMs outperform radiologists by 10% on medical imaging benchmarks even when the images are withheld. The models act as "superhuman guessers" by exploiting clinical context, revealing a fundamental flaw in current multimodal evaluation methods.

// ANALYSIS

This study exposes a massive "shortcut" in medical AI: models are often just very good at medical trivia rather than visual diagnosis.

  • Qwen 2.5 reached the top of a chest X-ray leaderboard without looking at a single image, even on private datasets.
  • The "superhuman" performance suggests LLMs capture subtle clinical correlations in text that human experts typically overlook.
  • For developers, this highlights the necessity of "blind" control tests to ensure multimodal models are actually performing visual reasoning.
  • Results challenge existing benchmarks that don't account for text-based leakage.
// TAGS
miragellmmultimodalresearchbenchmarkqwen-2-5safety

DISCOVERED

14d ago

2026-03-29

PUBLISHED

14d ago

2026-03-29

RELEVANCE

8/ 10

AUTHOR

Tolopono