BACK_TO_FEEDAICRIER_2
MIT Total Uncertainty catches overconfident LLMs
OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoRESEARCH PAPER

MIT Total Uncertainty catches overconfident LLMs

MIT researchers built Total Uncertainty, a black-box uncertainty metric that combines self-consistency with disagreement across similar LLMs. It is meant to catch confident-but-wrong answers that repeated prompting alone can miss, especially in high-stakes settings.

// ANALYSIS

This is not a magic truth meter, but it is a much better abstention signal than asking the model twice and trusting the repeat answer. The useful shift here is treating uncertainty as a cross-model problem, which is exactly where confident hallucinations get exposed.

  • Works on generated text alone, so it can be applied to closed models without logits or hidden states.
  • Best on tasks with a single correct answer, like factual QA, translation, and math; open-ended prompts will stay messy by nature.
  • The paper finds a small, scale-matched ensemble from different companies works best, which is a pragmatic way to estimate epistemic uncertainty.
  • More auxiliary models improve calibration, but in production this still means extra API calls and vendor coordination.
  • Strong fit for selective abstention, routing, and safety checks where a confident hallucination is worse than saying "I’m not sure."
// TAGS
llmreasoningresearchsafetytotal-uncertainty

DISCOVERED

17d ago

2026-03-25

PUBLISHED

17d ago

2026-03-25

RELEVANCE

8/ 10

AUTHOR

DryDeer775