OPEN_SOURCE ↗
REDDIT · REDDIT// 17d agoRESEARCH PAPER
MIT Total Uncertainty catches overconfident LLMs
MIT researchers built Total Uncertainty, a black-box uncertainty metric that combines self-consistency with disagreement across similar LLMs. It is meant to catch confident-but-wrong answers that repeated prompting alone can miss, especially in high-stakes settings.
// ANALYSIS
This is not a magic truth meter, but it is a much better abstention signal than asking the model twice and trusting the repeat answer. The useful shift here is treating uncertainty as a cross-model problem, which is exactly where confident hallucinations get exposed.
- –Works on generated text alone, so it can be applied to closed models without logits or hidden states.
- –Best on tasks with a single correct answer, like factual QA, translation, and math; open-ended prompts will stay messy by nature.
- –The paper finds a small, scale-matched ensemble from different companies works best, which is a pragmatic way to estimate epistemic uncertainty.
- –More auxiliary models improve calibration, but in production this still means extra API calls and vendor coordination.
- –Strong fit for selective abstention, routing, and safety checks where a confident hallucination is worse than saying "I’m not sure."
// TAGS
llmreasoningresearchsafetytotal-uncertainty
DISCOVERED
17d ago
2026-03-25
PUBLISHED
17d ago
2026-03-25
RELEVANCE
8/ 10
AUTHOR
DryDeer775