BACK_TO_FEEDAICRIER_2
Local LLMs fail 18th-century calendar test
OPEN_SOURCE ↗
REDDIT · REDDIT// 8d agoNEWS

Local LLMs fail 18th-century calendar test

A historical genealogy puzzle shared on r/LocalLLaMA reveals a persistent reasoning gap in consumer-grade open models, which fail to account for the Julian calendar's March 25th New Year when evaluating 1740s birth records. While frontier models correctly identify the 14-month gap between November 1746 and January 1747, local models often hallucinate medical miracles or dismiss the records as typos.

// ANALYSIS

This "tricky question" serves as a sharp reminder that benchmark scores often mask fundamental reasoning failures in smaller models. Consumer-grade models running on hardware like 64GB Macs frequently fail to connect the 1752 calendar shift to specific date calculations, with failure modes ranging from "medical miracles" to circular reasoning. The gap highlights the difference between a model knowing a fact and reasoning with it, as even high-performing open-weight models like Qwen struggle when historical context overrides modern defaults.

// TAGS
llmlocal-llmsreasoningbenchmarkreasoning-gap

DISCOVERED

8d ago

2026-04-03

PUBLISHED

8d ago

2026-04-03

RELEVANCE

6/ 10

AUTHOR

Murgatroyd314