Local LLMs fail 18th-century calendar test

// 99d agoNEWS

Local LLMs fail 18th-century calendar test

A historical genealogy puzzle shared on r/LocalLLaMA reveals a persistent reasoning gap in consumer-grade open models, which fail to account for the Julian calendar's March 25th New Year when evaluating 1740s birth records. While frontier models correctly identify the 14-month gap between November 1746 and January 1747, local models often hallucinate medical miracles or dismiss the records as typos.

// ANALYSIS

This "tricky question" serves as a sharp reminder that benchmark scores often mask fundamental reasoning failures in smaller models. Consumer-grade models running on hardware like 64GB Macs frequently fail to connect the 1752 calendar shift to specific date calculations, with failure modes ranging from "medical miracles" to circular reasoning. The gap highlights the difference between a model knowing a fact and reasoning with it, as even high-performing open-weight models like Qwen struggle when historical context overrides modern defaults.

// TAGS

llmlocal-llmsreasoningbenchmarkreasoning-gap

DISCOVERED

99d ago

2026-04-03

PUBLISHED

99d ago

2026-04-03

RELEVANCE

6/ 10

AUTHOR

Murgatroyd314

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE58m ago

Lightpanda merges IndexedDB support for automation

Lightpanda, the open-source headless browser engine written in Zig for web automation and AI agents, has added base implementation support for IndexedDB to its main branch. This update allows scripts that depend on IndexedDB for client-side storage to execute successfully, removing a significant barrier for automation and scraping workflows on modern web applications.

OPEN SOURCE1h ago

LangChain-Chatchat builds local private RAG pipelines

LangChain-Chatchat is an open-source, local knowledge-based QA application and RAG framework built on LangChain, FastAPI, and Streamlit. It provides a private, offline pipeline that integrates with Ollama and Xinference to support open-source models like Llama3 and Qwen2.

OPEN SOURCE2h ago

prose stylesheet forces clean AI writing

prose is a lightweight, single-file Markdown prompt configuration that guides AI coding agents to communicate like a direct, confident senior engineer. Appended directly to local agent instruction files, it establishes clear rules to eliminate common AI patterns like cheesy setups, over-bulleted reasoning, and theatrical language.