OPEN_SOURCE ↗
REDDIT · REDDIT// 38d agoPRODUCT LAUNCH
Marcus claims dataset scores AI skepticism at scale
This open-source project extracts and scores 2,218 testable claims from 474 Gary Marcus Substack posts using two independent LLM pipelines plus a reconciliation layer. The published results show strong support for specific technical critiques, weaker support for market-crash predictions, and clear caveats that all labels are LLM-scored rather than human-verified.
// ANALYSIS
Useful meta-research, but the strongest value is methodological transparency rather than definitive truth claims.
- –Dual-pipeline scoring (Claude and Codex) plus reconciliation is stronger than single-model judgment and makes disagreement visible.
- –The dataset highlights a key pattern for AI discourse: specific, falsifiable technical claims age better than broad market narratives.
- –The repo includes methods and outputs, which makes this reproducible for auditing other public AI commentators.
- –Because scoring is automated, downstream users should treat labels as evidence-weighted signals, not final adjudications.
// TAGS
llmresearchbenchmarkopen-sourcesafety
DISCOVERED
38d ago
2026-03-05
PUBLISHED
39d ago
2026-03-04
RELEVANCE
7/ 10
AUTHOR
davegoldblatt