Marcus claims dataset scores AI skepticism at scale
This open-source project extracts and scores 2,218 testable claims from 474 Gary Marcus Substack posts using two independent LLM pipelines plus a reconciliation layer. The published results show strong support for specific technical critiques, weaker support for market-crash predictions, and clear caveats that all labels are LLM-scored rather than human-verified.
Useful meta-research, but the strongest value is methodological transparency rather than definitive truth claims.
- –Dual-pipeline scoring (Claude and Codex) plus reconciliation is stronger than single-model judgment and makes disagreement visible.
- –The dataset highlights a key pattern for AI discourse: specific, falsifiable technical claims age better than broad market narratives.
- –The repo includes methods and outputs, which makes this reproducible for auditing other public AI commentators.
- –Because scoring is automated, downstream users should treat labels as evidence-weighted signals, not final adjudications.
DISCOVERED
83d ago
2026-03-05
PUBLISHED
84d ago
2026-03-04
RELEVANCE
AUTHOR
davegoldblatt