BACK_TO_FEEDAICRIER_2
Marcus claims dataset scores AI skepticism at scale
OPEN_SOURCE ↗
REDDIT · REDDIT// 38d agoPRODUCT LAUNCH

Marcus claims dataset scores AI skepticism at scale

This open-source project extracts and scores 2,218 testable claims from 474 Gary Marcus Substack posts using two independent LLM pipelines plus a reconciliation layer. The published results show strong support for specific technical critiques, weaker support for market-crash predictions, and clear caveats that all labels are LLM-scored rather than human-verified.

// ANALYSIS

Useful meta-research, but the strongest value is methodological transparency rather than definitive truth claims.

  • Dual-pipeline scoring (Claude and Codex) plus reconciliation is stronger than single-model judgment and makes disagreement visible.
  • The dataset highlights a key pattern for AI discourse: specific, falsifiable technical claims age better than broad market narratives.
  • The repo includes methods and outputs, which makes this reproducible for auditing other public AI commentators.
  • Because scoring is automated, downstream users should treat labels as evidence-weighted signals, not final adjudications.
// TAGS
llmresearchbenchmarkopen-sourcesafety

DISCOVERED

38d ago

2026-03-05

PUBLISHED

39d ago

2026-03-04

RELEVANCE

7/ 10

AUTHOR

davegoldblatt