Local outlets limit Internet Archive access
Nieman Lab reports that 342 local outlets in its updated sample are now limiting the Internet Archive’s crawlers, with major chains like McClatchy, Advance Local, Tribune Publishing, MediaNews Group, and USA Today Co. driving much of the change. The stated concern is that AI companies could use the Wayback Machine as a back door to scrape journalism for training data or licensing leverage, but the result is a weaker public record for researchers, journalists, and anyone relying on archives when sites change or disappear.
The defensive logic is understandable, but the tradeoff is brutal: publishers may be protecting leverage against AI scraping while making the historical record less accessible for everyone else.
- –The scale matters: this is no longer a handful of publishers, it is a broad shift across local news infrastructure.
- –The risk is indirect but real: blocking the Internet Archive does not just affect bots, it affects future reporting, scholarship, and verification.
- –The strategy is uneven: some publishers are blocking the Archive while still allowing major AI crawlers, which suggests this is as much about bargaining power as preservation.
- –The long-term gap is obvious: if publishers do not maintain their own durable archives, they are effectively outsourcing memory and then withdrawing from the backup.
DISCOVERED
1h ago
2026-05-21
PUBLISHED
4h ago
2026-05-21
RELEVANCE
AUTHOR
jaredwiener