Quit Emailing Yourself

Reddit blocks Internet Archive to end sneaky AI scraping

1 min read | Saved February 14, 2026 | Copied!

reddit 🤖 internet-archive 🤖 ai-scraping 🤖 privacy 🤖 content-restrictions 🤖

Do you care about this?

Reddit is stopping the Internet Archive from indexing its threads after discovering that AI companies were scraping data from the archived content. Moving forward, the Internet Archive will only save screenshots of Reddit's homepage, limiting its ability to capture deleted posts and user interactions. Reddit is also addressing privacy issues related to archived content.

If you do, here's more

Reddit is blocking the Internet Archive (IA) from indexing its content due to concerns about AI firms scraping data from archived Reddit pages. Previously, the IA's Wayback Machine could archive Reddit threads, profiles, and comments, which helped maintain a record of user activity and discussions. Now, Reddit will only allow the archiving of screenshots of its homepage, limiting the Wayback Machine’s utility to just snapshots of popular posts and headlines rather than a comprehensive backup of deleted content or insights into various subreddit cultures.

Reddit has not disclosed specific AI companies involved in the scraping but confirmed to Ars Technica that it has observed violations of its scraping policies. Tim Rathschmidt, a spokesperson for Reddit, mentioned that there might be ways for the Internet Archive to improve its defenses against these AI scraping activities. This suggests a potential pathway for reopening access if IA can address these concerns. In addition to blocking IA, Reddit is also tackling privacy issues, noting that the Wayback Machine's archival of deleted content poses problems for user privacy.

Questions about this article

No questions yet.