undefined | Better HN

0 pointstoomuchtodo5mo ago0 comments

Without gating AI scraper access, Reddit’s enterprise value based on only ad revenue is greatly diminished. If the AI folks impair Reddit’s economics through their maneuvers, that might not be so bad (as Reddit’s behavior of late has been “all this user generated content belongs to us to monetize as we see fit”).

0 comments

sergenj5mo ago

The AI companies could just pull the content from Reddit mirrors like https://arctic-shift.photon-reddit.com/search/ and https://search.pullpush.io/. It's not difficult to scrape nor difficult to acquire archives of all Reddit posts and comments.

toomuchtodoOP5mo ago

They would most likely use the browsers they offer users to scrap and stream the content back to an endpoint for ingest and processing as users browse Reddit, think Recap the Law extension for Pacer (which scrapes Pacer while a user browses it and ships the data to the Internet Archive) or ArchiveTeam’s Warrior VM. You can’t defend against scraping when every user browser, that looks like a human because it is a human, is a crawler node.

At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.

j / k navigate · click thread line to collapse

0 pointstoomuchtodo5mo ago0 comments

0 comments

sergenj5mo ago

toomuchtodoOP5mo ago

At least, this is how I would engineer a public browser operating as an adversarial distributed crawler network.

j / k navigate · click thread line to collapse