Aggressive AI scrapers are making it kinda suck to run wikis

lemmydividebyzero@reddthat.com · 10 hours ago

Aggressive AI scrapers are making it kinda suck to run wikis

definitemaybe@lemmy.ca · edit-2 4 hours ago

And so the Tragedy of the Commons plays out, yet again.

There’s no cost to being a selfish asshole, so it’s sadly not surprising that many individual actors are destroying the public Internet. Like, how can we align incentives to stop this? Regulations/laws are mostly pointless since the very same tactics used to dodge bot detection also make it incredibly hard to identify the originator.

The only other disincentive with a real cost, that I can think of, would be to poison the data fed to scrapers, so they get bad data? That seems expensive to set up, though.

I think TFA has the best solution idea: make it easy to scrape all the useful data using a low-cost standardized system. Then there’s no incentive to scrape the website using a stupid, expensive crawler in the first place.

Edit: actually, LLMs make poisoning the data fairly reasonable… When there’s a high volume of requests for outdated pages/edit pages/other rarely accessed pages, have the server serve a pre-cached parody version of the root page instead. Pre-build one parody copy of each page with a standardized prompt, like “rewrite this page like it comes from an academic journal of medicine or economics with APA citations for every fact.”