• definitemaybe@lemmy.ca
    link
    fedilink
    arrow-up
    6
    ·
    edit-2
    4 hours ago

    And so the Tragedy of the Commons plays out, yet again.

    There’s no cost to being a selfish asshole, so it’s sadly not surprising that many individual actors are destroying the public Internet. Like, how can we align incentives to stop this? Regulations/laws are mostly pointless since the very same tactics used to dodge bot detection also make it incredibly hard to identify the originator.

    The only other disincentive with a real cost, that I can think of, would be to poison the data fed to scrapers, so they get bad data? That seems expensive to set up, though.

    I think TFA has the best solution idea: make it easy to scrape all the useful data using a low-cost standardized system. Then there’s no incentive to scrape the website using a stupid, expensive crawler in the first place.

    Edit: actually, LLMs make poisoning the data fairly reasonable… When there’s a high volume of requests for outdated pages/edit pages/other rarely accessed pages, have the server serve a pre-cached parody version of the root page instead. Pre-build one parody copy of each page with a standardized prompt, like “rewrite this page like it comes from an academic journal of medicine or economics with APA citations for every fact.”