cross-posted from: https://ibbit.at/post/219495

From Fark.com RSS via this RSS feed. Fark comments are available here.

-–

By Wednesday morning, Anthropic representatives had used a copyright takedown request to force the removal of more than 8,000 copies and adaptations of the raw Claude Code instructions - known as source code - that developers had shared on programming platform GitHub.
It later narrowed its takedown request to cover just 96 copies and adaptations, saying its initial ask had reached more GitHub accounts than intended.

Source [web-archive]

-–

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model’s weights during training, and whether those memorized data can be extracted in the model’s outputs.

While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models… We investigate this question using a two-phase procedure…

We evaluate our procedure on four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, and we measure extraction success with a score computed from a block-based approximation of longest common substring…

Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs…

…we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984…

Source: https://arxiv.org/pdf/2601.02671

  • OctopusNemeses@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    31 minutes ago

    Standard US tech industry. Copy and steal in a mad dash to monopolize the market. Sue everyone else into oblivion. Crown yourself the inventor and the winner.

  • Kojichan@lemmy.world
    link
    fedilink
    arrow-up
    31
    arrow-down
    1
    ·
    4 hours ago

    Dude… imagine if companies in real life had the ability to remove your memories of their products because they dont like their intellectual property existing in certain demographics.

    Kinda feels like Total Recall…

  • aarch0x40@piefed.social
    link
    fedilink
    English
    arrow-up
    92
    ·
    7 hours ago

    Phase 1: Steal IP

    Phase 2: Claim IP rights over stolen IP

    Phase 3: Sell IP rights under Chapter 7

      • supersquirrel@sopuli.xyz
        link
        fedilink
        arrow-up
        2
        ·
        2 hours ago

        I used to post a bunch of articles there, I love that community but I recently had a bad experience with the way the Lemmy World team treated my concerns (and many others’s concerns) about Jordan Lund and the selective, problematic moderation of the Lemmy World politics and gobal politics communities. See this post here.

        https://sopuli.xyz/post/42630105

        Also see the post I made on the Lemmy World Support community and it was locked for frivolous reasons.

        https://sopuli.xyz/post/42631979

        I know this is a strong reaction, but I stand by it, this is very important for the health of the Fediverse.

        If I were part of the Fuck AI community on Lemmy World I would be concerned that once more and more news articles and journalism comes out about how some US tech companies were knowingly complicit in the Palestinian Genocide as a profit seeking venture I would be nervous that the Lemmy World team wouldn’t begin to become pretty uncomfortable with the Fuck AI community.

        I would love love love if y’all migrated somewhere that was more willing to confront the ways in which AI has become a tool of obfuscation for complicity in a globalized system of violence that transcends the confines of any one single country.

  • one_old_coder@piefed.social
    link
    fedilink
    English
    arrow-up
    10
    arrow-down
    2
    ·
    5 hours ago

    I wish we stopped blaming LLMs for using commercial training data, it gives me a valid excuse to use BitTorrent nowadays. Also I enjoy watching companies fighting each other.

  • sixdripb@lemmy.world
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    4 hours ago

    Suchir had to die, for these fuckers to have a single neuron activation. … reality’s cooked