cross-posted from: https://ibbit.at/post/219495

From Fark.com RSS via this RSS feed. Fark comments are available here.

-–

By Wednesday morning, Anthropic representatives had used a copyright takedown request to force the removal of more than 8,000 copies and adaptations of the raw Claude Code instructions - known as source code - that developers had shared on programming platform GitHub.
It later narrowed its takedown request to cover just 96 copies and adaptations, saying its initial ask had reached more GitHub accounts than intended.

Source [web-archive]

-–

Many unresolved legal questions over LLMs and copyright center on memorization: whether specific training data have been encoded in the model’s weights during training, and whether those memorized data can be extracted in the model’s outputs.

While many believe that LLMs do not memorize much of their training data, recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models… We investigate this question using a two-phase procedure…

We evaluate our procedure on four production LLMs: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, and we measure extraction success with a score computed from a block-based approximation of longest common substring…

Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs…

…we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984…

Source: https://arxiv.org/pdf/2601.02671

      • supersquirrel@sopuli.xyz
        link
        fedilink
        arrow-up
        6
        ·
        4 hours ago

        I used to post a bunch of articles there, I love that community but I recently had a bad experience with the way the Lemmy World team treated my concerns (and many others’s concerns) about Jordan Lund and the selective, problematic moderation of the Lemmy World politics and gobal politics communities. See this post here.

        https://sopuli.xyz/post/42630105

        Also see the post I made on the Lemmy World Support community and it was locked for frivolous reasons.

        https://sopuli.xyz/post/42631979

        I know this is a strong reaction, but I stand by it, this is very important for the health of the Fediverse.

        If I were part of the Fuck AI community on Lemmy World I would be concerned that once more and more news articles and journalism comes out about how some US tech companies were knowingly complicit in the Palestinian Genocide as a profit seeking venture I would be nervous that the Lemmy World team wouldn’t begin to become pretty uncomfortable with the Fuck AI community.

        I would love love love if y’all migrated somewhere that was more willing to confront the ways in which AI has become a tool of obfuscation for complicity in a globalized system of violence that transcends the confines of any one single country.