Histomat of F/OSS: We should reclaim LLMs, not reject them

洪民憙 (Hong Minhee)@lemmy.ml · 2 months ago

Histomat of F/OSS: We should reclaim LLMs, not reject them

Rioting Pacifist@lemmy.world · 2 months ago

Seems like the easiest fix is to consider the produce of LLMs to be derivative products of the training data.

No need for a new license, if you’re training code on GPL code the code produced by LLMs is GPL.

Joe@discuss.tchncs.de · 2 months ago

Let me know if you convince any lawmakers, and I’ll show you some lawmakers about to be invited to expensive “business” trips and lunches by lobbyists.

Rioting Pacifist@lemmy.world · 2 months ago

The same can be said of the approach described in the article, the “GPLv4” would be useless unless the resulting weights are considered a derivative product.

A paint manufacturer can’t claim copyright on paintings made using that paint.

Joe@discuss.tchncs.de · edit-2 2 months ago

Indeed. I suspect it would need to be framed around national security and national interests, to have any realistic chance of success. AI is being seen as a necessity for the future of many countries … embrace it, or be steamrolled in the future by those who did, so a soft touch is being embraced.

Copyright and licensing uncertainty could hinder that, and the status quo today in many places is to not treat training as copyright infringement (eg. US), or to require an explicit opt-out (eg. EU). A lack of international agreements means it’s all a bit wishy washy, and hard to prove and enforce.

Things get (only slightly) easier if the material is behind a terms-of-service wall.

Ferk@lemmy.ml · edit-2 2 months ago

You are not gonna protect abstract ideas using copyright. Essentially, what he’s proposing implies turning this “TGPL” in some sort of viral NDA, which is a different category of contract.

It’s harder to convince someone that a content-focused license like the GPLv3 protects also abstract ideas, than creating a new form of contract/license that is designed specifically to protect abstract ideas (not just the content itself) from being spread in ways you don’t want it to spread.

Rioting Pacifist@lemmy.world · 2 months ago

LLMs don’t have anything to do with abstract ideas, they quite literally produce derivative content based on their training data & prompt.

Ferk@lemmy.ml · edit-2 2 months ago

LLMs abstract information collected from the content through an algorithm (what they store is the result of a series of tests/analysis, not the content itself, but a set of characteristics/ideas). If that makes it derivative, then all abstractions are derivative. It’s not possible to make abstractions without collecting data derived from a source you are observing.

If derivative abstractions were already something that copyright can protect then litigants wouldn’t resort to patents, etc.