Histomat of F/OSS: We should reclaim LLMs, not reject them

洪民憙 (Hong Minhee)@lemmy.ml · 2 months ago

Histomat of F/OSS: We should reclaim LLMs, not reject them

bizdelnick@lemmy.ml · 2 months ago

One of the four essential freedoms is the freedom to study the software and modify it. Studying means training your brain on the open source code. Can one use their brain to write proprietary code after they studied some copylefted code?

chgxvjh [he/him, comrade/them]@hexbear.net · 2 months ago

If you study a code base then implement something similar yourself without attribution, there is a good chance that you are doing a form of plagiarism.

In other contexts like academic writing this approach might be considered a pretty clear and uncontroversial case of plagiarism.

bizdelnick@lemmy.ml · edit-2 2 months ago

Also, what if one implement proprietary software that is completely different from open source project they studied? They still may use knowledge they obtained when studying, e. g. by reusing algorithms, patterns or even code formatting. This is a common case for LLM coding assistants.

chgxvjh [he/him, comrade/them]@hexbear.net · 2 months ago

There are better suited tools than large language models for that, that run faster on regular laptop CPU than the roundtrip to the super computer in the AI data center.

bizdelnick@lemmy.ml · 2 months ago

It’s not the topic we discussed, right?

chgxvjh [he/him, comrade/them]@hexbear.net · 2 months ago

You listed a bunch of use cases for LLMs that aren’t plagiarism and they all seem to be better solved by different tools.

bizdelnick@lemmy.ml · 2 months ago

So what is the case we are speaking about? “Hey LLM, write the OS kernel that is fully compatible with Linux, designed like Linux, uses the same algorithms as Linux and the same code style as Linux”?

chgxvjh [he/him, comrade/them]@hexbear.net · 2 months ago

If you have Linux in the training data, the outcome if at all remotely useful would likely include plagiarism.

bizdelnick@lemmy.ml · 2 months ago

Are there similar cases in the wild?

bizdelnick@lemmy.ml · edit-2 2 months ago

There’s no such a word as plagiarism in free licenses nor in copyright laws. One could violate copyrights or patents or not. Copyleft licenses do not forbid what you call plagiarism. If you want to forbid this as well as training LLMs on your code, you need a new type of license. However I’m unsure if such a license could be considered free by FSF or approved by OSI.

chgxvjh [he/him, comrade/them]@hexbear.net · edit-2 2 months ago

Plagiarism is a form of copyright infringement if there are substantial similarities.

Open source licenses build on top of intellectual property laws.

bizdelnick@lemmy.ml · 2 months ago

So, everything depends on how you define substantial similarities. My opinion is that if there are no copy-and-pasted chunks of code (except for trivial), there are no substantial similarities.

chgxvjh [he/him, comrade/them]@hexbear.net · 2 months ago

https://en.wikipedia.org/wiki/Substantial_similarity

bizdelnick@lemmy.ml · 2 months ago

I live in another country, however the idea is the same as I wrote above: this all is about direct copying.