Also, what if one implement proprietary software that is completely different from open source project they studied? They still may use knowledge they obtained when studying, e. g. by reusing algorithms, patterns or even code formatting. This is a common case for LLM coding assistants.
There are better suited tools than large language models for that, that run faster on regular laptop CPU than the roundtrip to the super computer in the AI data center.
So what is the case we are speaking about? “Hey LLM, write the OS kernel that is fully compatible with Linux, designed like Linux, uses the same algorithms as Linux and the same code style as Linux”?
Also, what if one implement proprietary software that is completely different from open source project they studied? They still may use knowledge they obtained when studying, e. g. by reusing algorithms, patterns or even code formatting. This is a common case for LLM coding assistants.
There are better suited tools than large language models for that, that run faster on regular laptop CPU than the roundtrip to the super computer in the AI data center.
It’s not the topic we discussed, right?
You listed a bunch of use cases for LLMs that aren’t plagiarism and they all seem to be better solved by different tools.
So what is the case we are speaking about? “Hey LLM, write the OS kernel that is fully compatible with Linux, designed like Linux, uses the same algorithms as Linux and the same code style as Linux”?
If you have Linux in the training data, the outcome if at all remotely useful would likely include plagiarism.
Are there similar cases in the wild?
There are cases in the wild of LLMs straight up pasting the GPL into files unprompted.