Edit: If you have advice to make it clearer, I’m all ears. The “fractions of a penny” scene seems to perfectly capture the tech grift mentality, so I wanted to use it as the base. But I wanted to steer away from keeping it as “stealing”, because I don’t think copyright is a good basis for AI criticism. I think the fragmentation, noise, and distrust with which it infests our global collaboration infrastructure is the key part. But I really don’t know how to put that succinctly. We don’t really have many historical analogues for this, so there’s not a good shorthand.
Does clean rooming also work the other way round? Where open source models can reverse engineer proprietary binaries, upon which one can make “clean” open source copies?
Like, as far as legal basis? Yes, as I understand it — but I am not a lawyer.
But if you’re hoping to leverage an LLM… Part of the reason they’re so good at producing replacements for e.g. react is that the source code for react is in the training data, along with test suites and a ton of commentary related to the source code.
So you’d be at a big disadvantage. That’s on top of the basic legibility challenges of decompiled binaries.
I’m lost
Basically, generative AI is killing open source — between slop contributions overwhelming maintainers, and the increasing feasibility of “clean-rooming” open source software to remove any obligations that private companies might otherwise have towards projects they depend on, and models cleaving traffic away from the documentation sites that get people involved.
Edit: If you have advice to make it clearer, I’m all ears. The “fractions of a penny” scene seems to perfectly capture the tech grift mentality, so I wanted to use it as the base. But I wanted to steer away from keeping it as “stealing”, because I don’t think copyright is a good basis for AI criticism. I think the fragmentation, noise, and distrust with which it infests our global collaboration infrastructure is the key part. But I really don’t know how to put that succinctly. We don’t really have many historical analogues for this, so there’s not a good shorthand.
Does clean rooming also work the other way round? Where open source models can reverse engineer proprietary binaries, upon which one can make “clean” open source copies?
Like, as far as legal basis? Yes, as I understand it — but I am not a lawyer.
But if you’re hoping to leverage an LLM… Part of the reason they’re so good at producing replacements for e.g. react is that the source code for react is in the training data, along with test suites and a ton of commentary related to the source code.
So you’d be at a big disadvantage. That’s on top of the basic legibility challenges of decompiled binaries.
Sleep helped. Also, the FOSDEM link you sent. I was not aware of that was going on.