Now we are facing an unprecedented growth of AI as a whole. Do you think is time for FSF elaborate a new version of GPL to incorporate the new challenges of AI in software development to keep protecting users freedom?
I keep saying “no” to this sort of thing, for a variety of reasons.
- “You can use this code for anything you want as long as you don’t work in a field that I don’t like” is pretty much the opposite of the spirit of the GPL.
- The enormous companies slurping up all content available on the Internet do not care about copyright. The GPL already forbids adapting and redistributing code without licensing under the GPL, and they’re not doing that. So another clause that says “hey, if you’re training an AI, leave me out” is wasted text that nobody is going to read.
- Making “AI” an issue instead of “big corporate abuse” means that academics and hobbyists can’t legally train a language model on your code, even if they would otherwise comply with the license.
- The FSF has never cared about anything unless Stallman personally cared about it on his personal computer, and they’ve recently proven that he matters to them more than the community, so we probably shouldn’t ever expect a new GPL.
- The GPL has so many problems (because it’s been based on one person’s personal focuses) that they don’t care about or isolate in random silos (like the AGPL, as if the web is still a fringe thing) that AI barely seems relevant.
I mean, I get it. The language-model people are exhausting, and their disinterest in copyright law is unpleasant. But asking an organization that doesn’t care to add restrictions to a license that the companies don’t read isn’t going to solve the problem.
The problem of recent AI is about fair use of data, not about copyright. To solve the AI problem, we need laws to stop abuse of data rather than to stop copying of code.
Some portion of the “data” fed into these models is copyrighted, though. Github’s copilot is trained on code. Does it violate the GPL to train an AI model on all GPL source code published out there?
Too soon. The GPL is a license aligning prevalent copyright laws to some ideological goals. There are no prevalent copyright laws regarding AI yet, so there is nothing to base a copyright license on.
First step: introduce AI into copyright law (and pray The Mouse doesn’t introduce it first).
It might be time to start thinking about it, however it will depend on the consensus among the legal system on weather you need to provide attribution through AI.
There is already consensus, it just hasn’t been concluded explicitly yet.
There is no “AI” and there’s no “learning”, so there’s no new unbeaten path in law. like some would make you believe. LLMs are data processing software that take input data and output other data. In order to use the input data you have to conform to its licensing, and you can’t hide behind arguments like “I don’t know what the software is doing with the data” or “I can’t identify the input data in the output data anymore”.
LLM companies will eventually be found guilty of copyright infringement and they’ll settle and start observing licensing terms like everybody else. There are plenty of media companies with lots of money with a vested interest in copyright.
That’s not how copyrights work. They only care about copying or replicating that data. The hint is in the name
Copyright is not just about copying the data. It’s a name that stuck but it’s more accurately to call it “author rights”. The law awards the rights holder extensive rights, including deciding how the data is used.
And (as an aside) permission by omission doesn’t work as an excuse either, if the right to use the data in some way hasn’t been explicitly granted it most likely doesn’t apply.
No that’s not what copyrights are. The idea that they’re “author rights” has no basis in law
Why insist to argue this point when a simple visit to Wikipedia will show I’m right?
You are mistaken and don’t seen to fullt grasp what copyright is.
A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time.[1][2][3][4][5] The creative work may be in a literary, artistic, educational, or musical form.
Notice what is states besides copying? First paragraph on wikipedia, come on.
You are mistaken and don’t seen to fullt grasp what copyright is.
A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time.[1][2][3][4][5] The creative work may be in a literary, artistic, educational, or musical form.
I think if we want a GPLv4, it should not be made by the FSF.
Why is that, out of curiosity?
The FSF is a non-working organization which refuses to let go of its horrible founder. I hoped it would move on, it didn’t and refused to despite massive amounts of community backlash. I no longer believe they should have any role in representing the Free Software movement.
I really like Stallman, the man that made me think about the importance of free software. In my opinion he is essential for free software movement even with some “controversial” ideas. I like the way he defends his ideas, is something rare nowadays.
I mean, I think his ideas on free software are good generally but his behaviour and opinions on other topics are pretty fucking terrible. I don’t understand why people want to defend that part. The FSF can function without him and defend the ideas of Free Software.
I don’t want to defend all of his ideas, but I do want to point out that he’s most definitely neurodivergent, while having been used as a tackle target on several fronts by the same people claiming to defend people like him.
If you take a look at those “opinions” from a purely abstract and “internal musings” point of view, you realize that his main fault has been musing aloud while staying in the spotlight and repeating a PR stunt routine taken from an age when Bill Cosby was seen as a family role model.
Indeed, his ideas are often very controversial, he is a old man with old habits and I think he has some deficiency in the way he communicates with people that are contrary to his ideas and this fact makes everything even worse. I don’t know what will be with FSF after him for good and bad.
Some of his ideas are very harmful and he is an abuser. I don’t know what to tell you.
But ironically they are the “owner” of the license, anyone can’t modify it
The GPL is a license made by the FSF, not sure who else could make a new version other than them. Other entities make their own licenses, which might or not be compatible with the GPL.
Anyone can make a license and make it compatible with the GPL but yes, I forgot that the FSF doesn’t allow modifications of the GPL which is pretty fucking weird thinking more about it.
It makes sense, because it makes “GPL” act as a trademark and a guarantee that the license is what it’s supposed to be. And they get that without even paying for a trademark registration, which also makes it a brilliant abuse of copyright law (which, arguably, is what the whole license is about in the first place).
As in to keep it from being used to train AI? I think GPL and especially AGPL already cover that, but nobody cares and FSF can’t afford to litigate it.
GPL won’t work to prevent AI, unless the anti-AI lawsuits succeed. There’s a huge open question about the legal status of these data sets.
First of all: do these models contain the original text? I personally think they do (they’re like a lossy compression method for text, in a way) but it’s impossible to point at a weight and say “that’s the word printf”.
Second of all: are these transformations fair use or just derivative works? If they’re derivative works, AI companies will need to pay up, if they’re considered fair use, there’s no copyright protection for these cases.
Lastly: for many AI companies, the models themselves aren’t actually shared. The question then becomes if the text generated by a maybe-derivative work is also a derivative work, or if it’s a separate work. If the output of AI models is a separate work (which wouldn’t be copyrightable as they’re automatically generated, fun!), GPL still won’t have any effect because it only affects the spread of code, not the particular uses.
Then there’s the fact that scientific research is pretty much excluded from copyright obligations all together. Scientists sharing data sets and models is often completely acceptable despite existing copyright rules. The line becomes blurrier when scientific research gets turned into a for-profit company.
Right now, things can very much go either way. There are some high profile lawsuits against Stable Diffusion which come down to “how much does copyright apply to AI models and datasets?”. It’ll probably he a few years before we have an answer, or maybe they’ll end up like the DMCA lawsuits, settled for a boatload of money because the copyright industry is often better off without clear guidance on what is or isn’t fair use.
There are also unfortunate implications. Since AI possibly doesn’t need to are about copyright, does GPLv4 prevent you from uploading source code to companies like Github or Gitlab? How much effort do websites need to put into blocking scrapers from downloading the source code and violating the license? Are online, unauthenticated git repositories even allowed at that point? What about locally trained AI, can you use a local version or Copilot to help develop open source projects?
If it turns out copyright doesn’t really apply to AI, any license you add immediately becomes irrelevant, defeating the point. I hope it doesn’t come to that, but at this point it can go either way.
There’s also the fact that GPL is ultimately about using copyright to reduce the harm that copyright can cause to people’s rights.
If we look through the cases that could exist with AI law:
- Training can legally use copyrighted materials without a licence, but models cannot be copyrighted: This probably is a net win for software freedom - people can train models on commercial software even and generate F/L/OSS software quickly. It would undermine AGPL style protection though - companies could benefit from F/L/OSS and use means other than copyright to undermine rights, but there would be nothing a licence could do to change that.
- Training can legally use copyrighted materials without a licence, models can be copyrighted: This would allow companies to benefit heavily from F/L/OSS, but not share back. However, it would also allow F/L/OSS to benefit from commercial software where the source is available.
- Training cannot legally use copyrighted materials without complying with licence, models cannot be copyrighted (or models can be copyrighted, outputs can’t be copyrighted): This is probably the worst for F/L/OSS because proprietary software wouldn’t be able to be used for training, but proprietary software could use a model trained on F/L/OSS by someone else.
- Training cannot legally use copyrighted materials without complying with licence, models can be copyrighted, outputs can be copyrighted: In this case, GPLv2 and GPLv3 probably make the model and its outputs a derivative work, so it is more or less status quo.
Richard Stallman talked about this topic there: https://framatube.org/w/1DbsMfwygx7rTjdBR4DPXp
Can’t find timestamp tho.
GPLv3 already takes all of that. Programs that train AI have normal licencing applied. Programs that was modified by AI must be under GPL too. The neural network itself if not a program, it’s a format and is always modifiable anyway as there is no source code. You can take any neural network and train it futher without data it was trained on before.
To license what? Code or text? I don’t think either would have enough impact nor adoption.
To license what? Code or text? I don’t think both aren’t going to have enough impact and adoption.