Now we are facing an unprecedented growth of AI as a whole. Do you think is time for FSF elaborate a new version of GPL to incorporate the new challenges of AI in software development to keep protecting users freedom?
Now we are facing an unprecedented growth of AI as a whole. Do you think is time for FSF elaborate a new version of GPL to incorporate the new challenges of AI in software development to keep protecting users freedom?
GPL won’t work to prevent AI, unless the anti-AI lawsuits succeed. There’s a huge open question about the legal status of these data sets.
First of all: do these models contain the original text? I personally think they do (they’re like a lossy compression method for text, in a way) but it’s impossible to point at a weight and say “that’s the word printf”.
Second of all: are these transformations fair use or just derivative works? If they’re derivative works, AI companies will need to pay up, if they’re considered fair use, there’s no copyright protection for these cases.
Lastly: for many AI companies, the models themselves aren’t actually shared. The question then becomes if the text generated by a maybe-derivative work is also a derivative work, or if it’s a separate work. If the output of AI models is a separate work (which wouldn’t be copyrightable as they’re automatically generated, fun!), GPL still won’t have any effect because it only affects the spread of code, not the particular uses.
Then there’s the fact that scientific research is pretty much excluded from copyright obligations all together. Scientists sharing data sets and models is often completely acceptable despite existing copyright rules. The line becomes blurrier when scientific research gets turned into a for-profit company.
Right now, things can very much go either way. There are some high profile lawsuits against Stable Diffusion which come down to “how much does copyright apply to AI models and datasets?”. It’ll probably he a few years before we have an answer, or maybe they’ll end up like the DMCA lawsuits, settled for a boatload of money because the copyright industry is often better off without clear guidance on what is or isn’t fair use.
There are also unfortunate implications. Since AI possibly doesn’t need to are about copyright, does GPLv4 prevent you from uploading source code to companies like Github or Gitlab? How much effort do websites need to put into blocking scrapers from downloading the source code and violating the license? Are online, unauthenticated git repositories even allowed at that point? What about locally trained AI, can you use a local version or Copilot to help develop open source projects?
If it turns out copyright doesn’t really apply to AI, any license you add immediately becomes irrelevant, defeating the point. I hope it doesn’t come to that, but at this point it can go either way.
There’s also the fact that GPL is ultimately about using copyright to reduce the harm that copyright can cause to people’s rights.
If we look through the cases that could exist with AI law: