Lots of people on Lemmy really dislike AI’s current implementations and use cases.
I’m trying to understand what people would want to be happening right now.
Destroy gen AI? Implement laws? Hoping all companies use it for altruistic purposes to help all of mankind?
Thanks for the discourse. Please keep it civil, but happy to be your punching bag.
If we’re going pie in the sky I would want to see any models built on work they didn’t obtain permission for to be shut down.
Failing that, any models built on stolen work should be released to the public for free.
If we’re going pie in the sky I would want to see any models built on work they didn’t obtain permission for to be shut down.
I’m going to ask the tough question: Why?
Search engines work because they can download and store everyone’s copyrighted works without permission. If you take away that ability, we’d all lose the ability to search the Internet.
Copyright law lets you download whatever TF you want. It isn’t until you distribute said copyrighted material that you violate copyright law.
Before generative AI, Google screwed around internally with all those copyrighted works in dozens of different ways. They never asked permission from any of those copyright holders.
Why is that OK but doing the same with generative AI is not? I mean, really think about it! I’m not being ridiculous here, this is a serious distinction.
If OpenAI did all the same downloading of copyrighted content as Google and screwed around with it internally to train AI then never released a service to the public would that be different?
If I’m an artist that makes paintings and someone pays me to copy someone else’s copyrighted work. That’s on me to make sure I don’t do that. It’s not really the problem of the person that hired me to do it unless they distribute the work.
However, if I use a copier to copy a book then start selling or giving away those copies that’s my problem: I would’ve violated copyright law. However, is it Xerox’s problem? Did they do anything wrong by making a device that can copy books?
If you believe that it’s not Xerox’s problem then you’re on the side of the AI companies. Because those companies that make LLMs available to the public aren’t actually distributing copyrighted works. They are, however, providing a tool that can do that (sort of). Just like a copier.
If you paid someone to study a million books and write a novel in the style of some other author you have not violated any law. The same is true if you hire an artist to copy another artist’s style. So why is it illegal if an AI does it? Why is it wrong?
My argument is that there’s absolutely nothing illegal about it. They’re clearly not distributing copyrighted works. Not intentionally, anyway. That’s on the user. If someone constructs a prompt with the intention of copying something as closely as possible… To me, that is no different than walking up to a copier with a book. You’re using a general-purpose tool specifically to do something that’s potentially illegal.
So the real question is this: Do we treat generative AI like a copier or do we treat it like an artist?
If you’re just angry that AI is taking people’s jobs say that! Don’t beat around the bush with nonsense arguments about using works without permission… Because that’s how search engines (and many other things) work. When it comes to using copyrighted works, not everything requires consent.
Like the other comments say, LLMs (the thing you’re calling AI) don’t think. They aren’t intelligent. If I steal other people’s work and copy pieces of it and distribute it as if I made it, that’s wrong. That’s all LLMs are doing. They aren’t “being inspired” or anything like that. That requires thought. They are copying data and creating outputs based on weights that tell it how and where to put copied material.
I think the largest issue is people hearing the term “AI” and taking it at face value. There’s no intelligence, only an algorithm. It’s a convoluted algorithm that is hard to tell what going on just by looking at it, but it is an algorithm. There are no thoughts, only weights that are trained on data to generate predictable outputs based on given inputs. If I write an algorithm that steals art and reorganizes into unique pieces, that’s still stealing their art.
For a current example, the stuff going on with Marathon is pretty universally agreed upon to be bad and wrong. However, you’re arguing if it was an LLM that copied the artist’s work into their product it would be fine. That doesn’t seem reasonable, does it?
My argument is that the LLM is just a tool. It’s up to the person that used that tool to check for copyright infringement. Not the maker of the tool.
Big company LLMs were trained on hundreds of millions of books. They’re using an algorithm that’s built on that training. To say that their output is somehow a derivative of hundreds of millions of works is true! However, how do you decide the amount you have to pay each author for that output? Because they don’t have to pay for the input; only the distribution matters.
My argument is that is far too diluted to matter. Far too many books were used to train it.
If you train an AI with Stephen King’s works and nothing else then yeah: Maybe you have a copyright argument to make when you distribute the output of that LLM. But even then, probably not because it’s not going to be that identical. It’ll just be similar. You can’t copyright a style.
Having said that, with the right prompt it would be easy to use that Stephen King LLM to violate his copyright. The point I’m making is that until someone actually does use such a prompt no copyright violation has occurred. Even then, until it is distributed publicly it really isn’t anything of consequence.
I run local models. The other day I was writing some code and needed to implement simplex noise, and LLMs are great for writing all the boilerplate stuff. I asked it to do it, and it did alright although I had to modify it to make it actually work because it hallucinated some stuff. I decided to look it up online, and it was practically an exact copy of this, down to identical comments and everything.
It is not too diluted to matter. You just don’t have the knowledge to recognize what it copies.
AI models produced from copyrighted training data should need a license from the copyright holder to train using their data. This means most of the wild west land grab that is going on will not be legal. In general I’m not a huge fan of the current state of copyright at all, but that would put it on an even business footing with everything else.
I’ve got no idea how to fix the screeds of slop that is polluting search of all kinds now. These sorts of problems ( along the lines of email spam ) seem to be absurdly hard to fix outside of walled gardens.
See, I’m troubled by that one because it sounds good on paper, but in practice that means that Google and Meta, who can certainly build licenses into their EULAs trivially, would become the only government-sanctioned entities who can train AI. Established corpos were actively lobbying for similar measures early on.
And of course good luck getting China to give a crap, which in that scenario would be a better outcome, maybe.
Like you, I think copyright is broken past all functionality at this point. I would very much welcome an entire reconceptualization of it to support not just specific AI regulation but regulation of big data, fair use and user generated content. We need a completely different framework.
I’d like for it to be forgotten, because it’s not AI.
Thank you.
It has to come from the C suite to be “AI”. Otherwise it’s just sparkling ML.
I want real, legally-binding regulation, that’s completely agnostic about the size of the company. OpenAI, for example, needs to be regulated with the same intensity as a much smaller company. And OpenAI should have no say in how they are regulated.
I want transparent and regular reporting on energy consumption by any AI company, including where they get their energy and how much they pay for it.
Before any model is released to the public, I want clear evidence that the LLM will tell me if it doesn’t know something, and will never hallucinate or make something up.
Every step of any deductive process needs to be citable and traceable.
… I want clear evidence that the LLM … will never hallucinate or make something up.
Nothing else you listed matters: That one reduces to “Ban all Generative AI”. Actually worse than that, it’s “Ban all machine learning models”.
If “they have to use good data and actually fact check what they say to people” kills “all machine leaning models” then it’s a death they deserve.
The fact is that you can do the above, it’s just much, much harder (you have to work with data from trusted sources), much slower (you have to actually validate that data), and way less profitable (your AI will be able to reply to way less questions) then pretending to be the “answer to everything machine.”
The way generative AI works means no matter how good the data it’s still gonna bullshit and lie, it won’t “know” if it knows something or not. It’s a chaotic process, no ML algorithm has ever produced 100% correct results.
I’m not anti AI, but I wish the people who are would describe what they are upset about a bit more eloquently, and decipherable. The environmental impact I completely agree with. Making every google search run a half cooked beta LLM isn’t the best use of the worlds resources. But every time someone gets on their soapbox in the comments it’s like they don’t even know the first thing about the math behind it. Like just figure out what you’re mad about before you start an argument. It comes across as childish to me
It feels like we’re being delivered the sort of stuff we’d consider flim-flam if a human did it, but lapping it up bevause the machine did it.
“Sure, boss, let me write this code (wrong) or outline this article (in a way that loses key meaning)!” If you hired a human who acted like that, we’d have them on an improvement plan in days and sacked in weeks.
So you dislike that the people selling LLMs are hyping up their product? They know they’re all dumb and hallucinate, their business model is enough people thinking it’s useful that someone pays them to host it. If the hype dies Sam Altman is back in a closet office at Microsoft, so he hypes it up.
I actually don’t use any LLMs, I haven’t found any smart ones. Text to image and image to image models are incredible though, and I understand how they work a lot more.
I expect the hype people to do hype, but I’m frustrated that the consumers are also being hypemen. So much of this stuff, especially at the corporate level, is FOMO rather than actually delivered value.
If it was any other expensive and likely vendor-lockin-inducing adventure, it would be behind years of careful study and down-to-the-dime estimates of cost and yield. But the same people who historically took 5 years to decide to replace an IBM Wheelwriter with a PC and a laser printer are rushing to throw AI at every problem up to and including the men’s toilet on the third floor being clogged.
Stop selling it a loss.
When each ugly picture costs $1.75, and every needless summary or expansion costs 59 cents, nobody’s going to want it.