AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

RickRussell_CA@beehaw.org · 6 months ago

Oh good, now when I search I’ll have to wade through the effluent of AI-produced pablum to find an actual human journalism product.

RickRussell_CA@beehaw.org · 9 months ago

Remember when Substack, the home of many excellent journalists, started to defend fascist and white supremacist content on their platform?

Oh, wait, that’s happening right now.

RickRussell_CA@beehaw.org · 1 year ago

It’s not “inexplicable”.

DIMM mounting brackets introduce significant limitations to maximum bandwidth. SOC RAM offers huge benefits in bandwidth improvement and latency reduction. Memory bandwidth on the M2 Max is 400GB/second, compared to a max of 64GB/sec for DDR5 DIMMs.

It may not be optimizing for the compute problem that you have, and that’s fine. But it’s definitely optimizing for compute problems that Apple believes to be high priority for its customers.

RickRussell_CA@beehaw.org · 1 year ago

Before reading the article, I just assumed that N. Korea had hacked a game with loot boxes.

RickRussell_CA@beehaw.org · 1 year ago

RiffTrax Friends, The Gizmoplex, and Twitch Turbo give me everything I need.

RickRussell_CA@beehaw.org · 1 year ago

it’s basically impossible to tell where parts of the model came from

AIs are deterministic.

Train the AI on data without the copyrighted work.
Train the same AI on data with the copyrighted work.
Ask the two instances the same question.
The difference is the contribution of the copyrighted work.

There may be larger questions of precisely how an AI produces one answer when trained with a copyrighted work, and another answer when not trained with the copyrighted work. But we know why the answers are different, and we can show precisely what contribution the copyrighted work makes to the response to any prompt, just by running the AI twice.

RickRussell_CA@beehaw.org · 1 year ago

Inkscape and LibreOffice Draw do all that, but they’re not bitmap editors.

RickRussell_CA@beehaw.org · 1 year ago

It’s literally not possible to be exposed to the history of art and not have everything you output be derivative in some manner.

I respectfully disagree. You may learn methods from prior art, but there are plenty of ways to insure that content is generated only from new information. If you mean to argue that a rendering of landscape that a human is actually looking at is meaningfully derivative of someone else’s art, then I think you need to make a more compelling argument than “it just is”.

RickRussell_CA@beehaw.org · 1 year ago

There is literally not one single piece of art that is not derived from prior art in the past thousand years.

This is false. Somebody who looks at a landscape, for example, and renders that scene in visual media is not deriving anything important from prior art. Taking a video of a cat is an original creation. This kind of creation happens every day.

Their output may seem similar to prior art, perhaps their methods were developed previously. But the inputs are original and clean. They’re not using some existing art as the sole inputs.

AI only uses existing art as sole inputs. This is a crucial distinction. I would have no problem at all with AI that worked exclusively from verified public domain/copyright not enforced and original inputs, although I don’t know if I’d consider the outputs themselves to be copyrightable (as that is a right attached to a human author).

Straight up copying someone else’s work directly

And that’s what the training set is. Verbatim copies, often including copyrighted works.

That’s ultimately the question that we’re faced with. If there is no useful output without the copyrighted inputs, how can the output be non-infringing? Copyright defines transformative work as the product of human creativity, so we have to make some decisions about AI.

RickRussell_CA@beehaw.org · 1 year ago

This issue is easily resolved. Create the AI that produces useful output without using copyrighted works, and we don’t have a problem.

If you take the copyrighted work out of the input training set, and the algorithm can no longer produce the output, then I’m confident saying that the output was derived from the inputs.

RickRussell_CA@beehaw.org · 1 year ago

a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work

What was fed into the algorithm? A human decided which major copyrighted elements of previously created original work would seed the algorithm. That’s how we know it’s derivative.

If I take somebody’s copyrighted artwork, and apply Photoshop filters that change the color of every single pixel, have I made an expressive creation that does not include copyrightable elements of a previously created original work? The courts have said “no”, and I think the burden is on AI proponents to show how they fed copyrighted work into an mechanical algorithm, and produced a new expressive creation free of copyrightable elements.

RickRussell_CA@beehaw.org · 1 year ago

No, I get it. I’m not really arguing that what separates humans from machines is “libertarian free will” or some such.

But we can properly argue that LLM output is derivative because we know it’s derivative, because we designed it. As humans, we have the privilege of recognizing transformative human creativity in our laws as a separate entity from derivative algorithmic output.

RickRussell_CA@beehaw.org · 1 year ago

And yet, we know that the work is mechanically derivative.

RickRussell_CA@beehaw.org · 1 year ago

The bias is in the AI design and the training dataset.

RickRussell_CA@beehaw.org · 1 year ago

When you figure out how to train an AI without bias, let us know.

RickRussell_CA@beehaw.org · 1 year ago

“Plagiarizing” 😜

RickRussell_CA@beehaw.org · edit-2 1 year ago

Two things:

Many of these LLMs – perhaps all of them – have been trained on datasets that include books that were absolutely NOT released into the public domain.
Ethically, we would ask any author who parrots the work of others to provide citations to original references. That rarely happens with AI language models, and if they do provide citations, they often do it wrong.

RickRussell_CA@beehaw.org · 1 year ago

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

RickRussell_CA@beehaw.org · 1 year ago

Yeah, just get an MP3 player that uses an SD card, and copy your MP3 files to the card.

The question is, where are your files? Are they already on your phone or iPad? If not, you have the challenge of ripping from a USB CD player to the iPad or Pixel. I have no idea what software can do that, but there are apps on the Google Play store that claim to be able to.

Sounds like a great opportunity to dig up an old laptop and use Linux, though. I’ve got a couple of USB DVD readers sitting in a drawer that I pull out for these jobs, they’ve worked fine for years.

RickRussell_CA@beehaw.org · 1 year ago

Emacs.

No really, it was like 1989 and I had to learn Unix systems for classes, and this white haired Emacs advocate convinced me to try it.

RickRussell_CA@beehaw.org · 1 year ago

Who said “FOSS computer”? The FOSS that saved the night was ffmpeg. That he also ran it on a Linux system is a nice little FOSS bonus, but it’s not the headline.

RickRussell_CA

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)

AI Lie: Machines Don’t Learn Like Humans (And Don’t Have the Right To)