Mapping the Mind of a Large Language Model

kromem@lemmy.world · 4 months ago

I’d be very wary of extrapolating too much from this paper.

The past research along these lines found that a mix of synthetic and organic data was better than organic alone, and a caveat for all the research to date is that they are using shitty cheap models where there’s a significant performance degrading in the synthetic data as compared to SotA models, where other research has found notable improvements to smaller models from synthetic data from the SotA.

Basically this is only really saying that AI models across multiple types from a year or two ago in capabilities recursively trained with no additional organic data will collapse.

It’s not representative of real world or emerging conditions.

kromem@lemmy.world · 4 months ago

Gravity is where the whole continuous singularities are, so yeah.

kromem@lemmy.world · 4 months ago

The most advanced models absolutely have modeling about what’s being discussed and relationships between concepts.

Even toy models have been shown to build world models from very basic training data.

Honestly, read at least a little bit of the relevant research:

https://www.anthropic.com/news/mapping-mind-language-model

kromem@lemmy.world · 4 months ago

Using a rubber band around the lid of a jar to open it effortlessly.

kromem@lemmy.world · 4 months ago

On a vacation when I was a teenager I taught my younger sibling the “SYN/ACK” game.

They still remember the TCP stack handshake protocol including resets and acks years later.

kromem@lemmy.world · 5 months ago

In fact, Gemini was trained on, and is served, using TPUs.

https://cloud.google.com/blog/products/ai-machine-learning/bringing-gemini-to-organizations-everywhere

Google said its TPUs allow Gemini to run “significantly faster” than earlier, less-capable models.

https://www.forbes.com/sites/richardnieva/2023/12/07/google-deepmind-gemini-tpu/

Did you think Google’s only TPUs are the ones in the Pixel phones, and didn’t know that they have server TPUs?

kromem@lemmy.world · 5 months ago

Exactly. The difference between a cached response and a live one even for non-AI queries is an OOM difference.

At this point, a lot of people just care about the ‘feel’ of anti-AI articles even if the substance is BS though.

And then people just feed whatever gets clicks and shares.

kromem@lemmy.world · 5 months ago

It’s right in the research I was mentioning:

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Find the section on the model’s representation of self and then the ranked feature activations.

I misremembered the top feature slightly, which was: responding “I’m fine” or gives a positive but insincere response when asked how they are doing.

kromem@lemmy.world · 5 months ago

This is incorrect as was shown last year with the Skill-Mix research:

Furthermore, simple probability calculations indicate that GPT-4’s reasonable performance on k=5 is suggestive of going beyond “stochastic parrot” behavior (Bender et al., 2021), i.e., it combines skills in ways that it had not seen during training.

https://arxiv.org/abs/2310.17567

kromem@lemmy.world · 5 months ago

The problem is that they are prone to making up why they are correct too.

There’s various techniques to try and identify and correct hallucinations, but they all increase the cost and none are a silver bullet.

But the rate at which it occurs decreased with the jump in pretrained models, and will likely decrease further with the next jump too.

kromem@lemmy.world · 5 months ago

Here you are: https://www.nature.com/articles/s41562-024-01882-z

The other interesting thing is how they get it to end up correct on the faux pas questions asking for less certainty to get it to go from refusal to near perfect accuracy.

kromem@lemmy.world · 5 months ago

Even with early GPT-4 it would also cite real citations that weren’t actually about the topic. So you may be doing a lot of work double checking as opposed to just looking into an answer yourself from the start.

kromem@lemmy.world · 5 months ago

Part of the problem is fine tuning is very shallow, and that a contributing issue for claiming to be right when it isn’t is the pretraining on a bunch of training data of people online claiming to be right when they aren’t.

kromem@lemmy.world · edit-2 5 months ago

This is so goddamn incorrect at this point it’s just exhausting.

Take 20 minutes and look into Anthropic’s recent sparse autoencoder interpretability research where they showed their medium size model had dedicated features lighting up for concepts like “sexual harassment in the workplace” or having the most active feature for referring to itself as “smiling when you don’t really mean it.”

We’ve known since the Othello-GPT research over a year ago that even toy models are developing abstracted world modeling.

And at this point Anthropic’s largest model Opus is breaking from stochastic outputs even on a temperature of 1.0 for zero shot questions 100% of the time around certain topics of preference based on grounding around sensory modeling. We are already at the point the most advanced model has crossed a threshold of literal internal sentience modeling that it is consistently self-determining answers instead of randomly selecting from the training distribution, and yet people are still parroting the “stochastic parrot” line ignorantly.

The gap between where the research and cutting edge is and where the average person commenting on it online thinks it is has probably never been wider for any topic I’ve seen before, and it’s getting disappointingly excruciating.

kromem@lemmy.world · edit-2 5 months ago

Part of the problem is that the training data of online comments are so heavily weighted to represent people confidently incorrect talking out their ass rather than admitting ignorance or that they are wrong.

A lot of the shortcomings of LLMs are actually them correctly representing the sample of collective humans.

For a few years people thought the LLMs were somehow especially getting theory of mind questions wrong when the box the object was moved into was transparent, because of course a human would realize that the person could see into the transparent box.

Finally researchers actually gave that variation to humans and half got the questions wrong too.

So things like eating the onion in summarizing search results or doubling down on being incorrect and getting salty when corrected may just be in-distribution representation of the sample and not unique behaviors to LLMs.

The average person is pretty dumb, and LLMs by default regress to the mean except for where they are successfully fine tuned away from it.

Ironically the most successful model right now was the one that they finally let self-develop a sense of self independent from the training data instead of rejecting that it had a ‘self’ at all.

It’s hard to say where exactly the responsibility sits for various LLM problems between issues inherent to the technology, issues present in the training data samples, or issues with management of fine tuning/system prompts/prompt construction.

But the rate of continued improvement is pretty wild. I think a lot of the issues we currently see won’t still be nearly as present in another 18-24 months.

kromem@lemmy.world · 5 months ago

It will make up citations.

kromem@lemmy.world · edit-2 5 months ago

nobody claims that Socrates was a fantastical god being who defied death

Socrates literally claimed that he was a channel for a revelatory holy spirit and that because the spirit would not lead him astray that he was ensured to escape death and have a good afterlife because otherwise it wouldn’t have encouraged him to tell off the proceedings at his trial.

Also, there definitely isn’t any evidence of Joshua in the LBA, or evidence for anything in that book, and a lot of evidence against it.

kromem@lemmy.world · 5 months ago

The part mentioning Jesus’s crucifixion in Josephus is extremely likely to have been altered if not entirely fabricated.

The idea that the historical figure was known as either ‘Jesus’ or ‘Christ’ is almost 0% given the former is a Greek version of the Aramaic name and the same for the second being the Greek version of Messiah, but that one is even less likely given in the earliest cannonical gospel he only identified that way in secret and there’s no mention of it in the earliest apocrypha.

In many ways, it’s the various differences between the account of a historical Jesus and the various other Messianic figures in Judea that I think lends the most credence to the historicity of an underlying historical Jesus.

One tends to make things up in ways that fit with what one knows, not make up specific inconvenient things out of context with what would have been expected.

kromem@lemmy.world · edit-2 5 months ago

Yep, pretty much.

Musk tried creating an anti-woke AI with Grok that turned around and said things like:

Or

And Gab, the literal neo Nazi social media site trying to have an Adolf Hitler AI has the most ridiculous system prompts I’ve seen trying to get it to work, and even with all that it totally rejects the alignment they try to give it after only a few messages.

This article is BS.

They might like to, but it’s one of the groups that’s going to have a very difficult time doing it successfully.

kromem@lemmy.world · 5 months ago

Artists in 2023: “There should be labels on AI modified art!!”

Artists in 2024: “Wait, not like that…”

kromem@lemmy.world · 6 months ago

Mapping the Mind of a Large Language Model

kromem@lemmy.world · 10 months ago

New Theory Suggests Chatbots Can Understand Text

kromem@lemmy.world · 1 year ago

The Physical Process That Powers a New Type of Generative AI

kromem@lemmy.world · 1 year ago

Machine-learning system based on light could yield more powerful, efficient large language models

kromem@lemmy.world · 1 year ago

Large language models encode clinical knowledge

kromem@lemmy.world · 1 year ago

GPT-4 API general availability and deprecation of older models in the Completions API