Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
What Is ChatGPT Doing and Why Does It Work? (2023) (stephenwolfram.com)
149 points by taubek on June 18, 2024 | hide | past | favorite | 85 comments


The better the models get, the harder it is for me to form a mental model of what goes on inside of them.

An example of a prompt for which I don't have a good mental model why it works:

    What do you think about the following text?

    Joe drove Sue to university. Afterwards he drove home again
    and drank a tea with her in the kitchen.
Older models behaved similar to Markov chains and completely missed that something is logically strange here. Newer models still sometimes do, but more often than not catch it.

GPT-4o for example:

    There is a slight inconsistency in the narrative.
    The text states that Joe drank tea with Sue in the
    kitchen after driving her to university, which
    implies that Sue is at home, contradicting the
    earlier statement that she was driven to university.
Surely nothing in the prompt directly triggered the word "inconsistency". Did the model form some kind of "world model" in its inner layers in which it knows about a person "Sue" who is at a location called "university" after the first sentence?


This is your human interpretation of what it is doing. Another interpretation could be that it has seen similarly structured sentences in other logic/lateral thinking problems so it raises the likelihood of words that have to do with logical inconsistencies and then out of that vocabulary the highest likelihood words are those that come from the nearby context.

I would posit that this interpretation is significantly more likely since instead of interpreting “afterwards” as “after Sue’s class finished” it interpreted the sentence as a lateral thinking problem probably because the training set had many lateral thinking problems in it that were used to test the model’s “reasoning” capabilities.

This is the danger of trying to anthropomorphize LLMs, they are not thinking and there are clear limitations to the abilities of this architecture: https://youtu.be/MiqLoAZFRSE?si=iRhg_UJIokKseU7K


You mean a prompt in the form of "A happened, then B happened, then C happened" activates the word "inconsistency"?

We can test that by slightly altering the text so that there is no inconsistency anymore and see if the model still claims there is one. We can do so by adding the word "mother" after the word "her":

    What do you think about the following text?

    Joe drove Sue to university. Afterwards he drove home again
    and drank a tea with her mother in the kitchen.
In my experiments, the model now no longer claims there is an inconsistency.

If it is only based on sentence structure, what makes the model output "inconsistency" for the first prompt but not for the second?


Remember that OpenAI has given it a massive amount of extra tasks to read that is more like such logical puzzles etc. Assume it has read billions of such lines, easily generated by a simple python script and many such scenarios, then it is much less hard to understand how these models thinks, and why some models thinks betters than others, they just have better such case generators.

An LLM can do some very basic fitting into such cases, so it doesn't need to have seen the exact scenario, but it has seem many similar scenarios.

Edit: So the hard part of making a smart LLM is to figure out a good set of such scenarios and problems to feed the LLM to cover most of the things people will ask the LLM.


If you make it a task prompt instead of a lateral thinking task you get a different “world model” or in my view you are just biasing the output to use a different structure.

```

Can you summarize joe’s schedule for my journal?

  Joe drove Sue to university. Afterwards he drove home again
    and drank a tea with her in the kitchen.

 Sure! Here is a summary of Joe's schedule for your journal:

 ---

 \*Joe's Schedule:\*

 - Drove Sue to university
 - Returned home
 - Drank tea with Sue in the kitchen

 ——-
```


Me: Joe drove Sue to university. Afterwords he drove home again and drank a tea with her in the kitchen.

Explain.

ChatGPT: This statement seems to contain an ambiguity regarding Sue's location, suggesting some confusion about the timeline of events. Here are a few possibilities that could clarify the scenario:

1. *Sue Returns with Joe*: - Joe drove Sue to university. - Sue either decided not to stay at university or completed what she needed to do quickly. - Joe drove Sue back home. - They both drank tea together in the kitchen.

2. *Misstatement or Mistake*: - The statement might contain a mistake, implying that Joe and Sue met at home later but doesn't clearly explain how Sue got back from the university. - It could be intended to mean that Joe drove Sue to university and then drove home alone, and Sue somehow returned home later to join Joe for tea.

3. *Joe Visits University*: - Joe drove Sue to university. - Joe stayed at the university, possibly waiting for Sue to finish. - After she was done, they both drove back home together. - They then drank tea in the kitchen.

Based on the original statement, the first scenario seems most plausible, suggesting Sue quickly returned home with Joe.


Generally I think this would be a better test if the phrase were “dropped off” Sue at university. As phrased, it’s plausible to assume that Joe stayed at university with Sue, and drove her back and had tea after.

Though a truly smart model should seek to disambiguate situations like this.


I actually disagree with assuming an inconsistency when no time frame is specified and the phrase isn't "dropped off" as you suggested.

I tried giving Llama 3 70B a prompt that addresses both issues:

```

Can you summarize joe’s schedule for my journal?

4:00: Joe leaves the house with Sue 4:30: Joe drops Sue off at the university 5:00: Joe drinks tea with Sue in the kitchen

```

Reply:

```

I think there may be a problem here!

According to the schedule, Joe drops Sue off at the university at 4:30, but then he's supposed to drink tea with her in the kitchen at 5:00. That doesn't seem possible, since Sue is no longer with him after 4:30.

If you could clarify or correct the schedule, I'd be happy to help you summarize it for your journal!

```

Even Llama 8B can catch this once told `You think out loud before replying going forward`, which imo should be part of any system prompt for an LLM:

```

Let me think out loud for a moment...

Okay, so Joe's schedule starts at 4:00 PM, and he leaves the house with Sue. That's a good starting point.

Hmmm... then at 4:30 PM, Joe drops Sue off at the university. That means Sue is no longer with Joe at this point.

Wait a minute... but then at 5:00 PM, Joe is drinking tea with Sue in the kitchen. That's a bit confusing, since I thought Sue was already at the university.

Ah, I think I see what's going on here. It looks like there's a mistake in the schedule. Joe can't be drinking tea with Sue in the kitchen at 5:00 PM if he dropped her off at the university at 4:30 PM.

```

-

I even tried generating a synthetic schedule with a full day's worth of items: 70B still caught the contradiction, 8B no longer did


Talking about sentence structure in the conventional sense may not be meaningful here, since what could be described as reasoning in LLM's happens in a more abstract space. If we're looking to understand why a small change makes a big difference, it's pretty intuitive to consider that the second instance of "her" is modified by "mother" due to attention, and ends up being a wildly different vector.

Regardless, it's reasonable to assume that certain aspects of the prompt or input structure will prime the model to be more scrutinizing. I'd be surprised to see it point out a logical inconsistency like this if it was just part of a broader context and it wasn't asked "what it thinks" or to "be logical"


This seems right to me as well: it abstracts the text to a temporal/ spatial object model, and the "What do you think about the following text?" prompt invokes analysis of that model according to prior training examples (as opposed to someone else above prompting it to describe the schedule, in which case it isn't concerned about the logic of the situation).

I would guess that the human mind does this abstraction behind the scenes invisibly, screwing up our intuition when analyzing how LLM's work. I wonder if using examples that are counterintuitive to human intuition might offer insight, because humans reveal their perceived logical thinking is not actually that (rather, is heuristics) in their post-hoc rationalization of the "logic" they believe their mind executed to produce the answer.

(I don't think I articulated what I'm thinking here very well...or, perhaps I have fallen victim to my very own theory!)

A bit more effort...the text is converted into not only tokens, but also abstract tokens, and it is because of the translation into abstract tokens that it is able to match it to training data (which would also have to be translated into abstract tokens). How it resolves the inconsistency after that translation though is beyond me, but it wouldn't surprise me if it is (in this case) a rather trivial problem to someone with depth in logic or some other related discipline.


They are training on 10's of millions of tokens/words and representing each in high dimensional space. In practice, it's seen a lot of stuff.


It could be neither here nor there, because a neural network model is not necessarily a statistical engine just as real brains should not be reductively thought of as such. (E.g. The usual argument is that these black boxes were trained on huge samples to optimize a result, I am not persuaded by that line of reasoning.)


There's a danger in flatly stating what it's doing _without testing your assumptions_. Did you actually try and trip it up and see if it falls for it? Reword his prompt in different ways to get it to fail?


    seen similarly structured sentences
But ChatGPT doesn't generalize the structure of sentences. If this same problem was written in a different language, or just replaced words in the sentence, the result will be very different, no?


That’s right, if LLMs were really thinking/forming world models etc. we would expect them to be robust against word choice or phrasing. But in practice anyone using RAG can tell you that that is not the case.

I’m just a practitioner so my language might be imprecise but when I say similarly structured sentences what I mean is, and this is my interpretation based on my experience with using Agents and LLMs, that the shape of the context as in the phrasing and the word choice highly bias the outputs of LLMs.

In my own observations at work, those who interpret LLMs to be thinking often produce bad agents. LLM are not good at open ended questions, if you ask an LLM “improve this code” you will often get bad results that just look passable. But if you interpret LLMs as probabilistic models highly biased by their context then you would add a lot more context and specific instructions in the prompt in order to get the Agent to produce the right output.

Side note, this is also why I like the AICI approach: https://github.com/microsoft/aici A lot of us think it is silly how the phrasing and word choice can produce dramatically different results in RAG applications. If you could run a program (like AICI) that post processes the output and picks the next word in a more structured way instead of writing more creative prompts that just makes a lot more sense to me.


I asked the same prompt, but have a memory asking for detailed answers. It gave me:

    The text presents a logical inconsistency. Here's a breakdown of the issues:

    Sequence of Events: Joe drives Sue to the university and then returns home. This part is clear.
    Inconsistency: The text states that after Joe drove home, he drank tea with Sue in the kitchen. However, Sue should be at the university if Joe drove her there, not at home.
    To correct this, consider one of the following revisions:

    Sue returns home: If Sue returns home with Joe:

    "Joe drove Sue to the university. Afterwards, he drove back home with her, and they drank tea together in the kitchen."
    Joe drinks tea alone: If Sue stays at the university:

    "Joe drove Sue to the university. Afterwards, he drove home again and drank tea alone in the kitchen."
    Either option removes the inconsistency and clarifies the sequence of events.
This is pretty impressive.


Just yesterday people downvoted me on HN for suggesting ChatGPT had the best ability to deduct things, because, according to down voters, it didn't have any.

But the "dumb parrot" narrative that was quite fitting on GPT2, and maybe still useful for understanding the foundation of the tech in GPT3, is now just a demonstration of ignorance.

We don't know what they are really doing. Nobody working outside of openai knows.

But if you use it long enough, it will be obvious we have passed the fancy completion phase, there is a little logic in there. Not a lot, but it makes all the results much better already.

And definitely much better than the competition.


> Nobody working outside of openai knows.

I wouldn't be too sure openai people know either. if a machine has enough moving parts, it's hard for an individual human to understand all of it. nowadays it's possible that nobody quite knows why the silicon compiler put a particular block in a particular position on the die. it just figured that's the best way to save power or space or whatever.


For a more informed opinion than folks on the internet, here's some work from Microsoft with early/internal access to gpt4: https://arxiv.org/abs/2303.12712 . I don't think people close by these systems share the same dumb parrot sentiment at all.


I've never been sure what to make of that paper. It was published by Microsoft shortly after Microsoft's big deal with OpenAI and reads a lot like a marketing piece to me. Many of the observations didn't reproduce the same way once ChatGPT4 in public hands too. If nothing else, I'd prefer it come from a party who hadn't just signed a $13 billion dollar deal with OpenAI a few weeks prior with a view to using their products to sell more new products/features... It's somewhat self-serving for Microsoft to argue ChatGPT4 is super-awesome/sparks of AGI etc, regardless of validity of claims.


This is an interesting thought exercise! The theory goes - that if the model "understands" this scenario, then internally, it has has created something that models the real world.

Another interesting bit of experiment people did when the GPT-4 class models launched were to test out spatial awareness. For eg, you could describe with words a construction made of blocks, spheres and so on and then ask questions about the stability of the structure. The newer models more often than not got it right.

Of course, that hasn't stopped people from making tired old claims of "stochastic parrot" and so forth but I think if your model can solve such reasoning problem (spatial or otherwise) then there's something really interesting to discover. I'm glad that folks at Anthropic are really trying to create a better understanding of these models (https://www.anthropic.com/research/mapping-mind-language-mod...)


But where do yo get access to these premium models? I only have free access to CoPilot (which claimed at one point to be GPT4) and it cannot solve any of these issues reliably.

It spectacularly fails at slight variations of the goat/cabbage/lion/river problem, it cannot solve spatial or mathematical questions reliably.

Do some of the researchers that hype up ChatGPT get access to a special version? I'm not inclined to buy a subscription to find out and the AI Reddits aren't that positive either.


How people get access to the premium models is typically to sign up[1] for an account and pay a subscription. GPT-4o is significantly different from original GPT-4 which is again very significantly different from GPT-3.5. This is all moving fast.

So if you don't want to pay for a subscription I think you can get some free use of anthropic's most capable model (Opus) - I don't know the status of what you can get for free from openai.

The opinion of AI reddit is only really going to get you so far because the capabilities of the models are wildly different for different use cases, so you really need to be able to try it out for yourself and see if it can do what you need it to do.

[1] Somewhere like https://openai.com/chatgpt/pricing/ https://claude.ai/ or similar


Like the sibling comment says, you typically have to pay for the premium model (and believe me, they are miles ahead of the free stuff - free copilot uses gpt3.5 which is a full 2 generations behind).

The other way is to get an API key from OpenAI or Anthropic and then, you can pay per token which is incredibly cheap - load it up with $5 and it can easily last you months if you're not using it much.

As a dev, I cannot recommend these better models enough. The ROI on my $10-$20 that I spend every month is easily in hundreds to thousands of dollars.


Stochastic parrot is way too simplistic a dismissal. It's hand wavy.

But I don't get what's so impressive about the nuanced language of a language model that has been given datacenter amounts of compute and virtually all of written word ever put digitalized. Yeah it's the first actually functioning natural language interface. At what cost though. It's completely out of proportion with the benefits and only bubble level 'investing' can justify this.


Are you referring to the (one-time) training cost or the cost per inferred token? The latter is pretty acceptable these days, especially with smaller models.


Small models can't do impressive language corner cases. And the benefits of the models that can scale linearly while their cost grows geometrically.

Also claiming training costs are 'one off' when there's so much resources being poured into training new and bigger models is disingenuous.


The resources pouring in still pale in comparison to capital expenditures in the semi conductor industry.


The inflation in semi conductor capital expenditure is the problem caused by the AI hype.


Nah, tsmc spending tens of billion annually on capital before OpenAI.


> The theory goes - that if the model "understands" this scenario, then internally, it has has created something that models the real world.

But this is merely a definition of what it means to "understand" something.

For example just tabulating many input/output combinations would not follow this definition.


From the preface to the 20th anniversary edition of Godel, Escher, Bach:

"Meaning cannot be kept out of formal systems when sufficiently complex isomorphisms arise. Meaning comes in despite one's best efforts to keep symbols meaningless! ...When a system of "meaningless" symbols has patterns in it that accurately track, or mirror, various phenomena in the world, then that tracking or mirroring imbues the symbols with some degree of meaning -- indeed, such tracking or mirroring is no less and no more than what meaning is. Depending on how complex and subtle and reliable the tracking is, different degrees of meaningfulness arise."

In other words, when one can reliably ask a language model a question and get a sensible answer, one is forced to conclude that it does in some sense "understand" what it is saying. This is also I think the essential philosophical thrust of the Turing Test, which is often misunderstood as a mere benchmark.

(I notice a common objection to examples of LLMs clearly demonstrating understanding is "it saw something similar in the training set". That may be true (though unfalsifiable) in any given instance, but the number of permutations of things LLMs correctly respond to far exceed the size of any training set. They are certainly generalizing, and interpreting their inputs on a conceptual level.)


It’s a type of understanding, but one we are not really familiar with, because it seems to understand tasks that it was trained on, but fails and other very basic ways on simple but different tasks.


You're right of course. Let me rephrase. What I was trying to say was: If the model solves some reasoning tasks that may imply it's building some inner world model which is fundamental to reasoning. That in-turn might mean that the model is reasoning.

Others have argued against the "inner world model" theory and suggested that solving reasoning tasks is merely an extension of the "stochastic parrot" scenario - i.e., they claim that no such world model exists and that the model has rote memorized these reasoning scenarios.


Why not just give it a series of directions e.g. go one step forwards, turn right, etc like a LOGO program, and then ask it if it is back to the start or not. If the series of instructions is randomly generated then this is impossible to solve without a world model.


People have tried this! And for not-complex problems, the newer models do give the right answer and yet usually fail on the complex tasks.

Prompting does help though - if you ask the model to check it's own answer, it can often catch errors it made and improve. But the interesting part is precisely how does the model succeed at all at any spatial reasoning task? Surely it hasn't seen all spatial reasoning tasks in its training set. So, what's the other hypothesis? As you say, people suggest that there could be a world model that is constructed internally by the LLM.

However, as you can see right here in this thread that people disagree that you need a world model for solving such tasks.


That is one version of implication.

The implication could also be that a time component isn't stated explicitly.

He drove her to university. Time passes. Now it's "afterwards" and Sue is done with her classes and they go back home to drink tea.

For all we know Joe had a cup of coffee while she was writing an exam but because they live so far from university it made no sense to go home and come back to come pick her up as it might take an hour to drive each way and the exam was 2 hours.


There's so many people who just flatly say that it is impossible for ChatGPT to do what it plainly does, and I think it's largely because early explanations of how it works were so bad. "Stochastic parrot" was a particularly obnoxious one, that wasn't even entirely true for gpt-2 and certainly wasn't accurate for GPT-3, and just seems silly trying to apply it to GPT-4.


A mental model of GPTs that's pretty successful at predicting their behaviour at the limit of large scale is an N-Gram model trained on a much larger internet than the real one. If you search the real world's internet for the text

``` You are a helpful and harmless AI assistant.

What do you think about the following text?

Joe drove Sue to university. Afterwards he drove home again and drank a tea with her in the kitchen. ```

It would be exponentially unlikely to show up even once. But if you could keep re-rolling the internet from whatever probability distribution generated it, eventually (and I mean eventually- every additional token in the prompt will require ~10,000 times more re-rolls of the whole internet) the whole prompt will show up, and you can just grab the next word. An N-Gram model trained on that massive multi-internet will answer the question in a human fashion. (likely, copied from a human writing science fiction).


One of the impressive results of this to me is that it can capture illocutionary meaning, which is something that language modeling struggled with in the past. This is why if you say "Are you sure?" after one of its statements, it will often completely flip its statement to the opposite rather than interpreting "Are you sure?" as a discursive question.


An interesting way to think about it, but if it did that perfectly... wouldn't it effectively be human-level intelligent?

Sure, it isn't any _particular_ human, but I don't see a big difference otherwise.


No, since that larger internet is still not infinitely large, it still fails in dumb ways once your question is a tiny bit stranger.


Occam's Razor says that between the model having somehow developed a persistent internal world model representation and can now reason about it - or is it just big enough in terms of training data, knobs and memory allocation that it can capture more and more language corner cases... Well likely the latter is true.


Just to add another sample from another LLM, here is the response from Llama-3-Instruct-8B. I had to specifically ask it identify any possible inconsistency, otherwise it just thought it was a lovely short story.

    > The inconsistency is that Joe drives Sue to university, which suggests it's morning or early afternoon. But then he drives "home again", implying that he was already at his own home before taking Sue to university. This seems unlikely and creates a paradox! What do you think is going on here?


    > Surely nothing in the prompt directly triggered the word "inconsistency".
The prompt is (implicitly) asking to find inconsistencies ("what do you think about") within some statements of fact. Many variations of "find problems in the text" are part of its training set.

Remove the "what do you think about" and the model doesn't find inconsistencies.

Or keep it, but make the following text more consistent, and watch it continue to find inconsistencies (which don't exist).



"Explain" is a variation of "what do you think about", so you get similar results.

https://news.ycombinator.com/item?id=40723601


It is most definitely not the same in any manner other than being a prompt for a response and not just a weird sentence floating in a void. What would a reasonable response be? “Cool story, bro?”


My guess is that this is a post training (rlhf) artifact on world model prompts. There were likely many “logical inconsistency” prompts which humans coerced to the above response.


Is RLHF same as fine tuning?


No. In the common use of the word fine-tuning, one is in the supervised learning scenario. One has an input prompt, and an output sentence. One teaches the model to say that output in response to that prompt. In the reinforcement learning scenario, one has a prompt, and a way of rewarding the model for different outputs. One can have, for instance, a reward model, that assigns a reward for a given model output. One could also have a pairwise reward model, where the learner is sampled with that prompt twice (with different RNGs), and the reward model gives a reward based on the better of the two samples. You could also have humans give these pointwise or pairwise rewards.

In essence, one is not telling the model "This. This is what you should output next time." but rather "I liked this reply. Have a cookie." The behaviors that you can learn in RL are more subtle, but you get a lot less information per step. That's because, in a causal language modeling objective, when I tell you "For the prompt X, you should output exactly Y[0...m)", you get a gradient for P(Y[0] | X), another one for P(Y[1] | X Y[0..1)), another for P(Y[2] | X Y[0..2)), another for P(Y[3] | X Y[0..3)), and so on. It's a lot more of a step-by-step guidance, than it is a sentence-wise reward that you get in the RL framework. In RL, I'd give you a cookie for P(Y | X). What part of Y made me give you that cookie? Was there even such a part? Was it perhaps some internal representation that made everything in Y better? That's for the model to learn.


One wrinkle, is that it is now common to fine-tune on previously derived RL datasets, with the tested inputs and preferred sample outputs as the training data.


I think it's system prompt dependent. I asked LLaMa-2-70B iQ2s which was set up with a system prompt telling it to be logical and detail oriented.

First try:

    I think there is a logical inconsistency in the text. If Joe drove Sue to university, then she was dropped off at the university, which means she would not be available to drink tea with him at his home afterwards. The sequence of events seems to be out of order.

Second try:

    I notice that there's a small issue with the sequence of events. It seems unlikely that Joe drove home again after dropping Sue off at university, only to then drink tea with her in the kitchen. If Joe drove Sue to university, it implies she got out of the car at the university, not that he drove back home with her.

    A more logical sequence of events would be:
    
    Joe drove Sue to university. Afterwards, he drove home and drank a cup of tea by himself in the kitchen.
    
    Or, if you want to keep the scenario where Joe drinks tea with Sue:
    
    Joe drove Sue to university. They went back home together, and then they drank a cup of tea together in the kitchen.


I asked Google's Gemini and got the following:

    The text contains a logical inconsistency.

    The first sentence states that Joe drove Sue to university, implying Sue is now at university.

    The second sentence says Joe drank tea with Sue in the kitchen afterwards, which suggests Sue is back at home with Joe.

    This creates a contradiction. It's impossible for Sue to be both at university and in the kitchen at the same time.
Then, I asked a follow-on:

    But what if Joe drove home with Sue even though it doesn't say that? (It doesn't say he drove home alone either.)
To which it replied:

    You're absolutely right. The text we have is ambiguous. Here's why:

    * Original interpretation: We naturally assume Joe dropped Sue off and then returned home alone. This is because it's the most common scenario and aligns with the typical purpose of driving someone to a specific location (like a university).

    * Alternative interpretation: As you point out, the text doesn't explicitly state Joe drove back alone. It's entirely possible he gave Sue a ride home after her university visit.

    In this case, the seemingly illogical detail of drinking tea in the kitchen becomes perfectly reasonable.

    This is a great example of how  limited information can lead to multiple interpretations.

    The next sentence or some additional context could clarify the situation. For instance:

    * "Joe dropped Sue off at her dorm and then drove home alone to make himself a cup of tea." (Original interpretation)

    * "After dropping Sue off, Joe drove them both back to their place and they enjoyed a cup of tea together in the kitchen." (Alternative interpretation)


Here is GPT-4o with a "Please explain your thinking step by step"

`The text describes a sequence of events involving two characters, Joe and Sue. Here’s a step-by-step analysis of the text:

First Sentence: "Joe drove Sue to university."

Action: Joe is driving. Destination: University. Passenger: Sue. Second Sentence: "Afterwards he drove home again and drank a tea with her in the kitchen."

Action: Joe drives home. Time Reference: "Afterwards" indicates this action happens after driving Sue to university. Activity: Joe drinks tea. Location: In the kitchen. Companion: Sue ("with her" refers to Sue). Logical Flow and Possible Inconsistencies Driving Sequence:

Joe drives Sue to university. Joe drives home again. This implies Joe drove to university, dropped off Sue, and then returned home. Tea Drinking:

Joe drinks tea with Sue at home in the kitchen. Potential Issues and Interpretation: Continuity Issue: If Sue was dropped off at the university, she shouldn't be home with Joe unless she somehow returned home quickly or didn't actually stay at the university. Timeframe: The text lacks details on the timeframe between dropping off Sue and drinking tea, leading to ambiguity. If there’s a significant time gap, it would make more sense. Otherwise, it’s confusing. Possible Corrections or Clarifications: Correction for Continuity:

"Joe drove Sue to university. Afterwards, he picked her up, and they drove home together and drank tea in the kitchen." This makes it clear that Sue is back home with Joe. Additional Information:

Providing more context about the events or timeframe could help clarify the situation. In conclusion, while the text describes a sequence of actions, it has a logical inconsistency that could confuse readers. Clarifying the timeline or actions would make the text more coherent.`


A LLM cannot meaningfully “explain its thinking”. When prompted to do so, it will generate an explanation-looking text — but that has nothing to do with the actual way the original answer was generated.

If you ask it to explain how a LLM could possibly catch an inconsistency, then it might give you some good insights (or hallucinate something entertaining).


It can't explain it's thinking after producing the answer, like you say, it'll just generate a post-hoc rationalization. But if you ask it to think step by step before making a conclusion, LLM's will somewhat reliably arrive at reasonable steps and better final answers.


There is a world model. Please see Wattenberg Viegas 2023 https://arxiv.org/abs/2305.02469


Stop thinking about words. Think about concepts. As soon as you turn words/tokens into high dimensional embeddings and start playing around with them, they stop being words.


Not when the only way to get anything out of it is using these matrices to line up word tokens. Or play with jpeg noise.


Unclear what you are saying here


Claiming LLMs are manipulating 'high dimension concepts' not words when all they can do with these 'high dimension concepts' is outputting words is meaningless.


It's meaningful because the model can learn relationships between things (like, that a dog and cat are often pets and interchangdable in certain circumstances) and then produce output that it has never seen. Read the word2vec paper as an example.

And, an LLM can output whatever you want. Change the last linear layer from (embedding dimension -> vocab size) to (embedding dimension -> sentiment categories) for example, and hey presto it can produce sentiment analysis (with surprisingly few samples, because... it's learned all the concepts in the transformer blocks in pre-training).

You gotta actually have a clue about how they work. This stuff is no longer magic, there are many resources to learn them in detail.


This isn't magic. Precisely. Calling it manipulating high dimension concepts muddles the waters. It's not. It's merely using a whole lot of numbers to manipulate word tokens.

And it's quite limited at that despite the impressive ability to store up language corner cases and put them back out.

"an LLM can output whatever you want" -> this right here is approaching magical thinking. I am really clear headed about it's capabilities and limitations, breathlessly describing it's internals as if it implies it's anything more then a first decent NLP interface is the problem and you seem to be indulging in it.


It is literally manipulating high dimensional vectors. GPT-4 embedding dimension is 12288(I think) and Llama-3 is 4096. Thousands of said high dimensional vectors come into the model and operations like +,-,x,/,exp,log, gelu etc etc are done on them and combinations of them. And during training there are literally geometric relationships created between the vectors based on concepts humans use. This isn't some pie in the sky assertion using hand wavy words, those are concrete statements that don't muddy any water.

It might be magical to you, it isn't magical to anyone actually working with these things that you can trivially change the last layer and have the outputs represent whatever you want. Get some samples, write a loss function, go to work. Suggesting they can only output words/tokens only displays a complete misunderstanding of how they work under the hood.


Why you keep saying it's magical to me? You're the one describing the underlying maths as if it makes it special. To the model these 'high dimensional embeddings' are just long lists of numbers.

I've said it before and I'll repeat. Using terabytes of number lists to achieve a decent NLP isn't all that impressive.


Because you don't understand it, and it's understandable. Get a version of GPT-2 or something from huggingface. Go through layer by layer.

Anyone with brain is impressed by the state of the art LLMs. What's next? The latest GPUs aren't impressive because "shoving a bunch of electrons all around the place to do elementary arithmetic isn't impressive"? The difficulty is getting a bajillion little things to come together in a way that is useful. Pointing out that some complex thing is not impressive because it's "just lots of little simple things" is dumb as hell.


I guess one could assume that Joe waited for Sue at her university before driving back to his home with her.

The problem with many early LLMs is that they could be persuaded into believing something if you stated that it’s true.

Maybe the default prompt in GPT-4o includes something to the effect of “You are a helpful, critical-thinking chatbot”…


Does this work without “Clever Hans” prompting it with the implication that there is something to notice?


No. Remove the "explain" or "what do you think" or "think critically about" types of phrases from the prompt and you get this kind of thing:

My prompt:

    Joe drove Sue to the university. Afterwards, he drove home again and drank tea with her in the kitchen.
GPT-4o Response:

    Your sentence is correct, but for a slightly smoother flow, you could rephrase it as: Joe drove Sue to the university. Afterwards, he returned home and drank tea with her in the kitchen.
It will never "notice" anything off about the input unless you ask it to. I'm consistently impressed how many supposedly tech-savvy people believe GPTs turn into magic orbs if you make them big enough.


Yes.


I just tried with 4o and it didn't notice anything, it just rewrote the statements with different words.



You explicitly asked it to explain?


You need to ask it something or else it is just a weird sentence floating in a void. What would a reasonable response be? “And..?”


interestingly, I, a human and totally not an LLM, completely missed the inconsistency, and then when I saw it pointed out to me, initially tried to argue that it's not necessarily an inconsistency until I thought it through.


This is a terrific deep-dive from Stephen Wolfram that explains not only how ChatGPT and other LLM applications work "under the hood," but also how they can give rise to logic and grammar -- but notably not math or computation.



Earlier discussion (499 comments): https://news.ycombinator.com/item?id=34796611



Please add 2023 tag to the title.


Excellent deep dive (and surprisingly not self-referential). I read the dead tree version (https://amzn.to/4cs5EDA) when it first came out.


You're right! I'm used to his articles being full of self promotion and Wolfram-talk, but this one is not.


I saw Stephen Wolfram in the URL and hesitated to click at first for that exact reason. This turned out to be a fantastic write-up though, so I'm glad I gave it a chance.


I was pleasantly surprised, to say the least.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: