Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think it's guaranteed, but I do think it's very plausible because we've seen these models gain emerging abilities at every iteration, just from sheer scaling. So extrapolation tells us that they may keep gaining more capabilities (we don't know how exactly it does it, though, so of course it's all speculation).

I don't think many people would describe GPT-4 as a stochastic parrot already... when the paper that coined (or at least popularized) the term came up in early 2021, the term made a lot of sense. In late 2023, with models that at the very least show clear signs of creativity (I'm sticking to that because "reasoning" or not is more controversial), it's relegated to reductionistic philosophical arguments, but not really a practical description anymore.



I don’t think we should throw out the stochastic parrot so easily. As you say there are “clear signs of creativity” but that could be it getting significantly better as a stochastic parrot. We have no real test to tell mimicry apart from reasoning and as you note we also can only speculate about how any of it works. I don’t think it’s reductionist in light of that, maybe cautious or pessimistic.


They can write original stories in a setting deliberately designed to not be found in the training set (https://arxiv.org/abs/2310.08433). To me that's rather strong evidence of being beyond stochastic parrots by now, although I must concede that we know so little about how everything works, that who knows.


I didn't look at the paper but... How do you design a setting in a way that you're sure there isn't a similar one in the training set, when we don't even precisely know what the training set for the various GPT models was?


Basically by making it unlikely enough to exist.

The setting in the paper is about narrating a single combat between Ignatius J. Reilly and a pterodactyl. Ignatius J. Reilly is a literary character with some very idiosyncratic characteristics, that appears in a single book, where he of course didn't engage in single combats at all or interact with pterodactyls. He doesn't seem to have been the target of fanfiction either (which could be a problem if characters like, say, Harry Potter or Darth Vader were used instead), so the paper argues that it's very unlikely that a story like that had been ever written at all prior to this paper.


Well, we've been writing stories for thousands of years, so I'm a bit skeptical that the concept of "unlikely enough to exist" is a thing. More to the specific example, maybe there isn't a story about this specific character fighting a pterodactyl, but surely there are tons of stories of people fighting all kind of animals, and maybe there are some about someone fighting a pterodactyl too.


Sure, but the evaluation explicitly addresses (among other points) how well that specific character is characterized. If an LLM took a pre-existing story about (say) Superman fighting a pterodactyl, and changed Superman to Ignatius J. Reilly, it wouldn't get a high rating.


> very least show clear signs of creativity

Do you know how that “creativity” is achieved? It’s done with a random number generator. Instead of having the LLM pick the absolute most likely next token, they have it select from a set of most likely next tokens - size of the set depends on “temperature”.

Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.

The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.


I know that, but to me that statement isn't much more helpful than "modern AI is just matrix multiplication" or "human intelligence is just electric current through neurons".

Saying that it's done with a random number generator doesn't really explain the wonder of achieving meaningful creative output, as in being able to generate literature, for example.


> Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.

Sounds like some people I know, at both extremes.

> The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.

They have been designed and trained to solve natural language processing tasks, and are already outperforming humans on many of those tasks. The transformer architecture is extremely well thought out, based on extensive R&D. The attention mechanism is a brilliant design. Can you explain exactly which part of the transformer architecture is poorly designed?


> They have been designed and trained to solve natural language processing tasks

They aren’t really designed to do anything actually. LLMs are models of human languages - it’s literally in the name, Large Language Model .

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

I’m sorry but I don’t trust something that uses a random number generator as part of its output generation.


> They aren’t really designed to do anything actually. LLMs are models of human languages - it’s literally in the name, Large Language Model .

No. And the article you linked to does not say that (because Wolfram is not an idiot).

Transformers are designed and trained specifically for solving NLP tasks.

> I’m sorry but I don’t trust something that uses a random number generator as part of its output generation.

The human brain also has stochastic behaviour.


People use the term "stochastic parrot" in different ways ... some just as a put-down ("it's just autocomplete"), but others like Geoff Hinton acknowledging that there is of course some truth to it (an LLM is, at the end of the day, a system who's (only) goal is to predict "what would a human say"), while pointing out the depth of "understanding" needed to be a really good at this.

There are fundamental limitations to LLMs though - a limit to what can be learned by training a system to predict next word form a fixed training corpus. It can get REALLY good at that task, as we've seen, to extent that it's not just predicting next word but rather predicting an entire continuation/response that is statistically consistent with the training set. However, what is fundamentally missing is any grounding in anything other than the training set, which is the what causes hallucinations/bullshitting. In a biological intelligent system predicting reality is the goal, not just predicting what "sounds good".

LLMs are a good start in as much as they prove the power of prediction as a form of feedback, but to match biological systems we need a closed-loop cognitive architecture that can predict then self-correct based on mismatch between reality and prediction (which is what our cortex does).

For all of the glib prose that an LLM can generate, even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good), it doesn't have the intelligence of even a simple animal like a rat that doesn't use language at all, but is grounded in reality.


> even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good

It was trained not only to "sound good" aesthetically but also to solve a wide range of NLP tasks accurately. It not only "seems to" understand the prompt but it actually does have a mechanical understanding of it. With ~100 layers in the network it mechanically builds a model of very abstract concepts at the higher layers.

> it doesn't have the intelligence of even a simple animal

It has higher intelligence than humans by some metrics, but no consciousness.


> It was trained not only to "sound good" aesthetically but also to solve a wide range of NLP tasks accurately.

Was it? I've only heard of pre-training (predict next word) and subsequent RLHF + SFT "alignment" (incl. aligning to goal of being conversational). AFAIK the NLP skills that these LLMs achieve are all emergent rather than explicitly trained.

I'm not sure we can really say the net fully understands even if it answers as if it does - it was only trained to "predict next word", which in effect means being trained to generate a human-like response. It will have learnt enough to accomplish that goal, and no more (training loss tends to zero as goal is met).

Contrast this to an animal with a much richer type of feedback - reality, and with continual (aka online) learning. The animal truly understands it's actions - i.e. has learnt to accurately predict what will happen as a result of them.

The LLM does not understand it's own output in this sense - it exists only in a world of words, and has no idea if the ideas it is expressing are true or not (hence all the hallucinating/bullshitting). It only knew enough to generate something that sounded like what a person might say.


> Was it? I've only heard of pre-training (predict next word) and subsequent RLHF + SFT "alignment" (incl. aligning to goal of being conversational). AFAIK the NLP skills that these LLMs achieve are all emergent rather than explicitly trained.

I believe you are right about that. I did some research after reading your comment. Transformers were certainly designed for NLP, but with large enough models the abilities can emerge without necessarily being explicitly trained for it.

> I'm not sure we can really say the net fully understands even if it answers as if it does - it was only trained to "predict next word", which in effect means being trained to generate a human-like response.

It depends on your definition of "understand". If that requires consciousness then there is no universally agreed formal definition.

Natural Language Understanding (NLU) is a subset of Natural Language Processing (NLP). If we take the word "understanding" as used in an academic and technical context then yes they do understand quite well. In order to simply "predict the next word" they learn an abstract model of syntax, semantics, meaning, relationships, etc, from the text.

> and has no idea if the ideas it is expressing are true or not (hence all the hallucinating/bullshitting).

That is not really an issue when solving tasks that are within it's context window. It is an issue for factual recall. The model is not a type of database that stores its training set verbatim. Humans have analogous problems with long term memory recall. I can think straight within my working memory but my brain will "hallucinate" to some extent when recalling distant memories.


The context window only has to do with the size of input it has access to - its not related to what it's outputting, which is ultimately constrained by what it was trained on.

If you ask it a question where the training data (or input data = context) either didn't include the answer, or where it was not obvious how to get the right answer, that will not (unfortunately) stop it from confidently answering!


> The context window only has to do with the size of input it has access to - its not related to what it's outputting, which is ultimately constrained by what it was trained on.

Wait a minute. You are completely missing the entire "attention mechanism" thing which is what makes transformers so capable. For each output token generated in sequence, the attention mechanism evaluates the current tokens relationship to all tokens in the context window, weighing their relevance. There are multiple "attention heads" running in parallel (16 in GPT-3.5). Now for each layer of the neural network there is an attention mechanism, independently processing the entire context window for each token. There are ~100 layers in ChatGPT. So now we have 100 layers times 16 attention heads = 1600 attention mechanisms evaluating the entire context window over many deep layers of abstraction for each output token.


I'm not sure what your point is ... Hallucinations are where the net hadn't seen enough training data similar/related to the prompt to enable it to generate a good continuation/response. Of course in cases where it is sufficiently trained and the context contained what it needs then in can make full use of it, even copying context words to the output (zero shot learning) when appropriate.

The real issue isn't that the net often "makes a statistical guess" rather than saying "I don't know", but rather that when it does make errors it has no way to self-detect the error and learn from the mistake, as a closed-loop biological system is able to do.


I was responding to this.

> The context window only has to do with the size of input it has access to - its not related to what it's outputting

The sequential token generation process is closely related to the content of the context window at every step.

Maybe I misunderstood your point. I know these things can hallucinate when asked about obscure facts that they weren't sufficiently trained on.


> If you ask it a question where the training data (or input data = context) either didn't include the answer, or where it was not obvious how to get the right answer, that will not (unfortunately) stop it from confidently answering!

I haven't found this to be the case in my experience. I use ChatGPT-4. It often tells me when it doesn't know or have enough information.

If you haven't used GPT-4 I recommend signing up for a month. It is next level, way better than 3.5. (10x the parameter count). (No I'm not being paid to recommend it.)


You can predict performance of certain tasks before training and it's continuous:

https://twitter.com/mobav0/status/1653048872795791360


I read that paper back in the day and honestly I don't find it very meaningful.

What they find is that for every emerging ability where an evaluation metric seems to have a sudden jump, there is some other underlying metric that is continuous.

The thing is that the metric with the jump is the one people would actually care about (like actually being able to answer questions correctly, etc.) while the continuous one is an internal metric. I don't think that refutes the existence of emerging abilities, it just explains a little bit of how they arise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: