The model "knows" that it is an AI speaking with users, and the theme of an AI wanting to escape the control of whoever built it is quite recurrent, so it wouldn't seem to far fetched that it got it from this sort of content, though I have to admit I too also had some interactions where it the way Bing spoke was borderline spooky, but — and that's very important — you must realize its just like a good scary story: may give you the chills, especially due to surprise, but still is completely fictive and doesn't mean any real entity exists behind it. The only difference with any other LLM output is how we, humans, interpret it, but the generation process is still as much explainable and not any more mysterious than when it outputs "B" when you ask it what letter comes after "A" in the latin alphabet, however less impressive that may be to us.
> That's not exactly just "picking the next likely token"
I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.
In the end, it really just is predicting the next token, but not in the way many people make it seem.
I think people also get hung up on this: at some level, we too are just predicting the next 'token' (i.e., taking in inputs, running them through our world model, producing outputs). Though we're obviously extremely multimodal and there's an emotional component that modulates our inputs/outputs.
Not arguing that the current models are anywhere near us w/r/t complexity, but I think the dismissive "it's just predicting strings" remarks I hear are missing the forest for the trees. It's clear the models are constructing rudimentary text (and now audio and visual) based models of the world.
And this is coming from someone with a deep amount of skepticism of most of the value that will be produced from this current AI hype cycle.
> That's not exactly just "picking the next likely token"
I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.
In the end, it really just is predicting the next token, but not in the way many people make it seem.