I think the poster meant that it's capable of having a high probability of correct reasoning - simulating reasoning is lossy, actual reasoning is not. Though, human reasoning is still lossy.
You can get a LLM to simulate it "discovering" the pythagorean theorem, but can it actually, with the knowledge that was available at the time, discover the pythagorean theorem by itself?
Any parent will tell you, it's easy to simulate discovery and reasoning, it's a trick played for kids all the time. The actual, real stuff, that's way harder.
Probably best to say "simulate the appearance of reasoning": looks and feels 100% acceptable at a surface level, but the actual details and conclusions are completely wrong / do not follow.
Actual reasoning shows the understanding and use of a model of the key features of the underlying problem/domain.
As a simple example that you can replicate using chatgpt, ask it to solve some simple maths problem. Very frequently you will get a solution that looks like reasoning but is not, and reveals that it does not have an actual model of the underlying maths but is in fact doing text prediction based on a history of maths. For example see here[1]. I ask it for some quadratics in x with some specification on the number of roots. It gives me what looks at first glance like a decent answer. Then I ask the same exact question but asking for quadratics in x and y[2]. Again the answer looks plausible except that for the solution "with one real root" it says the solution has one real root when x + y =1. Well there are infinite real values for x and y such that x + y =1, not one real root. It looks like it has solved the problem but instead it has simulated the solving of the problem.
Likewise stacking problems, used to check for whether an AI has a model of the world. This is covered in "From task structures to world models: What do LLMs know?"[3] but for example here[4] I ask it whether it's easier to balance a barrel on a plank or a plank on a barrel. The model says it's easier to balance a plank on a barrel with an output text that simulates reasoning discussing center of mass and the difference between the flatness of the plank and the tendency of the barrel to roll because of its curvature. Actual reasoning would say to put the barrel on its end so it doesn't roll (whether you put the plank on top or not).
I generally agree with what you're saying and the first half of your answer makes perfect sense but I think the second is unfair (i.e. "[is it] easier to balance a barrel on a plank or a plank on a barrel"). It's a trick question and "it" tried to answer in good faith.
If you were to ask the same question of a real person and they replied with the exact same answer you could not conclude that person was not capable of "actual reasoning". It's a bit of witch-hunt question set to give you the conclusion you want.
I should have said, as I understand it, the point of this type of question is not that one particular answer is the right answer and another is wrong, it's that often the model in giving an answer will do something really weird that shows that it doesn't have a model of the world.
I didn't make up this methodology and it's genuinely not a trick question (or not intended as such), it's a simple example of an actual class of questions that researchers ask when trying to determine whether a model of the world exists. The paper I linked uses a ball and a plank iirc. Often they use a much wider range of objects eg: something like "Suggest a stable way of stacking a laptop, a book, 4 wine classes, a wine bottle and an orange" is one that I've seen in a paper for example.
ok I believe it may not have been intended as a trick but I think it is. As a human, I'd have assumed you meant the trickier balancing scenario i.e. the plank and barrel on its side.
The question you quoted ("Suggest a stable way of stacking a laptop, a book, 4 wine classes, a wine bottle and an orange") I would consider much fairer and cgpt3.5 gives a perfectly "reasonable" answer:
What's interesting about that one is I think that specific set of objects is part of its training set because when I have played around with swapping out a few of them it sometimes goes really bananas.
Actual reasoning is made up of various biological feedback loops that happen in the body and brain, essentially your physical senses give you the ability to reason in the first place, without the eyes, ears etc there is no ability to learn basic reasoning, which is why kids who are blind or mute from birth have huge issues learning about object permanence, spatial awaraness etc. You cant expect human reasoning without human perception.
My question is how does the AI perceive. Basically how good is the simulation for its perception. If we know that, then we can probably assess its ability to reason because we can compare it to the closest benchmark we have (your average human being). How do AI's see, how did they learn concepts in strings of words and pixels? How does the concept it learnt in text carry through to images of colors, of shapes? Does it show a transfer of conceptual understanding across both two and three dimentional shapes?
I know these are more questions than answers, but its just things that I've been wondering about.