> To what degree can these large language models arrive at these same conclusion...

> To what degree can these large language models arrive at these same conclusions, and by what process?

By having visual understanding more deeply integrated in the thought process, in my opinion. Then they wouldn't be Large Language Models, of course. There are several concepts I remember and operate on by visualizing them , even visualizing motion. If i want to add numbers, i visualize the carry jumping on top of the next number. If i don't trust one of the additions , I go back , but I can't say if it's because i "mark" the uncertainty somehow.

When I think about my different groups of friends, in the back of my mind a visual representation forms.

Thinking about my flight route forms a mini map somehow, and i can compare distances between places, and all.

This helps incredibly in logical tasks like programming and math.

I think it's something that we all learned growing up and by playing with objects around us.