Humans are set up I think to intuitively understand 3d space as it's what we run around and try to survive in. Language models on the other hand are set up to understand language which humans can do also but I think with a different part of the brain. There probably is no reason why you couldn't set up a model to understand 3d space - I guess they do that a bit with self driving. A lot of animals like cats and squirrels are pretty good with 3d space also but less so with language.