Our own understanding of spatial reasoning is tied in many respects to our hand-eye coordination, muscle memory and other senses: we learn to conceptualize "balance" by observing and feeling falling and when things are "about to" fall.
What GPT does is not "text" - although it centers that as the interface - but "symbols". The billions of parameters express different syntaxes and how they relate to each other. That's why GPT can translate languages and explain things using different words or as different personas.
So when we ask it to solve a spatial problem, we aren't getting a result based on muscle memory and visual estimation like "oh, it's about 1/3rd of the way down the number line". Instead, GPT has devised some internal syntax that frames a spatial problem in symbolic terms. It doesn't use words as we know them to achieve the solution, but has grasped some deeper underlying symbolic pattern in how we talk about a subject like a physics problem.
And this often works! But it also accounts for why its mathematical reasoning is limited in seemingly elementary ways and it quickly deviates into an illogical solution, because it is drawing on an alien means of "intuiting" answers.
We can definitely call it intelligent in some ways, but not in the same ways we are.