This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to.
The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?
It might be more useful, but as is, it is still dispositive: 5.5 is significantly worse than o3 at geo-guessing. And the “magic” prompt doesn’t matter that much, at least in o3’s case.