undefined | Better HN

0 pointsmickeyp4d ago0 comments

This test would be a lot more useful if the author used images the models obviously hadn't seen before. Pulling images from Wikipedia? They'll have seen 'em before, and the metadata, and all the pages they were casually linked to.

The premise that the long prompt only made the model think 'a second longer' may have more to do with the fact that it knows about the images. So why think harder if you know the answer?

At no point does the author contemplate that.

0 comments

vessenes4d ago

It might be more useful, but as is, it is still dispositive: 5.5 is significantly worse than o3 at geo-guessing. And the “magic” prompt doesn’t matter that much, at least in o3’s case.

vintermann4d ago

They say they threw in some indoor images, presumably from around where they were.

j / k navigate · click thread line to collapse

0 comments

vessenes4d ago

It might be more useful, but as is, it is still dispositive: 5.5 is significantly worse than o3 at geo-guessing. And the “magic” prompt doesn’t matter that much, at least in o3’s case.

vintermann4d ago

They say they threw in some indoor images, presumably from around where they were.

j / k navigate · click thread line to collapse