undefined | Better HN

0 pointsACCount376mo ago0 comments

That looks like something that can be solved by autoregressive models of today, no architectural changes needed.

What you need is: good image understanding, at least GPT-5 tier, general purpose reasoning over images training, and then some domain-specific training, or at least some few-shot guidance to get it to adopt the correct reasoning patterns.

If I had to guess which model would be able to do it best out of the box, few-shot, I'd say Gemini 3 Pro.

There is nothing preventing an autoregressive LLM from revisiting images and rewriting the texts as new clues come in. This is how they can solve puzzles like sudoku.

0 comments

vintermann6mo ago

Try for yourself, if you want to:

https://urn.digitalarkivet.no/URN:NBN:no-a1450-rg60085808000...

j / k navigate · click thread line to collapse

0 comments

vintermann6mo ago

Try for yourself, if you want to:

https://urn.digitalarkivet.no/URN:NBN:no-a1450-rg60085808000...

j / k navigate · click thread line to collapse