Assuming for a moment that the cost per task has a linear relationship with compute, then it costs a little more than $1 million to get that score on the public eval.
The results are cool, but man, this sounds like such a busted approach.
It feels a lot less like the breakthrough when the solution looks so much like simply brute-forcing.
But you might be right, who cares? Does it really matter how crude the solution is if we can achieve true AGI and bring the cost down by increasing the efficiency of compute?
That’s the thing that’s interesting to me though and I had the same first reaction. It’s a very different problem than brute-forcing chess. It has one chance to come to the correct answer. Running through thousands or millions of options means nothing if the model can’t determine which is correct. And each of these visual problems involve combinations of different interacting concepts. To solve them requires understanding, not mimicry. So no matter how inefficient and “stupid” these models are, they can be said to understand these novel problems. That’s a direct counter to everyone who ever called these a stochastic parrot and said they were a dead-end to AGI that was only searching an in distribution training set.
The compute costs are currently disappointing, but so was the cost of sequencing the first whole human genome. That went from 3 billion to a few hundred bucks from your local doctor.
That is roughly 10^9 times more compute required, or roughly the US military budget per half an hour, to get the intelligence of 1 (!) STEM graduate (not any kind of superhuman intelligence).
Of course, algorithms will get better, but this particular approach feels like wading in a plateau of efficiency improvements, very, very far down the X axis.