I think the real story is that Google is badly lagging their competitors in this space and keeps issuing press releases claiming they are pulling ahead. In reality they are getting very little traction vs. OpenAI.
I’ll be very interested to see how LLMs continue to evolve over the next year. I suspect we are close to a model that will outperform 80% of human experts across 80% of cognitive tasks.
Pro is available now - Ultra will take a few months to arrive.
Your analogy is odd because you're just posing a situation that is analgous to what the situation would look like if you turned out to be right. From the rate of improvement recently, i'd say we're more at the first flight test stage. Yes, of course the jump from a vehicle that can't fly to one that can is in some sense a 'bigger leap' than others in the development cycle, but we still eventually got to the moon.
DocVQA is a benchmark with a very strong SOTA. GPT-4 achieves 88.4, Gemini 90.9. It's only 2.5% increase, but a ~22% error reduction which is massive for real-life usecases where the error tolerance is lower.
GPT-2 February 2019
GPT-3 June 2020
CPT-3.5 December 2022
GPT-4 February 2023
Note that GPT-3 to GPT4 took almost 3 years!
Breadth for example means better multi-modality and real-world actions/control. These are capabilities that we haven't scratched the surface of.
But improving depth of current capabilities (like writing or coding) is harder if you're already 90% of the way to human-level competence and all of your training data is generated by human output. This isn't like chess or go where you can generate unlimited training data and guarantee superhuman performance with enough compute. There are more fixed limitations determined by data when it comes to domains where it's challenging to create quality synthetic data.
Their top line claim is multimodality.