undefined | Better HN

0 pointsnorir2y ago0 comments

This announcement makes we wonder if we are approaching a plateau in these systems. They are essentially claiming close to parity with gpt-4, not a spectacular new breakthrough. If I had something significantly better in the works, I'd either release it or hold my fire until it was ready. I wouldn't let openai drive my decision making, which is what this looks like from my perspective. Their top line claim is they are 5% better than gpt-4 on an arbitrary benchmark in a rapidly evolving field? I'm not blown away personally.

0 comments

dougmwne2y ago

I don’t think we can declare a plateau just based on this. Actually, given that we have nothing but benchmarks and cherry picked examples, I would not be so quick to believe GPT-4V has been bested. PALM-2 was generally useless and plagued by hallucinations in my experience with Bard. It’ll be several months till Gemini Pro is even available. We also don’t know basic facts like the number of parameters or training set size.

I think the real story is that Google is badly lagging their competitors in this space and keeps issuing press releases claiming they are pulling ahead. In reality they are getting very little traction vs. OpenAI.

I’ll be very interested to see how LLMs continue to evolve over the next year. I suspect we are close to a model that will outperform 80% of human experts across 80% of cognitive tasks.

pradn2y ago

> It’ll be several months till Gemini Pro is even available.

Pro is available now - Ultra will take a few months to arrive.

jackblemming2y ago

How could you possibly believe this when the improvement curve had been flattening. The biggest jumps were GPT-2 to GPT-3 and everything after that has been steady but marginal improvements. What you’re suggesting is like people in the 60s seeing us land on the moon and then thinking Star Trek warp drive must be 5 years away. Although people back in the day thought we’d all be driving flying cars right now. I guess people just have fantastical ideas of tech.

ThurnUnd2y ago

It is hard to quantify, but subjectively (and certainly in terms of public perception), each GPT release has been a massive leap over the previous model. Maybe GPT-2 to GPT-3 was the largest, but im not sure how you're judging that a field is stagnating based on one improvement in a series of revolutionary improvements being slightly more significant than the others. I think most would agree the jump from GPT-3 to GPT-4 was not marginal, and I think i'll be borne right when the jump from GPT-4 to GPT-5 isn't either. There may be a wall, but i dont't see a good argument that we've hit it yet. If GPT-5 releases and is only marginally better that will be evidence in that direction, but i'm pretty confident that won't happen.

Your analogy is odd because you're just posing a situation that is analgous to what the situation would look like if you turned out to be right. From the rate of improvement recently, i'd say we're more at the first flight test stage. Yes, of course the jump from a vehicle that can't fly to one that can is in some sense a 'bigger leap' than others in the development cycle, but we still eventually got to the moon.

1 more reply

belval2y ago

Don't look at absolute number, instead think of it in terms of relative improvement.

DocVQA is a benchmark with a very strong SOTA. GPT-4 achieves 88.4, Gemini 90.9. It's only 2.5% increase, but a ~22% error reduction which is massive for real-life usecases where the error tolerance is lower.

machiaweliczny2y ago

This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

belval2y ago

Yes, a lot of those have pretty egregious annotation mistakes. Once you get in high percentage it's often worth going through your dataset with your model prediction and compare. Obviously you can't do that on academic benchmarks (though some papers still do).

golol2y ago

In my opinion the field is not that rapidly advancing. The major breakthroughs, where something was really much better than everything before were the following:

GPT-2 February 2019

GPT-3 June 2020

CPT-3.5 December 2022

GPT-4 February 2023

Note that GPT-3 to GPT4 took almost 3 years!

rfw3002y ago

That seems like a pretty remarkable pace of innovation, no?

golol2y ago

Yea but some people seem to expect that GPT-4 should follow soon after GPT-4 because GPT-4 followed soon after ChatGPT.

famouswaffles2y ago

GPT-4 was done training 8 months before release, so 2 years

johnfn2y ago

Interesting, but hard to conclude just from one datapoint. An alternate interpretation is that, given how far Bard lagged behind GPT until this moment, it's a stunning advancement.

hackerlight2y ago

I expect a plateau in depth before breadth.

Breadth for example means better multi-modality and real-world actions/control. These are capabilities that we haven't scratched the surface of.

But improving depth of current capabilities (like writing or coding) is harder if you're already 90% of the way to human-level competence and all of your training data is generated by human output. This isn't like chess or go where you can generate unlimited training data and guarantee superhuman performance with enough compute. There are more fixed limitations determined by data when it comes to domains where it's challenging to create quality synthetic data.

miraculixx2y ago

It's a PR release. Probably Sundai needs to meet some objective by end of year.

yreg2y ago

> Their top line claim is they are 5% better than gpt-4 on an arbitrary benchmark in a rapidly evolving field?

Their top line claim is multimodality.

vl2y ago

Plateau is largely in hardware, next generation of accelerators with more memory will enable larger models and so on.

j / k navigate · click thread line to collapse

0 comments

dougmwne2y ago

I’ll be very interested to see how LLMs continue to evolve over the next year. I suspect we are close to a model that will outperform 80% of human experts across 80% of cognitive tasks.

pradn2y ago

> It’ll be several months till Gemini Pro is even available.

Pro is available now - Ultra will take a few months to arrive.

jackblemming2y ago

ThurnUnd2y ago

1 more reply

belval2y ago

Don't look at absolute number, instead think of it in terms of relative improvement.

machiaweliczny2y ago

This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

belval2y ago

golol2y ago

In my opinion the field is not that rapidly advancing. The major breakthroughs, where something was really much better than everything before were the following:

GPT-2 February 2019

GPT-3 June 2020

CPT-3.5 December 2022

GPT-4 February 2023

Note that GPT-3 to GPT4 took almost 3 years!

rfw3002y ago

That seems like a pretty remarkable pace of innovation, no?

golol2y ago

Yea but some people seem to expect that GPT-4 should follow soon after GPT-4 because GPT-4 followed soon after ChatGPT.

famouswaffles2y ago

GPT-4 was done training 8 months before release, so 2 years

johnfn2y ago

Interesting, but hard to conclude just from one datapoint. An alternate interpretation is that, given how far Bard lagged behind GPT until this moment, it's a stunning advancement.

hackerlight2y ago

I expect a plateau in depth before breadth.

Breadth for example means better multi-modality and real-world actions/control. These are capabilities that we haven't scratched the surface of.

miraculixx2y ago

It's a PR release. Probably Sundai needs to meet some objective by end of year.

yreg2y ago

> Their top line claim is they are 5% better than gpt-4 on an arbitrary benchmark in a rapidly evolving field?

Their top line claim is multimodality.

vl2y ago

Plateau is largely in hardware, next generation of accelerators with more memory will enable larger models and so on.

j / k navigate · click thread line to collapse