undefined | Better HN

0 pointscs7021y ago0 comments

It is embarrassingly, shockingly bad, because these models are advertised and sold as being capable of understanding images.

Evidently, all these models still fall short.

0 comments

kristjansson1y ago

It's surprising because these models are pretty ok at some vision tasks. The existence of a clear failure mode is interesting and informative, not embarrassing.

knowaveragejoe1y ago

Not only are they capable of understanding images(the kind people might actually feed into such a system - photographs), but they're pretty good at it.

A modern robot would struggle to fold socks and put them in a drawer, but they're great at making cars.

pixl971y ago

I mean, with some of the recent demos, robots have got a lot better at folding stuff and putting it up. Not saying it's anywhere close to human level, but it has taken a pretty massive leap from being a joke just a few years ago.

TeMPOraL1y ago

They're hardly being advertised or sold on that premise. They advertise and sell themselves, because people try them out and find out they work, and tell their friends and/or audiences. ChatGPT is probably the single biggest bona-fide organic marketing success story in recorded history.

foldr1y ago

This is fantastic news for software engineers. Turns out that all those execs who've decided to incorporate AI into their product strategy have already tried it out and ensured that it will actually work.

ben_w1y ago

> Turns out that all those execs who've decided to incorporate AI into their product strategy have already tried it out and ensured that it will actually work.

The 2-4-6 game comes to mind. They may well have verified the AI will work, but it's hard to learn the skill of thinking about how to falsify a belief.

1 more reply

TeMPOraL1y ago

Who cares about execs? They know they work, but for them "works" is defined as "makes them money", not "does anything useful".

I'm talking about regular people, who actually use these tools for productive use, and can tell the models are up to tasks previously unachievable.

1 more reply

simonw1y ago

I see this complaint about LLMs all the time - that they're advertised as being infallible but fail the moment you give them a simple logic puzzle or ask for a citation.

And yet... every interface to every LLM has a "ChatGPT can make mistakes. Check important info." style disclaimer.

The hype around this stuff may be deafening, but it's often not entirely the direct fault of the model vendors themselves, who even put out lengthy papers describing their many flaws.

jazzyjackson1y ago

There's evidently a large gap between what researchers publish, the disclaimers a vendor makes, and what gets broadcast on CNBC, no surprise there.

jampekka1y ago

A bit like how Tesla Full Self-Driving is not to be used as self-driving. Or any other small print. Or ads in general. Lying by deliberately giving the wrong impression.

verdverm1y ago

It would have to be called ChatAGI to be like TeslaFSD, where the company named it something it is most definitely not

startupsfail1y ago

Humans are also shockingly bad on these tasks. And guess where the labeling was coming from…

fennecbutt1y ago

Why do people expect these models, designed to be humanlike in their training, to be 100% perfect?

Humans fuck up all the time.

j / k navigate · click thread line to collapse

0 comments

kristjansson1y ago

It's surprising because these models are pretty ok at some vision tasks. The existence of a clear failure mode is interesting and informative, not embarrassing.

knowaveragejoe1y ago

Not only are they capable of understanding images(the kind people might actually feed into such a system - photographs), but they're pretty good at it.

A modern robot would struggle to fold socks and put them in a drawer, but they're great at making cars.

pixl971y ago

TeMPOraL1y ago

foldr1y ago

ben_w1y ago

> Turns out that all those execs who've decided to incorporate AI into their product strategy have already tried it out and ensured that it will actually work.

The 2-4-6 game comes to mind. They may well have verified the AI will work, but it's hard to learn the skill of thinking about how to falsify a belief.

1 more reply

TeMPOraL1y ago

Who cares about execs? They know they work, but for them "works" is defined as "makes them money", not "does anything useful".

I'm talking about regular people, who actually use these tools for productive use, and can tell the models are up to tasks previously unachievable.

1 more reply

simonw1y ago

I see this complaint about LLMs all the time - that they're advertised as being infallible but fail the moment you give them a simple logic puzzle or ask for a citation.

And yet... every interface to every LLM has a "ChatGPT can make mistakes. Check important info." style disclaimer.

The hype around this stuff may be deafening, but it's often not entirely the direct fault of the model vendors themselves, who even put out lengthy papers describing their many flaws.

jazzyjackson1y ago

There's evidently a large gap between what researchers publish, the disclaimers a vendor makes, and what gets broadcast on CNBC, no surprise there.

jampekka1y ago

A bit like how Tesla Full Self-Driving is not to be used as self-driving. Or any other small print. Or ads in general. Lying by deliberately giving the wrong impression.

verdverm1y ago

It would have to be called ChatAGI to be like TeslaFSD, where the company named it something it is most definitely not

startupsfail1y ago

Humans are also shockingly bad on these tasks. And guess where the labeling was coming from…

fennecbutt1y ago

Why do people expect these models, designed to be humanlike in their training, to be 100% perfect?

Humans fuck up all the time.

j / k navigate · click thread line to collapse