undefined | Better HN

0 pointsmsoad2y ago0 comments

They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

0 comments

cube22222y ago

I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

fvdessen2y ago

The demo is impressive but personally, as a commercial user, for my practical use cases, the only thing I care about is how smart it is, how accurate are its answers and how vast is its knowledge. These haven’t changed much since GPT-4, yet they should, as IMHO it is still borderline in its abilities to be really that useful

CapcomGo2y ago

But that's not the point of this update

fvdessen2y ago

I know, and I know my comment is dismissive of the incredible work shown here, as we’re shown sci-fi level tech. But I feel I have this kettle, that boils water in 10min, and it really should boil it in 1, but instead is now voice operated.

I hope the next version delivers on being smarter, as this update instead of making me excited, makes me feel they’ve reached a plateau on the improvement of the core value and are distracting us with fluff instead

5 more replies

jll292y ago

There is room for more than one use case and large language model type.

I predict there will be a zoo (more precisely tree, as in "family tree") of models and derived models for particular application purposes, and there will be continued development of enhanced "universal"/foundational models as well. Some will focus on minimizing memory, others on minimizing pre-training or fine-tuning energy consumption, some need high accuracy, others hard realtime speed, yet others multimodality like GPT4.o, some multilinguality, and so on.

Previous language models that encoded dictionaries for spellcheckers etc. never got standardized (for instance, compare aspell dictionaries to the ones from LibreOffice to the language model inside CMU PocketSphinx) so that you could use them across applications or operating systems. As these models are becoming more common, it would be interesting to see this aspect improve this time around.

https://www.rev.com/blog/resources/the-5-best-open-source-sp...

CooCooCaCha2y ago

I disagree, transfer learning and generalization are hugely powerful and specialized models won't be as good because their limited scope limits their ability to generalize and transfer knowledge from one domain to another.

I think people who emphasis specialized models are operating under a false assumption that by focusing the model it'll be able to go deeper in that domain. However, the opposite seems to be true.

Granted, specialized models like AlphaFold are superior in their domain but I think that'll be less true as models become more capable at general learning.

whyever2y ago

They say it's twice as fast/cheap, which might matter for your use case.

minimaxir2y ago

It's twice as fast/cheap relative to GPT-4-turbo, which is still expensive compared to GPT-3.5-turbo and Claude Haiku.

https://openai.com/api/pricing/

2 more replies

fvdessen2y ago

I’d much rather have it be slower, more expensive, but smarter

3 more replies

ben_w2y ago

I understand your point, and agree that it is "borderline" in its abilities — though I would instead phrase it as "it feels like a junior developer or an industrial placement student, and assume it is of a similar level in all other skills", as this makes it clearer when it is or isn't a good choice, and it also manages expectations away from both extremes I frequently encounter (that it's either Cmdr Data already, or that's it's a no good terrible thing only promoted by the people who were previously selling Bitcoin as a solution to all the economics).

That said, given the price tag, when AI becomes genuinely expert then I'm probably not going to have a job and neither will anyone else (modulo how much electrical power those humanoid robots need, as the global electricity supply is currently only 250 W/capita).

In the meantime, making it a properly real-time conversational partner… wow. Also, that's kinda what you need for real-time translation, because: «be this, that different languages the word order totally alter and important words at entirely different places in the sentence put», and real-time "translation" (even when done by a human) therefore requires having a good idea what the speaker was going to say before they get there, and being able to back-track when (as is inevitable) the anticipated topic was actually something completely different and so the "translation" wasn't.

fvdessen2y ago

I guess I feel like I’ll get to keep my job a while longer and this is strangely disappointing…

A real time translator would be a killer app indeed, and it seems not so far away, but note how you have to prompt the interaction with ‘Hey ChatGPT’; it does not interject on its own. It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)

1 more reply

Keyframe2y ago

One thing I've noticed, is the more context and more precise the context I give it the "smarter" it is. There are limits to it of course. But, I cannot help but think that's where next barrier will be brought down. An agent or multiple of that tag along with everything I do throughout the day to have the full context. That way, I'll get smarter and more to the point help as well as not spending much time explaining the context.. but, that will open a dark can that I'm not sure people will want to open - having an AI track everything you do all the time (even if only in certain contexts like business hours / env).

coffeebeqn2y ago

There are definitely multiple dimensions these things are getting better in. The popular focus has been on the big expensive training runs but inference , context size, algorithms, etc are all getting better fast

abdullin2y ago

I have a few LLM benchmarks that were extracted from real products.

GPT-4o got slightly better overall. Ability to reason improved more than the rest.

RupertEisenhart2y ago

Its faster, smarter and cheaper over the API. Better than a kick in the teeth.

aaroninsf2y ago

Absolutely agree.

This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.

It's still a WIP, most of the videos show awkwardness where its capacity to understand the "flow" of human speech is still vestigial. It doesn't understand how humans pause and give one another space for such pauses yet.

But it has some indeed magic ability to share a deictic frame of reference.

I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.

It is very hard to make blustery claims about "glorified Markov token generation" when using language in a way that requires both a shared world model and an understanding of interlocutor intent, focus, etc.

This is edging closer to the moment when it becomes very hard to argue that system does not have some form of self-model and a world model within which self, other, and other objects and environments exist with inferred and explicit relationships.

This is just the beginning. It will be very interesting to see how strong its current abilities are in this domain; it's one thing to have object classification—another thing entirely to infer "scripts plans goals..." and things like intent, and, deixis. E.g. how well does it now understand "us" and "them" and "this" vs "that"?

Exciting times. Scary times. Yee hawwwww.

nicklecompte2y ago

What part of this makes you think GPT-4 suddenly developed a world model? I find this comment ridiculous and bizarre. Do you seriously think snappy response time + fake emotions is an indicator of intelligence? It seems like you are just getting excited and throwing out a bunch of words without even pretending to explain yourself:

> using language in a way that requires both a shared world model

Where? What example of GPT-4o requires a shared world model? The customer support example?

The reason GPT-4 does not have any meaningful world model (in the sense that rats have meaningful world models) is that it freely believes contradictory facts without being confused, freely confabulates without having brain damage, and it has no real understanding of quantity or causality. Nothing in GPT-4o fixes that, and gpt2-chatbot certainly had the same problems with hallucinations and failing the same pigeon-level math problems that all other GPTs fail.

famouswaffles2y ago

One of the most interesting things about the advent of LLMs is people bringing out all sorts of "reasons" GPT doesn't have true 'insert property' but all those reasons freely occur in humans as well

>that it freely believes contradictory facts without being confused,

Humans do this. You do this. I guess you don't have a meaningful world model.

>freely confabulates without having brain damage

Humans do this

>and it has no real understanding of quantity or causality.

Well this one is just wrong.

6 more replies

DonHopkins2y ago

>But it has some indeed magic ability to share a deictic frame of reference.

They really Put That There!

https://www.youtube.com/watch?v=RyBEUyEtxQo

Oh, shit.

razodactyl2y ago

In my view, this was in response to the machine being colourblind haha

ChuckMcM2y ago

I expect the really solid use case here will be voice interfaces to applications that don't suck. Something I am still surprised at is that vendors like Apple have yet to allow me to train the voice to text model so that it only responds to me and not someone else.

So local modelling (completely offline but per speaker aware and responsive), with a really flexible application API. Sort of the GTK or QT equivalent for voice interactions. Also custom naming, so instead of "Hey Siri" or "Hey Google" I could say, "Hey idiot" :-)

Definitely some interesting tech here.

OJFord2y ago

I assume (because they don't address it or look at all phased) the audio cutting in and out is just an artefact of the stream?

throwthrowuknow2y ago

Haven’t tried it but from work I’ve done on voice interaction this happens a lot when you have a big audience making noise. The interruption feature will likely have difficulty in noisy environments.

OJFord2y ago

Yeah that was actually my first thought (though no professional experience with it/on that side) - it's just that the commenter I replied to was so hyped about it and how fluid & natural it was and I thought that made it really jarr.

mvdtnz2y ago

Interesting that they decided to keep the horrible ChatGPT tone ("wow you're doing a live demo right now?!"). It comes across just so much worse in voice. I don't need my "AI" speaking to me like I'm a toddler.

practice92y ago

It is cringe overenthusiastic, but a proper instructions/system prompt will fix that mostly

slibhb2y ago

You can tell it not to talk like this using custom prompts.

marvin2y ago

One of the linked demos is it being sarcastic, so maybe you can make it remember to be a little more edgy.

yieldcrv2y ago

tell it to speak to you differently

with a GPT you can modify the system prompt

maest2y ago

It still refuses to go outside the deeply sanitise tone that "alignment" enforces on you.

baumgarn2y ago

it should be possible to imitate any voice you want like your actual parents soon enough

goatlover2y ago

That won't be Black Mirror levels of creepy /s

throwthrowuknow2y ago

Did you miss the part where they simply asked it to change its manner of speaking and the amount of emotion it used?

clhodapp2y ago

Call me overly paranoid/skeptical, but I'm not convinced that this isn't a human reading (and embellishing) a script. The "AI" responses in the script may well have actually been generated by their LLM, providing a defense against it being fully fake, but I'm just not buying some of these "AI" voices.

We'll have to see when end users actually get access to the voice features "in the coming weeks".

gabiruh2y ago

It's weird that the "airplane mode" seems to be ON on the phone during the entire presentation.

arthurcolle2y ago

This was on purpose - they connected it to the internet via a USB-C cable it appears, for consistent internet instead of having it switch WiFi

Probably some kinks there they are working out

OJFord2y ago

> Probably some kinks there they are working out

Or just a good idea for a live demo on a congested network/environment with a lot of media present, at least one live video stream (the one we're watching the recording of), etc.

At least that's how I understood it, not that they had a problem with it (consistently or under regular conditions, or specific to their app).

1 more reply

_flux2y ago

And eliminate the change of some prankster affecting the demo by attacking the wifi.

simoes2y ago

They mention at the beginning of the video that they are using hardwired internet for reliability reasons.

sitkack2y ago

You would want to make sure that it is always going over WiFi for the demo and doesn't start using the cellular network for a random reason.

rightbyte2y ago

You can turn off mobile data. They probably just wanted wired internet.

spaceman_20202y ago

This is going straight into 'Her' territory

snthpy2y ago

Hectic!

Thanks for this.

modeless2y ago

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo2y ago

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same hardware architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

AaronFriel2y ago

It has only been a little over one year since GPT-4 was announced, and it was at the time the largest and most expensive model ever trained. It might still be.

Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".

Even Meta is still baking their 400B parameter model.

1024core2y ago

As Altman said (paraphrasing): GPT-4 is the _worst_ model you will ever have to deal with in your life (or something to that effect).

4 more replies

bamboozled2y ago

Legit love progress

famouswaffles2y ago

GPT-3 was released in 2020 and GPT-4 in 2023. Now we all expect 5 sooner than that but you're acting like we've been waiting years lol.

skepticATX2y ago

The increased expectations are a direct result of LLM proponents continually hyping exponential capabilities increase.

4 more replies

pwdisswordfishc2y ago

Incidentally, this dialogue works equally well, if not better, with David Chalmers versus B.F. Skinner, as with the Simpsons characters.

dialup_sounds2y ago

Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): No, mother, it's just the Nvidia GPUs.

dialup_sounds2y ago

Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): "No, mother, it's just the Nvidia GPUs."

dlivingston2y ago

"Seymour, the house is on fire!"

"No, mother, that's just the H100s."

TIPSIO2y ago

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

kmeisthax2y ago

The problem with "better intelligence" is that OpenAI is running out of human training data to pillage. Training AI on the output of AI smooths over the data distribution, so all the AIs wind up producing same-y output. So OpenAI stopped scraping text back in 2021 or so - because that's when the open web turned into an ocean of AI piss. I've heard rumors that they've started harvesting closed captions out of YouTube videos to try and make up the shortfall of data, but that seems like a way to stave off the inevitable[0].

Multimodal is another way to stave off the inevitable, because these AI companies already are training multiple models on different piles of information. If you have to train a text model and an image model, why split your training data in half when you could train a combined model on a combined dataset?

[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.

pbhjpbhj2y ago

>harvesting closed captions out of YouTube videos

I'd bet a lot of YouTubers are using LLMs to write and/or edit content. So we pass that through a human presentation. Then introduce some errors in the form of transcription. Turn feed the output in as part of a training corpus ... we plateaued real quick.

It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?

Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?

1 more reply

llm_trw2y ago

>[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.

Whisper models are better than anything google has. In fact the higher quality whisper models are better than humans when it comes to transcribing text with punctuation.

WhitneyLand2y ago

Why do you think they’re using Google auto-captioning?

I would expect they’re using their own t2s which is still a model but way better quality and potentially customizable to better suit their needs

marvin2y ago

At some point, algorithms for reasoning and long-term planning will be figured out. Data won’t be the holy grail forever, and neither will asymptotically approaching human performance in all domains.

littlestymaar2y ago

I don't think a bigger model would make sense for OpenAI: it's much more important for them that they keep driving inference coat down, because there's no viable business model if they don't.

Improving the instruction tuning, the RLHF step, increase the training size, work on multilingual capabilities, etc. make sense as a way to improve quality, but I think increasing model size doesn't. Being able to advertize a big breakthrough may make sense in terms of marketing, but I don't believe it's going to happen for two reasons:

- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.

- I don't think they would benefit in an arm race with Meta, trying to keeping a significant edge. Meta is likely to be able to catch-up eventually on performance, but they are not so much of a threat in terms of business. Focusing on keeping a performance edge instead of making their business viable would be a strategic blunder.

jononor2y ago

What is OpenAI business model if their models are second-best? Why would people pay them and not Meta/Google/Microsoft - who can afford to sell at very low margins, since they have existing very profitable businesses that keeps them afloat.

littlestymaar2y ago

That's the question OpenAI needs to find an answer to if they want to end up viable.

They have the brand recognition (for ChatGPT) and that's a good start, but that's not enough. Providing a best in class user experience (which seems to be their focus now, with multimodality), a way to lock down their customers in some kind of walled garden, building some kind of network effect (what they tried with their marketplace for community-built “GPTs” last fall but I'm not sure it's working), something else?

At the end of the day they have no technological moat, so they'll need to build a business one, or perish.

For most tasks, pretty much every models from their competitors is more than good enough already, and it's only going to get worse as everyone improves. Being marginally better on 2% of tasks isn't going to be enough.

1 more reply

mupuff12342y ago

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

diego_sandoval2y ago

> Seems to me that performance is converging

It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.

In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.

drawnwren2y ago

Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.

4 more replies

wavemode2y ago

> video generation also seemed kind of stagnant before Sora

I take the opposite view. I don't think video generation was stagnating at all, and was in fact probably the area of generative AI that was seeing the biggest active strides. I'm highly optimistic about the future trajectory of image and video models.

By contrast, text generation has not improved significantly, in my opinion, for more than a year now, and even the improvement we saw back then was relatively marginal compared to GPT-3.5 (that is, for most day-to-day use cases we didn't really go from "this model can't do this task" to "this model can now do this task". It was more just "this model does these pre-existing tasks, in somewhat more detail".)

If OpenAI really is secretly cooking up some huge reasoning improvements for their text models, I'll eat my hat. But for now I'm skeptical.

1 more reply

scarmig2y ago

Yeah. There are lots of things we can do with existing capabilities, but in terms of progressing beyond them all of the frontier models seem like they're a hair's breadth from each other. That is not what one would predict if LLMs had a much higher ceiling than we are currently at.

I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.

andrepd2y ago

Exactly. People like to point at the start of a logistic curve and go "behold! an exponential"

aantix2y ago

The use of AI in the research of AI accelerates everything.

thefaux2y ago

I'm not sure of this. The jury is still out on most ai tools. Even if it is true, it may be in a kind of strange reverse way: people innovating by asking what ai can't do and directing their attention there.

1 more reply

jcd0002y ago

I bet this will also cause model regressions.

moomoo112y ago

I really hope GPT5 is good. GPT4 sucks at programming.

cududa2y ago

It's excellent at programming if you actually know the problem you're trying to solve and the technology. You need to guide it with actual knowledge you have. Also, you have to adapt your communication style to get good results. Once you 'crack the pattern' you'll have a massive productivity boost

partiallypro2y ago

In my experience 3.5 was better at programming than 4, and I don't know why.

twsted2y ago

It's better than at least 50% of the developers I know.

Jensson2y ago

A developer that just pastes in code from gpt-4 without checking what it wrote is a horror scenario, I don't think half of the developers you know are really that bad.

viking1232y ago

What kind of people are you working with?

idontpost2y ago

It's not better than any of the developers I work with.

Trying to talk it into writing anything other than toy code is an exercise in banging my head against the wall.

verdverm2y ago

Look to a specialized model instead of a general purpose one

moomoo112y ago

Any suggestions? Thanks

I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.

3 more replies

jameshart2y ago

I think this comment is easily misread as implying that this GPT4o model is based on some old GPT2 chatbot - that’s very much not what you meant to say, though.

This model has been being tested under a code name of ‘gpt2-chatbot’ but it is very much a new GPT4+-level model, with new multimodal capabilities - but apparently some impressive work around inference speed.

Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.

lossolo2y ago

I agree. I tried a few programming problems that, let's say, seem to be out of the distribution of their training data and which GPT4 failed to solve before. The model couldn't find a similar pattern and failed to solve them again. What's interesting is that one of these problems were solved by Opus, which seems to indicate that the majority of progress in the last months should be attributed to the quality/source of the training data.

aixpert2y ago

useless anecdata but I find the new model very frustrating, often completely ignoring what I say in follow up queries. it's giving me serious Siri vibes

(text input in web version)

maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person

dragonwriter2y ago

> As many highlighted there, the model is not an improvement like GPT3->GPT4.

The improvements they seem to be hyping are in multimodality and speed (also price – half that of GPT-4 Turbo – though that’s their choice and could be promotional, but I expect it’s at least in part, like speed, a consequence of greater efficiency), not so much producing better output for the same pure-text inputs.

kybercore2y ago

the model scores 60 points higher in lmsys than the best gpt 4 turbo model from april, that's still a pretty significant jump in text capability

avereveard2y ago

I tested a few use cases in the chat, and it's not particularly more intelligent but they seem to have solved laziness. I had to categorize my expenses to do some budgeting for the family, and in gpt 4 I had to go ten in ten, confirm the suggested category, download the file, took two days as I was constantly hitting the limit. gpt4o did most of the grunth work, then commincated anomalies in bulk, asked for suggestion for these, and provided a downloadable link in two answers, calling the code interpreter mulitple times, and working toward the goal on it's own.

and the prompt wasn't a monstrosity, and it wasn't even that good, it was just one line "I need help to categorize these expenses" and off it went. hope it won't get enshittified like turbo, because this finally feels as great as 3.5 was for goal seeking.

ozzydave2y ago

Heh - I'm using ChatGPT for the same thing! Works 10X better than Rocket Money, which was supposed to be an improvement on Mint but meh.

vitorgrs2y ago

They are admitting that is the im-also-a-good-gpt2-chatbot. There was 3.... Don't ask me why.

The "gpt2-chatbot" was the worst of the three.

j / k navigate · click thread line to collapse

0 comments

cube22222y ago

I think the live demo that happened on the livestream is best to get a feel for this model[0].

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

fvdessen2y ago

CapcomGo2y ago

But that's not the point of this update

fvdessen2y ago

5 more replies

jll292y ago

There is room for more than one use case and large language model type.

https://www.rev.com/blog/resources/the-5-best-open-source-sp...

CooCooCaCha2y ago

I think people who emphasis specialized models are operating under a false assumption that by focusing the model it'll be able to go deeper in that domain. However, the opposite seems to be true.

Granted, specialized models like AlphaFold are superior in their domain but I think that'll be less true as models become more capable at general learning.

whyever2y ago

They say it's twice as fast/cheap, which might matter for your use case.

minimaxir2y ago

It's twice as fast/cheap relative to GPT-4-turbo, which is still expensive compared to GPT-3.5-turbo and Claude Haiku.

https://openai.com/api/pricing/

2 more replies

fvdessen2y ago

I’d much rather have it be slower, more expensive, but smarter

3 more replies

ben_w2y ago

fvdessen2y ago

I guess I feel like I’ll get to keep my job a while longer and this is strangely disappointing…

1 more reply

Keyframe2y ago

coffeebeqn2y ago

abdullin2y ago

I have a few LLM benchmarks that were extracted from real products.

GPT-4o got slightly better overall. Ability to reason improved more than the rest.

RupertEisenhart2y ago

Its faster, smarter and cheaper over the API. Better than a kick in the teeth.

aaroninsf2y ago

Absolutely agree.

This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.

But it has some indeed magic ability to share a deictic frame of reference.

I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.

Exciting times. Scary times. Yee hawwwww.

nicklecompte2y ago

> using language in a way that requires both a shared world model

Where? What example of GPT-4o requires a shared world model? The customer support example?

famouswaffles2y ago

One of the most interesting things about the advent of LLMs is people bringing out all sorts of "reasons" GPT doesn't have true 'insert property' but all those reasons freely occur in humans as well

>that it freely believes contradictory facts without being confused,

Humans do this. You do this. I guess you don't have a meaningful world model.

>freely confabulates without having brain damage

Humans do this

>and it has no real understanding of quantity or causality.

Well this one is just wrong.

6 more replies

DonHopkins2y ago

>But it has some indeed magic ability to share a deictic frame of reference.

They really Put That There!

https://www.youtube.com/watch?v=RyBEUyEtxQo

Oh, shit.

razodactyl2y ago

In my view, this was in response to the machine being colourblind haha

ChuckMcM2y ago

Definitely some interesting tech here.

OJFord2y ago

I assume (because they don't address it or look at all phased) the audio cutting in and out is just an artefact of the stream?

throwthrowuknow2y ago

OJFord2y ago

mvdtnz2y ago

practice92y ago

It is cringe overenthusiastic, but a proper instructions/system prompt will fix that mostly

slibhb2y ago

You can tell it not to talk like this using custom prompts.

marvin2y ago

One of the linked demos is it being sarcastic, so maybe you can make it remember to be a little more edgy.

yieldcrv2y ago

tell it to speak to you differently

with a GPT you can modify the system prompt

maest2y ago

It still refuses to go outside the deeply sanitise tone that "alignment" enforces on you.

baumgarn2y ago

it should be possible to imitate any voice you want like your actual parents soon enough

goatlover2y ago

That won't be Black Mirror levels of creepy /s

throwthrowuknow2y ago

Did you miss the part where they simply asked it to change its manner of speaking and the amount of emotion it used?

clhodapp2y ago

We'll have to see when end users actually get access to the voice features "in the coming weeks".

gabiruh2y ago

It's weird that the "airplane mode" seems to be ON on the phone during the entire presentation.

arthurcolle2y ago

This was on purpose - they connected it to the internet via a USB-C cable it appears, for consistent internet instead of having it switch WiFi

Probably some kinks there they are working out

OJFord2y ago

> Probably some kinks there they are working out

Or just a good idea for a live demo on a congested network/environment with a lot of media present, at least one live video stream (the one we're watching the recording of), etc.

At least that's how I understood it, not that they had a problem with it (consistently or under regular conditions, or specific to their app).

1 more reply

_flux2y ago

And eliminate the change of some prankster affecting the demo by attacking the wifi.

simoes2y ago

They mention at the beginning of the video that they are using hardwired internet for reliability reasons.

sitkack2y ago

You would want to make sure that it is always going over WiFi for the demo and doesn't start using the cellular network for a random reason.

rightbyte2y ago

You can turn off mobile data. They probably just wanted wired internet.

spaceman_20202y ago

This is going straight into 'Her' territory

snthpy2y ago

Hectic!

Thanks for this.

modeless2y ago

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo2y ago

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same hardware architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

AaronFriel2y ago

It has only been a little over one year since GPT-4 was announced, and it was at the time the largest and most expensive model ever trained. It might still be.

Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".

Even Meta is still baking their 400B parameter model.

1024core2y ago

As Altman said (paraphrasing): GPT-4 is the _worst_ model you will ever have to deal with in your life (or something to that effect).

4 more replies

bamboozled2y ago

Legit love progress

famouswaffles2y ago

GPT-3 was released in 2020 and GPT-4 in 2023. Now we all expect 5 sooner than that but you're acting like we've been waiting years lol.

skepticATX2y ago

The increased expectations are a direct result of LLM proponents continually hyping exponential capabilities increase.

4 more replies

pwdisswordfishc2y ago

Incidentally, this dialogue works equally well, if not better, with David Chalmers versus B.F. Skinner, as with the Simpsons characters.

dialup_sounds2y ago

Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): No, mother, it's just the Nvidia GPUs.

dialup_sounds2y ago

Agnes (voice): "SEYMOUR, THE HOUSE IS ON FIRE!"

Skinner (looking up): "No, mother, it's just the Nvidia GPUs."

dlivingston2y ago

"Seymour, the house is on fire!"

"No, mother, that's just the H100s."

TIPSIO2y ago

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

kmeisthax2y ago

pbhjpbhj2y ago

>harvesting closed captions out of YouTube videos

It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?

Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?

1 more reply

llm_trw2y ago

Whisper models are better than anything google has. In fact the higher quality whisper models are better than humans when it comes to transcribing text with punctuation.

WhitneyLand2y ago

Why do you think they’re using Google auto-captioning?

I would expect they’re using their own t2s which is still a model but way better quality and potentially customizable to better suit their needs

marvin2y ago

littlestymaar2y ago

I don't think a bigger model would make sense for OpenAI: it's much more important for them that they keep driving inference coat down, because there's no viable business model if they don't.

- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.

jononor2y ago

littlestymaar2y ago

That's the question OpenAI needs to find an answer to if they want to end up viable.

At the end of the day they have no technological moat, so they'll need to build a business one, or perish.

1 more reply

mupuff12342y ago

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

diego_sandoval2y ago

> Seems to me that performance is converging

It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.

In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.

drawnwren2y ago

Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.

4 more replies

wavemode2y ago

> video generation also seemed kind of stagnant before Sora

If OpenAI really is secretly cooking up some huge reasoning improvements for their text models, I'll eat my hat. But for now I'm skeptical.

1 more reply

scarmig2y ago

I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.

andrepd2y ago

Exactly. People like to point at the start of a logistic curve and go "behold! an exponential"

aantix2y ago

The use of AI in the research of AI accelerates everything.

thefaux2y ago

1 more reply

jcd0002y ago

I bet this will also cause model regressions.

moomoo112y ago

I really hope GPT5 is good. GPT4 sucks at programming.

cududa2y ago

partiallypro2y ago

In my experience 3.5 was better at programming than 4, and I don't know why.

twsted2y ago

It's better than at least 50% of the developers I know.

Jensson2y ago

A developer that just pastes in code from gpt-4 without checking what it wrote is a horror scenario, I don't think half of the developers you know are really that bad.

viking1232y ago

What kind of people are you working with?

idontpost2y ago

It's not better than any of the developers I work with.

Trying to talk it into writing anything other than toy code is an exercise in banging my head against the wall.

verdverm2y ago

Look to a specialized model instead of a general purpose one

moomoo112y ago

Any suggestions? Thanks

I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.

3 more replies

jameshart2y ago

I think this comment is easily misread as implying that this GPT4o model is based on some old GPT2 chatbot - that’s very much not what you meant to say, though.

Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.

lossolo2y ago

aixpert2y ago

useless anecdata but I find the new model very frustrating, often completely ignoring what I say in follow up queries. it's giving me serious Siri vibes

(text input in web version)

maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person

dragonwriter2y ago

> As many highlighted there, the model is not an improvement like GPT3->GPT4.

kybercore2y ago

the model scores 60 points higher in lmsys than the best gpt 4 turbo model from april, that's still a pretty significant jump in text capability

avereveard2y ago

ozzydave2y ago

Heh - I'm using ChatGPT for the same thing! Works 10X better than Rocket Money, which was supposed to be an improvement on Mint but meh.

vitorgrs2y ago

They are admitting that is the im-also-a-good-gpt2-chatbot. There was 3.... Don't ask me why.

The "gpt2-chatbot" was the worst of the three.

j / k navigate · click thread line to collapse