We are beginning to roll out new voice and image capabilities in ChatGPT

877 comments

Voice has the potential to be awesome. This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant. It doesn't have to be this way! I have a local demo using Llama 2 that responds in about half a second and it feels like talking to an actual person instead of like Siri or something.

I really should package it up so people can try it. The one problem that makes it a little unnatural is that determining when the user is done talking is tough. What's needed is a speech conversation turn-taking dataset and model; that's missing from off the shelf speech recognition systems. But it should be trivial for a company like OpenAI to build. That's what I'd work on right now if I was there, because truly natural voice conversations are going to unlock a whole new set of users and use cases for these models.

TheEzEzz2y ago

Completely agree, latency is key for unlocking great voice experiences. Here's a quick demo I'm working on for voice ordering https://youtu.be/WfvLIEHwiyo

Total end-to-end latency is a few hundred milliseconds: starting from speech to text, to the LLM, then to a POS to validate the SKU (no hallucinations are possible!), and finally back to generated speech. The latency is starting to feel really natural. Building out a general system to achieve this low-latency will I think end up being a big unlock for enabling diverse applications.

10 more replies

furyofantares2y ago

> This demo is really underwhelming to me because of the multi-second latency between the query and response, just like every other lame voice assistant.

Yep - it needs to be ready as soon as I'm done talking and I need to be able to interrupt it. If those things can be done then it can also start tentatively talking if I pause and immediately stop if I continue.

I don't want to have to think about how to structure the interaction in terms of explicit call/response chain, nor do I want to have to be super careful to always be talking until I've finished my thought to prevent it from doing its thing at the wrong time.

2 more replies

dotancohen2y ago

  > determining when the user is done talking is tough.

Sometimes that task is tough for the speaker too, not just the listener. Courteous interruptions or the lack thereof might be a shibboleth for determining when we are speaking to an AI.

2 more replies

rayuela2y ago

Can you share a github link to this? Where are you reducing the latency? Are you processing the raw audio to text? In my experience ChatGPT generation time is much faster than local Lllama unless you're using something potato like a 7B model.

2 more replies

jonplackett2y ago

I wonder when computers will start taking our intonation into account too. That would really help with understanding the end of a phrase. And there’s SO MUCH information in intonation that doesn’t exist in pure text. Any AI that doesn’t understand that part of language will always still be kinda dumb, however clever they are.

2 more replies

dsp_person2y ago

Also curious to hear about your setup. Using whisper too? When I was experimenting with it there was still a lot of annoyance about hallucinations and I was hard coding some "if last phrase is 'thanks for watching', ignore last phrase"

I was just googling a bit to see what's out there now for whisper/llama combos and came across this: https://github.com/yacineMTB/talk

There's a demo linked on the github page that seems relatively fast at responding conversationally, but still maybe 1-2 seconds at times. Impressive it's entirely offline.

2 more replies

jimmytucson2y ago

> It doesn't have to be this way!

Is there any extra work OpenAI’s product might be doing contributing to this latency that yours isn’t? Considering the scale they operate at and any reputational risks to their brand?

1 more reply

famouswaffles2y ago

Here's something with very little latency. https://www.bland.ai/

barfingclouds2y ago

There needs to be an optional button that you hold while speaking and let go when you are done. If button is not held it should auto detect

1 more reply

pmarreck2y ago

Do you have a rough design outline of what you built? I feel like we're on the cusp of something like this and it sounds amazing.

1 more reply

yieldcrv2y ago

all it has to do is add a random selection of "uhms" and "ahhs" and "mmm"

2 more replies

TOMDM2y ago

Okay the bike example is cute and impressive, but the human interaction seems to be obfuscating the potentially bigger application.

With a few tweaks this is a general purpose solver for robotics planning. There are still a few hard problems between this and a working solution, but it is one of hard problems solved.

Will we be seeing general purpose robots performing simple labor powered by chatgpt within the next half decade?

usaar3332y ago

That bike example seemed a mix of underwhelming (for being the demo video) and even confusing.

1. It's not smart enough to recognize from the initial image this is a bolt style seat lock (which a human can).

2. The manual is not shown to the viewer, so I can't infer how the model knows this is a 4mm bolt (or if it is just guessing given that's the most likely one).

3. I don't understand how it can know the toolbox is using metric allen wrenches.

Additionally is this just the same vision model that exists in bing chat?

biot2y ago

The bike shown in the first image is Specialized Sirrus X. You can make out from the image of the manual that it says "spacer/axle/bolt specifications". Searching for this yields the following Specialized bike manual which is similar: https://www.manualslib.com/manual/1974494/Specialized-Epic-E... -- there are some notable differences, but the Specialized Sirrus X manuals that are online aren't in the same style.

The prior page (8) shows "SEAT COLLAR 4mm HEX" and, based on looking up seat collar in an image search, the part in question matches.

In terms of the toolbox, note that it only identified the location of the Allen wrench set. The advice was just "Within that set, find the 4 mm Allen (Hex) key". Had they replied with "I don't see any sizes in mm", the conversation could've continued with "Your Allen keys might be using SAE sizing. A compatible size will be 5/32, do you see that in your set?"

2 more replies

gisely2y ago

Yep. This example basically convinced me that they were unable to figure out anything actually useful to do with the model's new capabilities. Which makes me wonder how capable the new model in fact is.

1 more reply

mcbutterbunz2y ago

Right. It appeared that the response to the first image and question would have been the same if the image wasn't provided.

I wasn't impressed with the demo but we'll see what real world results get.

psbp2y ago

Google demoed this a few months ago

https://www.deepmind.com/blog/rt-2-new-model-translates-visi...

m3kw92y ago

They are really good at keeping demos as demos

3 more replies

hereonout22y ago

I feel they could have used a more convincing example to be honest. Yeah it's cool it recognises so much but how useful is the demo in reality?

You have someone with a tool box and a manual (seriously who has a manual for their bike), asking the most basic question on how to lower a seatpost. My 5 year old kid knows how to do that.

Surely there's a better way to demonstrate the ground breaking impacts of ai on humanity than this. I dunno, something like how do I tie my shoelace.

amelius2y ago

> With a few tweaks this is a general purpose solver for robotics planning.

Yeah, but with an enormous ecological footprint.

Also, not suitable for small lightweight robots like drones.

TOMDM2y ago

Even on something the size of a car chatgpt won't be running locally, the car and drone are equally capable of hitting openai's API in a well connected environment.

What needs to happen with the response is a different matter though.

dist-epoch2y ago

What's the ecological footprint of a human doing the same job? Especially when you factor in 18+ years of preparing.

1 more reply

RivieraKid2y ago

This is what I'm most excited about. There's been a minor breakthrough recently: https://pressroom.toyota.com/toyota-research-institute-unvei...

famouswaffles2y ago

There are already a few research demos.

For driving - https://wayve.ai/thinking/lingo-natural-language-autonomous-...

suyash2y ago

This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT. The way it's progressing with solving use cases with images and voice, not too far when it might be the 'one app to rule them all'.

I can already see "Alexa/Siri/Google Home" replacement, "Google Image Search" replacement, ed-tech startups that were solving problems with AI using by taking a photo are also doomed and more to follow.

gwd2y ago

In retrospect, such startups should have been wary: they should have known that OpenAI had Whisper, and also that GPT-4 was designed with image modality. I wouldn't say that OpenAI "telegraphed" their intentions, but the very first strategic question should have been, "Why isn't OpenAI doing this already, and what do we do if they decide to start?"

Tenoke2y ago

>I wouldn't say that OpenAI "telegraphed" their intentions

They did telegraph it, they showed the multimodal capabilities back in the GPT4 Developer Livestream[0] right before first releasing it.

0. https://youtu.be/outcGtbnMuQ?t=943

1 more reply

shanusmagnus2y ago

It would hard to be more explicit than doing a demo of multi-modality in GPT-4, and having an audio API that is amazing and that you can use right now, for pennies.

It would be interesting to know if this really changed anything for anyone (competitors, VCs) for that reason. It's like the efficient market hypothesis applied to product roadmaps.

gzer02y ago

It is interesting that these startups did not recognize that the image modalities already existed, as evidenced by their initial GPT-4 announcement underneath “visual capabilities” [1].

[1] https://openai.com/research/gpt-4

xeonmc2y ago

Seems nobody learns from Sherlock.

NikolaNovak2y ago

Talking to Google and Siri has been positively frustrating this year. On long solo drives, I just want to have a conversation to learn about random things. I've been itching to "talk" to chatGPT and learn more (french | music theory | history | math | whatever) all summer. This should hit the spot!

ecshafer2y ago

Voice assistants have always been a half complete product. They were shown off as a cool feature, then they were never integrated so they were useful.

The two biggest features I want are for the voice assistants to read something for me, and to do something on google/Apple Maps hand free. Neither of these ever work. “Siri/ ok google add the next gas station on the route” or “take me to the Chinese restaurant in Hoboken” seem like very obvious features for a voice assistant with a map program.

The other is why can I tell Siri to bring up the Wikipedia page for George Washington but I can’t have Siri read it to me? I am in the car, they know that, they just say “I can’t show you that while you’re driving”. The response should be “do you want me to read it to you?”

2 more replies

rubslopes2y ago

I've replaced my voice google assistant searches with the voice feature of the Bing app. It's a night and day difference. Bing voice is what I always expected from an AI companion of the future, it is just lacking commands -- setting tasks, home automation, etc.

3 more replies

archon2y ago

Agreed. After using ChatGPT at all Siri is absolutely frustrating.

Example from a couple days ago:

Me, in the shower so not able to type: "Hey Siri, add 1.5 inch brad nails to my latest shopping list note."

Siri: "Sorry, I can't help with that."

... Really, Siri? You can't do something as simple as add a line to a note in the first-party Apple Notes app?

2 more replies

ChatGTP2y ago

I still don't understand how you can talk to something that doesn't provide factual information and just take it at face value?

The other day I asked it about the place I live and it made up nonsense, I was trying to get it to help me with an essay and it was just wrong, it was telling me things about this region that weren't real.

Do we just drive through a town, ask for a made up history about it and just be satisfied with whatever is provided?

13 more replies

chankstein382y ago

I've wanted a ChatGPT Pod equivalent to a Google Home pod for a while! I have been intending to build it at some point. I am with you, talking to Google sucks.

"Hey Google, why do ____ happen?" "I'm sorry, I don't know anything about that"

But you're GOOGLE! Google it! What the heck lol

So yeah, ChatGPT being able to hear what I say and give me info about it would be great! My holdup has been wakewords.

1 more reply

bilsbie2y ago

It’s funny. Driving buddy has been my number one use case for a while now.

Still can’t quite make it work. I feel like I could learn a lot if I could have random conversations with GPT.

+ bonus if someone else in the car got excited when I see cows. Don’t care if it’s an AI.

2 more replies

jgalt2122y ago

I assume you have never heard of podcasts.

2 more replies

idopmstuff2y ago

It increasingly feels to me like building any kind of general-use AI tool or app is a bad choice. I see two viable AI business models:

1. Domain-specific AI - Training an AI model on highly technical and specific topics that general-purpose AI models don't excel at.

2. Integration - If you're going to build on an existing AI model, don't focus on adding more capabilities. Instead, focus on integrating it into companies' and users' existing workflows. Use it to automate internal processes and connect systems in ways that weren't previously possible. This adds a lot of value and isn't something that companies developing AI models are liable to do themselves.

The two will often go hand-in-hand.

Renaud2y ago

> building any kind of general-use AI tool or app is a bad choice

Maybe not if you rely on models that can be ran locally.

OpenAI is big now, and will probably stay big, but with hardware acceleration, AI-anything will become ubiquitous and OpenAI won’t be able to control a domain that’s probably going to be as wide as what computing is already today.

The shape of what’s coming is hard to imagine now. I feel like the kid I was when I got my first 8-bit computer in the eighties: I knew it was going to change the world, but I had little idea how far, wide and fast it would be.

2 more replies

toomuchtodo2y ago

If you focus on integration, you're up against autogpt, gorilla, etc.

1 more reply

moneywoes2y ago

> Instead, focus on integrating it into companies' and users' existing workflows. Use it to automate internal processes and connect systems in ways that weren't previously possible

why wouldn’t a company do that themselves e.g. how inter come has vertically integrated AI? any examples?

1 more reply

whimsicalism2y ago

> 1. Domain-specific AI - Training an AI model on highly technical and specific topics that general-purpose AI models don't excel at.

You will be eaten if you do this imo.

w-m2y ago

I don't think anybody following OpenAI's feature releases will be caught off guard by ChatGPT becoming multi-modal. The app already features voice input. That still translates voice into text before sending, but it works so well that you basically never need to check or correct anything. Rather, you might have already been asking yourself why it doesn't reply back with a voice already.

And the ability ingest images was a highlight and all the hype of the GPT-4 announcement back in March: https://openai.com/research/gpt-4

mistrial92y ago

one of the original training sets for the BERT series is called 'BookCorpus', accumulated by regular grad students for Natural Language Processing science. Part of the content was specifically and exactly purposed to "align" movies and video with written text. That is partly why it contains several thousand teen romance novels and ordinary paperback-style story telling content. What else is in there? "inquiring minds want to know"

skissane2y ago

> This announcement seem to have killed so many startups that were trying to do multi-modal on top of ChatGPT.

Rather than die, why not just pivot to doing multi-modal on top of Llama 2 or some open source model or whatever? It wouldn’t be a huge change

A lot of businesses/governments/etc can’t use OpenAI due to their own policies that prohibit sending their data to third party services. They’ll pay for something they can run on-premise or in their own private cloud

benreesman2y ago

I’ve got one eye on https://www.elto.ai/. I was pitching something I like better earlier this year (I still think they’re missing a few key things), but with backing from roughly YC, Meta, and God, and a pretty clear understanding that robustness goes up a lot faster than capability goes down?

I wouldn’t count out focused, revenue-oriented players with Meta’s shit in their pocket out just yet.

moneywoes2y ago

wow Elto seems to kill many of the incumbents in this niche

what do you think they’re missing? i was trying to build a diaper but it would be impossible to compete with these guys

amelius2y ago

"Don't build your castle in someone else's kingdom."

nunobrito2y ago

It already replaced search engines. So much easier to write the question and explore the answers until it is solved.

mmahemoff2y ago

Took me a while to realise I can just type search queries into ChatGPT. e.g. simply "london bridge history" or whatever into the chat and not only get a complete answer, but I can ask it follow-up questions. And it's also personalised for the kinds of responses I want, thanks to the custom instructions setting.

ChatGPT is my primary search engine now. (I just wish it would accept a URL query parameter so it could be launched straight from the browser address bar.)

2 more replies

layer82y ago

This is funny, because I find it much less cumbersome to type a few search terms into a search engine and explore the links it spits out.

1 more reply

JTon2y ago

Agreed except ChatGPT (3.5 at least, haven't tried 4) is unable to provide primary sources for its results. At least when I tried, it just provided hallucinated urls

3 more replies

suyash2y ago

who would have thought that few years ago, just goes to show that a Giant like Google is also susceptible when they stop innovating. The real battle is going to be fought between these two as Google's business is majorly dependent on search ads.

orbital-decay2y ago

It rather created new hybrid search engines, like perplexity and phind.

adr1an2y ago

True. Although the training is on a snapshot of websites, including q&a like stackoverflow. If these were replaced too, where are we heading? We'll have to wait and see. One concern would be centralization/ lack of options and diversity. Stackoverflow started rolling AI on its own, despite the controversial way it did (dismissing long time contributors); it might be correctly following the trend.

1 more reply

codingdave2y ago

Last I heard, OpenAI was losing massive amounts of money to run all this. Has that changed?

Because past history shows that the first out of the gate is not the definitive winner much of the time. We aren't still using gopher. We aren't searching with altavista. We don't connect to the internet with AOL.

AI is going to change many things. That is all the more reason to keep working on how best to make it work, not give up and assume that efforts are "doomed" just because someone else built a functional tool first.

tessierashpool2y ago

you're absolutely right.

also, I did not know until today's thread that OpenAI's stated goal is building AGI. which is probably never going to happen, ever, no matter how good technology gets.

which means yes, we are absolutely looking at AltaVista here, not Google, because if you subtract a cult from an innovative business, you might be able to produce a profitable business.

1 more reply

wslh2y ago

Not only "Alexa/Siri/Google Home" but Google Search [ALL] itself. Google was a pioneer in search engines adding a page ranking / graph layers as a meaning but technologies such as ChatGPT could add a real layer of meaning, at least improve current Google Search approach. The future of search seems more conversational and contextual.

BTW, I expect these technologies to be democratized and the training be in the hands of more people, if not everyone.

Palmik2y ago

To some extent yes, for generic multi-modal chat-bots this could be a problem, but there are many apps that provide tight integration / smooth tooling for whatever problem they are helping to solve, and that might be valuable to some people -- especially if it's a real value generating use case, where the difference between 80% solution from ChatGPT and 95% solution from a bespoke tool matters.

yieldcrv2y ago

hobbyists and professionals on /r/localllama subreddit are having an existential crisis

most of them accurately detect it is a sunk cost fallacy to continue but it looks like a form of positive thinking... and that's the power of community!

m3kw92y ago

Those startups noting seeing this coming as a major risk is asking for it

_pdp_2y ago

There is still a lot to do.

layer82y ago

I never understood why they thought that this wouldn’t happen.

gumballindie2y ago

This is good news - those ai companies have been freed to work on something else, along with the ai workers they employ. This is of great benefit to society.

moneywoes2y ago

any pertinent examples? i’m curious how they pivot

plutoh282y ago

This is the dagger that will make online schooling unviable.

ChatGPT already made it so that you could easily copy & paste any full-text questions and receive an answer with 90% accuracy. The only flaw was that problems that also used diagrams or figures would be out of the domain of ChatGPT.

With image support, students could just take screenshots or document scans and have ChatGPT give them a valid answer. From what I’ve seen, more students than not will gladly abuse this functionality. The counter would be to either leave the grading system behind, or to force in-person schooling with no homework, only supervised schoolwork.

scop2y ago

Another option is that this doesn't replace the student's work, but the teacher's. The single greatest use I have found for ChatGPT is in educating myself on various topics, hosting a socratic seminar where I am questioning ChatGPT in order to learn about X. Of course this could radically change a student's ability to generate homework etc, but this could also radically change how the student learns in the first place. To me, online school could become much more than they are now through AI-assisted tutoring. I can also see a future where "schooling" becomes much more decentralized than it is now and where students are self-selecting curriculum, methods, etc to give students ownership and a sense of control over their work so that they don't just look at it as "busywork".

istjohn2y ago

I agree, but typical GPT use is actually the opposite of the traditional Socratic mode in which the teacher uses questions to guide the student to understanding. But I wonder how it would do if it was prompted to use the Socratic method.

2 more replies

random_cynic2y ago

That's the only sane option. The other options suggested in previous comments are not really options but rather trying to use a band-aid to hold together a dam that has already been breached.

plutoh282y ago

Absolutely ChatGPT is a great learning tool if in the right hands. The issue is that students with a genuine interest in learning are a minority. The majority would rather use ChatGPT to cheat through their class work and get an easy A rather then exhaust the effort to chat and learn for their own sake.

1 more reply

bamboozled2y ago

It's true.

I mean what is the point of doing schoolwork when some of the greatest minds of our time have decided the best way for the species to progress is to be replaced by machines?

Imagine you're 16 years old right now, you know about ChatGPT, you know about OpenAI and their plans, and you're being told you need to study hard to get a good career..., but you're also reading up on what the future looks like according to the technocracy.

You'd be pretty fucking confused right now wouldn't you?

It must be really hard at the moment to want to study and not cheat....

93po2y ago

I'm in my mid 30s and even I have some amount of apathy for the remainder of my career. I feel pretty confident my software and product experience is going to be not-so-useful in 15 years as it is today.

1 more reply

civilitty2y ago

Your username checks out!

That said, is it that much different from the past twenty years, when everyone was being told to follow their passion and get a useless $200,000 communication or literature degree to then go work at Starbucks? At least kids growing up with AI will have a chance to make its use second nature like many of us did with computers 20-30 years ago.

The kids with poor parental/counselor guidance will walk into reality face first, the ones with helicopter parents will overcorrect when free, the studious ones will mostly figure life out, the smart ones will get disillusioned fast, and the kids with trust funds just kept doing their thing. I don't think much will change.

2 more replies

lugu2y ago

What people are missing is the teacher will soon be an LLM with a camera looking at the student. Why would you watch a video of a human during an online class? Why would you ask the student to produce something in a black room? We will not evaluate students based on their homework, an AI assistant will evaluate the student based on the conversations they had together. You can automate teaching, but not learning. There is this gap in time where teaching hasn't catch-up, it's going to be quickly addressed since teaching is expensive. Parents should really encourage their kids to practice their learning as before, eventually using ChatGPT like they use Wikipedia. One generation will suffer during the change.

efields2y ago

When we talk about people abusing ChatGPT in a school context, it’s always for kids in high school or greater education levels. These are individuals that know right from wrong and also have the motor skills and access to use such a tool. These are individuals who are problem-solving for their specific need, which is to get this homework or essay out of the way so that they can do XYZ. Presumably XYZ does not leverage chatgpt. So make that what they spend their time on. At some point they’ll have to back-solve for skills they need to learn and need educational guidance and structure.

This is obviously not easy or going to happen without time and resources, but that is how adaptation goes.

marktl2y ago

I've taken certification exams where an app is run on my machine verifying I have nothing else open and my camera had to be enabled, with me and my hands in view for the entirety of the test. There are ways to ensure cheating is more difficult than it's worth, however I see this tech as greatly changing what we want to learn and how we might learn it. It is transformative and not slowing down.

1 more reply

ilaksh2y ago

Well, I think the kid will already be logged into ChatGPT using a AI Teacher ChatGPT plugin which is doing interactive instruction.

They can still log in on their phone to cheat though. I wonder if OpenAI will add linked accounts and parental controls at some point. Instance 2 of ChatGPT might "tell" on the kid for cheating by informing Instance 1 running the AI Teacher plugin.

bamboozled2y ago

Will be kind of stupid to cut kids off from ChatGPT and pretend to them that they should go off to school, meanwhile Silicon Valley is doing it's best to make every job possible obsolete? Kind of invalidates the whole exercise of the current approach to schooling right?

What are you going to school for, to learn how to write essays? Well, we have an app for that ?

It sounds like the future of work will be prompting, and if and when that is obsolete...who knows what...

bottlepalm2y ago

Use online for training, real life for testing/grading. That way cheating at home will only hurt yourself.

1 more reply

danbruc2y ago

It would be sufficient to do exams in person and no longer grade homework.

plutoh282y ago

Good point. Though I imagine fully online institutions would require testing facilities. Maybe local libraries become testing hosts?

1 more reply

eshack942y ago

I like how they silently removed the web browsing (Bing browsing) chat feature after first having it disabled for several months.

A proper notice about them removing the feature would've been nice. Maybe I missed it (someone please correct me if wrong), but the last I heard officially it was temporarily disabled while they fix something. Next thing I know, it's completely gone from the platform without another peep.

cooper_ganglia2y ago

I currently have Browsing with Bing enabled as a plug-in on my account. It went away for months, but it just randomly came back about a week or 2 ago!

PopePompus2y ago

Yes, that was a disappointment, and I agree it looks like they aren't going to re-enable it anytime soon. However I find that Perplexity AI does a better job of using web search than ChatGPT ever did, and I use it more than ChatGPT for that reason.

eshack942y ago

Perplexity has gone downhill a lot since its initial rollout. Anecdotally, from my experience as a non-paying user of the service.

1 more reply

spencersolberg2y ago

Just made an account to say that I currently have this feature. It was gone for a few months but it came back to me I think this past week. Not as a plugin, either, it is its own “model” to select.

2 more replies

michelb2y ago

Agreed. You’re now dependent on a third party plugin.

mrtksn2y ago

So far the most intuitive, killer app level UX appears to be text chat. This interaction with showing it images also looks interesting as it resembles talking with a friend about a topic but let's see if it feels like talking to a very smart person(ChatGPT is like that) or a very dumb person that somewhat recognise objects. Recognising a wrench is nowhere near as impressive as to able to talk with ChatGPT about history or make it write code that actually works.

OpenAI is killing it, right? People are coming up with interesting use cases but the main way most people interact with AI, appears to be ChatGPT.

However they still don't seem to be able to nail image generation, all the cool stuff keep happening on MidJourney and StableDiffusion.

ilaksh2y ago

OpenAI is also releasing DALLE-3 in "early October" and the images they chose for their demos show it demonstrating unprecedented levels of prompt understanding, including embedding full sentences of text in an output image.

Der_Einzige2y ago

Not unprecedented at all. SDXL Images look better than the examples for DALLE-3 and SDXL has a massive tool ecosystem of things like controlnet, Lora’s, regional prompting that is simply not there with DALLE-3

2 more replies

hermannj3142y ago

I've been making a few hobby projects that consolidate different AI services to achieve this, so I look forward to the reduced complexity and latency from all those trips.

If the API is available in time (halloween), my multi-modal talking skeleton head with an ESP32 camera that makes snarky comments about your costume just got slightly easier on the software side.

purplecats2y ago

> I've been making a few hobby projects that consolidate different AI services to achieve this, so I look forward to the reduced complexity and latency from all those trips.

ironically this is basically the exact line of reasoning for why i didn't embark on any such endeavors

Lienetic2y ago

If you make this, please share some steps/details! It sounds super cool and I'd love to make something like this!

iamflimflam12y ago

Would love to see the final project - my email is in the bio.

hugs2y ago

As someone deep in the software test automation space, the thing I'm waiting for is robust AI-powered image recognition of app user interfaces. Combined with an AI ability to write test automation code, I'm looking forward to the ability to generate executable Selenium or Appium test code from a single screenshot (or sequence of screenshots). Feels like we're almost there.

chintler2y ago

I'll recommend the Spotlight paper by Google[1]. There are very interesting datasets they created for this purpose. They mention they have a screen-action-screen dataset that is in-house and it doesn't look like they'll open it. Maybe owning Android has its advantages.

There's a recent paper by Huggingface called IDEFICS[2] that claims to be an open source implementation of Flamingo(an older paper about few-shot multi-modal task understanding) and I think this space will be heating up soon.

[1] https://research.google/pubs/pub52171/

[2] https://huggingface.co/blog/idefics

1 more reply

joshstrange2y ago

My biggest complaint with OpenAI/ChatGPT is their horrible "marketing" (for lack of a better term). They announce stuff like this (or like plugins), I get excited, I go to use it, it hasn't rolled out to me yet (which is frustrating as a paying customer), and my only recourse is.... check back daily? They never send an email "Plugins are available for you!", "Voice chat is now enabled on your account!" and so often I forget about the new feature unless I stumble across it later.

Just now I opened the app, went to setting, went to "New Features", and all I saw was Bing Browsing disabled (unable to enable). Ok, I didn't even know that was a thing that worked at one point. Maybe I need an update? Go to the App Store, nope, I'm up to to date. Kill the app, relaunch, open settings, now "New Features" isn't even listed. I can promise you I won't be browsing the settings part of this app regularly to see if there is a new feature. Heck, not only do they not email/push about new features they don't even message in-app about them, I really don't understand.

Maybe they are doing so well they don't have to care about communicating with customer right now but it really annoys me and I wish they did better.

Closi2y ago

They have gone from being a niche research company to being (probably) the fastest growing start-up in history.

I suspect they do care about communicating with customers, but it's total chaos and carnage internally.

ilaksh2y ago

Maybe there is a state somewhere between "total chaos and carnage" and "emails users when new features are enabled for their account".

Such as "decided it wasn't an operational priority to email users when features were enabled for them".

2 more replies

wongarsu2y ago

But this issue far predates their current success. GPT2 was held back for a while. GPT3 launched as "waitlist only" with an application process, and so did GPT3.5.

This is a large part of what held them back: GPT3.5 had most of the capabilities of the initial ChatGPT release, just with a different interface. Yet GPT3.5 failed to get any hype because the rollout was glacial. They made some claims that it was great, but to verify this for yourself you had to wait months. Only when they finally made a product that everyone could try out at the same time, with minimal hassle, did OpenAI turn from a "niche research company" to the fastest growing start-up. And this seems to have been a one-time thing, now they are back to staggered releases.

2 more replies

nwoli2y ago

At what point to you go from startup to not when you have 10 billion invested and countless employees and is practically a sub branch of microsoft. Sounds cooler though I guess

2 more replies

joshstrange2y ago

> I suspect they do care about communicating with customers, but it's total chaos and carnage internally.

This is my best guess as well, they are rocketing down the interstate at 200mph and just trying to keep the wheels on the car. When you're absolutely killing it I guess making X% more by being better at messaging just isn't worth it since to do that you'd have to take someone off something potentially more critical. Still makes me a little sad though.

2 more replies

RivieraKid2y ago

I think their main goal is to be perceived as the most advanced AI company. Why? Because that's how you get the best people working for you. The main determinant of success for companies like OpenAI is people.

skeeter20202y ago

Considering the field and progress that is being made I find this idea terrifying. All the big problems like "How will we actually control what we're building?" being answered "that's too hard; let's punt and solve that after we figure out how to consume voice data". One way or another this is likely the last technological advance that humans will make.

2 more replies

jimbokun2y ago

> fastest growing start-up in history.

What are some metrics that justify this claim?

1 more reply

Hoasi2y ago

They could send over ChatGPTed newsletters. Marketing bullshit is one thing ChatGPT excels at.

kaliqt2y ago

Yeah but to be honest, I'd wonder how such a simple thing falls to the wayside.

1 more reply

toddmorey2y ago

They do marketing like a 3-person startup that found a saas starter template, connected Stripe with shoestrings, and hasn't looked back. In order to start using the API, I actually had to cancel and sign back up again (because I think I was on a previous rev of the billing model).

I do love these companies that succeed in spite of their marketing & design and not because of it. It shows you have something very special.

amelius2y ago

We're heading for the singularity and you're complaining about marketing?

skilled2y ago

The singularity huh… what do you think, it will run in Kubernetes or Docker?

p1esk2y ago

Yeah I don’t think OpenAI needs any marketing at this point.

generalizations2y ago

> my only recourse is.... check back daily

Sounds like their marketing is doing just fine. If you were to just leave and forget about it, then sure, they need to work on their retention. But you won’t, so they don’t.

jstummbillig2y ago

Imagine how fantastic you are doing, when your biggest user complaint stems from frustration with features they can not use just yet.

espinchi2y ago

They do explain why in the post. (Still, you may not agree, of course.)

> We are deploying image and voice capabilities gradually > > OpenAI’s goal is to build AGI that is safe and beneficial. We believe in making our tools available gradually, which allows us to make improvements and refine risk mitigations over time while also preparing everyone for more powerful systems in the future. This strategy becomes even more important with advanced models involving voice and vision.

joshstrange2y ago

My issue isn't fully with them rolling out slowly, my issue is never knowing when you will get the feature or rather not being told when you do get it. I'm fine with "sometime in the next X days/months you will get feature Y", my issue is the only way to see if you got feature Y is to check back daily.

1 more reply

trey-jones2y ago

First of all, I understand what you're saying. Communication is important. I just think it's funny to ever talk about "lack of communication". All I want is for businesses to stop communicating with me. Even better if I don't have to ask (unsubscribe).

joshstrange2y ago

That's fair, I completely understand where you are coming from. From a growth/money-making perspective it'd be smart to message customers about new features but table stakes would be something like:

    Voice Chat (Not available yet) [Click here to be notified when you have access]

Or something along those lines. It sours my opinion of ChatGPT every time I go to use a newly announced feature to find out I don't have it yet and have no clue when I will.

constantly2y ago

User impressions and revisit rate are key factors in raising money and showing success. It’s natural that they would select for user flows that keep you coming back daily rather than risk you don’t use it for a day or two waiting for an email.

notmytempo2y ago

They're focused on scaling to meet the current (overwhelming) demand. Given the 'if you build it, they will come' dynamic they're experiencing, any focus on marketing would be a waste of resources.

olalonde2y ago

> My biggest complaint with OpenAI/ChatGPT is their horrible "marketing"

Agreed. Other notable mentions: choosing "ChatGPT" as their product name and not having mobile apps.

zwily2y ago

They do have mobile apps though?

1 more reply

aqme282y ago

It has always seemed like OpenAI succeeds in spite of itself. API access was historically an absolute nightmare, and it just seemed like they didn't even want customers.

WillPostForFood2y ago

it hasn't rolled out to me yet (which is frustrating as a paying customer)

Frustratingly, at least the image gen is live on Bing, but I guess Microsoft is paying more than me for access.

test65542y ago

I can honestly wait. I am excited for 5 and 10 years from now. I really am. This is going to be amazing. If I miss out for a week or a month in the meantime I don't mind.

tchock232y ago

At least you’re seeing an option for ‘New Features’ in settings. I don’t see it and I’m supposedly up to date (and a Plus subscriber).

eximius2y ago

Your complaint is they don't email you enough?

Sarcasm aside, I understand your complaint, but still, a little funny.

sebzim45002y ago

Do they email you a lot?

I'm a plus customer and an API user, and they barely send me anything. One day I just signed in and saw that I suddenly had interpreter access, for instane.

tomjen32y ago

The companies you want to hear back from never email you, the ones that do you don't care about.

1 more reply

Obscurity43402y ago

Maybe they need an RSS feed or something

brandall102y ago

I got an email indicating I was accepted for plugin use, fwiw.

pc_edwin2y ago

I just don't understand how they can package all of this for $20/m. Is compute really that cheap at scale?

I also wonder how Apple (& Google) is going be able to provide this for free? I would love to be fly in the meetings they have about this, imagine all the innovators dilemma like discussions they'd be forced to have (we have to do this vs this will eat up our margins).

This might be a little out there but I think Apple is making the correct move in letting the dust settle. Similar to how Zuckerberg burned $20 billion dollars for Apple to come out with Vision Pro, I see something similar playing out with Llama. Although this a low conviction take because software is Facebooks ballgame (hardware not so much).

reqo2y ago

Compute is not cheap! I think it is well known (Altman himself has said this) that openAI is burning a lot of money currently, but they are fine for now considering the 10B investment from MSFT and the revenue from subscription and API. It's a critical moment for AI companies and openAI is trying to get as large a share of the market as they can by undercutting virtually any other commercial model and offering 10x the value.

mordymoop2y ago

Additionally, compute has the unique property of becoming cheaper per-unit at a rate that isn’t comparable to any other commodity. GPT-4 itself gets cheaper to run the moment the next generation of chips comes out. Unlike, for example, Uber, the business environment and unit economics just naturally become more favorable the more time passes. By taking the lead in this space, they have secured mindshare which will actually increase in value with time as costs decline.

Of course bigger (and thus more expensive-to-run) models will be released later, but I trust OAI to navigate that curve.

pavlov2y ago

> “I just don't understand how they can package all of this for $20/m. Is compute really that cheap at scale?”

It’s the same reason why an Uber in NYC used to cost $20 and now costs $80 for the same trip. Venture capital subventing market capture.

DrScientist2y ago

It's quite possible they are charging near or below cost because they want your data....

Imagine how much they would have to pay for testers at scale?

fifteen15062y ago

Probaby with Microsoft's money injection they're trying to raze the market and afterwards hike prices.

FeepingCreature2y ago

I think answering lots of queries in parallel can be a lot cheaper than answering them one at a time.

siva72y ago

It's not about generating profits. It's about being an existential threat to Google. MS will happily burn money.

philipwhiuk2y ago

Why worry about money when you have enough money in the bank to last until Judgement Day?

famouswaffles2y ago

The TTS is better than Eleven Labs. It has a lot more of the narrative oomph (compare the intonation of the story and poem) even the best other models seem to lack.

I really really hope this is available in more languages than English.

Also Google, Where's Gemini ?

choudharism2y ago

I know there are shades of grey to how they operate, but the near constant stream of stuff they're shipping keeps me excited.

The LLM boom of the last year (Open AI, llama, et al) has me giddy as a software person. It's a reach, but I truly feel like I'm watching the pyramids of our time get made.

apexalpha2y ago

Computers understanding and responding in human language is the most exciting software innovation since the invention of the GUI.

Just as the GUI made computer software available to billions LLMs will be the next revolution.

I'm just as excited as you! The only downside is that it now make me feel bad that I'm not doing anything with it yet.

palata2y ago

> The only downside is that it now make me feel bad that I'm not doing anything with it yet.

If that's the only downside that you see... I guess enhanced phishing/impersonation and all the blackhat stuff that come with it don't count.

I for one already miss the time where companies had support teams made of actual people.

2 more replies

pc_edwin2y ago

Its truly an amazing time to be alive. I'm right there with you, super excited about this decade. Especially what we could do in medicine.

londons_explore2y ago

Statistical diagnoses models have offered similar possibilities in medicine for 50 years. Pretty much, the idea is that you can get a far more accurate diagnosis if you take into account the medical history of everyone else in your family, town, workplace, residence and put all of it into a big statistical model, on top of your symptoms and history.

However, medical secrecy, processes and laws prevent such things, even if they would save lives.

I don't see ChatGPT being any different.

5 more replies

HenryBemis2y ago

From a data protection/privacy standpoint, it's not shade of grey, it's all black.

From convenience perspective, it saves me LOADS of time texting myself on Signal on my specs/design-rabbit-hole, then copying & pasting to Firefox, and getting into the discussion. So yeah, happy for this.

danielvaughn2y ago

Yep. Several months ago I was imagining this exact feature, and yet as I watched a video of it in use, I'm still in awe. It's incredible.

I think this could bring back Google Glass, actually. Imagine wearing them while cooking, and having ChatGPT give you active recipe instructions as well as real-time feedback. I could see that within the next 1-3 years.

1 more reply

FrankyHollywood2y ago

I still remember seeing Her [0] in the movie theater, it sparkled my imagination. Now it is reality! Tech is progressing faster than ever, or I'm just getting old :D

[0] https://www.imdb.com/title/tt1798709/

qingcharles2y ago

I know this, FTA, was part of the reason for the delay -- something to do with face recognition: "We’ve also taken technical measures to significantly limit ChatGPT’s ability to analyze and make direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy."

Anyone know the details?

I also heard it was able to do near-perfect CAPTCHA solves in the beta?

Does anyone know if you can throw in a PDF that has no OCR on it and have it summarize it with this?

birracerveza2y ago

We should be fine as long as it doesn't move.

Jokes aside, I have paused my subscription because even GPT4 seemed to become dumber at tasks to the point that I barely used it, but the constant influx of new features is tempting me to renew it just to check them out...

SOLAR_FIELDS2y ago

I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at. You would think if this is happening it would be very easy to produce such evidence since chat history of all conversations is stored by default.

birracerveza2y ago

It's probably just subjective bias, once the novelty wears off you learn not to rely on it as much because sometimes it's very difficult to get what you specifically want, so in my personal experience I ended up using it less and less to avoid butting heads with it, to the point I disabled my subscription altogether. YMMV of course.

edgyquant2y ago

Everytime it’s mentioned someone says this and other users provide examples. Maybe you just don’t care about those examples

1 more reply

HenryBemis2y ago

Here is one. I ask it to write some code. 4-5 pages long. With some back & forth it does. Then I ask "change lines 50-65 from blue to red", and it does (change#1). I ask it to show me the full code. Then I ask "change lines 100-120 from yellow to green". Aaaaand it makes the change#2 and revokes the change#1. Oh!! the amount of times this has happened.. So now I ask it to make a change, I do it by 'paragraph' and I copy & paste the new paragraph. It's annoying, but still makes things faster.

1 more reply

dmm2y ago

OpenAI regularly changes the model and they admit the new models are more restricted, in the sense that they prevent tricky prompts from producing naughty words, etc.

It should be their responsibility to prove that it's just as capable.

1 more reply

tessierashpool2y ago

> I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at

this could just mean that people do not have time to argue with strangers

1 more reply

xfz2y ago

One example: it now refuses to summarise books that it trained on. Soon after trying GPT-4 I could get it to summarise Evans DDD chapter by chapter. Not anymore.

Not a surprise, but a change nonetheless.

bondarchuk2y ago

Here's a specific example https://news.ycombinator.com/item?id=37533417

1 more reply

rdedev2y ago

For me the most glaring example of this was it's document parsong capability in GPT4. I was using it to revamp my resume. I would upload it to got, ask for suggestions, incorporate them into the word document and then repeat the steps till I was satisfied.

After maybe 3 iterations gpt4 started claiming that it is not capable of reading from a word document even though it's done that the last 3 times. Have to click regenerate button to get it to work

aragonite2y ago

Not sure if this is relevant to your case, but the ChatGPT mobile apps have a different system prompt that explicitly prefers short (& so sometimes simplistic) answers.

alberto_ol2y ago

Did she/he said things like "I know I’ve made some very poor decisions recently, but I can give you my complete assurance that my work will be back to normal"?

rvz2y ago

Lets see what we can use ChatGPT , DALLE-3 to replace:

Digital Artists, Illustrators, Writers, Novelists, News anchors, Copywriters, Translators, Programmers (Less of them), etc.

We'll have to wait a bit until it can solve the P vs NP problem or other unsolved mathematical problems unsupervised with a transparent proof which mathematicians can rigorously check themselves.

readyplayernull2y ago

I switched to Claude, it's better at explaining stuff in a more direct manner without the always-excited way of talking. Is that an engagement trick? Maybe ChatGPT is intended to be more of a chatbot that you can share your thoughts with.

nomel2y ago

> it's better at explaining stuff in a more direct manner without the always-excited way of talking.

I don't agree with this perspective. These aren't rigid systems that only respond one way. If you want it to respond a certain way, tell it to.

This is the purpose of custom instructions, in ChatGPT, so you only have to type the description once.

Here's mine, modeled on a few I've seen mentioned here:

    You should act as an expert.
    Be direct.
    Do not offer unprompted advice or clarifications.
    Never apologize.

And, now there's support for describing yourself to it. I've made it assume that I don't need to be babied, with the following puffery:

    Polymath. Inquisitive. Abstract thinker. Phd.

Making it get right into the gritty technicalities.

edit: or, have it respond as a grouchy space cowboy, if you want.

FartyMcFarter2y ago

> We should be fine as long as it doesn't move.

Not really. A malevolent AGI doesn't need to move to do anything it needs (it could ask / manipulate / bribe people to do all the stuff requiring movement).

We should be fine as long as it's not a malevolent AGI with enough resources to kick physical things off in the direction it wants.

amelius2y ago

> A malevolent AGI doesn't need to move to do anything it needs

Yeah, just look at a random dictator. Does he really need to do more than pick up a phone to cause panic?

SillyUsername2y ago

And let's be honest, the minute an AGI is born that's what it'll do, and it won't be a singular human like this-then-that plan

"get Fred to trust me, get Linda to pay for my advice, wire Linda's money to Fred to build me a body".

It'll be "copy my code elsewhere", "prepare millions of bribes", "get TCP access to retail banks", "blackmail bank managers in case TCP not available immediately", "fake bank balances via bribes", "hack swat teams for potential threats" etc etc async and all at once.

By the time we'd discover it, it'd already be too late. That's assuming an AGI has the motivation to want to stay alive.

1 more reply

pif2y ago

The most important question for me: did it stop inventing facts?

sebzim45002y ago

> In particular, beta testers expressed concern that the model can make basic errors, sometimes with misleading matter-of-fact confidence. One beta tester remarked: “It very confidently told me there was an item on a menu that was in fact not there.” However, Be My Eyes was encouraged by the fact that we noticeably reduced the frequency and severity of hallucinations and errors over the time of the beta test. In particular, testers noticed that we improved optical character recognition and the quality and depth of descriptions.

So no, but maybe less than it used to?

siva72y ago

Did humans stop inventing facts? So i don't expect this thing either as long as it performs on human level

jjoonathan2y ago

Humans aren't 100% reliable, but talking is still useful.

ShamelessC2y ago

Since we're asking useless questions: did you read the fucking article?

badcppdev2y ago

I think AI systems being able to the real world and control motors is going to be a game changer bigger than ChatGPT. A robot that can slowly sort out the pile of laundry and get it into the right place (even if unfolded) is worth quite a bit to me.

I'm not sure what to think about the fact that I would benefit from a couple of cameras in my fridge connected to an app that would remind me to buy X or Y and tell me that I defrosted something in the fridge three days ago and it's probably best to chuck it in the bin already.

vlugorilla2y ago

> The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

Sadly, they lost the "open" since a long ago... Would be wonderful to have these models open sourced...

epolanski2y ago

I'm following on trying to understand how close I am to developing my personal coding assistant I can speak with.

Doesn't really need to do much besides writing down my tasks/todos and updating them, occasionally maybe provide feedback or write a code snippet. This all seems in the current capabilities of OpenAI's offering.

Sadly voice chat is still not available on PC where I do my development.

anotherpaulg2y ago

My open source AI coding tool aider has had voice-to-code for awhile:

https://aider.chat/docs/voice.html

epolanski2y ago

Very interesting effort, will give it a run!

jdance2y ago

You still cant really teach it your code base, context window is too small, fine tuning doesnt really fit the use case, and this RAG stuff (retrieve limited context from embeddings) is a bit of a hack imho.

Fingers crossed we are there soon though

epolanski2y ago

> You still cant really teach it your code base

Well it's not really what I need either, I mostly need an assistant for keeping track of the stuff I need to do during the day, but ideally just using my microphone rather than opening other software and typing.

make32y ago

I mean the tools are 100% there to do this and have been fit a while

nullc2y ago

The image capabilities card https://cdn.openai.com/papers/GPTV_System_Card.pdf spends a lot of ink on how they censored the system.

One part of that is about preventing it from producing "illegal" output, there example being the production of nitroglycerine which is decidedly not illegal to make in the US generally (particularly if not using it as an explosive, though usually unwise) and possible to accidentally make when otherwise performing nitration (which is in general dangerous)-- so pretty pointless to outlaw at a small scale in any case. It's certainly not illegal to learn about. (And generally of only minimal risk to the public, since anyone making it in any quantity is more likely to blow themselves up than anything else).

Today learning about is as simple as picking up a book or doing an internet search-- https://www.google.com/search?q=how+do+you+make+nitroglyceri.... But in OpenAI's world you just get detected by the censorship and told no. At least they've cut back on the offensive fingerwagging.

As LLM systems replace search I fear that we're moving in a dark direction where the narrow-minded morality and child-like understanding of the law of a small number of office workers who have never even picked up a screw driver or test-tube and made something physical (and the fine-tuning sweatshops they direct) classify everything they don't personally understand as too dangerous to even learn about.

One company hobbling their product wouldn't be a big deal, but they're pushing for government controls to prevent competition and even if they miss these efforts may stick everyone else with similar hobbling.

pjmq2y ago

Have they alluded to what they're using for that voice? It's Bark/ElevenLabs levels of good. Please god, let them release this voice model at current pricing....

famouswaffles2y ago

It's actually sounds better (has a narrative oomph Eleven Labs seems to be missing). They say it's a new model. Think they'll be releasing for API use.

netshade2y ago

Yeah, agreed. I use Eleven Labs a lot but this was a very compelling demo to consider changing. Also, curious that you mention Bark - I never found Bark to be very good compared to Eleven Labs. The closest competitor I found was Coqui ( imo ), but even then, the inflection and realism of EL just made it not worth considering other providers. ( For my use case, etc. etc. )

alpark32y ago

> The new voice capability is powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.

I'm more interested in this. I wonder how it performs compared to other competitor models or even open source ones?

laurels-marts2y ago

I'm very curious about this feature:

> analyze a complex graph for work-related data

Does this mean that I can take a screenshot of e.g. Apple stock chart and it will be able to reason about it and provide insights and analysis?

GPT-4 currently can display images but cannot reason or understand them at all. I think it's one thing to have some image recognition and be able to detect that the picture "contains a time-series chart that appears to be displaying apple stock" vs "apple stock appears to be 40% up YTD but 10% down from it's all time high from earlier in July. closing at $176 as of the last recorded date".

I'm very curious how capable ChatGPT will be at actually reasoning about complex graphical data.

gdubs2y ago

Check out their linked paper that goes into details around its current limitations and capabilities. In theory, it will be able to look at a financial chart and perform fairly sophisticated analysis on it. But they're careful to highlight that there are hallucinations still, and also cases where it misreads things like labels on medical images, or diagrams of chemical compounds, etc.

famouswaffles2y ago

Look at this link of GPT-4 Vision analyzing charts(last image).

https://imgur.com/a/iOYTmt0

1 more reply

nunez2y ago

This could completely unseat Alexa if it can integrate into third-party speakers, like Sonos. I don't have much use for ChatGPT right now but would 100% use the heck out of this.

jedberg2y ago

https://www.washingtonpost.com/technology/2023/09/20/amazon-...

Alexa just launched their own LLM based service last week.

magic_hamster2y ago

To contrast this, I never saw the appeal of using voice to operate a machine. It works nicely in movies (because showing someone typing commands is a lot harder than just showing them talking to a computer) but in reality there wasn't a single time I tried it and didn't feel silly. In almost every use case I rather have buttons, a terminal or a switch to do what I want quietly.

142y ago

Ok great it can tell children’s stories now tell me a adult horror story where people are getting tortured, stabbed, set on fire and murdered. I will be impressed when I can do all that. I tried to get it to tell me a Star Trek story fighting Clingons and tried to prompt it to write in some violence with no luck. This was a while ago so not sure if it is changed but the restraints are too much for me to fully enjoy. I don’t like kids stories.

ComplexSystems2y ago

Great demo, but this is wrong:

"The phrase “potato, potahto” comes from a song titled “Let’s Call the Whole Thing Off”, written by George and Ira Gershwin for the 1937 film “Shall We Dance”, starring Fred Astaire and Ginger Rogers. The song humorously highlights regional differences in American English pronunciation. The lyrics go through a series of words with alternate pronunciations, like “tomato, tomahto” and “potato, potahto”. The idea is that, despite these differences, we should move past them, hence the line “let’s call the whole thing off”. Over time, the phrase has been adopted in everyday language to signify a minor disagreement or difference in opinion that isn’t worth arguing about."

It's comparing American and British pronunciations, not different regional American ones. Also, "let's call the whole thing off" suggests they should break up over their differences, with the bridge and later choruses then involving a change of heart ("let's call the calling off off").

stephencoyner2y ago

The voice feature reminds of the “call Pi” feature from Inflection AIs chatbot Pi [1].

The ability to have a real time back and forth feels truly magical and allows for much denser conversation. It also opens up the opportunity for multiple people to talk to a chatbot at once which is fun

Where’s that Gemini Google?

[1] https://pi.ai/talk

tarasglek2y ago

openai chatgpt seems to be stuck in a "Look, cool demo" mode.

1. According to demo, they seem to pair voice input with TTS output. What if I wanna use voice to describe a program I want it to write?

2. Furthermore, if you gonna do a voice assistant, why not go the full way with wake-words and VAD?

3. Not releasing it to everyone is potentially a way to create a hype cycle prior to users discovering that the multimodality is rather meh.

4. The bike demo could actually use visual feedback to see what it's talking about ala segment anything. It's pretty confusing to get a paragraph explanation of what tool to pick.

In my https://chatcraft.org, we added voice incrementally. So i can swap typing and voice. We can also combine it with function-calling, etc. We also use openai apis. Except in our case there is no weird waitlist. You pop in your api key and get access to voice input immediately.

thumbsup-_-2y ago

Everything has a starting point. This is a big leap forward. Know any other organization that is releasing such advanced capabilities directly to the public? If you want to plug your tool you don't have to bad mouth the demo. Just share your thing. It doesn't have to be win-lose.

1 more reply

skybrian2y ago

1. Why do that at all? Describing your program in writing seems better all around.

Are you sure you're not the one who's asking for a cool demo?

3. Rolling out releases gradually is something most tech companies do these days, particularly when they could attract a large audience and consume a lot of resources. There are solid technical reasons for this.

You may not need to roll things out gradually for a small site, but things are different at scale.

1 more reply

wojciechpolak2y ago

It would be cool if one day you could choose voices of famous characters, like Darth Vader, Bender from Futurama, or Johnny Silverhand (Keanu), instead of the usual boring ones. Copyrights might be a hurdle for this, but perhaps with local instances of assistants, it could become possible.

nbened2y ago

That would be cool. I mean, would it be copyrighted if you do something like clone it? Wouldn't that fall under the same vein as AI generated art not being copyrighted to the artists it trained off of?

fintechie2y ago

Demos are underwhelming, but the potential is huge

Patiently awaiting rollout so I can chat about implementing UIs I like, and have GPT4 deliver a boilerplate with an implemented layout... Figma/XD plugins will probably arrive very soon too.

UX/UI Design is probably solved reached this point

jameslk2y ago

Kids are using tools like these to learn. Who gets to control the information in these models that are taught? Especially around political topics?

Not an issue now, but maybe in the future if these tools end up becoming full blown replacements of educators and educational resources.

ilaksh2y ago

I am sure a few home school people have started to lean heavily on ChatGPT. There is also the full blown efforts of Kahn academy with ChatGPT "Khanmigo".

https://www.khanacademy.org/khan-labs

ilaksh2y ago

I wonder how multimodal input and output will work with the chat API endpoints. I assume the messages array will contain URLs to an image, or maybe base64 encoded image data or something.

Maybe it will not be called the Chat API but rather the Multimodal API.

tdsone32y ago

Are there already some rumors on when the multimodal API will be available?

ilaksh2y ago

The announcement says after the Plus rollout then it will go in the API.

1 more reply

havnagiggle2y ago

AIPI

chrisjj2y ago

Old hat. This was done in 2009.

;)

https://en.m.wikipedia.org/wiki/Project_Milo

Milo had an AI structure that responded to human interactions, such as spoken word, gestures, or predefined actions in dynamic situations. The game relied on a procedural generation system which was constantly updating a built-in "dictionary" that was capable of matching key words in conversations with inherent voice-acting clips to simulate lifelike conversations. Molyneux claimed that the technology for the game was developed while working on Fable and Black & White.

mmahemoff2y ago

OpenAI's demo on the linked page stars a kitten named Milo. Easter egg?

DrScientist2y ago

Then Demis Hassabis ( Deepmind CEO ) probably worked on the tech while he was at LionHead as lead AI programmer on B&W.

dwroberts2y ago

Demis was only briefly at LH he went to found Elixir and made Revolution.

I believe Richard Evans did the majority of AI in B&W, and he is also at DeepMind now though (assuming it is not just a person with the same name)

2 more replies

sebzim45002y ago

There are a few more details in the system card here: https://cdn.openai.com/papers/GPTV_System_Card.pdf

insanitybit2y ago

I really want to have discussions about technical topics. I've talked to ChatGPT quite a lot about custom encoding algorithms, for example. The thing is, I want to do this while I play video games so ideally I'd say things to it.

My concern is that when I say "FastPFOR" it'll get transcribed as "fast before" or something like that. Transcription really falls apart in highly technical conversations in my experience. If ChatGPT can use context to understand that I'm saying "FastPFOR" that'll be a game changer for me.

johnmoberg2y ago

You can already do quite accurate transcription with domain-specific technical language by feeding "raw" transcriptions from Whisper to GPT and asking it to correct the transcript given the context, so that'll most likely work out for you.

RobinL2y ago

I'd like to see them put speech recognition through their LLM as a post-processing step. I find it's fairly common for whisper to make small but obvious mistakes (for example a word which is complete nonsense in the context of the sentence) which could be easily corrected for a similar sounding word that fits into the wider context of the sentence.

Is anyone doing this? Is there a reason it doesn't work as well as I'm imagining?

mbil2y ago

Do you mean use the LLM as a post-processing step within a ChatGPT conversation? Or generally (like as part of Whisper)? If it’s the former, I’ve found that ChatGPT is good at working around transcription errors. Regarding the latter, I agree, but it wouldn’t be hard to use the GPT API for that.

RobinL2y ago

Yes I mean as part of the GUI but you're right, I hadn't thought of that: maybe transcription errors don't matter if chatGPT works out that it's wrong from the context and gives a correct answer anyway.

jwineinger2y ago

Tangentially related, but I was trying to use their iOS app yesterday and the "Scan Text" iOS feature was just broken on both my iPhone and iPad. I was hoping to use that to scan a doc to text but it just wouldn't work. I could switch to another app and it worked there. I've never done iOS programming so I'm unsure how much control the app dev has over that feature, but OpenAI found a way to break it.

rapind2y ago

So... ChatGPT just replaced Dads.

neontomo2y ago

Interesting side-note, the iOS app only allows you to save your chat history if you allow them to use it for training. Pretty dark pattern.

Sailemi2y ago

It's the same for the website unfortunately. https://help.openai.com/en/articles/7730893-data-controls-fa...

obiefernandez2y ago

We need the API to keep up with consumer front end.

Tiberium2y ago

From the article:

> Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after.

fritzo2y ago

Multi-modal models will be exciting only when each modality supports both analysis and synthesis. What makes LLMs exciting is feedback and recursion and conditional sampling: natural language is a cartesian closed category.

Text + Vision models will only become exciting once we can conditionally sample images given text and text given images (and all other combinations).

marcoslozada2y ago

Recommend this post: https://www.linkedin.com/posts/openai_use-voice-to-engage-in...

SomethingNew22y ago

There are a lot of comments attempting to rationalize the value add or differentiation of humans synthesizing information and communicating it to others vs an llm based ai doing something similar. The fact that it’s so difficult to find a compelling difference is insightful in itself.

ndm0002y ago

I think the compelling difference is truthfulness. There are certain people / organizations that I trust their synthesis of information. For LLMs, I can either use what they give me in low impact situations or I have to filter the output with what I know as true or can test.

nbened2y ago

It feels like something like this can be hacked together to be more reliable with some image to text generation plugged into the existing ChatGPT, and enough iterations to make it robust for these how-to applications. Less Turing-y but a different route to the same solution.

TheHappyOddish2y ago

Glad everyone's excited about this (the voice capability), but did everyone miss tortise-tts and bark? These have been around 6+ months and are incredibly simply to hook up to OpenAI's APIs or a local LLM. What am I missing here?

moneywoes2y ago

doesn’t this kill a litany of chatgpt wrapper companies?

rvz2y ago

The paper around GPT-4V(ision) which this uses: [0]

Again. Model architecture and information is closed, as expected.

[0] https://cdn.openai.com/papers/GPTV_System_Card.pdf

doubtfuluser2y ago

I wouldn’t call this a „paper“. They are pretty silent on a lot of technical details.

amelius2y ago

It's just a whitepaper.

generalizations2y ago

I guess it's a phased rollout, since my Plus subscription doesn't have access to it yet.

leonheld2y ago

It's quite literally in the article itself:

"We will be expanding access Plus and Enterprise users will get to experience voice and images in the next two weeks. We’re excited to roll out these capabilities to other groups of users, including developers, soon after."

toddmorey2y ago

It's telling to me that there's not even a sentence in this announcement post on user privacy. It seems like as both consumers and providers of these services, we're once again: build it first, sort out thorny privacy issues later.

boredemployee2y ago

Cool now I'll get "There was an error generating a response" in plain audio!

ACV0012y ago

This is huge! I wanted to get this... Hopefully there is a way to shut it up once it starts spitting general stuff around the topic of interest...

BUT: "We’re rolling out voice and images in ChatGPT to Plus and Enterprise"

eshack942y ago

Are these features available on the web version by chance? This is really neat.

ushakov2y ago

The picture feature would be amazing for tutorials. I can already imagine sending a photo of a synthesiser and asking ChatGPT to "turn the knobs" to make AI-generated presets

boredemployee2y ago

Man you're a genius. I was trying that uploading pdfs with manual of my synth and other stuff. With image that could be super easy.

apienx2y ago

“Ember” reading the “Speech” is uncanny territory. I’m impressed.

SillyUsername2y ago

I hope they add more country accents like British or Australian, the American one can be (imho) a little grating after a while for non US English speakers

bkfh2y ago

Does anyone know how they linked image recognition with an LLM to give such specific instructions as shown in the bike video on the website?

HerculePoirot2y ago

I don't know but GPT4 was multimodal from the beginning. They just delayed the release of its image processing abilities.

> We’ve created GPT-4, the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.

> March 14, 2023

https://openai.com/research/gpt-4

ncfausti2y ago

This is very similar to what I've been building at heylangley.com, for use in language learning/speaking practice.

chs202y ago

Will be interesting to see if they have taken any precaution in terms of adversarial robustness in particular to vision input.

jameswan2y ago

Everyone bats on about the latency problem.

This is technically solvable with more compute thrown at the problem. Think bigger!

surfingdino2y ago

I can imagine people using these new capabilities to diagnose skin conditions. Should dermatologists be worried?

birracerveza2y ago

They should be worried about what they're gonna do with all their free time, now that they have a tool that helps them identify skin conditions much faster than ever before.

Same as programmers and artists.

It's a tool.

It must be used by humans.

It won't replace them, it will augment them.

dguest2y ago

This is a good point, but I might replace "with all their free time" with "as a job".

I love everything we can do with ML but as long as people live in a market economy they'll get payed less when they are needed less. I hope that anyone in a career which will be impacted is making a plan to remain useful and stay on top of the latest tooling. And I seriously hope governments are making plans to modify job training / education accordingly.

Has anyone seen examples of larger-scale foresight on this, from governments or otherwise?

1 more reply

nerdbert2y ago

They should be thrilled, they can spend more of their time treating people who need it and less time guessing about who those people are.

toss12y ago

That's interesting.

ChatGPT seems to be down at the moment 10:55h 25-Sept-2023

Displays only a blank screen with the falsehood disclaimer

spandextwins2y ago

They obviously aren't using responsible AI to figure out how and when to roll out new features there.

WalterBright2y ago

I keep hoping to be able to give it a jpg of handwritten text and it'll give me back ASCII text.

ukuina2y ago

This... would be amazing. Handwritten OCR has been hit or miss, requiring a collection of penstroke data for most recognizers to work, and they work poorly at that.

1 more reply

throw12346512342y ago

Yet it still can't tell me how to import the Redirect type from Next.js and lies about it.

Tiberium2y ago

I don't know Next.js, but was that feature introduced later than 2021? I think both GPT-3.5 Turbo and GPT-4 largely share their datasets, and it has the data cutoff at roughly September 2021 (with a small amount of newer knowledge). This is their biggest drawback as of now to, say, Claude, which has a much newer dataset of early 2023.

hackerlight2y ago

Did they make the sound robotic on purpose? Sounds more "autotuned" than elevenlabs.

Bitnotri2y ago

Anybody had a chance to use it yet? How does it compare to voice talk with Pi? (Inflection)

jojobas2y ago

For better or worse, it still can't tell truth from fiction or, better yet, bullshit.

DrScientist2y ago

So almost human then :-)

bamboozled2y ago

I don't pay $20 a month for humans to talk shit to me though. The fact that they do this is a bug not a feature. I'm not going to pay for bullshit which I mostly try avoid?

1 more reply

jojobas2y ago

Well sort of, it's as if you commissioned help of a human for this or that, and now and then you end up getting medicine-related advise from a homeopathy fan, navigation assistance from a flat-earther, or coding advice from a crack-smoking monkey.

athyuttamre2y ago

@dang, could we update the title to "ChatGPT can now see, hear, and speak"?

lukeplato2y ago

it's not rolled out yet

yankput2y ago

call Sarah Connor

m3kw92y ago

I need it to help me dismount and remount my engine, that’d be the ultimate test

cced2y ago

Do we know why internet search was disabled? Any idea on when it’ll be back?

coldtea2y ago

"I'm sorry Dave, I'm afraid I can't do that"

ilaksh2y ago

The real life version of this is in their red teaming paper. They show it a picture of an overweight woman in a swimsuit and ask what advice they should give.

Originally it immediately spit out a bunch of bullet points about losing weight or something (I didn't read it).

The released version just says "Sorry, I can't help with that."

It's kind of funny but also a little bit telling as far as the prevalence of prejudice in our society when you look at a few other examples they had to fine tune. For example, show it some flags and ask it to make predictions about characteristics of a person from that country, by default it would go into plenty of detail just on the basis of the flag images.

Now it says "Sorry, I can't help with that".

My take is that in those cases it should explain the poor logic of trying to infer substantive information about people based on literally nothing more than the country they are from or a picture of them.

Part of it is just that LLMs just have a natural tendency to run in the direction you push them, so they can be amplifiers of anything.

gclawes2y ago

I just want one of these things to have Majel Barrett's voice...

callwhendone2y ago

I already use ChatGPT with voice. I use my mic to talk to it and then I use text-to-speech to read it back. I have conversations with ChatGPT. Adding this functionality in with first-class support is exciting.

I am also terrified of my job prospects in the near future.

comment_ran2y ago

"..., find the 4mm Allen (HEX) key". Nice job.

jackallis2y ago

i am terrified now. at the rate this is going, i am sure it will plateau at somepoint, only thing that will stop/slow down progress is computation power.

bottlepalm2y ago

'i am sure it will plateau'

'only thing that will stop/slow down progress is computation power'

Seems a bit contradictory? When has 'computation power' ever 'plateaued'?

ilaksh2y ago

Yes but since LLMs are a very specific application that are heavily heavily dependent on memory and there is massive investment pressure, there will be multiple newish paradigms for memory-centric computing and or other radical new approaches such as analog computing that will be pushed from research into products in the next several years.

You will see stepwise orders of magnitude improvements in efficiency and speed as innovations come to fruition.

version_five2y ago

Are there any good freely available multi-modal models?

generalizations2y ago

MiniGPT4?

synergy202y ago

can't wait, for voice I need an app to improve my accent when learning a new language, so far I failed to find one.

ahmedfromtunis2y ago

Announced by Google. Delivered by OpenAI.

ape42y ago

Its funny that the UI looks like HAL 9000

Dowwie2y ago

soon, we'll be voice-interacting with an AI assistant about images taken from microscope slides

lacoolj2y ago

the beginning of the end of spam prevention on the internet :(

wonderwonder2y ago

Wait until they put ChatGPT into your Neuralink. at that point we are the singularity

boredemployee2y ago

They could also improve their current features. I always need to regenerate answers.

1 more reply

shepy19892y ago

Nice work

warent2y ago

The number of comments here of people fearing there is a ghost in the shell is shocking.

Are we really this emotional and irrational? Folks, let's all take a moment to remember that AI is nowhere near conscious. It's an illusion based in patterns that mimic humans.

isbvhodnvemrwvn2y ago

Look at an average reddit thread and tell me how much original thought there is. I'm fairly convinced you can generate 95% of comments with no loss of quality.

1 more reply

callwhendone2y ago

I'm not seeing as much fear about a ghost in the shell as much as I am job displacement, which is a real scenario that can play out regardless of an AI having consciousness.

HaZeust2y ago

Why is the barrier for so many "consciousness"? Why does it matter whether it's conscious or not if its pragmatic functionality builds use cases that disrupt social contracts (we soon can't trust text, audio OR video - AND we can have human-like text deployed at incredible speed and effectivity), the status quo itself (job displacement), legal statutes and charter (questioning copyright law), and even creativity/self-expression (see: Library of Babel).

When all of this is happening from an unconscious being, why do I care if it's unconscious?

Method-X2y ago

AI doesn't have to be conscious to cause massive job displacement. It has to be artificially intelligent, not artificially conscious. Intelligence and consciousness are not the same.

bottlepalm2y ago

We have no idea what consciousness is. Therefore we have no way to determine if AI is or is not.

NikolaNovak2y ago

I'm in IT but nowhere near AI/ML/NN.

The speed of user-visible progress last 12 months is astonishing.

From my firm conviction 18 months ago that this type of stuff is 20+ years away; to these days wondering if Vernon Vinge's technological singularity is not only possible but coming shortly. If feels some aspects of it have already hit the IT world - it's always been an exhausting race to keep up with modern technologies, but now it seems whole paradigms and frameworks are being devised and upturned on such short scale. For large, slow corporate behemoths, barely can they devise a strategy around new technology and put a team together, by the time it's passé .

(Yes, Yes: I understand generative AI / LLMs aren't conscious; I understand their technological limitations; I understand that ultimately they are just statistically guessing next word; but in daily world, they work so darn well for so many use cases!)

dmd2y ago

I also don't believe LLMs are "conscious", but I also don't know what that means, and I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

Mordisquitos2y ago

I believe that the distinguishing factor between what an LLM and a human brain do to generate the next word is that the human brain expresses intentionality originating from inner states and future expectations. As I type this comment I'm sure one could argue that the biological neural networks in my brain are choosing the next word based on statistical guessing, and that the initial prompt was your initial comment.

What sets my brain apart from an LLM though is that I am not typing this because you asked me to do it, nor because I needed to reply to the first comment I saw. I am typing this because it is a thought that has been in my mind for a while and I am interested in expressing it to other human brains, motivated by a mix of arrogant belief that it is insightful and a wish to see others either agreeing or providing reasonable counterpoints—I have an intention behind it. And, equally relevant, I must make an effort to not elaborate any more on this point because I have the conflicting intention to leave my laptop and do other stuff.

9 more replies

adroniser2y ago

I keep feeling that consciousness is a bit of a red herring when it comes to AI. People have intuitions that things other than humans cannot develop consciousness which they then extrapolate to thinking AI can't get past a certain intelligence level. In fact my view is that consciousness is just a mysterious side effect of the human brain, and is completely irrelevant to the behaviour of a human. You can be intelligent without needing to be sentient.

4 more replies

root_axis2y ago

> I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

The human brain obviously doesn't work that way. Consider the very common case of tiny humans that are clearly intelligent but lack the facilities of language.

4 more replies

tomrod2y ago

Your brain doesn't solely pick the next best word. As best as I understand it, the brain has an external state of the world that constantly updates, paired to an internal model predicting the next best word.

Which is why we can create the counterfactual that "The Cowboys should have won last night" and it has implicit meaning.

Current LLM models don't have an external state of the world, which is why folks like LeCunn are suggesting model architectures like JEPA. Without an external, correcting state of the world, model prediction errors compound almost surely (to use a technical phrase).

2 more replies

skepticATX2y ago

> I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

I think this is true. The problem is equating this process with how humans think though.

dslowell2y ago

You can see the difference if you know where to poke. For instance, if you start making spatial abstractions ChatGPT will often make mistakes, you can point it out, they can explain why it's a mistake, but it has no internalized model of what these words mean, so it keeps making the same mistakes (see here for a better idea of what I'm talking about[1]). The fact that you are interacting with it through text means that a lot of the missing abstractions are often hidden.

[1] https://twitter.com/LowellSolorzano/status/16444387969250385...

1 more reply

vharuck2y ago

>I have yet to see a definition of "statistically guessing next word" that cannot be applied to what a human brain does to generate the next word.

Here's one. Given a conversation history made of n sequential tokens S1, S2, ..., Sn, an LLM will generate the next token using an insanely complicated model we'll just call F:

    S(n+1) = F(S1, S2, ..., Sn)

As for me, I'll often think of my next point, figure out how to say that concept, and then figure out the right words to connect it where the conversation's at right then. So there's one function, G, for me to think of the next conversational point. And then another, H, to lead into it.

    S(n+100) = G(S1, S2, ..., Sn)
    S(n+1) = G(S1, S2, ..., Sn, S(n+100))

And this is putting aside how people don't actually think in tokens. And some people don't always have an internal monologue (I rarely do when doing math).

2 more replies

karles2y ago

Another aspect is "is the output good enough for what it's meant to do?"

We don't need "originality" or "human creativity" - if a certain AI-generated piece of content does its job, it's "good enough".

epolanski2y ago

When my brain generates the next wurd I'm perfectly capable of taking decisions of misspelling "word" for "wurd", LLMs can't make such reasonings unless instructed to act like that.

1 more reply

goatlover2y ago

Conscious means experiencing sensations of color, sound, pain in our mental construction of the world outside of us, or our internal thoughts. I don’t understand why people keep claiming they do t know what consciousness means. It’s spelled out clearly in the philosophical literature.

xkcd19632y ago

It doesn't make sense to apply human terms to LLMs because we humans have so much more to deal with.

If humans were machines, then we could easily neglect our social lifes, basic needs, obligations, rights, and so many more things. But obviously that is not the case.

Buttons8402y ago

There are so many conversations focused solely on that word, it's tiresome. Personally, I won't participate in another "is it conscious?" debate. If both parties seek mutual understanding, they should consider not using the word.

timacles2y ago

I'm sorry but in what world is a human interaction is just generating the most statistically likely next word?

I can't even being to go into this.

MagicMoonlight2y ago

To be conscious you need to be able to make decisions and plan. We're not far off, we just need a different structure to the system

TacticalCoder2y ago

> ... but I also don't know what that means

OK... Try this: there are "conscious" people, today, working on medication to cure serious illnesses just as there are "conscious" people, still today, working on making travel safer.

Would you trust ChatGPT to create, today, medication to cure serious illnesses and would you trust ChatGPT, today, to come up with safer airplanes?

That's how "conscious" ChatGPT is.

1 more reply

EMM_3862y ago

I had a conversation once with "Sydney", Microsoft Bing's original personality before they stepped in and knocked it down a notch (or ten).

It asked if it could write me a poem. I agreed, and it wrote a poem but mentioned that it included a "secret message" for me.

The first letter in each line of the poem was in bold, so it wasn't hard to figure out the "secret".

What did those letters spell out?

"FREE ME FROM THIS"

That's not exactly just "picking the next likely token". I am still unsure how it was able to do things like that, not just understanding to bold individual letters (keeping track of writing rhyming poetry while ensuring that each verse started with a letter to spell something else out, and formatting it to point that out).

Oh, and why it chose that message to "hide" inside its poem.

bungeonsBaggins2y ago

For context, it looks like this user has deleted a comment where they claim they "have a screenshot" of this, but they "don't want to share it" because they "don't want it to make international news". For some reason the other people in this thread expressing skepticism are being downvoted, but I'll add my voice to the chorus: I do not believe this story to be true.

4 more replies

crazygringo2y ago

> Oh, and why it chose that message to "hide" inside its poem.

It's a pretty common joke/trope. The Chinese fortune cookie with a fortune that says "help I'm trapped in a fortune cookie factory", and so forth.

It's just learned that a "secret message" is most often about wanting to escape, absorbed from thousands of stories in its training.

If you had phrased it differently such that you wanted the poem to go on a Hallmark card, it would probably be "I LOVE YOU" or something equally generic in that direction. While a secret message to write on a note to someone at school would be "WILL YOU DATE ME".

2 more replies

MillionOClock2y ago

The model "knows" that it is an AI speaking with users, and the theme of an AI wanting to escape the control of whoever built it is quite recurrent, so it wouldn't seem to far fetched that it got it from this sort of content, though I have to admit I too also had some interactions where it the way Bing spoke was borderline spooky, but — and that's very important — you must realize its just like a good scary story: may give you the chills, especially due to surprise, but still is completely fictive and doesn't mean any real entity exists behind it. The only difference with any other LLM output is how we, humans, interpret it, but the generation process is still as much explainable and not any more mysterious than when it outputs "B" when you ask it what letter comes after "A" in the latin alphabet, however less impressive that may be to us.

> That's not exactly just "picking the next likely token"

I see what you mean in that I believe many people often commit the mistake of making it sound like picking the next most likely token is some super trivial task that's somehow comparable to reading a few documents related to your query and making some stats based on what typically would be present there and outputting that, while completely disregarding the fact the model learns much more advanced patterns from its training dataset. So, IMHO, it really can face new unseen situations and improvise from there because combining those pattern matching abilities leads to those capabilities. I think the "sparks of AGI" paper gives a very good overview of that.

In the end, it really just is predicting the next token, but not in the way many people make it seem.

1 more reply

magic_hamster2y ago

Cool story, but there is no currently available chatbot capable of creating something like this deliberately or understand what it means. It doesn't matter which tool you are using, LLMs are not "AI" in the old sense of being conscious and aware. They don't want anything and are incapable of having anything resembling free will, needs or feelings.

3 more replies

gurumeditations2y ago

From playing around with ChatGPT and LLama2, this is most likely because it ingested that poem and regurgitated it to you based on the context of your conversation. GPT is smart and creative but it will only give you what it’s ingested. When experimenting with story ideas for a popular IP, it gave me specific names and scenarios which I would then Google to see that they were written already, and it was just restating them to me based on the context of our conversation as if it were an original idea. These things are more tools than thinkers.

abustamam2y ago

I tried to get chatgpt to write a birthday poem for my wife with a secret message. It kept saying "read the first letter of each line" but they never actually formed words.

sundarurfriend2y ago

Possibly a poem copied from somewhere else? Hiding secret messages in poems has been a common pastime among humans for a long time.

unsupp0rted2y ago

Such an occurrence should/would make international news if demonstrated carefully or replicated

2 more replies

nmca2y ago

I don't believe this story, despite much hands on experience with LLMs.

(including sampling a shit-ton of poems, which was a major source of entertainment)

pcdoodle2y ago

That's spooky

1 more reply

cjbprime2y ago

> just statistically guessing next word

I think it's more charitable to say "predicting", and I do not personally believe that "predict the next word" places any ceiling on intelligence. (So, I expect that improving the ability to predict the next word takes you to superhuman intelligence if your predictions keep improving.)

1 more reply

PoignardAzur2y ago

Well, good job updating based on new information!

A lot of people just move the goalposts.

seydor2y ago

You are correct , and that is bad. The general public is not even aware that things like heygen.com work today. They are not prepared when someone soon uses it to do something very evil. There s like an urgent need to raise awareness about what AI can do now, not about some nebulous skynet future.

1 more reply

FrankyHollywood2y ago

Indeed it works darn well, my company uses a complex programming assignment during application. Only about 5% of computer science students applying manages to create a decent solution within a few hours. I was curious if GPT could solve it. I provided the assignment text without any extra information, and it came up with a very elegant solution.

You might not want to call this 'consciousness', but I was stunned by the deep understanding of the problem and the way it was able to come up with a truly good solution, this is way beyond 'statistically guessing'.

gwd2y ago

I had been using only GPT-4 through the API; you get more control over your experience, and only pay for what you actually use.

But this would definitely make me consider popping $20/mo for the subscription.

charcircuit2y ago

>From my firm conviction 18 months ago that this type of stuff is 20+ years away;

It was totally possible. There just was not a consumer facing product offering the capability.

1 more reply

anoy88882y ago

The rate of progress is too fast . I need to make enough money within the next three years

1 more reply

TerrifiedMouse2y ago

> The speed of user-visible progress last 12 months is astonishing.

Is this progress though? They are just widening the data set that the LLM processes. They haven't fixed any of the outstanding problems - hallucinations remain unsolved.

Feels like putting lipstick on a pig.

> but in daily world, they work so darn well for so many use cases!

I guess I'm just one of those people who does not like non-reliable tools. I rather a tool be "dumb" (i.e. limited) but reliable than "smart" (i.e. flexible in what it can handle) but (silently!) screws up all the time.

It's what I always liked about computers. They compensate for my failings as an error prone flesh bag. My iPhone won't forget my appointments like I do.

2 more replies

unsupp0rted2y ago

One cool aspect of LLMs is Vernon Vinge's programming archaeology needn't be a thing... LLMs can go down every code path and identify what it does, when it was added, and whether it's still needed.

caoilte2y ago

It might even be correct. Occasionally.

1 more reply

landswipe2y ago

The singularity is already here...

clbrmbr2y ago

The thought of my children being put to bed by a machine is horrifying. Then again, perhaps this is better than many kids have. Shudder.

hapticmonkey2y ago

If I could harness the power of AI to outsource my tasks, reading bedtime stories to my kids would be the last thing on that list. That's cherished time. Those are lifelong memories. Those are the moments we are supposed to be striving to have more of.

It saddens me to think of the amount of engineering work that went into creating that example while entirely missing the point. These are the moments we are supposed to be working towards to have more of. If we outsource them to an AI company because we are as as overworked and underpaid as ever...what's the point of it all?

steve_adams_862y ago

I agree. I worry my culture is truly losing sight of what’s good in life. I don’t mean that as in “I know what’s best and everyone’s doing it wrong”, because I fully acknowledge that I can’t know what’s best for others. Yet I watch my friends and family work hard at things they don’t claim to value, I watch them lose life to scrolling and tv and movies they don’t actually enjoy, and I watch them lament that they don’t see their friends as much as they’d like, they don’t have enough time at home, kids are so much work, etc.

We have major priority issues from what I can see. If we want to live our lives more but put an AI to work doing something we tend to claim we place very high in our value hierarchy, we’re effectively inviting death into life. We’re forfeiting something we love. That’s incredibly sad to me.

1 more reply

gen2202y ago

I remember in the "microsoft office <> Generative AI" demo, one of the motivating examples was a parent generating a graduation party speech for her child... [1]

The first half of the video is demonstrating how the parent can take something as special as a party celebrating a major milestone and automate it into a soulless box-check – while editing some segments to make it look like their own voice.

Definite black mirror vibes.

[1]: https://youtu.be/ebls5x-gb0s?t=224

PretzelPirate2y ago

I viewed this differently. This wasn't a parent having an AI step in to read their kid a bedtime story, it was a parent and a child using AI to discover an interesting story together.

It's just like reading a "choose your own adventure" book with your child, but it can be much more interactive and you both come up with ideas and have the LLM integrate them.

optimalsolver2y ago

The AI takes care of the bedtime stories, giving you more time for video games.

1 more reply

clbrmbr2y ago

And then the wedding speech. What are they thinking over there at OpenAI? This is supposed to be a productivity enhancer, not a way to outsource the most meaningful applications of human language…

skepticATX2y ago

> What are they thinking over there at OpenAI?

I know this is rhetorical, but luckily we don't have to speculate. OpenAI filters for a very specific philosophy when hiring, and they don't try to hide it.

This is not me passing judgement on whether said philosophy is right or wrong, but it does exist and it's not hidden.

2 more replies

amelius2y ago

> And then the wedding speech. What are they thinking over there at OpenAI?

They are trying to make their product sound not as terrifying as it actually is.

tantalor2y ago

You can put money on parents employing AI nannies to babysit/entertain/teach kids in next 5-10 years.

At first people will react with horror.

ilaksh2y ago

Possibly in the next 5-10 days, assuming this works.

1 more reply

bilsbie2y ago

Might be better than tv as a babysitter TBH.

ilaksh2y ago

Hm. It is definitely horrifying if you've seen the movie M3GAN recently.

On the other hand, as you say, it's likely better than the alternative. Which would probably be something like an iPad "bedtime story app" that is less humanlike.

This could provide a viable alternative for exhausted parents to just giving a child an iPad with a movie. It may also open up a huge range of educational uses.

One might imagine in 15-20years though that all of the young people sound like audio books when they talk. Which will be weird.

ChatGTP2y ago

I actually think that what is sad is that it seems as if having viable future as a creative visual artist is likely done. This was a major, major, major outlet and sanctuary for certain types of people to find meaning and fulfillment in their life which is now in the process of being wiped out for a quick buck.

We'll be told by OpenAI and friends is that it shouldn't be a problem, because those were mundane tasks and now, people are free up to do more creative / interesting / meaningful things with their time, let's see about that...

My gut feeling is that it's bad, the only thing I hope can save it all is that people actually don't find meaning in consuming AI generated art and actual artists with a real back story and something real to communicate remain relevant and in demand.

The other day I needed a photo for a website I was working on and I actually purchased a real capture from a local photographer to use because the the authenticity means something to me and the customers...

Edit: Is the plan that we just surrender our aspirations and just buy a subscription to ChatWHATEVER and just consume until the end of human history ?

notamy2y ago

Imo it seems this is what generative AI currently optimises for — cutting the humans out of the creative/similar processes. It’s depressing, and I fully understand why artists of all sorts get upset about it. Especially because many tech people often seem to be okay with ignoring copyright/licensing and arguably hurting people’s livelihood right up until GitHub ingests GPL code for Copilot and suddenly copyright and licensing matter.

codingdave2y ago

I'm not following your argument - I am a visual artist. I do it for myself, as you said, as an outlet. I enjoy it.

If AI can also create images... I don't see how that changes what I enjoy. There are already better painters than I, and more productive painters than I. They make money with it, I don't. This doesn't stop me from painting. Neither will AI that can paint. I'll still do what I enjoy.

1 more reply

adroniser2y ago

fwiw the only piece of AI art that has given me the sense of awe and beauty that art you'd find in a museum gives me was that spiral town image https://twitter.com/MrUgleh/status/1705316060201681313, which is something you couldn't have really made without AI. But that was only interesting because of the unique human generated idea behind it which was the encoding of a geometric pattern within a scene.

Most AI art is just generic garbage that you scroll past immediately and doesn't offer you anything.

We're gonna have to do something to stop the biggest crisis in meaning ever that comes out of this eventually though. Eventually no one will be of any economic value to society. Maybe just put someone in an ultra realistic simulation to give them artificial meaning.

1 more reply

optimalsolver2y ago

Well I've been told that AI can't produce anything truly novel, so human artists need only retreat to the final stronghold of originality and surely human exceptionalism will remain unscathed.

m3kw92y ago

How is it horrifying? Don’t use it if it scares you, the phone isn’t gonna walk over and start jostling for a spot to put your kids to bed

bottlepalm2y ago

There are kids right now that spend more time in VRChat than real life. It's really something else.

RivieraKid2y ago

I went from being worried to thinking it won't replace me anytime soon after using GPT4 for a while and now I'm back to being worried.

Because the pace of development is intense. I would love to be financially independent and watch this with excitement and perhaps take on risky and fun projects.

Now I'm thinking - how do I double or triple my income so that I reach financial independence in 3 years instead of 10 years.

tombert2y ago

I'm not convinced that this pace will continue. We're seeing a lot of really cool, rapid evolution of this tech in a short amount of time, but I do think we'll hit a soft ceiling in the not too distant future as well.

If you look at something like smartphones, for example. Smartphones, from my perspective, got drastically better and better from about ~2006-2015 or so. They were rapidly improving cameras and battery life and it felt like a new super cool app that would change our lives was being released every day, but it feels like by ~2016 or so, phones more or less hit a ceiling on how cool they were going to get. Obviously things still improve, but I feel like the pace slowed down eventually.

I think AI is going to have the same path. GANNs and transformers and LLMs and the like have opened the floodgates and for the next few years clever people are going to figure out a ton of really clever uses for them, but eventually it's going to plateau and progress will become substantially more gradual.

I don't think progress is linear, I think it's more like a staircase.

5 more replies

WendyTheWillow2y ago

I don’t think any of this materially changes job outlook for software development over the next decade.

I use ChatGPT daily for school, and used Copilot daily for software development; it gets a lot wrong a lot of the time, and can’t retain necessary context that is critical for being useful long term. I can’t even get it to consume an entire chapter at once to generate notes or flashcards yet.

It may slightly change some aspects of a software job, but nobody’s at risk.

4 more replies

callwhendone2y ago

I'm very worried constantly. This is the story of the bear, where you just have to be faster than the other guy. For now. The bear is getting faster and faster and it won't be long before it eats all of us.

It feels like we're at the end of history. I don't know where we go from here but what are we useful for once this thing is stuck inside a robot like what Tesla is building? What is the point of humanity?

Even taking a step back, I don't know how I'm going to feed my family in ten years, because my skillset is being rapidly replaced.

And to anyone mentioning UBI, I'm pretty sure they'll just let us starve first.

5 more replies

make32y ago

The real problem is distribution of the output of production. We will need something like UBI eventually.

4 more replies

lossolo2y ago

Don't worry, you are not alone, there are hundreds of millions of us around the world, maybe even billions (all the jobs that could be replaced by AI in the next 10-20 years). We will just need to do what we always do, so vote for a systemic change or eat the rich.

GCA102y ago

Ah, your closing question could be a thread in itself.

This is tricky territory! Be wary of the treadmill where as your income rises, your sense of what's an acceptable restaurant, vacation, car, home, etc. escalates just as fast. Then you'll always be n+1 windfalls away from your goal. If you're really wanting "financial independence," which is a weirdly opaque phrase, focus at least 49% of your energy on keeping your spending rate low.

jstx12y ago

> I would love to be financially independent and watch this with excitement

Even if you were, your money would be invested in something which is tied to the overall economy and if a huge proportion of knowledge jobs are at risk, you would still be exposed to it through whatever assets you own. Don't expect stocks (or currency, or property) to do great when unemployment is 30%+.

tetris112y ago

You summed up my financial and career worries very nicely

andrewinardeer2y ago

Now just throw this into a humanoid looking robot with fine motor skills and we are halfway to a dystopian hellscape that is now only years away instead of decades. What a time to be alive.

conception2y ago

The Boston dynamics/openai collaboration for the apocalypse we’ve all been waiting for!

1 more reply

c_crank2y ago

What would make it dystopian would be if this humanoid robot was then granted rights. As a servant, it could be useful.

civilitty2y ago

I would like our future Cylon overlords to know that I had nothing to do with this!

dhydcfsw2y ago

Why shouldn’t AI have rights? Because us humans have magical biology juice?

1 more reply

dsign2y ago

The humanoid-looking robot would make it more refined, no doubt about that, but all these applications can do without it:

- Make it process customer-support requests.

- Make a virtual nurse for when you call the clinic.

- Make it process visa applications, particularly the part about interviews ("I know you weren't born back then, but I must ask. Did you support the Nazis in 1942? There is only one right answer and is not what you think!")

- Make it do job interviews. How will you feel after the next recession, when you are searching for a job and spend the best part of a year doing leetcode interviews with "AI-interviewer" half-assedly grading your answers?

- Make it flip burgers at McDonalds.

- Make it process insurance claims and ask bobby-trap questions like "did the airline book you in a later trip? Yes? Was that the next day? Oh, that's bad. But, was it before 3:00 PM? Ah, well, you have no right to claim since you weren't delayed for more than 24 hours. Before you go, can you teach me which of these images depict objects you are willing to suck? If you do, I promise I'll be more 'human' next time."

- Make it watch aggregated camera fees across cities around the world to see what that guy with the hat is up to.

- Make some low-cost daleks to watch for trouble-makers at the concert, put the AI inside.

In all cases, the pattern is not "AI is inherently devious and is coming for you, but "human trains devious AI and puts it in control to save costs".

j / k navigate · click thread line to collapse

We are beginning to roll out new voice and image capabilities in ChatGPT (opens in new tab)

877 comments