These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.
I'm also incredibly excited about the possibility of this as an always available coding rubber duck. The multimodal demos they showed really drove this home, how collaboration with the model can basically be as seamless as screensharing with someone else. Incredible.
I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.
(See also 'The Expanse' for how sci-fi imagined this properly.)
For me, this is seriously impressive, and I already use LLMs everyday - but a serious "Now we're talkin" moment would be when I'd be able to stand outside of Lowes, and talk to my glasses/earbuds "Hey, I'm in front of lowes, where do I get my air filters from?"
and it tells me if it's in stock, aisle and bay number. (If you can't tell, I am tired from fiddling with apps lol)
"Computer, buy some stock"
*** buys 100 lots of tesla without a promptThis is called an "employee" and all you need to do is pay them. If you don't want to do that, then I have to wonder: Is what you want slavery?
Why shouldn't we expect AI to be created using the same type of math?
If there is a surprise, it's only that we can use the same math at a much higher level of abstraction than the quantum level.
If you give it access to the entire codebase at the same time that could work pretty well. Maybe even add an option to disable the sarcasm.
They did fuck all, especially the ginger.
Is that because you're not used to it? Honestly asking.
This is probably the first time it feels natural where as all our previous experiences make "chat bots" and "automated phone systems", "automated assistants" absolutely terrible.
Naturally, we dislike it because "it's not human". But this is true of pretty much any thing that approaches "uncanny valley". But, if the "it's not human" solves your answer 100% better/faster than the human counter part, we tend to accept it a lot faster.
This is the first real contender. Siri was the "glimpse" and ChatGPT is probably the reality.
[EDIT]
https://vimeo.com/945587328 the Khan academy demo is nuts. The inflections are so good. It's pretty much right there in the uncanny valley because it does still feel like you're talking to a robot but it also directly interacting with it. Crazy stuff.
That wasn't even my impression.
My impression was that it reminds me of the humans that I dislike.
It speaks in customer service voice. That faux friendly tone people use when they're trying to sell you something.
Really? I found this demo painful to watch and literally felt that "cringe" feeling. I showed it to my partner and she couldn't even stand to hear more than a sentence of the conversation before walking away.
It felt both staged and still frustrating to listen to.
And, like far too much in AI right now, a demo that will likely not pan out in practice.
Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow (read: the field dedicated to manipulating you into buying shit).
This whole fucking thing bothers me.
This is partly right.
Agree. Can't wait to see how it'll be...
(Arguably, all things revolutionary do.)
I'm personally not very happy about this for a variety of reasons; nor am I saying AI is incapable of changing the entire human condition within our lifetimes. I do claim that we have little reason to believe we're headed in a more-utopian direction with AI.
Most people would never accept the same behavior from a being capable of more complex thoughts.
But of course this was the age-old debate with our favorite golden-eyed android; and unsurprisingly, he too received the same sort of animosity:
Bones was deeply skeptical when he first met Data: "I don't see no points on your ears, boy, but you sound like a Vulcan." And we all know how much he loved those green-blooded fools.
Likewise, Dr. Pulanski has since been criticized for her rude and dismissive attitudes towards Data that had flavors of what might even be considered "racism," or so goes the Trekverse discussion on the topic.
And let's of course not forget when he was on trial essentially for "humanity," or whether hew as indeed just the property of Starfleet, and nothing more.
More recent incarnations of Star Trek: Picard illustrated the outright ban on "synthetics" and indeed their effective banishment; non-synthetic life -- from human to Roman -- simply weren't ok with them.
Yes this is all science fiction silliness -- or adoration depending on your point of view -- but I think it very much reflects the myriad directions our real life world is going to scatter (shatter?) in the coming years ahead.
Sorry, had to be that trekkie :) and nice job referencing Measure of a Man — such great trek.
We get the upside of conversation, and avoid the downside of falling asleep at the wheel (as Ethan Mollick mentions in "Co-Intelligence".)
I was literally just thinking about this a few days ago... that we need a multi-modal language model with speech training built-in.
As soon as this thing rolls out, we'll be talking to language models like we talk to each other. Previously it was like dictating a letter and waiting for the responding letter to be read to you. Communication is possible, but not really in the way that we do it with humans.
This is MUCH more human-like, with the ability to interrupt each other and glean context clues from the full richness of the audio.
The model's ability to sing is really fascinating. It's ability to change the sound of its voice -- its pacing, its pitch, its tonality. I don't know how they're controlling all that via GPT-4o tokens, but this is much more interesting stuff than what we had before.
I honestly don't fully understand the implications here.
Amazon, Google, and Apple have sunk literally billions of dollars into this idea only to find out that, no, we aren't.
We are with other humans, yes. When socialization is part of the conversation. When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.
But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
And when it comes to inputs/outputs, typing is more precise and more efficient.
Don't get me wrong, this is an incredibly revolutionary piece of technology, but I don't think the benefits of talking you're describing (timing, subtext, inexplicit knowledge) are achievable here either (for now), since even that requires HOURS of interaction over days/weeks/months of experiences for humans to achieve with each other.
>>> But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
A lot of people think of their colleagues as part of a professional community as well, though.
For example, I mentioned something to my contractor and the short thing he said back and his tone had me assume he understood.
Oh, he absolutely did not.
And, with him at least, that doesn’t happen when in writing.
Is it so?
Speaking most of the time is for short exchange of information (pleasantries to essential information exchanges).
I prefer writing for long in-depth thought exchanges (whether by emails, blogs etc.)
In many cultures - European or Asian, people are not very loquacious in everyday life.
I’m 100% a text everything never calls person but I can’t live without Alexa these days, every time I’m in a hotel or on vacation I nearly ask a question out loud.
I also hate how much Alexa sucks so this is a big deal. I spent years weeding out what it could do and can’t do so it will be nice to have one that I don’t have to treat like a toddler
(We mostly use it in car trips -- great for keeping the kids (ages 8, 12) occupied with endless Harry Potter trivia questions, answers to science questions, etc.)
Besides - not sure if I want this level of immersion/fake when talking to a computer...
"Her" comes to mind pretty quickly…
If you don’t complete your thought in one go, you have to insert filler words to keep it listening.
I've long felt that embracing the concept of the 'prompt' was a terrible idea for Siri and all the other crappy voice assistants. They built ecosystems on top of this dumb reduction, which only engineers could have made: that _talking to someone_ is basically taking turns to compose a series of verbal audio snippets in a certain order.
The previous ChatAI app was getting pretty good once you learned the difference between run on sentences or breaking it up enough.
The tonality and inflections in the voice are a little too good.
Most people put on a spectrum/average aren't that good at speaking and communicating and that stands out as an uncanny valley approach. It is mindbogglingly good at it though.
I don't think that's generally true, other than for socializing with other humans.
Note how people, now having a choice, prefer to text each other most of the time rather than voice call.
I don't think people sitting at work in their cube farm want to be talking to their computer either. The main use for voice would seem to be for occasional use talking to an assistant on a smartphone.
Maybe things will change in the future when we get to full human AGI level, treating the AGI as an equal, more as a person.
More on the IBM Personal Speech Assistant for which I am on a patent (since expired) by Liam Comerford: http://liamcomerford.com/alphamodels3.html "The Personal Speech Assistant was a project aimed at bringing the spoken language user interface into the capabilities of hand held devices. David Nahamoo called a meeting among interested Research professionals, who decided that a PDA was the best existing target. I asked David to give me the Project Leader position, and he did. On this project I designed and wrote the Conversational Interface Manager and the initial set of user interface behaviors. I led the User Interface Design work, set specifications and approved the Industrial Design effort and managed the team of local and offsite hardware and software contractors. With the support of David Frank I interfaced it to a PC based Palm Pilot emulator. David wrote the Palm Pilot applications and the PPOS extensions and tools needed to support input from an external process. Later, I worked with IBM Vimercati (Italy) to build several generations of processor cards for attachment to Palm Pilots. Paul Fernhout, translated (and improved) my Python based interface manager into C and ported it to the Vimercati coprocessor cards. Jan Sedivy's group in the Czech Republic Ported the IBM speech recognizer to the coprocessor card. Paul, David and I collaborated on tools and refining the device operation. I worked with the IBM Design Center (under Bob Steinbugler) to produce an industrial design. I ran acoustic performance tests on the candidate speakers and microphones using the initial plastic models they produced, and then farmed the design out to Insync Designs to reduce it to a manufacturable form. Insync had never made a functioning prototype so I worked closely with them on Physical UI and assemblability issues. Their work was outstanding. By the end of the project I had assembled and distributed nearly 100 of these devices. These were given to senior management and to sales personnel."
Thanks for the fun/educational/interesting times, Liam!
As a bonus for that work, I had been offered one of the chessboards that been used when IBM Deep Blue defeated Garry Kasparov, but I turned it down as I did not want a symbol around of AI defeating humanity.
Twenty-five years later, how far that aspiration towards conversational speech with computers has come. Some ideas I've put together to help deal with the fallout: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."
Another idea for dealing with the consequences is using AI to facilitate Dialogue Mapping with IBIS for meetings to help small groups of people collaborate better on "wicked problems" like dealing with AI's pros and cons (like in this 2019 talk I gave at IBM's Cognitive Systems Institute Group). https://twitter.com/sumalaika/status/1153279423938007040
Talk outline here: https://cognitive-science.info/wp-content/uploads/2019/07/CS...
A video of the presentation: https://cognitive-science.info/wp-content/uploads/2019/07/zo...
https://www.theonion.com/brain-dead-teen-only-capable-of-rol...
Yeah, and it's only the beginging.