I assume the goal of Alexa was never to be the top conversational system on the planet, it was to sell more stuff on Amazon. Apple's approach to making a friendly and helpful chat assistant helps keep people inside their ecosystem, but it's not clear how any skill beyond "Alexa, buy more soap" was going to contribute meaningfully to Alexa's success as a product from Amazon's perspective. I saw the part about them having a "how good at conversation is it" metric, but that cannot be the metric that leadership actually cared about, it was always going to be "how much stuff did we sell off Alexa". In other words, Amazon did not ever appear to be in the race to make the best voice assistant, and I'm not sure why they would want to be.
After years of raising 3 kids, you would think if I ask to add diapers to the cart, it would know something. But no, it would just go with whatever is the top recommended, or first in a search, or something like that. Nothing using the brand or most recent sizes we purchased.
There was no serious attempt to drive real commerce. Instead, Alexa became full of recommendation slots that PMs would battle over. "I set that timer for you. Do you want to try the Yoga skill?"
On the other hand, they have taken on messy problems and solved them well, but not using technology, and for no real financial gain. For example, if you ask for the score of the Tigers game, Alexa has to reconcile which "Tigers" sports team you mean among both your own geography and the worldwide teams, at all levels from worldwide to local, across all sports, might have had games of interest. People worked behind the scenes to manage this manually, tracking teams of interest and filling intent slots daily.
I'm actually working on an app that solves this for a specific use case, tho it isn't in the retail space.
Voice assistants are particularly egregious - they've done all that work to correctly recognise the words I said - i.e. the hard part - but then it breaks because I said "set reminder" instead of "create reminder"??
You forgot the part about it solving crimes.
https://broward.us/2023/07/18/amazons-alexa-is-surprise-witn...
"Shopping with your voice" never took off despite many attempts. The contribution towards subscription services like Audible and Amazon Music was not substantial enough to warrant the massive R&D investment. The business unit never found any other sources of convincing revenue.
Every other decision is downstream from that unresolved tension.
I've never used our Alexa for shopping. If I said something like "Alexa, buy more filters", even being very clever and looking at my order history, it would still get something wrong. And then I'd need to use another device to actually make the order.
While it seems to work fine on the speech recognition part, in that Alexa understands the words I say, it never seemed good enough to actually navigate a task like ordering the right kind of filter.
I knew there was some behind-the-scenes scripting going on, but I didn't realize just how much...
We mostly use our Alexa for kitchen timers, reminders, and video calls with family. Occasionally for playing music too. No, I don't want to subscribe to Amazon Music Unlimited.
We're seeing this more and more in tech: Company comes out with a feature that few people want. It doesn't gain adoption. They make many attempts to cajole and nudge users to use the feature. Users don't use the feature. They make more buttons and flows trigger the feature. Users ignore them. They start tricking users into using the feature, with dark patterns and misleading buttons. Users deliberately learn and avoid these. Exasperated, they declare "Why, oh why, won't users just use this feature!? They're just uninformed or don't know what's good for them!"
Whatever happened to starting with what the user actually wants and then working backwards from that to the actual feature? More and more, companies are more interested in serving their own metrics than serving their users.
Guess which I picked?
Even when sitting in front of a real computer, it often takes fair amount of effort to find a product that represents the kind of value at the moment that I'm interested in.
Comparative shopping with this mess on the back end doesn't work with the current state of Alexa. There's details that are important to me, as a consumer, that can't be boiled down to a price and an 8-word summary.
If the back-end data weren't broken, buying with Alexa could be made to work if it could get a grasp (using ML or some other buzzword) of how a buyer's proclivities tended to be shaped. For instance, some people want the best per-volume price, and some others want the highest quality at any expense, with a huge range in between. I myself don't have a ton of room for bulk buying, so I often aim for a medium volume of moderate-high quality, tempered by a price that is low today.
But, again: The back-end data is broken, and Alexa is too stupid to make what I think are good decisions. When I can't trust the talking computer on my countertop to make good decisions for me, and if my hands are already full, I don't have time to have a drawn-out conversation with a bot, so I won't ever actually buy stuff that way.
It's not functionally better than Amazon's abortive Dash Buttons[0] from 8-ish years ago, which were also untrustworthy for many of the same (or related) reasons.
---
But if I'm cooking in the kitchen and I notice that I'm low on oregano, I do have time to say "Alexa, add oregano to my cart." And I'll also invariably make time to interrupt its misguided response with a quick "Alexa, shut the fuck up" once it starts prattling on about the useless summary from the bad back-end data (GIGO), so I can get back to doing what I'm doing.
This is important to mention because if I weren't already busy with my hands, I wouldn't bother with using Alexa at all for this task.
Eventually, I'll find myself in front of a real computer again and I'll go through and true up the things I've used Alexa to put in my cart, so they match my actual expectations, and actually buy some things. And while this is useful to me, it's obviously pretty far removed from the target goal of the system.
And it can't ever get better until they fix their data.
It’s painful to see them give up a good brand just as the moment when a change in technology could have given them wheels…
They cornered so many markets and, surprise, used that position to let every go to shit for a profit. Still at least Bezos got to wave his wang at the World by going to space.
I expect a similar thing to happen when AMZN announces some AI consumer product. Never mind they were in a Prime (ahahah - get it - "PRIME") position to be the first mover here.
An opportunity good and truly squandered.
As with other projects Amazon’s plan seems to have been get big fast and figure out monetizing later. I’m sure ZIRP played some role in it and if not for rate hikes they might have kept it going for few more years.
But their aim from day 1 was to get millions of devices into customers home and then use that to boost e-commerce sales. When the second part didn’t materialize the initiative suddenly became a white elephant as it costs non trivial server capacity to keep the backend infrastructure running.
Is that actually true? I cannot imagine that they are even marginally successful at that. In fact, I can’t identify what exactly Alexa succeeded at, beyond being a voice activated kitchen timer.
> that cannot be the metric that leadership actually cared about
I think the metric was promotions for Alexa employees, sort of like a lot of projects at Google.
Suppose I put a roast in the oven and retire to my office to do something completely unrelated to cooking, where I cannot hear what happens in the kitchen.
One would think that I could set a timer in the kitchen and have it notify me wherever I am -- in the office, in the living room, on my pocket computer, on my desktop PC, or maybe even all of these things.
"Alexa, set a timer for two hours and notify me everywhere" seems like a perfectly cromulent thing to do.
But it isn't that way. Timers follow Vegas rules: Timers that that start in the kitchen stay in the kitchen -- they cannot be heard anywhere else.
It's not superior in any functional way to the old dumb digital timer on my oven, which has a VFD and a rotary encoder to set a timer.
(Which, by the way, has really marvelous ramps and responsiveness for that encoder -- it's silly-fast and efficient to give that knob a twist and dial in exactly what I want for a timer. Adjusting the clock for DST or whatever is equally fast and straightforward.
Except, fucking perplexingly: Alexa can notify me in the office when my oven timer beeps in the kitchen. This works fine.
All that is clear is that there is nobody steering this fucking ship.)
I got frustrated with that and tossed all my Alexas.
> "That did introduce tension for our team because we were supposed to be taking experimental bets for the platform’s future. These bets couldn’t be baked into product without hacks or shortcuts in the typical quarter as was the expectation."
If I can pump one learning into engineers' and PMs' heads it's this: intermediate deliverables are not optional no matter how cutting-edge your team is.
You will never succeed if your pitch to leadership is "give us a budget for the next N years and expect no shippable products until the end of N years". Even if you get approved somehow at the beginning, there's a 99.5% chance your team/project will be killed before you get to N years.
Again, once again for the audience in the back: there is no such thing as a multi-year project without convincing, meaningful intermediate deliverables.
To clarify, that doesn't mean "don't have multi-year roadmaps", it means "your multi-year roadmaps must deliver wins at a consistent cadence".
Understanding this will carry you a lot further in the industry.
As a fairly cutting-edge R&D team part of your job is to figure out what slice of this is shippable (and worth shipping). If you're coming up empty you are not ready to pitch this to execs.
If you push in any way they start to scream "tech debt" and everyone just accepts it. I've been through a migration mandated by an infrastructure team where where were 0 improvements for the teams that used the platform, all benefits were for the platform team only, and this was green-lighted and forced upon everyone without a second thought. It's unbelievable.
Just as real-life debt, building a company without it is unrealistic and unwise. You just have to manage it.
What you describe is exactly the opposite of research: which is collecting neverendin failures.
An environment that lives by such logic cannot really lead to major technological breakthroughts. And in fact, Amazon has very little of those to show compared to the rest of the SV.
If you look at all the defining products of Apple, they also took years from the “germ of an idea” until they could be launched, and though they might have “shipped” internally, they gained a lot by not having pressure to ship things piecemeal to customers.
But it's not obvious to me that approach was even a net win for Google as a business. Did Google Brain invent the technology that killed Google? TBD I think.
Working on the latest and greatest social media website? Sure, ship early, ship often.
Working on medical devices? You better not ship a prototype.
Working on hardware? Too expensive to pivot from learnings, better get it right the first time.
Working for NASA? You better get it right the first time and predict all future issues that might be possible, and you better document it 9 ways to sunday.
This applies also to ML - it applies to all tech projects, though yeah, it's harder in ML. But not figuring out the intermediate products is not an option though - your stuff will get killed prematurely if you don't.
The trick with ML is not to promise "98% precision and 92% recall by Q4", it's to figure out what kind of product is shippable with lower precision and recall. Or perhaps a stepping-stone model that allows some simpler use case, but gives you progress towards the greater goal.
It's always case-specific, but as a ML team you do need to figure out what your intermediate checkpoints are. You need to demonstrate not only progress, but that your progress is contributing to the company's goals.
Very experienced people tend to forget this from time to time too and get excited or convince themselves "big risk big reward"... I've never seen that work out.
Executive patience, focus and planning horizon are the immediately next 1-4 quarters, years 2 and 3 and perhaps 5 if you are lucky and that's it, and they might not even be around in five years.
In academia, if you are stubborn and tenured and don't care about your short term success (publications, citations, awards) you can actually decide to implement a very long-term vision, depending on how much additional funding you need (if that is a lot, you will need to also convince funding agencies or philanthropists of your vision).
Heck even Andrew Wiles, someone who only needed pen and paper for research, had to publish papers during his 7 years working on Fermats last theorem.
If your goal is only promotions inside of big tech then definitely throw most innovative ideas out the window. But if you're interested in innovation, then big tech either needs to get a lot more entrepreneurial or less big.
I wonder if these "SmartAssistant" programmers ever actually had a human personal assistant. For most of what you need them to do, you don't even ask them to do it, they just know you and do it. An actually good computerized SmartAssistant would know that it's been a year, so it's time to book my physical with my doctor. It would have contacted the doctor's office for me, checked my calendar, scheduled the appointment, and then proactively reminded me a few days in advance. I shouldn't have to say "Hey, Assistant: Please schedule a physical for Doctor X at Clinic Y on July 1 of this year." (by the way SmartAssistants can't even currently do that).
The voice interaction should only be for exceptional cases: "Hey, Assistant: My trip to the Paris office needs to be delayed by one week." The assistant should then go and re-book flights, hotels, and rental cars, and then when finished, merely say "Done."
Until they can do this, tech companies might as well stop bothering releasing incremental crap products that can barely understand a task I'd expect a 4 year-old to be able to do.
Modern ML and embeddings models are the discontinuity that was needed to get from "massively complex hack that can't scale" to "even more complex but principled approach that scales pretty well".
Alexa's main failure was mainly that the tech wasn't ready - it was basically a ASR + NLU + rule engine. If we had 2023 LLM tech, then we may have "won" the Assistants market.
Yes, organizational bloat and politics was a problem but OP was hired as a result of the mass hiring spree, so he was a beneficiary of that.
Though I also very much agree with the other point of OP that privacy paranoia also blocked development. The privacy team seemed like they would have been most happy if we couldn't ship.
can confirm that several of my launches got delayed to bolt on GDPR
Mainly because when the org chart grows, more "rules" are added to the rules engine, where each rule is managed by another service... which all adds to end to end user perceived latency, etc... that's why rule engines don't work.
In the early years I couldn’t control the Phillips hue lights in my home, and then one year suddenly I could thanks to updates.
Most companies would have abandoned hardware this old.
Also they are trying to sell as a subscription which is interesting since Siri is free.
The Amazon philosophy of constant execution is at odds with large leap technical innovation. It works very well for ops heavy AWS orgs, and supply chain related optimization problems. The company has a cultural problem
Regardless of the above, ChatGPT made almost all NLP technologies across all companies obsolete.
If you would like to know more about elements just say "Alexa, tell me about the periodic table of elements".
What kind of assistant says "by the way" apropos of nothing? An annoying one. It's just a thinly disguised ad that was never asked for.
That would be a data point in favor of the amazon strategy, no? Prevented millions upon millions of being invested in developing losing technology.
The window to integrate LLMs (especially from Antrhopic, in which Amazon is an investor) is closing but not shut.
If they can do so well, they have massive distribution power to catch up and drive rapid adoption of alexa 2.0.
that's overstated, because you accidentally lumped speech recognition in, and i imagine Nuance (https://nuance.com) and others are like "hold my beer".
> Alexa put a huge emphasis on protecting customer data with guardrails in place to prevent leakage and access. Definitely a crucial practice, but one consequence was that the internal infrastructure for developers was agonizingly painful to work with.
I really don't want this to be a message companies are hearing right now -- that being conscientious about customer data is a lethal barrier to progress, in the "AI" gold rush.
Also, without knowing anything about the organization, I'd expect it to probably have a high level of dysfunction, being at a company known for being excessively metrics-driven from the top, and for ruthless stack-ranking and related HR practices... trying to organize a large coherent cutting-edge R&D effort against that cultural backdrop. Like suggested by this bit elsewhere in the section:
> And most importantly, there was no immediate story for the team’s PM to make a promotion case through fixing this issue other than “it’s scientifically the right thing to do and could lead to better models for some other team.” No incentive meant no action taken.
Companies that are absolutely at the forefront of AI, must be by definition, doing terrible things wrt privacy & security.
They won't be hearing it, they'll be (and are) sending it. "AI would be better for you if it had fewer guardrails around your privacy. Trust us."
Incremental improvement was rewarded through the regular stock and pay process.
Thus no one cared enough to quickly switch to LLMs.
Suoer interesting how org design - even when brillaint can be severely lacking.
We use both azure ans AWS at my current org. We recently had an internal 'hackathon' to try Llms in both orgs (Claude for AWS, ChatGPT4 for Azure) on our knowledge bases.
Clearly, we couldn't differentiate them on response quality, not in 3 days, but on how easy and integrable the LLM was, AWS was superior, even for our mostly Azure teams, weirdly.
It was great for her to play different radio stations, playlists, news. It did the job.
I did try linking it to a TV but that was terrible. Slow, janky, unreliable.
Since she died - Dec 2018 - "Alexa play LBC" "Alexa stop"
Oh, if you do have an Alexa device "Alexa, what noise does a hamster make?"
I just did this and it said "Here's a hippopotamus' grunt"
So that's how it's going for Alexa in my house
I was in Alexa and this rings painfully true. So many workarounds and endless classification escalations. The customer-data certified compute environments were extremely painful to use (though later improved but still annoying) and getting data in or out, even for anodyne reasons, was nigh impossible. For a long period, even getting access to this system (called Hoverboard) took months. During my internship I spent about half of it waiting for access to be granted and had to spend a big chunk of it testing out my training system on CPU...not fun.
This should be a given. The fact you think otherwise is worrying.
(Yes, data security legislation does introduce barriers. Tough. Get used to it.)
[1] https://www.aboutamazon.com/news/company-news/amazon-anthrop...
It’s obvious current ai can handle context reasonably well which is something the previous voice assistants failed badly at. The next thing is to write all the apis so the ai can reasonably act on that context. It’s such a new way of doing things it’s probably best to hard cut development of the previous way to do it which was seemingly hard coded triggers->actions which always failed badly if you wanted to add context eg. “open house blinds when I pick up the phone in the morning or when the alarm goes off, whichever comes first” would never work with the old voice assistants. It might just work with the new ai systems but it’ll be a completely different system, not even a rewrite at that point.
I suppose this might explain the Google -> Alphabet thing but they haven't really embraced the new corporate name enough for "Alphabet Intelligence" to make any sense.
The only way to make things better (in my mind) was to use my own time to improve the infra, and because the metrics don't track these infra improvements I don't get rewarded so I just became burned out.
Part of me think this is the reason why you want bloat in orgs, so that motivated people with enough redundancy will actually feel comfortable chasing longer term incentives.
I've stopped using home devices like the Echo (coz of privacy concerns, esp with hotword mistriggers): now use voice only when driving the car. Maybe multimodal LLMs like GPT-4o will spawn new useful use-cases, but I think they're unlikely to be for the same use-cases Alexa the product+brand is known for.
1. Set my alarm/timer.
2. "What's the weather?"
3. "Turn on/off my lights" for those with connected lights.
etc.
We've had voice calling for over a century, yet it feels like the majority of us prefer to text most of the time these days.
It depends what culture you’re from. Many cultures around the world prefer voice. If you live in a fair large city just look around for folks on the phone. There are still many people who need a voice plan.
They're annoying to use, because the interface sort of implies affordances (like, you know, just talking to it like a person) that aren't actually available, and really it's just a menu tree that's barely more sophisticated than a customer support call tree.
I unplugged it and am not too sure about plugging it back in.
I probably could have used the app to stop it but I didn't think about that at the time.
Last one failure for entire BigTech where desire to maintain control prevented any form of standardization or interoperability to the point where hobbyist open source solutions are now leading on how to do smart home right way and not abandon user base in 6 months after release.
I really want HomePod to be better at household tasks such as managing shopping lists, timers, and reminders, but it's not there yet. As soon as the HomePod can replace my Alexa devices, I'll be all in. I have a HomePod right next to every Alexa device in my house, and I'm just waiting for Apple to turn on their "Apple intelligence."
I honestly ask this because I never tried though… I use my homepod as a glorified timer, alarm clock, and speaker. I’m just sitting here in the apple ecosystem hoping one day things will actually feel connected.
You live in a bubble
Can we give these things their real name: Smart Microphones
Alexa = Amazons microphone.
Like most people, I use Alexa for _commands_: home automation, timers, tell me the weather, ask a specific question looking for a specific answer, play this music. That's not "conversational", and I don't want it to be.
I use generative AI for other things, mostly writing code for me, or telling me about code problems in general. It's rare that I want output that I'm _not_ going to copy/paste somewhere.
Alexa isn't a failure, it just didn't sell more stuff for Amazon. And, well, it costs an awful lot for them to keep running. So maybe it is.
It definitely captured the market, but without a top down vision, the whole thing was just a huge letdown.
I was always under impression that Amazon uploads all our data because I notice data transfers whenever I use voice commands, which makes me doubt their privacy claims.
It seems like Alexa was designed more to learn from us rather than to genuinely assist us. Its primary goal appears to be gathering data rather than helping users.
As I recall, they said as much. The device uploads a clip of the audio to get processed by the back end, does it not?
For what it’s worth we were working on a conversational health app and this why we picked Alexa over alternatives (if you’re big enough to get on GPT enterprise you can probably implement HIPAA safeguards, but we never got replies)
Most of the queries are gonna involve setting an alarm or turn on/off a thing.
They didn’t drop the ball- they were very customer savvy and really knew what they were getting into.
Up until about a year ago you could also do it on the computer but they took it down. https://alexa.amazon.com/
When Siri came out in 2011 -- two years before Alexa -- all my coworkers and I had iPhones. I remember sitting in my office as people yelled at Siri all day trying to get her to be useful. "Hey Siri, what's the weather tomorrow? No... No SIRI, WHAT'S -- THE -- WEATHER -- TOMORROW!"
Even though it sucked, it seemed every hardcore Apple user was ready to jump onboard. Who cares if I'm in a crowded office with people trying to get work done while I spend 10x longer to perform a function in the noisiest possible way? I'm using this thing!!
The voice recognition has improved since then. But the functionality still sucks.
When I'm in private, there are a couple commands I'll use.
- "Hey Siri, call xyz" where xyz is someone in my contact list I have tested with Siri and is known to work. Not recommended to try without testing first.
- While cooking, "Hey Siri, set a timer for 10 minutes." Works great.
- While driving and navigating: "Hey Siri, take me to the nearest gas station." That one is pretty good, except the actual maps are not smart enough so sometimes you'll be turned around in the opposite direction you were going, since technically that's where the nearest gas station is.
I never understood why they couldn't make this tool better, even before LLMs and without any AI at all. Just hard-code a bunch of phrases, and ways to translate those phrases into some action.
"Hey Siri, how close is my UPS delivery?"
"Hey Siri, where can I get the best price on xyz cat food?"
"Hey Siri, what's my bank balance?"
"Hey Siri, how much is a Lyft to xyz?"
I bet if they had a single developer working on adding Siri commands full-time, they could announce something like 20-50 new Siri functions at every WWDC.
But it seems the goal now is just "Make it an LLM," instead of focusing on recognizing the task that the user wants to do, and connecting it to APIs that can do those tasks.
They could've dominated the "conversational system" market 13 years ago.
I almost completely agreed with you, but this is not true! Apple is trying to solve the task & API problem with “task intents”, on which they go into more detail outside of the keynote: https://youtu.be/Lb89T7ybCBE
The new Siri models are trained on a large number of schemas. Apps can implement those schemas to say “I provide this action” (aka, the user intends to do this action). Siri can use the more advanced NLP that comes with GenAI to match what you say to a schema, and send that to an app.
These app intents are also available to spotlight and shortcuts, making them more powerful than just being Siri actions
My opinion is that data access restrictions did not cause Alexa to fail. If you think about it, it wasn't lack of machine learning that contributed to its issues. Alexa attempted to solve the long tail of customer requests with the equivalent of spaghetti "if statements" - rule engines. This was never going to scale. Alexa did not have a generic enough approach to cover the long tail of customer requests (e.g. AGI). With rule engines, there was always a tension between latency and functionality. Alexa solved this with bureaucracy - monitor latency, monitor customer request types, and make business decisions about how to evolve the rule engines. But it was never fundamentally able to scale out of the most basic requests or solve chicken-egg problems (customers don't ask complicated requests because Alexa isn't capable, so they don't show up as large enough use cases to optimize for). Top use cases remained playing music and setting timers.
A more fundamental issue was monetizing. Early on Bezos liked the idea of having a small, essentially free, device that would reduce the friction to buying things. If you remember the "easy buttons" Amazon floated there were many ideas like this. In practice, building a robust voice assistant that could purchase items proved challenging for a myriad of reasons. So the business looked for other ways to monetize. Advertising kept coming up but there was rank and file pushback to this because it could break customer expectations and/or privacy concerns. Alexa considered pivoting into various B2B ventures (hospitality, healthcare, business) and other customer scenarios (smarthome, automotive) but took half-measures into each of them rather than committing to an opportunity. It felt like a solution looking for a problem.
Alexa would have (could still?) benefit from modern LLM technology. However to be truly useful it would need to do more than chat. It would need some layer to take actions. This would all have to be carefully considered and designed so that it scales - so that it isn't a bureaucracy trying to measure what people are wanting to do and "if statement"ing a rules engine to enable it. OpenAI and others appear to be poised with the machine learning expertise to do this.
Finally, it's my opinion that Alexa's machine learning scientists were very good, however as a population they did not appear to me to really care about the business/product use case. Many of them worked on research for publication on problems like distance estimation, etc. The expertise was very heavy on voice transcription and audio processing. However there was less expertise in "reasoning". This I hypothesize contributed to the approach of iterated rules engines, with the science community focused primarily with improving transcription accuracy by small numbers of basis points.