Kasparov didn't seem to see what I did. Watson seemed very consistent in knowing what it did not know. There was maybe two questions I recall where it actually got the question wrong with 50%+ certainty. I believe it answered, "leg" when it should have been "mising a leg". The other it answered the 20s when the answer was the 10s. And I think for neither of those the percentage was much beyond 50%.
Also Kasparov seems to miss that Watson in medicine would be used with humans. I doubt a doctor will say, "Watson says to cut off his left leg -- I would just given him aspirin for the headache, oh well. Hopefully cutting off this leg makes his head feel better."
What Watson hopefully will do is help diagnosis. Especially tricky ones.
There's a great story in a book I read, I wish I could recall the name, but it begins with a lady who has some stomach issue that she has for like 20 years. Everyone thinks its in her head. She finally happens upon a doctor who happens to have seen something like this before, she gets diagnosed and healed. But she had to live with it for like 20 years after seeing doctor after doctor. Watson would be able to greatly help situations like this, I hope.
UPDATE: The book is "How Doctor's Think". Here's an excerpt that talks about this case, http://harvardmedicine.hms.harvard.edu/bulletin/winter2007/7... -- just in case anyone cares. :-)
The second reason is that IBM was representing Watson as something of a big push in knowledge representation (I just watched a video where they talk about Watson's "informed judgments" about complicated questions for instance). It looks instead like Watson just has an improved ability to disambiguate words relative to previous systems and to do quick lookups that match those words with nearby key terms.
For example, on the clue "Rembrandt's biblical scene 'Storm on the Sea of' this was stolen from a Boston museum in 1990", Watson correctly answered "Galilee". But its next two answers were "Gardner Museum" and "Art theft"; no one who "understood" the question in any conventional sense would even consider these as answers because they don't make any sense. Clearly, Watson looked for instances of "Rembrandt", "Storm on the sea of", "stolen", or other phrases from the clue in its text corpus, and found that "Galilee", "Gardner Museum", and "art theft" all frequently occurred when together (because the painting was stolen from the Gardner museum in an instance of art theft), and relatively rarely when not together. "Galilee" probably won out of these three because Watson is tuned to Jeopardy clue styles (whenever there is a quoted phrase in a clue followed by the word 'this', it's always asking for the answer that completes the phrase).
Similarly, Watson was far less confident on the clue "You just need a nap!" You don't have this sleep disorder that can make sufferers nod off while standing up." It still got the right answer of "Narcolepsy", but with a relatively low confidence of 64%. "Insomnia" had a confidence of 32% despite clearly being the opposite sort of sleep disorder, and "deprivation" appeared at 13%, despite not being a sleep disorder. Here Watson gets confused because the only term of the clue that appears more frequently with "narcolepsy" than "insomnia" is "standing up"; my guess is that if "standing up" had been replaced by some oddly phrased, uncommonly occurring synonym, Watson wouldn't have been able to come up with an answer, despite the clue conveying exactly the same information.
This kind of cleverness is certainly impressive, but it seems like it's an advance in tuning existing techniques to the format of Jeopardy, not an advance that will spark other successful projects down the line. IBM's goal of giving us "the computer from Star Trek" doesn't seem any closer; I don't see any evidence that Watson could have answered a question that required more thought or understanding than a simple text search. If there was the question "how many kings ruled England in between Henry the Fourth and Henry the Eigth" (8), then Ken and Brad would have been able to answer relatively easily, while my guess is that Watson would be stumped.
But I think your peek into Watson's inner mind may give you a more insight than you have about the human mind.
I'm reminded of a story about how a girl told me she was good at froggy when it came to basketball. I was like, "What's froggy" and she said "when you get the ball after someone shoots it". I said, "I think its called a rebound". And she said, "that's the word, rebound... but froggy and rebound, they remind me of each other"
And your narcolepsy v insomnia example is a mistake I think a lot of humans make. Like if you ask me which way to turn a lightbulb to remove it, my brain will have both clockwise and counter-clockwise as responses. And clockwise is probably 80%, but counter clockwise is probably at 20% -- I have been known to accidentally tighten a bolt, rather than loosen it.
My concern about its utility, and I read they would like it to answer medical questions, is that
Watson's performance reminded me of chess computers. They play fantastically well in maybe 90% of
positions, but there is a selection of positions they do not understand at all. Worse, by definition
they do not understand what they do not understand and so cannot avoid them. A strong human Jeopardy! player,
or a human doctor, may get the answer wrong, but he is unlikely to make a huge blunder or category error--
at least not without being aware of his own doubts. We are also good at judging our own level of certainty.
A computer can simulate this by an artificial confidence measurement, but I would not like to be
the patient who discovers the medical equivalent of answering "Toronto" in the "US Cities" category,
as Watson did.
I would not like to downplay the Watson team's achievement, because clearly they did something most
did not yet believe possible. And IBM can be lauded for these experiments. I would only like to wait
and see if there is anything for Watson beyond Jeopardy!.
If IBM wants to fix the "Toronto" problem, have at it. But those sorts of "embarrassing" errors could be quite costly in medical situations. During the show they showed Watson's progression from really stupid answers very frequently to less frequently, which makes me personally believe their fundamental process is flawed (not necessarily irreconcilable) and their current algorithms are just a bunch of hacks thrown together on top of Google rather than something more sophisticated like Wolfram Alpha.Surprise, that kind of mistake happens far too frequently in the medical field now.
Why is Kasparov commenting on something so far out of his recognized area of expertise relevant anyway? I don't go to Knuth for advice on chess, nor Hawking for snarky banter on economics, etc. (Although if I had access to either of those 2, I might try it.)
In the medical case, it's actually better for the answer to be obviously, embarrassingly wrong than slightly wrong. Like the other commenter said, people aren't going to be getting amputations for headaches just because Watson says so. There's much more danger in something like prescribing medications with a fatal interaction, something that a hypothetical "Dr. Watson" would pick up.
This is almost certainly true for humans too in terms of general problems rather than specifically chess. There are probably concepts which we have so little understanding and comprehension of that we can't even see our own ignorance. Rumsfeld's known unknowns.
EDIT: Wow this comment is way more controversial than I thought when writing it. Down to -1, up to 2, back to 0.
Anyone who finds it so objectionable as to downrate, please explain why that's so? Discussion > downvoting.
Instead, IBM wanted a forum to show off its multi-million-dollar QA technology, and approached Jeopardy. (They may have also, though I haven't seen definitive information either way, offered Jeopardy promotional payments.) IBM then spent 3+ years optimizing for the Jeopardy domain. (In the Reddit QA, the Watson team answered: "At this point, all Watson can do is play Jeopardy and provide responses in the Jeopardy format.")
And in the matches, Watson dominated on one dimension of Jeopardy play – quickly pressing a button after a light goes off – that's the least interesting technical challenge. (Yes, it's an important part of any champion's skills, but a machine would have won that button-pressing competition 50 years ago, so it obscures rather than highlights any other 'breakthroughs' Watson may represent.)
While impressive in several dimensions, and drawn from much deeper research by IBM, the only thing we can say for sure about Watson is that it was a "Horse for the Course" in Jeopardy. And unfortunately, no other computer horses were invited to play, and offered the same prizes (in money and fame).
I suspect, now that the pattern has been set, we'll see leaner teams showing they can do as well or better than Watson with far less funding/hardware, over the next few years. Still, in the popular imagination, these efforts will live in the shadow of Watson, when a fair competitive process might have given them a chance to upstage Watson.
I think everyone was disappointed in the applicability of the Deep Blue accomplishment in other fields. Were any of the special purpose ASICs used to defeat Kasparov used in any other application? As far as I know a significant part of the Deep Blue development team left IBM relatively soon after the accomplishment.
"As with Deep Blue, he had once again let an encounter with a machine play games with his head. He had been obsessed with the idea that Deep Junior would never tire. 'The machine is never distracted by an argument with its mother," he told me, 'or a lack of sleep.'
And in the linked piece Kasparov alludes to the reported next approach IBM wants to take with Watson - support in medicine.
Kasparov's human reaction to his encounters with Watson's distant cousins brings up one obvious benefit in the use of technology like Watson for supporting medical decision-making - simply that such software will be less likely to miss something. Software is less likely to miss considering a diagnosis, ordering a crucial test, or following up on a finding - unlike the fallible 'I' who may have skipped a class in med school, or was up all night on call and just can't think straight, or am just occasionally more stupid than usual.
Diagnosis is the first thing people think of with technology like this, but in my opinion that's not the big problem Watson should tackle. Medical diagnosis in and of itself (dramatizations like the TV show 'House' notwithstanding), is not really that difficult 99% of the time. When you hear hoofbeats, you're very likely going to find horses and not zebras. A future Dr. Watson might occasionally be very helpful in pointing out very obscure (but uncommon) diagnoses. However, in my opinion the most helpful thing a Dr. Watson could provide is collecting, evaluating, and comparing evidence and outcomes as they are developed globally and locally (ie across broad swaths of medicine, but also within a single physician's own patient population), continuously educating the physician, and monitoring cases.
There is plenty of untapped medical data/evidence out there, but it's almost all hidden away in plain sight...text/natural language. I have to agree with Kasparov here, in that the primary advancement Watson represents was in moving farther down the path from syntax to semantics.