it doesn't use natural language processing, it *computes* the answer.
GibberishFurther, one of the promising approaches in this area involves using generative grammars (or other generative, non-parametric approaches) to approximate natural language representations.
Both these approaches 'compute' answers without a notion of natural language grammars that are usually associated with natural language processing.
usually associated
I've definitely read papers about using those methods you describe on processing text. They are all just algorithms attacking a problems, so the distinction between NLP and computing is gibberish.The demo was very impressive, but in practice they just couldn't match the scope of Wikipedia or the wider Web. It may be the same way with Wolfram|Alpha.
A cite to some Google claims when they came out would be nice. (Academic papers don't count.)
I was in the SF Bay Area at the time and the "publicity" that I remember was friends saying "check this out". I don't recall Google saying anything beyond "here's how many pages we indexed".
I think that's the point shimonamit is making.
Was Twitter hyped up like this before it launched? Facebook? Google? Microsoft? Apple? TechCrunch? Hacker News? Wikipedia? Heck, pretty much ANYTHING that's successful now? (Even small stuff like Balsamiq that's currently very successful in a small way wasn't hyped before it launched).
Now think of stuff that was hyped massively before launch. Cuil. Powerset. Yeah.
Stuff that ultimately becomes super successful becomes successful over quite a long period of time and due to the excitement of users after launch - not the bleatings of gurus before launch.
http://www.trueknowledge.com/technology/
It's an interesting concept, and has much broader applications through their API.
Products that tend to be modest initially and improve and prove themselves rapidly tend to do better than products that are hyped up beyond all proportions.
The thesis of A New Kind of Science is something like "systems comprised of a small number of simple rules can perform arbitrarily complex computations."
The book proceeds to support the thesis. The content is comprised of descriptions of such systems, corresponding Mathematica execution trace diagrams, and analysis. These analyses are related to a ambitiously large scope of natural phenomenon and scientific knowledge.
I think a couple of things are clear.
(1) We are at the point where something impressive is likely to be able to be produced, and Wolfram may very well have the resources to do it.
(2) We are not at the point where the be-all-end-all version of this can be produced.
Compare this with the symbolic computation packages (Mathematica, Maple, etc.). Around 1990, we were at the point where we could produce a very good one. Several were written. They have been improved since, but only marginally. We're still pretty much using 1990 technology.
And that's fine. We knew how to make a really good symbolic computation package. We did. End of story.
But consider the proposed packages (Alpha, etc.). We might produce something impressive. But we are not ready to produce something really good and useful. Our initial efforts will require lots of improving.
And Wolfram is definitely not the one to do that improving. He runs an aggressively closed shop. Always has. I predict, therefore, that the cathedral-bazaar effect is going to mean his product will be difficult to improve, and so will never become truly useful.
If he provides not just technology and data but also the means to extend that technology and data by following his example he might be contributing something truly revolutionary.
Seems to me that this technology should have been released for some other scientific usage first (if it is indeed that powerful). It could be valuable as an engine for other applications as well in this manner.
I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.
> Seems to me that this technology should have been released for some other scientific usage first
Why?
Technologists need to get over the idea that technology is for science/technology.
> I would also argue that one of Google's advantages is that it enables discovery of new information instead of just giving you the one page you for which you are looking.
Huh? If there isn't a page that states which city is the fourth largest in eastern Montana, how will Google help you answer that question? (No fair going to the "populations of cities in eastern montana" page.)
Google doesn't (yet) do join queries.
I think its still way out of reach for non-trivial data-sets. Something like this doesn't just show up out of the blue, its not a problem amenable to some single new algorithm or breakthrough.
I'd love to be proven wrong...
A domain that doesn't have a mathematical model would not be describable by a formal system.
And likewise, any domain describable by a formal system would have a mathematical model.
Which brings us back to the original point: human stuff is not very math-friendly. I want to deal with emotions, politics, etc.
Think Bush and torture: can this system give me any definitive answers?
Too bad for that because right away I was thinking "Wow! It's a sentient version of Google only a bazillion times better!" but then I realized it's just a parser that turns natural language questions into queries against a large dataset then I became all sad and disappointed.
If I'm asking something like, "What is the capitol of Nebraska?" why not just get directed to the Wikipedia entry where I can learn a lot more than the one fact that answers what I just asked?
If Alpha is actually going to do computation, I'd rather be able to use it for something more complex than a single natural language query.
I would love to gather good questions and discuss the results when they are available. I think it is important to find questions to which google, yahoo, powerset or wikipedia don't provide a straight answer.
How about:
1) What is the smallest unknown prime number? ;)
2) Where on earth is the rainy days to sunny days ratio the lowest?
3) How many languages does the average person from the Benelux countries speak?
But heck, I wouldn't even be surprised if they push the scale, 7 cuils anyone?
This software should be able to look up rainfall data from Stephan Wolfram's Bumper Book of Trivia, work out average rainfall for each country, work out which country has the 15th-highest rainfall from that result, then look up the capital city for that country.
All determinable facts with a straight answer; you should simply get the name of the city as a result.
The parent posting is correct. Google is particularly poor at complex questions, especially if you aren't sure about the exact phrasing. I have had several queries in the past couple of weeks where I have had to spend 30 to 90 minutes trying to get the right set of terms to return the right set of documents to look at. This is particularly true if the name of the product is a common English word.
Practically speaking, static content that doesn't change often is better served by models like wikipedia.
If Wolfram knows all the answers, write them all in static html for the world to use, search, browse, replicate and extend instead of stored on semantic databases or ethereal brains.
I am not pissing on their parade, I know the scientific work is commendable, but practically speaking it can't compete with more efficient models.
Wikipedia isn't all that useful for storing all of the sums of integers.
In other words, you can't enumerate all of the questions that have one answer.
Not to be a dickhead, I know what you mean.
Questions that involve some kind of processing power can be a good target for Wolfram, but then, how much marketable besides academia?
The answer to the population of X country/city/town = wikipedia, plus more facts you may be interested while doing your research paper.
Maybe I just need 10 different questions/implementations of such service to get it.
Like TrueKnowledge and the Freebase answers in Powerset, this system will likely be good at answering a small subset of very direct questions. Having access to Mathematica's symbolic solver algorithms would definitely help in building this system.
If it's successful it will either be faster than current inference engines, or capable of solving more complex queries. Or perhaps both. We'll see.