Example 1: Some years ago, I had to sit through a meeting where a committee worried about a 2% drop in satisfaction scores on a student questionnaire. No-one checked how many replies were involved (around 400, so it worked out to about 6 people less in the second year than the first as the ratings were something like 75%).
Example 2: I recently had to add comments in a record system about students whose attendance percentage had dropped below 90%. That was 8 weeks into the course...
I don't know if Newton's law's would jump out of the paper if you simply threw a ball at one million different vectors.
We might be initially processing the large data using relatively simple techniques, but on the reduced data, we can now run more sophisticated methods that actually work because the underlying data comes from a huge number of samples.
As but one example, in computer vision, the concept of "attributes" -- automatically labeling objects using descriptive words instead of categorical ones, i.e., "this thing is like..." rather than "this thing is..." -- has opened the door to a number of exciting advances. One is the concept of "zero-shot learning": automatically recognizing an object that you've never seen an instance of before simply via a description. For example, one could recognize beavers as "small, four-legged furry rodents with big teeth and a flat tail", without having ever seen a beaver before. The training data for this classifier need not include beavers, but only images which match the individual attributes, not necessarily all in the same image -- small, four-legged, furry, rodent, big teeth, flat tail.
This kind of thing was not really possible before, because there just wasn't enough data to train reliable classifiers for each attribute in any kind of automated way.
Finally, as I alluded to at the beginning, these individual attribute classifiers are often relatively simple algorithms, such as Support Vector Machines (SVMs). Yet, the 2nd-stage algorithms that use the attribute values to do something useful, such as the zero-shot learning application described above, are often much more involved/advanced techniques.
1. Remoteness of location - few outside influences 2. Relatively few species!
Even though it's on the equator, the islands aren't all jungle and animals. The sheer lack of different species made it possible to see every single one of them in a single visit, and allowed Darwin to theorize without thinking he missed something.
Sometimes, simplicity helps with focus
A lot of early "progress" in AI was found to not survive contact with the real world -- for example, most of computer vision. This was because collecting data was so expensive/difficult that only a few images could be captured for many experiments, and the methods they came up with often worked okay for those examples, but nothing else! So a lot of clever-seeming algorithms end up being rather useless in the real world, and progress was illusionary.
I find that in computer vision (my area of research), a fundamental component of many disparate problems is that you are trying to interpolate or extrapolate data in a very complicated underlying space, where linear approximations are completely unusable and optimization is too unconstrained. The key is to come up with suitable regularizers that can use prior information to constrain the problem appropriately.
Getting more data thus helps in two ways:
1. It reduces the amount of interpolation you have to do, since you can get a denser sampling of the space.
2. It allows you up to build up these priors using real data, making interpolation much better.
Empirically, this paper (http://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icm...) makes a reasonably compelling case that Naive Bayes is really not very good compared to anything that actually models cross-feature correlations. Theoretically, it's clear that Naive Bayes will fail in unboundedly bad ways given enough strongly correlated features (If I just duplicate the feature N times, I effectively multiply its coefficient by N without actually adding any new information).
Note: I believe there is a technical weakness in the paper due to how they quantized continuous variables for use in Naive Bayes, but the overall performance trends reported confirm my experience with modeling projects in the wild.
Edit: I realize that one might make the claim just in the context of huge data sets, but again you have to get lucky not to have strong correlation effects that other models would handle better.
Edit 2: Oh, I'm an idiot. You specifically said AI. I'll leave the comment as it was, because I often hear the "with enough data Naive Bayes is as good as anything else" story and hope to influence anyone who might be impressionable :-)