One of his biggest take-aways in building systems like the one in the linked paper was the need for layered understanding. For example, to understand the meaning of a passage of scanned text, you can't just look at each individual letter of the text. There is ambiguity in characters like l and I or 0 and o and O. To remove that ambiguity, you bring in more context by looking at a character in a word. Then a word within a sentence. Then a sentence within a passage. This is how they were able to build highly accurate systems without massive amounts of training data.
Now that I'm on a roll recollecting the teachings of Henry Baird I will add two more:
1) There is a difference between an algorithm and a heuristic that is often overlooked these days: algorithms are provably correct. If you can't prove it's correct, then it's a heuristic. By this measure, there are very few algorithms out there, because there are very few proofs of correctness. Most likely, anything you've ever called an algorithm that you've written yourself is actually a heuristic.
2) Because of the above, Baird was very fond of KNN classifiers, because they are one of the few classifiers that have any sort of provable guarantees -- namely that with infinite data, the error rate is guaranteed to be no worse than 2x the Bayes error.