The downside to any preorder algorithm is that you must rerun the preorder generation code anytime the input changes (in this case a dictionary) and often you must allocate extra storage to hold the pre-processed input.
This is a really interesting algorithm but, as always, you have to know the pros and cons and apply it where if fits (i.e. Not somewhere that needs a compressed dictionary).
[1] http://www.codeproject.com/Articles/31669/Hierarchical-Tree-...
http://java.dzone.com/news/lucenes-fuzzyquery-100-times
Looks like some interesting research and dev going on in this space at the moment.
They don't mention the approach I'd take for a naive spellcheck:
Generate a (giant) trie from the dictionary. Then perform a DFS on the trie allowing only <the current closest word's number of edits> edits from the input word, keeping track of the edits done so far - try no changes at each node first, then a deletion, then a change, then an insertion. This works better with a smaller alphabet size.
An optimization: a DAWG is potentially even better space-wise than a hashmap (effectively it ends up compressing the input data), but takes a while to precompute.
BTW - if anyone's interested in the Go implementation let me know, I'll try to find it and post it somewhere.
Am I mistaken?
Actually even reading the article I'm not completely certain where that 1000x comes from, I suppose it's when compared to the naive approach.
> This is three orders of magnitude less expensive (36 terms for n=9 and d=2)
Three orders of magnitude is 1000. Peter Norvig's approach was described above with:
> still expensive at search time (114,324 terms for n=9, a=36, d=2)
So 114,324 / 36 = 3,175, so "three orders of magnitude", and I suppose he went conservative by saying "1000x" rather than "3000x".
Some power languages (Dutch, German) have an infinite number of words, as you can take 2 nouns (fe a and b) and concatenate them to form a new one (ab means something else than ba). Some languages also have inflection....