1) Instead of using the full color pixel image, use an "edge image" with some simple additional normalizations. If color is important, do this per color channel.
2) Create a dataset with as many cropped examples of the target object as you can find (mechanical turk is useful for annotating large datasets); every other crop of every image is a negative example.
3) Train a classifier (SVM if you want it to work, neural network if you're so inclined) using this dataset.
4) Apply the classifier to all subwindows of a new image to generate hypotheses of the target object location. This can be sped up in various ways, but this is the basic idea.
5) Post-process the hypotheses using context (can be as simple as simply finding the most confident hypotheses within a neighborhood).
If you're interested in object detection, an excellent recent summary of the recent decade of research is due to Kristen Grauman and Bastian Leibe: http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y2... (do some googling if you don't have access to this particular PDF).
A cool paper from a few months ago that should be mentioned when commenting on a post called "Where's Waldo?" is http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.ht...
Somehow I'm always surprised when two vision people agree on the right way to approach a problem =)
Waldo always seemed a bit of a strange name, and it still confuses me why it would be changed for the US market. Anyone know why (Wiki doesn't say).
I felt dirty with all the exclamation marks.
There's a danger of overfitting, where a technique works for one instance (or a subset of instances), but not in general. Detecting stripes could work in general, but as a SO commenter noted, "Where's Wally" images often include spurious stripes to undermine this detection strategy for humans.
I was impressed until I read that--the guy is basically fitting the model/procedure to the training set (of size 1). I'd wait for a more general approach before accepting the answer.
http://www.npr.org/blogs/waitwait/2011/12/18/143865340/the-w... via http://meta.stackoverflow.com/questions/116401/stack-overflo...
Template matching is your friend in this case, because most Waldos look similar. You already tried this in a basic way by searching for the stripes of a given color. You can make it more powerful by making the template include more properties, and work in more contexts. For instance: what if Waldo's a different size?
The other option is to pretend you don't know what Waldo looks like, find him in a bunch of images, label the subimages as "waldo" candidates, measure certain properties of those subimages, and find which of coordinates of feature space have similar properties. Then use these properties as your template.
Finally, you could train a classifier on subwindows like sergeyk suggested. This has some difficulty because where's waldo images are difficult to subdivide into subwindows on the scale of a single person. Do you move pixel by pixel? Do you divide it into a grid? Each grid will contain weird parts of people in each box. Etc. If you do find a way to divide the image into "people" -- perhaps by doing a preliminary "person"-template sweep that identifies locations of people in the image -- then you can use a supervised learning algorithm to say "yes, this person is waldo" or "nope, FRWONG!", based on the image properties in the subwindow around that person.
A good solution to this would get close, then calculate the probabilities of every "maybe-waldo" and then display the one with the highest probability of being Waldo. An augmented reality app that highlighted Waldo on every page would be awesome.
I don't know how many variations on the /Where's Wa[a-z]+\?/ theme have actually been produced, though, so maybe it wouldn't be easier.
Then again, if you can upload unknowns, wait until you've got enough samples to generate confidence, and then store the result, it'd scale/perform much better :)
This article is not interesting because it's an amazing new algorithm or something that solves some important world problem. It's interesting because it takes something that is not known among the general hacker population for doing this sort of thing really easily, and accomplishes it in a fairly simple way.
Don't be a grump, this is cool. :(
I mostly wanted to see who else remembered that particular Waldo puzzle...it was the final one in one of the books