"Perl is forgiving. Biological data is often incomplete, fields can be missing, a field that is expected to be present once occurs several times (because, for example, an experiment was run in triplicate) or the data gets entered by hand and doesn't quite fit the expected format. Perl doesn't particularly mind if a value is empty or contains odd characters. Regular expressions can be written to detect and correct a variety of common errors in data entry. Of course, this flexibility can also be a curse, as I'll discuss in more detail later."
A few words are different. The article says triplicate, and p3ll0n says duplicate, for example. But they are similar enough to use as testing input to a diff algorithm.
EDIT: Also from this guy's comment history:
http://news.ycombinator.com/item?id=1456105
Some of the phrasing looks to have been copied and pasted from this article by Jonathan Ellis:
http://www.rackspacecloud.com/blog/2009/11/09/nosql-ecosyste...
I bet if you could make a bot to do this -- go out and find relevant information, and summarize it -- you could actually provide a serious public service. As long as you cited your sources, so it's not a plagiarism-bot.