http://en.wikipedia.org/wiki/Billion_laughs
Same is mostly true of JSON parsers as well of course.
If you let potentially hostile users feed arbitrary data into any of these, even a totally non-buggy, perfectly conformant parser is wide-open to being abused via DOS.
My guess is that to distinguish between 'legitimate' cases and 'attacks' is on par with solving the halting problem.
I know I shouldn't be. But here I am, surprised.
Someday, I'd like to see a nice library written in something like O'Caml or Haskell exposed as a C interface that exists solely to be a library. I know it's technically possible today, but it doesn't seem to have penetrated into the public consciousness that even a C library doesn't actually have to be written in C anymore. (It may not be slick yet, but only because people aren't doing it, chicken and egg. There's no fundamental stopper.) I sure as hell wouldn't write a "C" library in C if I had anything remotely resembling a choice.
(Of course, that only solves string bobbles, not the "infinite memory consumption required" problem, but even then those other languages can have somewhat cleaner, clever solutions than in C.)
http://www.codenomicon.com/news/press-releases/2009-08-05.sh...
And a CERT-FI advisory:
http://www.cert.fi/en/reports/2009/vulnerability2009085.html
Also the expat-bug and expat-discuss mailing lists were very active in January/February with seemly related issues:
http://mail.libexpat.org/pipermail/expat-bugs/2009-January/t...
http://mail.libexpat.org/pipermail/expat-discuss/2009-Februa...
"Targets: Anything that uses XML"Wouldn't a better title be "XML Parser Flaws Doom Computing World"?