Poisonous MD5 – Wolves Among the Sheep (opens in new tab)

(blog.silentsignal.eu)

110 points4mnt10y ago52 comments

52 comments

The relevance of the article's mention of the "Flame" malware was puzzling, since no context is provided and the linked Wired article doesn't shed any light.

Wikipedia has this to say, which seems to solve that puzzle:

"Flame was signed with a fraudulent certificate purportedly from the Microsoft Enforced Licensing Intermediate PCA certificate authority. The malware authors identified a Microsoft Terminal Server Licensing Service certificate that inadvertently was enabled for code signing and that still used the weak MD5 hashing algorithm, then produced a counterfeit copy of the certificate that they used to sign some components of the malware to make them appear to have originated from Microsoft. A successful collision attack against a certificate was previously demonstrated in 2008, but Flame implemented a new variation of the chosen-prefix collision attack."

http://en.m.wikipedia.org/wiki/Flame_%28malware%29

WalterGR10y ago

Whoops - I didn't remember that Wikipedia uses a separate domain for mobile browsers.

Here's the 'real' link: http://en.wikipedia.org/wiki/Flame_%28malware%29

aylons10y ago

I'm no security expert, but I have a question.

In some systems I've built in the past I employ MD5 as a hashing mechanism to verify firmware integrity after flashing it in the memory. I don't use MD5 for anything security related (this is treated in other ways, depending on the system), just to check transmission and memory integrity.

Is MD5 still considered fine for this, or is there a real risk that random or systematic (but unintentional) noise could generate a collision between corrupted and original data? I do believe it should suffice, but hearing all the badmouth makes me wonder...

andrew-lucker10y ago

MD5 is good enough to prevent most random collisions. The problem is when you need to prevent intentional collisions.

innocenat10y ago

As a note, even CRC32 is enough to check most random collisions.

1 more reply

cmdrfred10y ago

I'm no expert either but as I recall if you verify the length as well it should be almost impossible.

dperfect10y ago

Aren't all hashing algorithms vulnerable to the possibility for collisions (albeit with different degrees of difficulty)? It sounds like the problem here is more related to the logic that relies on a hash alone to make important decisions.

Not saying that MD5 is a good choice in this case, just that we may be blaming the wrong thing.

mikeash10y ago

Your parenthetical is the key, though. There's a big difference between a hash algorithm where generating a collision requires a few minutes of work on a cheap computer (MD5, now) and a hash algorithm where generating a collision requires a computer the size of the universe operating for a trillion trillion years (any good cryptographically secure hash).

dperfect10y ago

Cool - didn't realize the difference was so great. I've always known that the good algorithms are better because they're more difficult to brute-force, but always wondered if it's just a matter of a few years before the "impossible" becomes possible. Your illustration helps clarify that improbability in my mind - thanks!

3 more replies

gkoz10y ago

Did breaking MD5 require a computer the size of the universe 20 years ago?

3 more replies

malka10y ago

Yeah, but MD5 is broken in the sense that you can generate a collision on purpose.

kedean10y ago

The collision-resistance property that all good hashes should have (and md5 lacks) states that an attacker with an input and its hash cannot arbitrarily produce a second input with the same hash. The possibility of it happening in the wild will always exist with hashes by their finite nature, but the only way an attacker should be able to find collisions is by enumerating the input space (rainbow table generation).

jimrandomh10y ago

No, the property you describe is called "preimage resistance". Collision resistance is stronger; it states that an attacker should not be able to create a pair of inputs with the same hash. In the case of md5, creating a pair of inputs with the same hash is easier than creating another input with the same hash as something else which you didn't yourself generate.

The MD5 algorithm is known to lack collision resistance, but whether it has preimage resistance is less certain; mathematical advances have weakened its preimage resistance, but not yet to the point of demonstrating a practical preimage attack.

3 more replies

hinkley10y ago

Yes, but the complexity of finding a collision in SHA1 is about 2^14 higher than MD5, and even SHA1 is being sunset by many people.

cm218710y ago

I don't get why it is a security problem that someone can manufacture false positives for an anti-virus. What is the benefit for a virus to have non-malicious code caught by the anti-virus?

False negatives would be more of an issue if the anti-virus has white lists and one can manufacture a Microsoft Excel MD5 signature with a malware. But that's not what the article refers to.

MD5 is only broken if you want to use it as a non-reversible hashing algorithm or if you want to use it as a an unforgeable signature. But it's perfectly fine for many other usage.

bariumbitmap10y ago

From the article:

  As you can see, binaries submitted for analysis are
  identified by their MD5 sums and no sandboxed execution is
  recorded if there is a duplicate (thus the shorter time
  delay). This means that if I can create two files with the
  same MD5 sum – one that behaves in a malicious way while the
  other doesn’t – I can “poison” the database of the product
  so that it won’t even try to analyze the malicious sample!

So it's a technique to get the scanner to ignore a malicious binary by constructing a non-malicious one with the same MD5 sum. This would be much harder if the scanner used a SHA-1 hash or similar.

cm218710y ago

But that's a white list. But I thought anti-virus rather work by black listing.

1 more reply

sarciszewski10y ago

sha256sum or b2sum (BLAKE2b) would be far better than sha1 :)

jimrandomh10y ago

You misunderstand. The researchers are presenting a way to manufacture false negatives for an anti-virus. It works by confusing antivirus vendors' infrastructure into thinking it's already analyzed an executable and found it to be innocent when it's really analyzed something else.

snorkel10y ago

The attack vector would be malware binary crafted to have the same MD5 sig as a popular already trusted app. But of course once the badware is caught virus scanners could check other properties aside from MD5 sig to flag a bad binary. I assume virus scanners use MD5 just a fast prescreen scan, then do a few deeper checks on pototentially bad binaries to make sure.

ryan-c10y ago

What you describe there would be a preimage attack[0], not a collision attack. There is no publicly known practical[1] preimage attack on MD5 at this time.

0. http://en.wikipedia.org/wiki/Preimage_attack

1. 2^123.4 complexity is not practical

1 more reply

makomk10y ago

This is actually one of the older and easier attacks against MD5; we've known this was possible for over a decade. Nowadays it's actually possible to so chosen prefix attacks - you can literally take two arbitrary, unrelated files and append some data that makes them have the same MD5. So you don't even have to include the malicious code in the decoy file in any form anymore.

j / k navigate · click thread line to collapse

52 comments

WalterGR10y ago

The relevance of the article's mention of the "Flame" malware was puzzling, since no context is provided and the linked Wired article doesn't shed any light.

Wikipedia has this to say, which seems to solve that puzzle:

http://en.m.wikipedia.org/wiki/Flame_%28malware%29

WalterGR10y ago

Whoops - I didn't remember that Wikipedia uses a separate domain for mobile browsers.

Here's the 'real' link: http://en.wikipedia.org/wiki/Flame_%28malware%29

aylons10y ago

I'm no security expert, but I have a question.

andrew-lucker10y ago

MD5 is good enough to prevent most random collisions. The problem is when you need to prevent intentional collisions.

innocenat10y ago

As a note, even CRC32 is enough to check most random collisions.

1 more reply

cmdrfred10y ago

I'm no expert either but as I recall if you verify the length as well it should be almost impossible.

dperfect10y ago

Not saying that MD5 is a good choice in this case, just that we may be blaming the wrong thing.

mikeash10y ago

dperfect10y ago

3 more replies

gkoz10y ago

Did breaking MD5 require a computer the size of the universe 20 years ago?

3 more replies

malka10y ago

Yeah, but MD5 is broken in the sense that you can generate a collision on purpose.

kedean10y ago

jimrandomh10y ago

3 more replies

hinkley10y ago

Yes, but the complexity of finding a collision in SHA1 is about 2^14 higher than MD5, and even SHA1 is being sunset by many people.

cm218710y ago

I don't get why it is a security problem that someone can manufacture false positives for an anti-virus. What is the benefit for a virus to have non-malicious code caught by the anti-virus?

False negatives would be more of an issue if the anti-virus has white lists and one can manufacture a Microsoft Excel MD5 signature with a malware. But that's not what the article refers to.

MD5 is only broken if you want to use it as a non-reversible hashing algorithm or if you want to use it as a an unforgeable signature. But it's perfectly fine for many other usage.

bariumbitmap10y ago

From the article:

  As you can see, binaries submitted for analysis are
  identified by their MD5 sums and no sandboxed execution is
  recorded if there is a duplicate (thus the shorter time
  delay). This means that if I can create two files with the
  same MD5 sum – one that behaves in a malicious way while the
  other doesn’t – I can “poison” the database of the product
  so that it won’t even try to analyze the malicious sample!

So it's a technique to get the scanner to ignore a malicious binary by constructing a non-malicious one with the same MD5 sum. This would be much harder if the scanner used a SHA-1 hash or similar.

cm218710y ago

But that's a white list. But I thought anti-virus rather work by black listing.

1 more reply

sarciszewski10y ago

sha256sum or b2sum (BLAKE2b) would be far better than sha1 :)

jimrandomh10y ago

snorkel10y ago

ryan-c10y ago

What you describe there would be a preimage attack[0], not a collision attack. There is no publicly known practical[1] preimage attack on MD5 at this time.

0. http://en.wikipedia.org/wiki/Preimage_attack

1. 2^123.4 complexity is not practical

1 more reply

makomk10y ago

j / k navigate · click thread line to collapse