How Not to Encrypt a File – Courtesy of Microsoft (opens in new tab)

(medium.com)

88 pointsrakel_rakel8y ago61 comments

61 comments

The author could spend less time bashing the original article and a little bit more explaining how to do things right.

This:

> Suggestion to use the encryption key as the IV

is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.

Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.

After reading the post I haven't learned anything useful other than that the original article was bad.

Bartweiss8y ago

I... sort of have mixed feelings on this.

I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).

But holy shit, the MSDN article is bad. It's so hideously bad that I think there's nontrivial social value in bashing it extensively to discourage people from writing docs like this without getting them sanity-checked.

In short, I think this article is largely useless to people reading guides and trying to avoid the pitfalls of the original source, but is aimed at people writing crypto guides who have no business doing so.

jlebar8y ago

Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?

"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."

"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."

Similarly maybe the author didn't explain what "IV" means because their audience understands that term.

"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."

Lagged2Death8y ago

Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?

I wouldn't expect an explanation, I wouldn't say the author is obliged to that kind of effort.

I did sort of expect a link to an explanation, though.

At this writing, other comments in this very HN thread claim there are many intro-level explanations of IVs out there to choose from. They don't link to them either.

Hypertext is what makes the web special, you know? The article would be more useful with a link. Think of this: even this discussion, here on HN, would have been more fruitful if the author had included a link to some explanation.

Similarly maybe the author didn't explain what "IV" means because their audience understands that term.

I have actually shipped a couple of products that made use of encryption packages, and I've never heard of an IV. Maybe the encryption advice I followed was terrible; maybe the instructions were terrible; maybe the packages were terrible. Maybe I'm an idiot suited only to the digging of ditches.

1 more reply

recursive8y ago

I was in the audience, and if I ever knew how one should use an IV, I forgot. The article would have been more valuable to me if it gave a summary of what IVs are instead of what they aren't.

2 more replies

Sophira8y ago

While I'm sure the article is correct, it doesn't even attempt to link to resources to say how these things are misunderstandings. For example, I myself don't really understand IVs, and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret, or why the IV isn't required to be able to decrypt the file again.

Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.

fpgaminer8y ago

> I myself don't really understand IVs

Time to drop some knowledge!

IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.

Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box. You input a key and an IV and out you get a really, really long stream of uniformly random bytes. (This is a simplification but sufficient here). ChaCha20 works in such a way that having any or all of the output stream doesn't help you figure out what the inputs were. It's irreversible. ChaCha20 is also deterministic; the same input will give the same output.

You can then use the output of random bytes to encrypt a message by XORing with your plaintext. To later decrypt, you feed the same key and IV, get the same stream, XOR the ciphertext with it, and by the property of XOR you'll get the plaintext.

Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:

    R = ChaCha(Key)
    Ciphertext = Message ^ R

So let's encrypt two different messages:

    R = ChaCha(Key)
    Ciphertext1 = Message1 ^ R
    R = ChaCha(Key)
    Ciphertext2 = Message2 ^ R

Notice how R is the same for both messages? Again, ChaCha is deterministic; the output is the same for the same inputs. Since the key is the same, R is the same. Now an attacker, knowing this, can do this

    Q = Ciphertext1 ^ Ciphertext2

What does Q end up being? Let's look:

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R ^ Message2 ^ R
    = Message1 ^ Message2

So Q ends up being equal to the XOR of the two messages. That's really bad. The xor of two messages might be enough to tell the attacker what the messages are, especially if the messages are predictable (like english text). But maybe that's not scary enough. Well there's another attack. What if you're encrypting a data format with a header. Headers often have the same data in the same places. So the attacker knows part of the message. Uh oh...

    R = Ciphertext1 ^ Message1

If the attacker knows the message (or any parts of it) they can recover the R of those parts. And now, since your key is always the same and your R is always the same, all the other messages you encrypt will have those bytes exposed.

This is where IVs come in:

    R = ChaCha(Key, IV)

IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish:

    R1 = ChaCha(Key, IV1)
    Ciphertext1 = Message1 ^ R1
    R2 = ChaCha(Key, IV2)
    Ciphertext2 = Message2 ^ R2

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R1 ^ Message2 ^ R2

And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.

That's the point of IVs.

> why the IV isn't required to be able to decrypt the file again.

It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.

Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.

But you always need the IV to decrypt.

> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,

Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes out. There's no way to reverse that without the key (and IV). Since the attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't help.

Another way to think about it is that, instead of accepting a 256-bit key and a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key doesn't help break a cipher. The cipher is still 256-bits strong. So you can share the IV without affecting security.

BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.

Hope that helps. This is only one way that IVs are used. In ChaCha20 it's called a nonce, because ChaCha20 is geared towards usage on network protocols where the above trick of using packet number is applied. For block ciphers there are various cipher modes that get used, and most of them need an IV. The purpose is always the same; to make this "session" of encryption unique.

There's another way to use IVs, and I think they re-affirm the concept of what an IV actually is. Let's say you have a cipher that only accepts a key! No IV (like AES). You still want to make your encryption sessions unique. A way to do that is this:

    TempKey = HMAC (IV, Key)

And then use TempKey. HMAC is a form of hash. In this case it lets us combine a Key and IV in an irreversible way, yielding a new key. TempKey will be the right size key for the cipher (say, 256-bits). What this is doing is giving us a unique key for every encryption session. And that's the heart of IVs. And in many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV and using the output hash to generate a long stream of random data that can't be reversed back to the key+IV.

(and in case you're wonder, yes, you can use a cryptographically secure hash function alone to build a stream cipher like ChaCha. It'll just be _really_ _really_ slow, because hash functions are really, really slow compared to ChaCha.)

teh_klev8y ago

Thanks for spending the time explaining this.

1 more reply

UncleMeat8y ago

Note that AES uses IVs in CBC mode. It is incorrect to say that AES does not use IVs.

1 more reply

jdcarter8y ago

In addition to fpgaminer's excellent explanation, I highly recommend the book "Cryptography Engineering: Design Principles and Practical Applications" by Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno. It's an excellent overview of how to use crypto primitives and why to use them that way.

rakoo8y ago

I know you're not just looking for answers but a pointer to some better documentation, and I can't provide you with those, but:

> why IVs shouldn't be considered secret

The least is considered secret, the least can be leaked and cause problems.

> why the IV isn't required to be able to decrypt the file again

The IV is required to decrypt the file again. In the linked document's design the IV is actually the encryption key, which means it is known by the receiver, which is why it's not included. But that is just a special case that should never be reproduced.

sixothree8y ago

Agreed. Where is the pointer to the correct article to use when encrypting a file in C#.

pacaro8y ago

Note: All my information re: Microsoft is from no later than 2013.

This is indicative of a classic challenge in the industry.

To ship code that uses crypto at Microsoft you have to go through an auditing process. To ship code that uses novel crypto, or works directly with crypto primitives, you have to be reviewed by a specialist crypto review board — that contains security and crypto people from across the company, names that you might know (e.g. Niels Ferguson was there last time I needed a review. Hi Niels!)

Samples and documentation aren't held to the same standard.

nailer8y ago

Microsoft have already 404d the article: https://support.microsoft.com/en-us/help/307010

casparz8y ago

Luckily we have a snapshot: https://web.archive.org/web/20170327154501/https://support.m...

bartread8y ago

Also now dead - I just get a blank page apart from the header and footer.

3 more replies

nailer8y ago

Here's the original Microsoft article:

https://gist.github.com/mikemaccana/badf6c16f203e05c02b42f93...

(disabled JS in DevTools, caught it from archive.org before JS to wipe it kicked in)

unscaled8y ago

As someone in charge of reviewing all crypto code for a sizable chunk of my company, I've yet to see a single case of someone using encryption primitives correctly by naive developers. To tell the truth, I don't think I've ever seen a single example of IVs used correctly.

At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.

I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.

I wanted to call Microsoft sneaky for pulling out this article, but considering basically every top-ranked "how do I encrypt with AES" question on StackOverflow is full of bad advice, I'm glad they at least did something.

jwilk8y ago

The article says that DES "can be brute forced in a single digit number of days by a modern computer".

  2**56 keys / 9 days ≈ 92.7 Gkeys/s

Can modern computers actually compute DES that fast?

danbruc8y ago

This benchmark [1] gives 196.2 GH/s for DES using 8 Nvidia GTX 1080 Ti and Hashcat 3.5. So while your average computer is probably not quite sufficient it is certainly in reach.

[1] https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...

mikeash8y ago

Here's a project that did 1.4G/s on a single GPU five years ago:

https://www.reddit.com/r/crypto/comments/162ufx/research_pro...

Stick multiple modern GPUs in a machine and single digit days seems feasible.

CiPHPerCoder8y ago

Yes: http://www.h-online.com/security/features/A-death-blow-for-P...

natch8y ago

Another version of essentially the same article is still live here:

https://support.microsoft.com/en-us/help/301070/how-to-encry...

d--b8y ago

Yep, all over the place:

https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...

EDIT: ok maybe not "all over the place", but it's been done.

Strategizer8y ago

The article author is complaining about an MSDN article not being updated. The content even says it applies to VS 2005 at its highest. That's a hint of how old it is. Is he going to get the print version and complain about that next. If programmers are using this without thought that is on them not the example code.

cesarb8y ago

Raymond Chen wrote some time ago about the variable quality of MS Knowledge Base articles: https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=...

BusinessInsider8y ago

That's pretty disturbing. Though to be fair, the article in question was written a while ago (since it targets .NET 2005), and to be less fair, MS doesn't really review their documentation very well, at all.

duke3608y ago

probably you are too youn, in the past when internet wasn't so ubiquitus, having a MSDN cd documentation was a live saver. the docs that today have serius content directly descend from that days, the res, as other already said, are just boilerplate autogenerated docs., which nobody maintains anymore because simply the technology is too fast. so probably this doc page abaut usage of DES is directly from 1990 or so... and in that days probably was good enough

TheSpecialist8y ago

It does seem useless to make the IV the same as the key. But is there a reason making the IV the same as the key is worse than using 0 as an IV?

Just asking.

norcimo58y ago

To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc

To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz

(foo can be a file or directory)

snakeanus8y ago

This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM instead? And why AES256 instead of Chacha20-Poly1305 or some other modern AEAD?

norcimo58y ago

What are the advantages of GCM over CBC? And whats wrong with AES256?

2 more replies

snakeanus8y ago

I feel disgusted after reading this. I wonder how many people applied the advices given by the original article because they made the bad decision to trust the official documentation by MS.

bartread8y ago

Oh, come on: whatever Microsoft's faults might be they have a very long track record, stretching back decades, of providing overall high quality documentation for developers.

Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.

Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.

setq8y ago

Most of the documentation is boilerplate. There's very little real content now and most of it is filler.

1 more reply

yebyen8y ago

The last time I really had to deal with a bad MSDN article was probably 2003 working on an ASP/VBScript application that used MSXML. I was in my last year of High School, working for a local bank doing things that were certainly above my pay grade, with extremely minimal support, in my glory days.

I remember getting 90% of the way through writing my application in VBScript and finding a piece of documentation about some XSD thing that I really needed to do to complete the tool, and that lots of people were reporting my similar issue, the support reply basically said, "get f'ed," this function works in JScript implementation of MSXML but not in VBScript.

Sorry! Hope you have hundreds of spare hours to learn a new language and port over your entire codebase, because we're not fixing it.

Every time since that I can remember I have ever referred to MSDN, I have found one post with my question, asked in clear terms that I could reach from a google search... posted four years ago, with one or more replies that are almost always very obviously wrong, from MS Certified Partner(TM).

Maybe some of their documentation is great! I have not had the fortune to encounter it.

While many open source projects have great documentation, and many others do not, the difference tends to be that if your Open Source project has bad documentation, or features that just plain don't work, you are free to read the source code and fix it yourself!

alistproducer28y ago

If I could down vote this twice I would. I can't tell you how much I've been on the phone with MS support to try and do basic things with their software but can't because a.) the only existing documentation is wrong or b.) there's no documentation. My company pays MS a lot of money to not be able to do basic things with its software.

nthcolumn8y ago

MSDN really? I never use it. I always end up someplace else. The information is there somewhere, maybe. I don't use other vendors much to know but MSDN sucks for me majorly. Not a microsoft fan generally though so maybe too much pain this past 30 years for an objective view.

wintorez8y ago

I always look at Microsoft in order to learn how not to do anything /s

giancarlostoro8y ago

>It’s a good thing the caesar shift isn’t available in their library or it would probably have ended up in this tutorial.

https://docs.python.org/2/library/codecs.html#python-specifi...

Python does rot13 :)

proaralyst8y ago

But that's in the codecs library, not a cryptography library.

Sean17088y ago

To be fair that's not a tutorial on how to encrypt and decrypt a file, it's a reference on the possible encodings you can use for a string.

j / k navigate · click thread line to collapse

61 comments

GreaterFool8y ago

The author could spend less time bashing the original article and a little bit more explaining how to do things right.

This:

> Suggestion to use the encryption key as the IV

is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.

Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.

After reading the post I haven't learned anything useful other than that the original article was bad.

Bartweiss8y ago

I... sort of have mixed feelings on this.

I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).

jlebar8y ago

Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?

"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."

"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."

Similarly maybe the author didn't explain what "IV" means because their audience understands that term.

"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."

Lagged2Death8y ago

Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?

I wouldn't expect an explanation, I wouldn't say the author is obliged to that kind of effort.

I did sort of expect a link to an explanation, though.

At this writing, other comments in this very HN thread claim there are many intro-level explanations of IVs out there to choose from. They don't link to them either.

Similarly maybe the author didn't explain what "IV" means because their audience understands that term.

1 more reply

recursive8y ago

I was in the audience, and if I ever knew how one should use an IV, I forgot. The article would have been more valuable to me if it gave a summary of what IVs are instead of what they aren't.

2 more replies

Sophira8y ago

Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.

fpgaminer8y ago

> I myself don't really understand IVs

Time to drop some knowledge!

IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.

Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:

    R = ChaCha(Key)
    Ciphertext = Message ^ R

So let's encrypt two different messages:

    R = ChaCha(Key)
    Ciphertext1 = Message1 ^ R
    R = ChaCha(Key)
    Ciphertext2 = Message2 ^ R

    Q = Ciphertext1 ^ Ciphertext2

What does Q end up being? Let's look:

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R ^ Message2 ^ R
    = Message1 ^ Message2

    R = Ciphertext1 ^ Message1

This is where IVs come in:

    R = ChaCha(Key, IV)

IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish:

    R1 = ChaCha(Key, IV1)
    Ciphertext1 = Message1 ^ R1
    R2 = ChaCha(Key, IV2)
    Ciphertext2 = Message2 ^ R2

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R1 ^ Message2 ^ R2

And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.

That's the point of IVs.

> why the IV isn't required to be able to decrypt the file again.

It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.

Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.

But you always need the IV to decrypt.

> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,

BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.

    TempKey = HMAC (IV, Key)

teh_klev8y ago

Thanks for spending the time explaining this.

1 more reply

UncleMeat8y ago

Note that AES uses IVs in CBC mode. It is incorrect to say that AES does not use IVs.

1 more reply

jdcarter8y ago

rakoo8y ago

I know you're not just looking for answers but a pointer to some better documentation, and I can't provide you with those, but:

> why IVs shouldn't be considered secret

The least is considered secret, the least can be leaked and cause problems.

> why the IV isn't required to be able to decrypt the file again

sixothree8y ago

Agreed. Where is the pointer to the correct article to use when encrypting a file in C#.

pacaro8y ago

Note: All my information re: Microsoft is from no later than 2013.

This is indicative of a classic challenge in the industry.

Samples and documentation aren't held to the same standard.

nailer8y ago

Microsoft have already 404d the article: https://support.microsoft.com/en-us/help/307010

casparz8y ago

Luckily we have a snapshot: https://web.archive.org/web/20170327154501/https://support.m...

bartread8y ago

Also now dead - I just get a blank page apart from the header and footer.

3 more replies

nailer8y ago

Here's the original Microsoft article:

https://gist.github.com/mikemaccana/badf6c16f203e05c02b42f93...

(disabled JS in DevTools, caught it from archive.org before JS to wipe it kicked in)

unscaled8y ago

At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.

I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.

jwilk8y ago

The article says that DES "can be brute forced in a single digit number of days by a modern computer".

  2**56 keys / 9 days ≈ 92.7 Gkeys/s

Can modern computers actually compute DES that fast?

danbruc8y ago

This benchmark [1] gives 196.2 GH/s for DES using 8 Nvidia GTX 1080 Ti and Hashcat 3.5. So while your average computer is probably not quite sufficient it is certainly in reach.

[1] https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...

mikeash8y ago

Here's a project that did 1.4G/s on a single GPU five years ago:

https://www.reddit.com/r/crypto/comments/162ufx/research_pro...

Stick multiple modern GPUs in a machine and single digit days seems feasible.

CiPHPerCoder8y ago

Yes: http://www.h-online.com/security/features/A-death-blow-for-P...

natch8y ago

Another version of essentially the same article is still live here:

https://support.microsoft.com/en-us/help/301070/how-to-encry...

d--b8y ago

Yep, all over the place:

https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...

EDIT: ok maybe not "all over the place", but it's been done.

Strategizer8y ago

cesarb8y ago

Raymond Chen wrote some time ago about the variable quality of MS Knowledge Base articles: https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=...

BusinessInsider8y ago

duke3608y ago

TheSpecialist8y ago

It does seem useless to make the IV the same as the key. But is there a reason making the IV the same as the key is worse than using 0 as an IV?

Just asking.

norcimo58y ago

To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc

To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz

(foo can be a file or directory)

snakeanus8y ago

This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM instead? And why AES256 instead of Chacha20-Poly1305 or some other modern AEAD?

norcimo58y ago

What are the advantages of GCM over CBC? And whats wrong with AES256?

2 more replies

snakeanus8y ago

I feel disgusted after reading this. I wonder how many people applied the advices given by the original article because they made the bad decision to trust the official documentation by MS.

bartread8y ago

Oh, come on: whatever Microsoft's faults might be they have a very long track record, stretching back decades, of providing overall high quality documentation for developers.

Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.

Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.

setq8y ago

Most of the documentation is boilerplate. There's very little real content now and most of it is filler.

1 more reply

yebyen8y ago

Sorry! Hope you have hundreds of spare hours to learn a new language and port over your entire codebase, because we're not fixing it.

Maybe some of their documentation is great! I have not had the fortune to encounter it.

alistproducer28y ago

nthcolumn8y ago

wintorez8y ago

I always look at Microsoft in order to learn how not to do anything /s

giancarlostoro8y ago

>It’s a good thing the caesar shift isn’t available in their library or it would probably have ended up in this tutorial.

https://docs.python.org/2/library/codecs.html#python-specifi...

Python does rot13 :)

proaralyst8y ago

But that's in the codecs library, not a cryptography library.

Sean17088y ago

To be fair that's not a tutorial on how to encrypt and decrypt a file, it's a reference on the possible encodings you can use for a string.

j / k navigate · click thread line to collapse