This:
> Suggestion to use the encryption key as the IV
is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.
Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.
After reading the post I haven't learned anything useful other than that the original article was bad.
I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).
But holy shit, the MSDN article is bad. It's so hideously bad that I think there's nontrivial social value in bashing it extensively to discourage people from writing docs like this without getting them sanity-checked.
In short, I think this article is largely useless to people reading guides and trying to avoid the pitfalls of the original source, but is aimed at people writing crypto guides who have no business doing so.
"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."
"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."
Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."
I wouldn't expect an explanation, I wouldn't say the author is obliged to that kind of effort.
I did sort of expect a link to an explanation, though.
At this writing, other comments in this very HN thread claim there are many intro-level explanations of IVs out there to choose from. They don't link to them either.
Hypertext is what makes the web special, you know? The article would be more useful with a link. Think of this: even this discussion, here on HN, would have been more fruitful if the author had included a link to some explanation.
Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
I have actually shipped a couple of products that made use of encryption packages, and I've never heard of an IV. Maybe the encryption advice I followed was terrible; maybe the instructions were terrible; maybe the packages were terrible. Maybe I'm an idiot suited only to the digging of ditches.
Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.
Time to drop some knowledge!
IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.
Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box. You input a key and an IV and out you get a really, really long stream of uniformly random bytes. (This is a simplification but sufficient here). ChaCha20 works in such a way that having any or all of the output stream doesn't help you figure out what the inputs were. It's irreversible. ChaCha20 is also deterministic; the same input will give the same output.
You can then use the output of random bytes to encrypt a message by XORing with your plaintext. To later decrypt, you feed the same key and IV, get the same stream, XOR the ciphertext with it, and by the property of XOR you'll get the plaintext.
Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:
R = ChaCha(Key)
Ciphertext = Message ^ R
So let's encrypt two different messages: R = ChaCha(Key)
Ciphertext1 = Message1 ^ R
R = ChaCha(Key)
Ciphertext2 = Message2 ^ R
Notice how R is the same for both messages? Again, ChaCha is deterministic; the output is the same for the same inputs. Since the key is the same, R is the same. Now an attacker, knowing this, can do this Q = Ciphertext1 ^ Ciphertext2
What does Q end up being? Let's look: Ciphertext1 ^ Ciphertext2
= Message1 ^ R ^ Message2 ^ R
= Message1 ^ Message2
So Q ends up being equal to the XOR of the two messages. That's really bad. The xor of two messages might be enough to tell the attacker what the messages are, especially if the messages are predictable (like english text). But maybe that's not scary enough. Well there's another attack. What if you're encrypting a data format with a header. Headers often have the same data in the same places. So the attacker knows part of the message. Uh oh... R = Ciphertext1 ^ Message1
If the attacker knows the message (or any parts of it) they can recover the R of those parts. And now, since your key is always the same and your R is always the same, all the other messages you encrypt will have those bytes exposed.This is where IVs come in:
R = ChaCha(Key, IV)
IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish: R1 = ChaCha(Key, IV1)
Ciphertext1 = Message1 ^ R1
R2 = ChaCha(Key, IV2)
Ciphertext2 = Message2 ^ R2
Ciphertext1 ^ Ciphertext2
= Message1 ^ R1 ^ Message2 ^ R2
And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.That's the point of IVs.
> why the IV isn't required to be able to decrypt the file again.
It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.
Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.
But you always need the IV to decrypt.
> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,
Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes out. There's no way to reverse that without the key (and IV). Since the attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't help.
Another way to think about it is that, instead of accepting a 256-bit key and a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key doesn't help break a cipher. The cipher is still 256-bits strong. So you can share the IV without affecting security.
BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.
Hope that helps. This is only one way that IVs are used. In ChaCha20 it's called a nonce, because ChaCha20 is geared towards usage on network protocols where the above trick of using packet number is applied. For block ciphers there are various cipher modes that get used, and most of them need an IV. The purpose is always the same; to make this "session" of encryption unique.
There's another way to use IVs, and I think they re-affirm the concept of what an IV actually is. Let's say you have a cipher that only accepts a key! No IV (like AES). You still want to make your encryption sessions unique. A way to do that is this:
TempKey = HMAC (IV, Key)
And then use TempKey. HMAC is a form of hash. In this case it lets us combine a Key and IV in an irreversible way, yielding a new key. TempKey will be the right size key for the cipher (say, 256-bits). What this is doing is giving us a unique key for every encryption session. And that's the heart of IVs. And in many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV and using the output hash to generate a long stream of random data that can't be reversed back to the key+IV.(and in case you're wonder, yes, you can use a cryptographically secure hash function alone to build a stream cipher like ChaCha. It'll just be _really_ _really_ slow, because hash functions are really, really slow compared to ChaCha.)
> why IVs shouldn't be considered secret
The least is considered secret, the least can be leaked and cause problems.
> why the IV isn't required to be able to decrypt the file again
The IV is required to decrypt the file again. In the linked document's design the IV is actually the encryption key, which means it is known by the receiver, which is why it's not included. But that is just a special case that should never be reproduced.
This is indicative of a classic challenge in the industry.
To ship code that uses crypto at Microsoft you have to go through an auditing process. To ship code that uses novel crypto, or works directly with crypto primitives, you have to be reviewed by a specialist crypto review board — that contains security and crypto people from across the company, names that you might know (e.g. Niels Ferguson was there last time I needed a review. Hi Niels!)
Samples and documentation aren't held to the same standard.
https://gist.github.com/mikemaccana/badf6c16f203e05c02b42f93...
(disabled JS in DevTools, caught it from archive.org before JS to wipe it kicked in)
At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.
I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.
I wanted to call Microsoft sneaky for pulling out this article, but considering basically every top-ranked "how do I encrypt with AES" question on StackOverflow is full of bad advice, I'm glad they at least did something.
2**56 keys / 9 days ≈ 92.7 Gkeys/s
Can modern computers actually compute DES that fast?[1] https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...
https://www.reddit.com/r/crypto/comments/162ufx/research_pro...
Stick multiple modern GPUs in a machine and single digit days seems feasible.
https://support.microsoft.com/en-us/help/301070/how-to-encry...
https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...
EDIT: ok maybe not "all over the place", but it's been done.
Just asking.
To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz
(foo can be a file or directory)
Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.
Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.
I remember getting 90% of the way through writing my application in VBScript and finding a piece of documentation about some XSD thing that I really needed to do to complete the tool, and that lots of people were reporting my similar issue, the support reply basically said, "get f'ed," this function works in JScript implementation of MSXML but not in VBScript.
Sorry! Hope you have hundreds of spare hours to learn a new language and port over your entire codebase, because we're not fixing it.
Every time since that I can remember I have ever referred to MSDN, I have found one post with my question, asked in clear terms that I could reach from a google search... posted four years ago, with one or more replies that are almost always very obviously wrong, from MS Certified Partner(TM).
Maybe some of their documentation is great! I have not had the fortune to encounter it.
While many open source projects have great documentation, and many others do not, the difference tends to be that if your Open Source project has bad documentation, or features that just plain don't work, you are free to read the source code and fix it yourself!
https://docs.python.org/2/library/codecs.html#python-specifi...
Python does rot13 :)