Show HN: SHA-256 explained step-by-step visually (opens in new tab)

(sha256algorithm.com)

1241 pointsmanceraio4y ago137 comments

137 comments

Oh this is great. When we taught SHA-256 last semester, we linked to this YouTube video: https://youtu.be/f9EbD6iY9zI. Next time we do it, we'll probably link to both. Having several different ways to visualize the same thing is very helpful, and I like that this one moves quickly.

A couple of details missing from this visualization are how you pad a message to be a multiple of the block size, and how you chain blocks together to form a longer message. In the pseudocode at https://en.wikipedia.org/wiki/SHA-2#Pseudocode, that's the "Pre-processing (Padding)" part and the "for each chunk" loop just below it. I get why you'd want to leave those things out, since they're not really the interesting part, and the screen is already pretty packed as it is.

If anyone's feeling curious about implementing this yourself, take a look at these project notes: https://github.com/oconnor663/applied_crypto_2021_fall/tree/.... At some point I'll clean that up for public consumption, but for now just ignore the parts about grades and cheating :)

manceraioOP4y ago

Thanks for the feedback and I am glad you'll use it for teaching (which was the main goal of this project)! The padding part it's briefly explained on the "dynamic" notes on the left column, but yes, can be improved. Typing on the input gives you some sense of what is doing on the background, specially if it jumps to two blocks.

The "for each chunk" is also implemented (which was one of the most difficult parts to synchronize with the UI), but I agree too, I should come up with some way to represent it better. Thanks again :)

fragmede4y ago

Minor nit: input could also take hex.

miki7254y ago

Was about to reply with the link to the project. If anyone is curious about sha2 highly highly recommend to go thorough the project. Jack did an amazing job explaining everything step by step. Writing the code really helps to understand all the concepts much better.

Drdrdrq4y ago

Thank you for the link to your repo, this it the first time I heard about length extension atracks. TIL, appreciate it! This SO answer explains them nicely, if anyone is curious: https://crypto.stackexchange.com/questions/3978/understandin...

Dowwie4y ago

What course did you teach?! Have you got a syllabus?

oconnor6634y ago

Applied Cryptography (CS-GY 6903) at NYU Tandon. You can find all our programming problem sets in the same repo: https://github.com/oconnor663/applied_crypto_2021_fall.

picture4y ago

So, how do people come up with these things? I assume every aspect of the design is carefully considered to defend it against various attacks. For example, why "right rotate 7 XOR right rotate 18 XOR right shift 3" and not "right rotate 2 XOR right rotate 3 XOR right shift 4"?

floodyberry-4y ago

Generally, come up with a set of operations that you want to use that meet your design criteria (fast in hardware, fast in software, fast in SIMD, whatever), then brute-force the instruction order and/or arbitrary constants (rotations) to minimize known attacks and maximize diffusion

e.g. the Chacha20 design document explains the choices made in a fairly high level and easy to understand way: https://cr.yp.to/chacha/chacha-20080128.pdf

dr_zoidberg4y ago

> (fast in hardware, fast in software, fast in SIMD, whatever)

"Slow to compute in GPUs" has been another design criterion for a while. It helps against brute force attacks. Of course, you might want your hash to be fast on GPUs too, if you're trying to accelerate something, bu cryptographic hashes usually aim for attack resistance.

2 more replies

edflsafoiewq4y ago

>why not "right rotate 2 XOR right rotate 3 XOR right shift 4"

In this case, you generally want shifts that are far apart so bits that far apart have a chance to influence each other. The other operations have only local effects on bits (and, or, xor affect only the same bit position; add affects the same bit position and perhaps a few that follow via carries). Only the shifts/rotates give a chance for "non-local" effects.

metalrain4y ago

Visualized like this it feels like security through obscurity, but there must be reason for this.

I did wonder why initialization is like:

1. Initialize hash value h0 to h7: first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19).

2. Initialize array of K constants: first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311

manceraioOP4y ago

https://news.ycombinator.com/item?id=23166713

It seems like a "nothing up my sleeve" way to initialize the variables.

orangepenguin4y ago

It's not security through obscurity. In fact, it's the very opposite. You can see the process exactly. The reason this is secure is because the process itself doesn't work backwards. You can create a hash using this algorithm, but you'll never reverse that hash back into the original text.

1 more reply

bsedlm4y ago

from what I've seen, there's a lot of "obscurity" to this; there are many seemingly arbitrary choices all over the place.

In the end most encryption algorithms boil down to doing 'random' (arbitrary, hard to justify why) things to data and then undoing them exactly in order to decrypt.

the math is all incredibly abstract but not all that complex, the high level of abstraction does make it quite difficult to grasp.

What's worse is that I fear there are incentives (mostly political/security interests) to keep the field small and to keep many people far away from this very practical use for all these beautiful, elegant, simple (but extremely abstract) mathematics (refering to the entire cryptography field).

7 more replies

tptacek4y ago

It's helpful to understand that the algorithm wasn't designed the way it's presented in this illustration, and consists of somewhat discrete components.

It's an iterated design, like a block cipher, meaning that it's built around a simple round function that's repeated a bunch of times on each input, rather than a super-complicated function run once.

It belongs to a large, important family of cryptographic hashes called Merkle-Damgård, which pads messages to a fixed block size, chunks them into blocks, feeds and feeds each block to an iterated "compression function" (the heart of the hash) that takes a block and the chained n-1th compression value and spits out the nth compression value. So a lot of the mechanism in this illustration is just the vanilla mechanics of an MD hash.

Iterated cryptographic designs generally have "schedule" or "expansion" functions that take the (small) input to the iterated function and "expand" it so that not all the iterations are working on exactly the same data; in the same way, a block cipher will have a "key schedule" that expands your 128-bit key into distinct "round keys" for each round. Message and key schedules, to me, feel like tiny little ciphers embedded in the larger cipher, and they're yet more of the complexity you're looking at, but can be studied independently (the SHA2 message schedule is apparently a big part of what makes it difficult to attack).

When you see magic numbers in a block cryptography design like this, two relatively safe bets you can make is that they're either "nothing up my sleeve" numbers chosen from e.g. mathematical constants, just for the sake of avoiding arguments, or that they're the product of statistical testing to maximize things like diffusion or minimize things like linear or differential characteristics.

With all block cipher cryptography one goal you always have is introducing nonlinearity; you can accidentally design a simple iterated function that is actually linear, and that cipher will be solvable simply with Gaussian elimination. People have shown me CTF levels that, for instance, linearized the AES round function. So when you see particular operations chained together in a cipher/hash design, keep in mind that they're balancing goals of (1) ensuring nonlinearity, (2) maximizing diffusion, the rate at which a single-bit change in the message totally scrambles the output in the shortest number of rounds, (3) optimizing metrics to avoid differential and linear cryptanalysis, (4) maximizing performance on target hardware.

As was suggested downthread: a good way to come at this stuff is to start with MD4, and then read about MD5 (vs. MD4), then SHA1, and then take a closer look at SHA2. This stuff didn't get figured out all at once, and all these hashes are related. You might find MD4 easier to get your head around.

For the linear and differential cryptanalysis stuff, which is surprisingly (given its age) important in cipher/hash design, a great starting point is the Heys tutorial, which is built around worked examples of both attacks:

https://ioactive.com/wp-content/uploads/2015/07/ldc_tutorial...

userbinator4y ago

You might find MD4 easier to get your head around.

You can even generate collisions for it by hand calculation:

https://www.iacr.org/archive/fse2007/45930331/45930331.pdf

sedatk4y ago

What a great question and what an excellent answer. Thank you!

dragontamer4y ago

1. You want an invertible operation. Invertible operations do _NOT_ lose information, and therefore have the maximum amount of entropy per step.

2. You want the invertible operation to pass a statistical test called "differential cryptography analysis". Over multiple rounds, it must be difficult / impossible to "predict" how 1-bit of difference changes the state. (ie: 1-bit of difference should lead to 50.0000000% change in every output bit. If its 50.1% or 49.9% change of bits, you fail because someone running cryptoanalysis will pick that up).

----------

#1 is somewhat easy. It turns out that Data XOR (Rotate-left Data) XOR (Rotate-right Data) is all you need to make an invertible operation. 5-operations, no more (any more is redundant and therefore unnecessary use of compute power), no less (any less is not invertible and therefore loses information / entropy each step).

#2 is complicated: you gotta understand differential cryptoanalysis and run a bunch of tests.

-----------

The discovery that (Data) XOR (Rotate-left Data) XOR (Rotate-right Data) was invertible became extremely popular in the 2010s through 2020s, and has become the cornerstone of XOR / Rotate / Add ciphers (aka: chacha), pseudorandom generators, and hash functions.

I don't know quite when it was originally discovered, but Jenkins was blogging about the importance of invertibility and playing with invertible xor/rotate stuff in (non-crypto) hash functions way back in the 90s.

I know Knuth's "Art of Computer Programming" book 2, Seminumerical Algorithms, discusses the importance of invertibility in random number generators, which is closely related to hashing / cryptographic procedures. So this "understanding" has been around for decades, but has only "become popular" in the SHA256 / Blake3 / pcg-random era.

------

In the 90s, ciphers were mostly using SBoxes for this step ("confusion", to grossly change a value to another value without losing information). But today, modern CPUs are much faster at add/xor/bitshift operations than reading/writing to memory. So SBoxes are no longer a high-speed methodology / primitive for these kinds of operations.

It makes more sense to change our algorithms to use a new "invertible confusion operation" (aka: what SBoxes did before, and what ((Data) XOR (Rotate-left Data) XOR (Rotate-right Data)) does today).

--------

EDIT: Remember: the modern crypto-primitive is just a combination of "confusion" principles and "diffusion" principles.

1. Confusion "lossless transforms" numbers into other numbers. (A "set permutation" of sorts)

2. Diffusion "moves" bits from one number into other numbers.

Iterating over confusion + diffusion many times (IIRC, SHA256 is 64 rounds) is all you need to make a cryptographic cipher. If you "just" need a pseudo-random number generator or hash function, maybe 5 to 10 rounds is all you need.

y424y ago

Not sure if I understand you right, but sha256 is not invertible, there are a couple of steps where you actually just 'cut off' information that is lost afterwards.

1 more reply

manceraioOP4y ago

Do you recommend any resources to learn about this?

1 more reply

kzrdude4y ago

MD5 and SHA-1 are predecessors, and they are simpler, so certainly by starting from the understanding of the simpler ones. Just to say it wasn't all conceived in one go.

less_less4y ago

You may not get an exact answer for SHA-1 or SHA-2, because they were designed by NSA, and I'm not sure whether a complete design doc has been published. SHA-3, by contrast, was designed in an open competition, and is completely different from SHA-1 and SHA-2. But here's a rough go at it.

First of all, the overall form. SHA-2 is a Merkle-Damgård construction, meaning that it has a "compression function" which takes an algorithm state and a message block, and outputs a new algorithm state. The final block gets some padding, so that eg "x" and "x\000" don't have the same hash, and then the final algorithm state is the output. (This is a known weakness: from H(x) you can calculate H(x concat padding(x) concat other stuff). SHA-3 doesn't have this weakness. This weakness doesn't itself break collision resistance, but it can be dangerous for other uses of SHA2 if you aren't careful.)

The compression function is a Davies-Meyer construction, which means that given an input state, it:

* Copies the state.

* Encrypts the state with a "cipher" of sorts (the "cipher" extracted from SHA-2 is known as SHACAL-2), using the message as a "key".

* Adds the copied state to the encrypted state to produce a new state.

Davies and Meyer proved that, for an ideal cipher, this would make a collision-resistant hash function. Note that because the update is a "cipher", it must be internally invertible. This is a good idea anyway, because any non-invertibility is by definition a collision, and you want to push that out as far as possible to make collisions hard to find. The step that isn't invertible is the final "add the copied state to the encrypted state".

Within that design, the SHA-2 cipher is based on a shift register. Basic shift registers have been used in crypto for many decades; traditionally you get almost all of the bits in the register by shifting its current state, and at least one (typically only the one shifting in) by computing a function of the other bits. To extend this to general-purpose computers, you can do the same thing for words (32-bit words for SHA-224 and -256 and 64-bit words for SHA-384 -512). So you have A shifting to B shifting to C etc, with a final value that's computed and shifted back into A. You can see there's a slight twist, which is that instead of E=D you get E=D+stuff. I expect that this mitigates issues where the same stuff is used in the same way throughout the whole round.

The "message schedule" is likewise based on a cipher's key schedule. The idea in a cipher is that each round needs a different version of the key to avoid "slide" attacks, so you mix the key during the round as well as mixing the cipher state. I'm not sure what the corresponding theory is for a hash function, but it's also pretty clear that you want a single byte of the message to be used in many places throughout the operation, so mixing the message is a must.

For the operations, the biggest constraint is that they need to be as nonlinear as cheaply, because doing a bunch of linear functions in a row still gets you a linear function, and those are easy to break. They need to be nonlinear both as bits and as integers, so a popular design is to mix addition (linear over ints) with xor (linear over bits) and rotations (linear over both, but you need it to propagate changes from the top of a number to the bottom). That way the whole operation won't be linear over bits, nor over the integers, but all three operations are cheap on PCs. This is called an "ARX" (Add-Rotate-Xor) design.

Picking the exact operations and rotation constants is something of a black art, and while you can trace descent from MD5-SHA0-SHA1-SHA2, the exact design criteria for the SHAs might not even be public. But you can see how some of the decisions may have been made. Consider the rotations in sigma0 and sigma1. The rotation constants in these designs are typically chosen so that if you look at any slice of the output, it will have many different input bits affecting it, which in turn means that small changes diffuse rapidly thoughout the design. This is not done perfectly in SHA2, in that the same bits are used for the Maj and Ch in successive iterations (i.e. you compute Maj(A,B,C) and then Maj(newA,A,B) next time since they shifted over), so you'd get correlated outputs of those steps from round to round, but apparently it's good enough.

The setup constants are chosen as the square and cube roots of prime numbers, as "nothing-up-my-sleeve" numbers. The idea is that if the initialization were random numbers with no rationale given, then it would be a possible place that NSA could have hidden a backdoor. But it would be difficult to create a backdoor where the initialization is the square roots of the primes.

dragontamer4y ago

About linearity, that just means that there's no "multiply" or "exponent" function equivalent to our primitives.

For example: if "add Foo" were our primitive, then 64-rounds of "add-Foo" could be broken by simply "data + 64 * Foo". IE: We found a "trivial shortcut" and someone else can break our cipher way faster than we can use it.

Addition is linear over numbers, because of multiplication.

XOR is linear over numbers: because XOR undoes itself. (5 x (XOR-Foo) is equivalent to 1x (XOR-Foo)).

It turns out that combining addition and XOR together results in a non-linear function over numbers and non-linear function over bits.

We can't "shortcut numbers", because the XOR-messes up our "number math". We can't "shortcut bits" because Add has this really annoying "carry" that basically has a 50% chance of shuffling bits around.

As such, we "expect" that combinations of Addition, XOR, and Shifts are non-linear. I'm not sure if it's been mathematically proven however, but no one has been able to show a "shortcut" thus far.

taviso4y ago

That's really cool. I made a terrible one for SHA1 years ago, yours is 1000x better.

https://lock.cmpxchg8b.com/sha1/visualize.html

I read a paper at the time where someone described a tool they made to find a near-collision, they explained they were just flipping bits and visually observing the affects. That sounded kinda fun, but they didn't release it, so I tried to replicate it from their description!

dragontamer4y ago

Your tool seems to understand the gist of differential cryptography better though.

You can track a 1-bit change or 3-bit change to "M" and see how it propagates through the SHA256 rounds in your tool.

----------

So your tool is probably better at understanding the underlying design of SHA2. We know that SHA2 was created well into the era of differential-analysis for example, so the designers would have inevitably done analysis similar to how your tool works.

mabbo4y ago

Before watching this: "Why can't cryptographers just figure out some tricks to crack these hash algorithms?"

After watching this: "How can any cryptographer EVER figured out any trick to crack these hash algorithms?!"

mynameismon4y ago

> "How can any cryptographer EVER figured out any trick to crack these hash algorithms?!"

More so, "How can any cryptographer EVER figure out any trick to come up with these hash algorithms?!" Seriously, they are incredibly impressive mathematical algorithms. Even coming up with an algorithm that is able to show the avalanche effect is mind boggling. To make sure that the algorithm is not biased to a set of samples AND shows the avalanche effect is tremendously mind blowing.

sammyo4y ago

They can't. But there are certainly unsavory actors that are able to trick a certain percentage of the unsuspecting to just give them a private key.

selestify4y ago

They actually can, in some situations: https://security.googleblog.com/2017/02/announcing-first-sha...

userbinator4y ago

This reminds me that I've always wanted to make a huge interactive combinatorial circuit that computes SHA-256 and shows all its internal state, then put it on a site with the claim that anyone who can make its output match a certain clearly-constructed value (e.g. 0123456...ABCD...) will win a prize. No mentions of hash algorithms or other such phrasing to deter anyone. I wonder how many people would try such a "logic puzzle", how much time they'd spend on it, and if we might even get the first successful preimage attack from that.

jjeaff4y ago

I think making a problem more accessible like that is the fastest path to a solution.

It reminds me of the Stephen J Gould quote:

"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops."

manceraioOP4y ago

I actually started this because my first idea was: can I implement SHA-256 with just TTL logic gates? which should be possible, but it would take months to do.

for puzzles try 12037465 for some coffee :)

nayuki4y ago

My way of explaining step by step visually is by implementing in Excel: AES https://www.nayuki.io/page/aes-cipher-internals-in-excel ; DES https://www.nayuki.io/page/des-cipher-internals-in-excel .

Also relevant: https://www.righto.com/2014/09/mining-bitcoin-with-pencil-an...

y424y ago

There also exists a written description showing the process in Python, step by step, which I consider more helpful, because you do not need to stop and play the video.

https://nickyreinert.medium.com/wie-funktioniert-der-sha256-...

edwnj4y ago

not everybody speaks barbarian sir

y424y ago

Python that bad?

2 more replies

p1mrx4y ago

Can it be proven whether values of m exist such that SHA256(m) == 0?

If I were omnipotent and wanted people to believe in me, I would write a book that hashes to 0, so that anyone could verify its authenticity.

userbinator4y ago

One way to prove it might be to actually find the "null hash"; turn it into a challenge/puzzle, don't mention hashing or crypto or that it's really hard, and let all the bored people of the world play with it. Perhaps someone might notice something that all the highly-trained cryptographers have missed all along: https://news.ycombinator.com/item?id=30245419

orangepenguin4y ago

Maybe I'm being incredibly naive, but it seems like this would be trivial. Can you just start with the output hash and then essentially run the algorithms backwards? Obviously the resulting "input" would be random-ish garbage, but it seems like if all you care about is the output, you can pretty much just "pick" any data for the last step that produces the output. Then do likewise for the step prior, and so on.

arcastroe4y ago

As a comment above stated, part of the "input" is the initialized values:

> Initialize hash value h0 to h7: first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19).

My guess is h0 to h7 change throughout the algorithm. If you perform each step in "reverse" as you suggest, "picking" any input at each step that produces the required output for that step, then you may not arrive to the correct initial state with the square roots of the first 8 primes.

You'll arrive at "random-ish garbage".

1 more reply

y424y ago

Sure, that would be a pretty high difficult factor, but its possible.

p1mrx4y ago

But is it provably possible for SHA-256?

An n-bit cryptographic hash function should ideally cover every n-bit output value, given slightly more than n bits of input, but I don't know whether this has been proven for any real-world functions.

2 more replies

picture4y ago

You could in theory also get that with a lot of computation of course

p1mrx4y ago

It should be straightforward to design a hash function that can't be preimage attacked using all the computing power in the universe.

Maybe an omnipotent cryptographer would find flaws in SHA-256, but then they could design a better function and include it in the book.

1 more reply

ypcx4y ago

Similar project which visualizes SHA-256 into terminal: https://github.com/in3rsha/sha256-animation

spdebbarma4y ago

This comes to my attention at a really convenient time. As a teenager, I initially got interested in Computer Science due to cryptography. Over a decade later, I've gotten into the subject for the first time since then.

For the last few days, I've been writing my own encryption for fun even though it's 100% not secure enough or powerful. My belief is that even though it's not super useful, the experience of attempting to write one is teaching me a lot more than I would have by simply studying it.

zahllos4y ago

Rather than write crypto, what I'd actually recommend is to break it. Schneier put together a block cipher cryptanalysis course a long time ago and while I don't usually recommend his crypto books these days, the course is good: https://www.schneier.com/academic/archives/2000/01/self-stud... (in this case, his crypto book might actually be useful, because it documents some of these out of date ciphers. There's (was?) a mistake in the DES code though iirc).

It is essentially a tour of all the early cryptanalysis literature, complete with suggestions of ciphers to target (e.g. FEAL). This will give you a sense of how the attacks work. Many of the ciphers are old, but I wouldn't let that put you off.

The study technique for this would be a) implement each cipher with toggles to control rounds and any other features, then implement attacks. Most of the papers should be open access by now since the 'course' was written in the year 2000. You could also 'catch up' on the constructions and attacks that have come out since.

I would caveat this with: what I am advising is purely for potential interest. Bear in mind there is little need to implement new block ciphers these days (what I'm saying is: this is a very specialized skill and most people won't find jobs in it).

brk4y ago

How long before we see this website as the source for some "hacker sequence" in a movie where a person wearing a black hoodie states they are "... working on cracking their SHA-256 encryption, should only take a sec."

can16358p4y ago

If such a movie even mentions SHA-256 I think that's above average on its own.

breakingcups4y ago

That depends..

"Just a second, I need to backdoor the SHA256 rootkit to penetrate the directory. Shit, they have an X62 firewall. Luckily I brought my pentester."

2 more replies

reincarnate0x144y ago

Syntax-highlighted ALGOL-y code blocks seem to be back in style if they're not going with the constant "toss that screen up to a hologram" bit.

Sometime there will be a nice interview of such on the design that goes into that, not necessarily for "hacker sequences" but general imaginary computer interfaces like https://www.hudsandguis.com/home/2021/theexpanse .

DJPocari4y ago

This is fantastic. I once implemented SHA-256 in Google Sheets to visualize it, but it had horrible performance compared to this. This is the best visualization I've seen yet.

westurner4y ago

SHA2: https://en.wikipedia.org/wiki/SHA-2

https://rosettacode.org/wiki/SHA-256

Hashcat's GPU-optimized OpenCL implementation: https://github.com/hashcat/hashcat/blob/master/OpenCL/inc_ha...

Bitcoin's CPU-optimized sha256.cpp, sha256_avx2.cpp, sha256_sse4.cpp, sha256_sse41.cpp: https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sh...

https://github.com/topics/sha256 https://github.com/topics/sha-256

Cryptographic_hash_function#Cryptographic_hash_algorithms: https://en.wikipedia.org/wiki/Cryptographic_hash_function#Cr...

Merkle–Damgård construction: https://en.m.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_...

(... https://rosettacode.org/wiki/SHA-256_Merkle_tree ... Merkleized IAVL+ tree that is balanced with rotations in order to optimize lookup,: https://github.com/cosmos/iavl

Self-balancing binary search tree: https://en.wikipedia.org/wiki/Self-balancing_binary_search_t... )

daenz4y ago

You should be very satisfied with how well you conveyed the algorithm. Well done. What was your approach to arriving at this result?

manceraioOP4y ago

Thanks :) I first implemented sha256 in js to understand its inner workings. Then I started displaying its variables with react and adding this stepped mechanism. Finally, I added some notes on the left to add some context of what is going on.

daenz4y ago

Very nice. Have you built any other algorithm visualizations? I have a very strong interest in how algorithms are visualized so that they are more easily understood.

1 more reply

chris_l4y ago

Someone should make a collection of the various visualization web projects that crop up here from time to time.

jonathanyc4y ago

I love single-purpose websites like this that put a potentially complex implementation behind an elegantly simple interface. This website’s design and styling are pretty too :) Another useful one is https://www.h-schmidt.net/FloatConverter/IEEE754.html . I’d say even https://godbolt.org/ counts!

bmitc4y ago

Does anyone have any good references, preferably a book but a good detailed website is fine, on cryptography, hashing, public/private keys, tokens, encryption, etc. as it relates to a software engineer? I don't necessarily want to know all the nitty gritty details of how these things are implemented. Rather, I think I would prefer simply understanding them and how to use them, piece them together, etc. to build something out of them.

I just have very little knowledge in this area. I'm going through a how to build a blockchain book right now, and I find myself struggling a little bit where I'm just calling some library functions but not necessarily knowing how to compose things properly.

manceraioOP4y ago

For that I like this one: https://cryptobook.nakov.com/

bmitc4y ago

Thanks!

lanecwagner4y ago

I wrote this one: https://app.qvault.io/course/6321ddbf-49eb-4748-9737-6bc12e8...

it's a crypto course where you write the solutions in Go. you might enjoy it :)

tptacek4y ago

JP Aumasson is one of the authors of the BLAKE hashes and wrote "Serious Cryptography":

https://www.amazon.com/Serious-Cryptography-Practical-Introd...

bmitc4y ago

Thanks!

anonymousDan4y ago

I have an odd request regarding e.g. SHA-3. Can anyone tell me if it is implemented in a way that is in a sense 'one-pass' over its input, i.e. each byte of its input in memory is accessed only once, after which all of the algorithm state is held in registers and the original input is never accessed again? My scenario is one where I'm concerned about TOCTOU-like attacks on the memory where the input is stored, but I don't want to pay the overhead of first copying the whole input to a 'safe' memory location, e.g. imagine I have kernel code wanting to compute a hash over data stored in userspace.

oib4y ago

Yes, sha3 reads every input byte only once. It does hold a pretty large internal state that doesn't fit in only registers.

a-dub4y ago

this is funny. when i first learned the algorithm, i found some matlab code that computes it with bit vectors. i added support for displaying them as an image and used the movie feature to generate step by step movies to build intuition.

nice to see someone build something polished that visualizes it in the same way. once you look at the mechanics for each round of the compression function and see the bits get swirled around for yourself, it starts to make intuitive sense.

the other big intuitions are of course, the trapdoor nature of add mod 2^32 (which is implicit in unsigned integer overflow on many machines) and the fact that some operations (like xor) operate in galois field 2, while others (like addition) operate in galois field 32 and the repeated stacking of the operations in different fields gives the function it's nonlinear trapdoor property.

i remember reading a pretty good paper on the arx (add, rotate, xor) family of ciphers back in the day (sort of in the vein of, is that all you need?)...

Darkphibre4y ago

Man, this is amazing. I had to hand-unroll bit packing in a binary encoding scheme we used in a game. Rare enough that making a tool wasn't worth it, but damn I love your visualizations! Doing something like that would have helped others understand how I was "seeing the matrix."

recursive4y ago

On the third step(?) of the second step, it says "Copy 2nd chunk into 1st 16 words", but it's accompanied by a visualization of copying the 1st chunk into the 1st 16 words. Am I just totally misunderstanding something?

seumars4y ago

Fantastic presentation! The utility functions from the source code are just as useful.

ansible4y ago

Is there a library or application that can take an annotated algorithm, and then generate a website like this one? That would be great for beginning CS and the sorting algorithms and other basic data structures too.

abrookewood4y ago

Looks fantastic, but the only thing missing is why each step is done.

fthtls4y ago

great visualization. i've also checked the source code and utility functions. they are very well defined and useful too.

i've coded a sha256 decrypter recently which uses dictionary attack and brute force. I read lots of articles about sha256 while coding this tool. there were still some missing parts on my mind, but your project clarified all.

btw, the decrypter i coded -> https://10015.io/tools/sha256-encrypt-decrypt

based24y ago

https://en.wikipedia.org/wiki/Secure_Hash_Algorithms

anandsuresh4y ago

Pretty cool. Have been looking for something like this for a while. Thanks for building it.

Just sent you a PR for some typos I found while running through an example.

M4tze4y ago

Great visualization. Might become my new screensaver.

colejohnson664y ago

One could even feed the hash back into itself to get a different result each time

em3rgent0rdr4y ago

how long would the cycle last before it starts repeating?

2 more replies

stevofolife4y ago

Well made! Can you share how you made the website?

sylware4y ago

Can we have a video of this on youtube/dailymotion/vimeo/etc which we can download with yt-dlp?

haunter4y ago

You can screen record the full screen browser window with OBS (with the mouse disabled too)

reincarnate0x144y ago

That's super cool. Visualized or illustrated algorithms have always looked so magical, to me at least.

nwatab4y ago

Looks bautiful. I understand it is really complicated (even I know calculation of SHA256)

dicroce4y ago

Please do this for b+trees. :)

hombre_fatal4y ago

Beautiful color palette.

berta4y ago

This is awesome!

iqanq4y ago

Two of the buttons at the top have no "title" attribute and therefore no tooltip.

jerpint4y ago

Very satisfying!

j / k navigate · click thread line to collapse

137 comments

oconnor6634y ago

manceraioOP4y ago

fragmede4y ago

Minor nit: input could also take hex.

miki7254y ago

Drdrdrq4y ago

Dowwie4y ago

What course did you teach?! Have you got a syllabus?

oconnor6634y ago

Applied Cryptography (CS-GY 6903) at NYU Tandon. You can find all our programming problem sets in the same repo: https://github.com/oconnor663/applied_crypto_2021_fall.

picture4y ago

floodyberry-4y ago

e.g. the Chacha20 design document explains the choices made in a fairly high level and easy to understand way: https://cr.yp.to/chacha/chacha-20080128.pdf

dr_zoidberg4y ago

> (fast in hardware, fast in software, fast in SIMD, whatever)

2 more replies

edflsafoiewq4y ago

>why not "right rotate 2 XOR right rotate 3 XOR right shift 4"

metalrain4y ago

Visualized like this it feels like security through obscurity, but there must be reason for this.

I did wonder why initialization is like:

1. Initialize hash value h0 to h7: first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19).

2. Initialize array of K constants: first 32 bits of the fractional parts of the cube roots of the first 64 primes 2..311

manceraioOP4y ago

https://news.ycombinator.com/item?id=23166713

It seems like a "nothing up my sleeve" way to initialize the variables.

orangepenguin4y ago

1 more reply

bsedlm4y ago

from what I've seen, there's a lot of "obscurity" to this; there are many seemingly arbitrary choices all over the place.

In the end most encryption algorithms boil down to doing 'random' (arbitrary, hard to justify why) things to data and then undoing them exactly in order to decrypt.

the math is all incredibly abstract but not all that complex, the high level of abstraction does make it quite difficult to grasp.

7 more replies

tptacek4y ago

It's helpful to understand that the algorithm wasn't designed the way it's presented in this illustration, and consists of somewhat discrete components.

It's an iterated design, like a block cipher, meaning that it's built around a simple round function that's repeated a bunch of times on each input, rather than a super-complicated function run once.

https://ioactive.com/wp-content/uploads/2015/07/ldc_tutorial...

userbinator4y ago

You might find MD4 easier to get your head around.

You can even generate collisions for it by hand calculation:

https://www.iacr.org/archive/fse2007/45930331/45930331.pdf

sedatk4y ago

What a great question and what an excellent answer. Thank you!

dragontamer4y ago

1. You want an invertible operation. Invertible operations do _NOT_ lose information, and therefore have the maximum amount of entropy per step.

----------

#2 is complicated: you gotta understand differential cryptoanalysis and run a bunch of tests.

-----------

------

It makes more sense to change our algorithms to use a new "invertible confusion operation" (aka: what SBoxes did before, and what ((Data) XOR (Rotate-left Data) XOR (Rotate-right Data)) does today).

--------

EDIT: Remember: the modern crypto-primitive is just a combination of "confusion" principles and "diffusion" principles.

1. Confusion "lossless transforms" numbers into other numbers. (A "set permutation" of sorts)

2. Diffusion "moves" bits from one number into other numbers.

y424y ago

Not sure if I understand you right, but sha256 is not invertible, there are a couple of steps where you actually just 'cut off' information that is lost afterwards.

1 more reply

manceraioOP4y ago

Do you recommend any resources to learn about this?

1 more reply

kzrdude4y ago

MD5 and SHA-1 are predecessors, and they are simpler, so certainly by starting from the understanding of the simpler ones. Just to say it wasn't all conceived in one go.

less_less4y ago

The compression function is a Davies-Meyer construction, which means that given an input state, it:

* Copies the state.

* Encrypts the state with a "cipher" of sorts (the "cipher" extracted from SHA-2 is known as SHACAL-2), using the message as a "key".

* Adds the copied state to the encrypted state to produce a new state.

dragontamer4y ago

About linearity, that just means that there's no "multiply" or "exponent" function equivalent to our primitives.

Addition is linear over numbers, because of multiplication.

XOR is linear over numbers: because XOR undoes itself. (5 x (XOR-Foo) is equivalent to 1x (XOR-Foo)).

It turns out that combining addition and XOR together results in a non-linear function over numbers and non-linear function over bits.

As such, we "expect" that combinations of Addition, XOR, and Shifts are non-linear. I'm not sure if it's been mathematically proven however, but no one has been able to show a "shortcut" thus far.

taviso4y ago

That's really cool. I made a terrible one for SHA1 years ago, yours is 1000x better.

https://lock.cmpxchg8b.com/sha1/visualize.html

dragontamer4y ago

Your tool seems to understand the gist of differential cryptography better though.

You can track a 1-bit change or 3-bit change to "M" and see how it propagates through the SHA256 rounds in your tool.

----------

mabbo4y ago

Before watching this: "Why can't cryptographers just figure out some tricks to crack these hash algorithms?"

After watching this: "How can any cryptographer EVER figured out any trick to crack these hash algorithms?!"

mynameismon4y ago

> "How can any cryptographer EVER figured out any trick to crack these hash algorithms?!"

sammyo4y ago

They can't. But there are certainly unsavory actors that are able to trick a certain percentage of the unsuspecting to just give them a private key.

selestify4y ago

They actually can, in some situations: https://security.googleblog.com/2017/02/announcing-first-sha...

userbinator4y ago

jjeaff4y ago

I think making a problem more accessible like that is the fastest path to a solution.

It reminds me of the Stephen J Gould quote:

"I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops."

manceraioOP4y ago

I actually started this because my first idea was: can I implement SHA-256 with just TTL logic gates? which should be possible, but it would take months to do.

for puzzles try 12037465 for some coffee :)

nayuki4y ago

My way of explaining step by step visually is by implementing in Excel: AES https://www.nayuki.io/page/aes-cipher-internals-in-excel ; DES https://www.nayuki.io/page/des-cipher-internals-in-excel .

Also relevant: https://www.righto.com/2014/09/mining-bitcoin-with-pencil-an...

y424y ago

There also exists a written description showing the process in Python, step by step, which I consider more helpful, because you do not need to stop and play the video.

https://nickyreinert.medium.com/wie-funktioniert-der-sha256-...

edwnj4y ago

not everybody speaks barbarian sir

y424y ago

Python that bad?

2 more replies

p1mrx4y ago

Can it be proven whether values of m exist such that SHA256(m) == 0?

If I were omnipotent and wanted people to believe in me, I would write a book that hashes to 0, so that anyone could verify its authenticity.

userbinator4y ago

orangepenguin4y ago

arcastroe4y ago

As a comment above stated, part of the "input" is the initialized values:

> Initialize hash value h0 to h7: first 32 bits of the fractional parts of the square roots of the first 8 primes 2..19).

You'll arrive at "random-ish garbage".

1 more reply

y424y ago

Sure, that would be a pretty high difficult factor, but its possible.

p1mrx4y ago

But is it provably possible for SHA-256?

2 more replies

picture4y ago

You could in theory also get that with a lot of computation of course

p1mrx4y ago

It should be straightforward to design a hash function that can't be preimage attacked using all the computing power in the universe.

Maybe an omnipotent cryptographer would find flaws in SHA-256, but then they could design a better function and include it in the book.

1 more reply

ypcx4y ago

Similar project which visualizes SHA-256 into terminal: https://github.com/in3rsha/sha256-animation

spdebbarma4y ago

zahllos4y ago

brk4y ago

can16358p4y ago

If such a movie even mentions SHA-256 I think that's above average on its own.

breakingcups4y ago

That depends..

"Just a second, I need to backdoor the SHA256 rootkit to penetrate the directory. Shit, they have an X62 firewall. Luckily I brought my pentester."

2 more replies

reincarnate0x144y ago

Syntax-highlighted ALGOL-y code blocks seem to be back in style if they're not going with the constant "toss that screen up to a hologram" bit.

DJPocari4y ago

This is fantastic. I once implemented SHA-256 in Google Sheets to visualize it, but it had horrible performance compared to this. This is the best visualization I've seen yet.

westurner4y ago

SHA2: https://en.wikipedia.org/wiki/SHA-2

https://rosettacode.org/wiki/SHA-256

Hashcat's GPU-optimized OpenCL implementation: https://github.com/hashcat/hashcat/blob/master/OpenCL/inc_ha...

Bitcoin's CPU-optimized sha256.cpp, sha256_avx2.cpp, sha256_sse4.cpp, sha256_sse41.cpp: https://github.com/bitcoin/bitcoin/blob/master/src/crypto/sh...

https://github.com/topics/sha256 https://github.com/topics/sha-256

Cryptographic_hash_function#Cryptographic_hash_algorithms: https://en.wikipedia.org/wiki/Cryptographic_hash_function#Cr...

Merkle–Damgård construction: https://en.m.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_...

(... https://rosettacode.org/wiki/SHA-256_Merkle_tree ... Merkleized IAVL+ tree that is balanced with rotations in order to optimize lookup,: https://github.com/cosmos/iavl

Self-balancing binary search tree: https://en.wikipedia.org/wiki/Self-balancing_binary_search_t... )

daenz4y ago

You should be very satisfied with how well you conveyed the algorithm. Well done. What was your approach to arriving at this result?

manceraioOP4y ago

daenz4y ago

Very nice. Have you built any other algorithm visualizations? I have a very strong interest in how algorithms are visualized so that they are more easily understood.

1 more reply

chris_l4y ago

Someone should make a collection of the various visualization web projects that crop up here from time to time.

jonathanyc4y ago

bmitc4y ago

manceraioOP4y ago

For that I like this one: https://cryptobook.nakov.com/

bmitc4y ago

Thanks!

lanecwagner4y ago

I wrote this one: https://app.qvault.io/course/6321ddbf-49eb-4748-9737-6bc12e8...

it's a crypto course where you write the solutions in Go. you might enjoy it :)

tptacek4y ago

JP Aumasson is one of the authors of the BLAKE hashes and wrote "Serious Cryptography":

https://www.amazon.com/Serious-Cryptography-Practical-Introd...

bmitc4y ago

Thanks!

anonymousDan4y ago

oib4y ago

Yes, sha3 reads every input byte only once. It does hold a pretty large internal state that doesn't fit in only registers.

a-dub4y ago

i remember reading a pretty good paper on the arx (add, rotate, xor) family of ciphers back in the day (sort of in the vein of, is that all you need?)...

Darkphibre4y ago

recursive4y ago

seumars4y ago

Fantastic presentation! The utility functions from the source code are just as useful.

ansible4y ago

abrookewood4y ago

Looks fantastic, but the only thing missing is why each step is done.

fthtls4y ago

great visualization. i've also checked the source code and utility functions. they are very well defined and useful too.

btw, the decrypter i coded -> https://10015.io/tools/sha256-encrypt-decrypt

based24y ago

https://en.wikipedia.org/wiki/Secure_Hash_Algorithms

anandsuresh4y ago

Pretty cool. Have been looking for something like this for a while. Thanks for building it.

Just sent you a PR for some typos I found while running through an example.

M4tze4y ago

Great visualization. Might become my new screensaver.

colejohnson664y ago

One could even feed the hash back into itself to get a different result each time

em3rgent0rdr4y ago

how long would the cycle last before it starts repeating?

2 more replies

stevofolife4y ago

Well made! Can you share how you made the website?

sylware4y ago

Can we have a video of this on youtube/dailymotion/vimeo/etc which we can download with yt-dlp?

haunter4y ago

You can screen record the full screen browser window with OBS (with the mouse disabled too)

reincarnate0x144y ago

That's super cool. Visualized or illustrated algorithms have always looked so magical, to me at least.

nwatab4y ago

Looks bautiful. I understand it is really complicated (even I know calculation of SHA256)

dicroce4y ago

Please do this for b+trees. :)

hombre_fatal4y ago

Beautiful color palette.

berta4y ago

This is awesome!

iqanq4y ago

Two of the buttons at the top have no "title" attribute and therefore no tooltip.

jerpint4y ago

Very satisfying!

j / k navigate · click thread line to collapse