This is actually a pretty big deal. FIPS 140 certifications are required for a lot of US Federal sales. The FIPS 140 standard changed sometime in the past year from major version 2 to major version 3 and lots of changes are required to certify against the v3 standard, even if you had a v2 certification. What's not obvious is that a lot of FIPS 140 certified software libraries are whitelabeled openssl. Because of the openssl teams hard work in getting this released, it really lowers the barrier to entry for companies trying to sell software to the US Federal government.
I wish the certifications were not as onerous as they were, but this is a big step forward for teams that are not staffed to read and implement several hundred pages of ISO standards for how to correctly implement crypto algorithms. Don't even get me started on how the standards you certify against are themselves copyrighted...
With modern compilers, how often (or in what circumstances) is it worth "hand-rolling" assembler code versus just letting the compiler do it? Does one make the assembler 'from scratch', or perhaps let the compiler generate the assembler and have a human look at it to see if there are any places it can be improved?
* There's not that much code involved.
* Many CPUs have instructions specifically made for accelerating cryptographic algorithms.
* Security may have specific requirements from the code, such as not giving away any secrets through timing. This may require writing very specific, suboptimal code intentionally.
AES-GCM is "AES" and "GCM" running at the same time on the same data. ChaCha20 is "ChaCha20" and "Poly1305" running at the same time on the same data, usually block by block so that you avoid pulling data into cache more than once. You can interleave their imperative operations in C, or Rust code (or whatever) ... but the compiler isn't going to intuit how some of the math can be re-used across the algorithms without a lot of hints, or how it can be safely vectorized, and at that point you might as well just write the assembly.
In fact, you can benchmark openssl's assembly vs openssl's C: https://github.com/openssl/openssl/blob/master/crypto/aes/ae...
Granted, they aren't using intrinsics in that code, but a sufficiently smart compiler shouldn't need intrinsics