undefined | Better HN

0 pointsjerf2y ago0 comments

That's not what crazygringo means. ñ can be represented both as a single unicode U+00F1 https://www.compart.com/en/unicode/U+00F1, or as an n with a combining tilde https://www.compart.com/en/unicode/U+0303, which looks like this: ñ.

    Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
    >>> "ñ".encode("utf-8")
    b'\xc3\xb1'
    >>> "ñ".encode("utf-8")
    b'n\xcc\x83'

A naive hashing algorithm will hash them to different things.

For way too much information on this, see: https://www.unicode.org/reports/tr15/

Even a lot of Unicode-aware code written by a developer aware of at least some Unicode issues often fails to normalize properly, most likely because they're not even aware it's an issue. Passwords are a case where you need to run a Unicode normalization pass on the password before hashing it, but, unfortunately, if you're already stored the wrong password hash fixing it is rather difficult. (You have to wait for the correctly-incorrect password to be input, then you can normalize and fix the password entry. This requires the users to input the correctly-incorrect password; if they only input an incorrectly-incorrect password you can't do anything.) I'd suspect storing a lot of unnormalized passwords before learning the hard way this is an issue is the majority case for homegrown password systems. You hear "don't roll your own crypto" and think reaching for a bcrypt or scrypt library solves it, but don't realize that there's some stuff that needs to be done before the call to those things still.

0 comments

grodriguez1002y ago

Right. I misunderstood the comment. Thanks for clarifying!

j / k navigate · click thread line to collapse

0 pointsjerf2y ago0 comments

    Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
    >>> "ñ".encode("utf-8")
    b'\xc3\xb1'
    >>> "ñ".encode("utf-8")
    b'n\xcc\x83'

A naive hashing algorithm will hash them to different things.

For way too much information on this, see: https://www.unicode.org/reports/tr15/