Python 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] on linux
>>> "ñ".encode("utf-8")
b'\xc3\xb1'
>>> "ñ".encode("utf-8")
b'n\xcc\x83'
A naive hashing algorithm will hash them to different things.For way too much information on this, see: https://www.unicode.org/reports/tr15/
Even a lot of Unicode-aware code written by a developer aware of at least some Unicode issues often fails to normalize properly, most likely because they're not even aware it's an issue. Passwords are a case where you need to run a Unicode normalization pass on the password before hashing it, but, unfortunately, if you're already stored the wrong password hash fixing it is rather difficult. (You have to wait for the correctly-incorrect password to be input, then you can normalize and fix the password entry. This requires the users to input the correctly-incorrect password; if they only input an incorrectly-incorrect password you can't do anything.) I'd suspect storing a lot of unnormalized passwords before learning the hard way this is an issue is the majority case for homegrown password systems. You hear "don't roll your own crypto" and think reaching for a bcrypt or scrypt library solves it, but don't realize that there's some stuff that needs to be done before the call to those things still.