Some sign-up forms don't even give you feedback on which characters are problematic. The Oracle Cloud one kept erroring with "you need one uppercase, one lowercase, and one number" when what it meant to say is "remove that tilde", that took a while to figure out.
I mean, you're not supposed to write down passwords, but with all the various restrictions you can't even use a consistent convention so you can actually remember them all.
You're supposed to use a password manager. Preferably with a passphrase and a second factor like a keyfile or hardware token.
If you do this, you should really save the version of the normalizing table you used, since they change over time.
what are these and why do you need to do it?
This is independent of the Unicode encoding, which turns those codepoints into bytes, for example using UTF-8 this gives C3A9 or 65CC81.
Users don't really have control about what their keyboard/application is putting in the text field when they press the button, and obviously the hash of those is different so the password wouldn't match. Normalization is the process of turning the characters into its composed form (in my example "\u00E9") or the decomposed form ("\u0065\u0301"), so you can then compare your codepoints/bytes/hashes.
https://en.wikipedia.org/wiki/Unicode_equivalence#Normalizat...
When they forbid backslashes and quotes, it's even better: someone didn't know how to use query parameters or escape database values. It's a sign that their software is as secure as a "watch out for the dog" sign.
For a specific example Oracle Database has a very restrictive list of characters allowed in a user password. If you're using Database Users behind the scenes (even if not directly, but via an Oracle integration) you're subject to those same restrictions. Up until Oracle 11g passwords were also limited to 30 characters and a few releases before that were case-insensitive (!).
Is this a good reason? I'd argue, no, but I've worked at tons of organizations where "things that don't make sense" often have an explanation even if it isn't an explanation you're happy with. We should definitely push companies to use cryptographically secure one-way hashing functions with salts, and adjustable difficulty.
I've heard banks and other financial institutions use the "our ancient mainframe only allows 8 characters in account passwords" excuse or "our ancient mainframe database can only handle 8 characters in the password column", and find it extremely hard to believe.
First of all, I find it hard to believe that each customer has a user account on the mainframe, and so the mainframe's restrictions on account passwords is irrelevant. Your banking account is going to be entirely something defined by the database.
Second, I find it hard to believe that they are running their web server on their ancient mainframe OS. The web server is going to be running on something more modern. Users have to go through that to do online banking, and the account system on that can be totally separate from whatever account system is running on the backend banking system. Your user name (if their online banking uses something other than you account number) and you password for online banking should be entirely handled on the Unix or Unix-like or Windows Server that is running their web-facing stuff. The ancient mainframe stuff should never see it.
Why? Do you actually have any experience in this area? I do, and I can tell you, they do exactly that. Then multiple systems integrate with that mainframe, often using the user account as the unique identifier for the entire organization. Migrations are an absolute nightmare.
> Users have to go through that to do online banking, and the account system on that can be totally separate from whatever account system is running on the backend banking system.
It can be, but it isn't. Thus, the problem.
Honestly this type of "hardly believe" take is what every new employee right out of college (or myself 15 years ago) when they come up with ten thousand "simple" ideas for improvement without any organization, political, or system understanding. Then they act confused when their ideas aren't instantly implemented, because they don't even understand what it is they're proposing or why it is complicated.
Banks have been trying to get off of mainframes for 30-years or more at this point, spent tens of millions of dollars, but had someone just told them to "run a web server in front of it" this could all have been avoided.
I am talking about Oracle Database Users and Oracle Database's password limitations therein. The reason for Oracle Database's password restrictions isn't to do with how they're stored on disk (which is secure as if 12c[0]), it is to do with how they were implemented originally (i.e. passwords are implemented as database objects, and database objects have max lengths and other naming rules which apply to passwords).
The keyboards in the lab were heavily used and was noisy. The space bar, because of its shape, sounded distinctly different from the other keys. I stayed away from the admins when they entered the password like a decent citizen but listened in and found that the password was 7 characters long and also that the second and sixth characters were space (thanks to the different sound of the key). So .˽...˽.
I brute forced this using a shell script (since I has just learned how to write shell script), ran it overnight, and got in the next day.
So yes, I think there might, atleast in theory, be good reasons to avoid certain characters in a password.
It is thus a security Best Practice for streamers and the likes to mute their microphones while typing passwords.
Really, all senses leak information like this. Wifi signals are enough to see round corners and steal passwords. Even wearing a sleeveless shirt and having your upper arms visible to a camera leaks a little information from the small arm and theoretically even muscle movements.
Also, since my password manager types letters one by one, I wouldn't use tabs or line feeds.
Maybe don't use grapheme clusters that have multiple valid encodings and make up for it by using a longer password instead?
Because of that, outlawing the likes of line feed, carriage return and backspace (raw input on a tty will store those in passwords, but good luck entering them in a web form) makes sense, as does normalizing Unicode input (typing ‘é’ on their phone may produce a byte sequence that’s different from typing ‘é’ on their PC)
Apart from that, it should not be necessary. If, however, you don’t trust your programmers to do the right thing, you may want to rule out characters that are related to security incidents such as single quotes, and also may want to prevent users from entering strings that might get decoded to such strings such as ‘"’.
That path can be endless, though. If you forbid ‘&’, because your programmers might accidentally html-decode it, should you guard against double html-decoding? URI-decoding and then uudecoding? Getting programmers you can trust to do the right thing and giving them the time to do so is the better option.
But they're probably just storing it in plaintext on some legacy system that can't handle certain characters. Or the plaintext goes through one of those systems on its way to being hashed and salted.
For characters outside that range, there is a good reason: it's hard to type those characters consistently across different platforms/systems, and they don't want you to lock yourself out over that.
> Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length. Verifiers SHOULD permit subscriber-chosen memorized secrets at least 64 characters in length. All printing ASCII [RFC 20] characters as well as the space character SHOULD be acceptable in memorized secrets. Unicode [ISO/ISC 10646] characters SHOULD be accepted as well.
- Requires quoting or escaping in the shell or some other programming environment
- Hard to type on mobile keyboard.
- Not in a given person's touch-typing repertoire.
The correct way to think about password security is as randomly generating a binary string of the desired security strength/length and then encoding it. If you generate 16 random bytes, that's 128 bits of security whether you encode it with hex, base32 or base64.
Required characters also do little to improve security, since there is usually only 1 of each kind of required character, and it's often at the beginning or end. They don't cause the user to select a random string from a meaningfully larger space.
What I cannot get is sites that make you play 20 questions to figure out their rules instead of just telling you, as in my experience, it leads to lousy passwords that meet only the bare minimum. I seem to recall some popular site (want to say it was AirBnB) which threw an error "password cannot contain name/username" for basically anything it didn't like, regardless of whether the password actually contained that, and it's very annoying.
It was one of the most welcomed changes to the password system at a former work place when I convinced the small team behind the authentication to put the requirements plain and simple and change from red to green as people met the requirements. We also added a passphrase helper that could be summoned if they missed requirements a few times which based on metrics got some fair use.
People generally want to do well by security and it's on their mind, but no one wants to look stupid because they can't think of a password that meets unknown requirements. Make it clear what's expected, and even a nudge towards how to think of good passphrases, and you'll get happy people using your site.
I change my password with something randomly generated by my password manager, and the site accepts it, and as far as I know I'm good to go. Then next time I try to log into the website, it doesn't accept the password it previously (falsely) accepted before, and I have to reset it again and play the guessing game of what special character it didn't like. Madness.
Regarding the null, if it's C based, theoretically your password just stops there. All other chars after that would be ignored.
Now I wonder, what would other non-C languages do if they see 0x00 in a string?
ctrl-H usually works.
Possibly they're preparing for password entry on more ubiquitous devices with limited keyboards? (ATMs, credit card keypads).
Although you should probably not allow "1234" as passwords or anything on the top 100 list for that matter.
That said, I did actually run into an instance where having ";-- in your password would trigger the WAF during login and because we needed to ship ASAP the easiest way to get around that was to ban ; in passwords. I don't think we ever went back to fix that one...
This is a misconception. Password length is far more important than allowing a few "tricky" non-alphanumerics. It aids entropy, but it's not some security silver bullet. Also, if the web service you're using is storing undigested passwords then all bets are off.
https://support.1password.com/pbkdf2/
Clocking in at a cracking cost of 79 million USD, for most intents and purposes, even a rather trivial 56-bit entropy password such as "align-caught-boycott-delete" (or "correct horse battery staple", for that matter) would be prohibitively expensive to break.
What system allows you to try 2⁴³ passwords in half a jiffy?
No provider is going to let anyone try that many combinations against a login API, but let's consider the case where the hashes have been captured. Hashcat on a Radeon RX 6650 can test about 30 billion MD5 hashes per second, about 200,000 sha512crypt hashes per second, about 500,000 MacOS PBKDF2 passwords per second, and about 32,000 bcrypt hashes per second.[2][3]
To brute-force the "four random English words" space for a single password, I therefore calculate:
MD5: 333,333 seconds (a little under 4 days)
sha512crypt: 50,000,000,000 seconds (578,703 days, or 1,585 years)
Mac OS PBKDF2: 2,000,0000,000 seconds (231,481 days, or 634 years)
bcrypt: 312,500,000,000 seconds (3,616,898 days, or 9909 years)
No one recommends storing passwords as MD5 hashes anymore, but that's the fastest algorithm Hashcat supports. When using the kind of hash that information security specialists tend to recommend these days, it seems like the XKCD method is still pretty safe. Am I missing something? Did I calculate something incorrectly?
Edit 1: Fixed the figures for sha512crypt.
Edit 2: for the NVidia A100 you mentioned in another branch of this thread, it would be about ten times faster per GPU, but it's still an impractically long time for the modern password hashes unless the adversary has millions of dollars to spend on cracking a high-value account's password.
[1] https://wordcounter.io/blog/how-many-words-are-in-the-englis...
[2] https://hashcat.net/forum/thread-10919.html
[3] It would be slower to handle the four English words case, because AFAIK you'd need to use the wordlist mode instead of straight brute force.
[4] https://gist.github.com/Chick3nman/d65bcd5c137626c0fcb05078b...
Some emoji, for example, are combinations of multiple other emoji, and a given combined emoji may not be uniquely represented by a sequence of codepoints. In the pathological case, this could mean that an OS update on the user's system changes the composition of the same emoji, which might make it impossible for them to input their password. It is probably prudent for a system to disallow emoji passwords.
One step away from Emoji, Unicode also allows for other m̸̱̜̅ͅȋ̴̩̠̀s̸̺͐c̶͈͇͉̐͛̚h̸̤̣̆i̴͍͍͒͌e̴̲̽̓f̸̞̽̊. Chances are, full-on Zalgo passwords can lead to problems. Again, there are probably prudent reasons to restrict some characters. On the other hand, those modifiers exist for a reason, and disallowing phrases in the user's native language doesn't make for great UX.
Towards the more common use of Unicode, there is a pretty good _practical_ reason to restrict the use of some non-ASCII characters: if your system accepts ç, ö and ø as characters in passwords, and non-technical users venture into a part of the world where the keyboard layout doesn't, your helpdesk is going to have to deal with the occasional annoyed customer. From a systems design perspective, those characters seem fine -- operationally, they may cause headaches.
Finally, we've arrived at printable ASCII characters. Restrictions on maximum length (usually 6 or 8 characters), and on certain characters (%, & or :) tend to be based on interactions with legacy systems (e.g. DES crypt() used to have an 8-character minimum), or on bad input handling. Either way, it's probably a bad sign.
In addition, not all keyboard environments are capable of inputting the same set of emoji. A coworker once got locked out of his macbook because the UX when changing the password when already logged in allowed inputting emoji, but the OS login screen did not (could be misremembering some specifics, but the broad point remains).
Which I suppose is really a subset of the sorts of issues around ç, ö and ø, but how it can happen even on the same system in different contexts.
In general, passwords are not treated as essential for access, and there will be recovery techniques, and the number of password resets or whatever required because they can’t type such-and-such a character any more or on this new device will be a minuscule fraction of the total. Resetting passwords and other forms of lost account access is typically not an exceptional path. From what I’ve seen, for business-to-customer businesses that don’t have some form of self-serve account recovery, “I’ve lost access to my account” will routinely be half your ticket volume.
In those fairly uncommon situations where passwords are essential for access (e.g. where it’s an encryption key), well, it’s still up to the user, and the user is somewhat more likely to be aware of any potential hazards in such fanciness.
Overall, I say: stop trying to be clever; accept what is set before you without asking questions on the grounds of compatibility. Let the user do what they try to do.
Maybe normalise Unicode; it’s a harmless thing to do and has the potential to improve compatibility slightly on very unusual input devices. (I don’t think I’d bother, myself.) But beyond that, I’m not sold on the arguments for restricting possibilities.
However, in my opinion, for real-world systems, you need to strike a balance between technical and operational, and user experience concerns. If restricting your password space to printable ASCII characters can meaningfully decrease the amount of the tickets that generate half of your ticket volume, you should give it some serious thought.
There are good arguments for both approaches, and the right way also depends on your user base. There was a story about WhatsApp a while ago, criticizing that WhatsApp would only notify users when their contacts' security code had changed, whereas Signal (and other secure messengers) would block and ask for confirmation first. Signal currently sits at 100M+ downloads in Play store, WhatsApp sits at 5B+. The numbers are very vague, but WA has 1-2 orders of magnitude more users than Signal.
In the WhatsApp example, a small change in the process can mean that good security becomes accessible to a pool of billions of users, vs. excellent security to millions. Restricting the password character set (to a sensible set of characters, and with a sensible length limit) comes with no security drawbacks, and good chances of some process/usability improvements. For a real-world deployment, I would argue it's very prudent.
I think it took me about five reboots in single-user mode and password resets before something clicked. I wish Ubuntu would not have allowed special characters. :)
So if your password is "password", it will get entered in as "Password" - and the user will get confused why their username/password aren't logging them in.
So a UX pattern is to actually lowercase the first letter on the backend.
While this technically slightly lowers security (they are trying 4 passwords built from the one you typed in), I don't think that's significant, and I imagine it greatly improves user experience.
You have draw the line somewhere and degrading the majority’s experience for the minority’s benefit is an unusual trade-off.
Whatever happened to, “Design for the expert user”?
I don't understand why this would cause an expert user trouble (it's the loss of a single bit of password security, which shouldn't matter if your password is even reasonably decent).