So while iterative/computational hashing is only secure if it is slow and if the password is strong, Blind Hashing prevents offline attacks even against weak passwords and actually runs faster as you increase the cost factor.
In this case it's more like an an actual anchor -- technically we call this Bounded Retrieval Model -- the idea that we size the network bandwidth to make it take 300 days at full line rate to steal the data over the network. So it's a physical limitation rather than trusting a black box to protect 256 bits like an HSM.
If you're interested here's an intro [0], a tech spec [1], and an academic paper [2] by Moses Liskov at MITRE.
Disclaimer: I'm Founder/CTO of BlindHash.com which is basicallly Data Pool as a Service -- we provide an API into a geo-replicated 16TB (and growing) data pool.
[0] - https://s3.amazonaws.com/blindhash/BlindHash+Architecture+Gu...
[1] - https://docs.wixstatic.com/ugd/005c1c_5996c661899e4d09a28b9a...
I certainly wouldn't get 16TB of disks just for that if it were ever leaked.
Bummer(not for me :p) that you guys went the route of patenting it and keeping it proprietary & only available through an API.
I think it would be adopted in no time if it were open source, and I'd definitely like to see something like this available as a service on clouds like GCP/AWS/Azure/etc for my day job.
The approach has an economy of scale where a shared pool can secure many sites' hashes at very low cost to individual sites, but where the sum-total can fund a very large data pool. I would love to grow this to 1PB and beyond. The idea behind the patent is to give us a chance to try to grow exactly that service.
Fundamentally the technique is quite simple and easy to copy, yet IMO it is better than computational/iterative hashing in every way -- cost, performance, scalability, and security. It seemed to me a perfect example of something worth patenting. If we're ultimately not successful in commercializing it, I would want to relinquish the patent to the public domain.
The most important part -- and what's kept me working at this for years now -- is that it protects even weak passwords after a company is breached. It takes the onus (and a lot of the blame) off the end user, and solves the usability problem with passwords.
By the way, the same technique works equally well for adding BlindHash to your KDF used to decrypt your SSH key, or your laptop or your TrueCrypt volume. We can also add additional checks when running the BlindHash call for a given AppID to enforce things like;
1. must first rely to an SMS or enter a TOTP code 2. Request must come from a certain IP range or during certain hours 3. Request only valid after date X (time lock)
So this can be used to shore up password-based encryption as well in some very interesting ways.
1. Generate 16TB of random data, backup/replicate many times
2. Think of data as 16 billion 1k pieces
3. Generate 64 random piece addresses using hashA(key) as seed
4. Concatenate the 64 pieces into one 64k chunk, and store hashB(chunk)
This technique could have been invented and promoted starting in 1997 (20 years ago) but only through the protectionism of the patent regime do you have this beautiful write-up and promotion of it by researchers pushing it forward: it's the patent regime working in action.
It works EVEN WITH WEAK PASSWORDS. That is pretty amazing if you ask me.
I am glad they patented it and are promoting it.
"But wait, it's so simple".
Let me give you an example of a $684.23B company that you've heard of that is making a mistake in security that even a small child could detect and correct, but for which there is no proprietary solution in the space pushing them forward.
The company is Google and their silly security mistake is that when I give out "jsmith543+weeklytechupdate@gmail.com" where my true address is jsmith543@gmail.com, and I'm signing up for the Weekly Tech Update newsletter but I'm afraid they could start spamming me, or sell my address for any number of third parties to start spamming me, then this allows the creation of a gmail inbox that tags the incoming mail with "weeklytechupdate". Pretty clever. Only the issue is that it is possible to strip the +____ and spammers actually do that. Here are examples of HN people saying they actually do that: https://news.ycombinator.com/item?id=15396446
>I’ve run a fair amount of email campaigns where we strip out the + if gmail is the domain to ensure it doesn’t end up in some weird filter.
The solution is extremely simple. Allow me to specify a key-value pair from the GMail interface that generates a high-entropy key, and pairs it to a value I choose. Deliver all address to that key to my inbox, tagged with the value I chose, until I start marking it as spam. Very easy. Example: I go to gmail, I click "generate rescindable read address address", I am given affj3fjd and I assign it "weeklytechupdate". I see that affj3fjd@gmail.com gets assigned to weeklytechupdate and if I need to give my email address to that web site in the future I can always look it up in some list. Easy. Gmail doesn't do it, and its spam solution is broken.
The only thing is: nobody has come up with something clever enough to patent in this space, and then promote the @#$# out of. If they had, I could give my email addresses out in confidence to whoever I want.
Actually I made a full gmail email address dedicated only for spam. The problem is I can NEVER read the stuff that goes there as I just don't even look. I just looked. The last piece of spam that I got delivered to it occurred 7 days ago. There are just 2 pieces of mail in my inbox.
That means Google's spam filter is very, very, very good. Wait, what? So good that it silently filters spam that I expect to get, that I explicitly give out my email address for? (Okay, I just looked, and there are 2 messages from 4 days ago - nothing more recent - in the "promotions" tab).
No. It's not what it means. It means that some of these sites I give my address out to aren't able to email me at all. They're just not getting through, because GMail's spam fiters are too draconian.
When I give out "jsmith543+weeklytechupdate@gmail.com" I expect ALL of the mail sent to there to go through - not to be caught by the spam filter. Instead, presumably what happens is gmail throws away most mail that isn't sent to an individual by an individual.
Sorry to rant on this aside, I just wanted to show, in action, the difference between a patented solution that a company promotes, versus an EASY solution that would WORK, that GMail doesn't do. It actively does something broken. Nobody has come up with and promotes some fancy solution that works, so instead they don't use the weak solution that works; they use nothing, only a broken non-working security through obscurity solution that you can see HN'ers actively strip out in order to be able to spam effectively.
And this is Google. So this is a question as clear as day for why I don't mind patented novel algorithms with companies behind them licensing and promoting them. I kind of mind when it's a race to the patent office with new technology, but grandparent poster's technique is one that could have been done in 1997 so I don't really buy that excuse. I like that they're patenting it and promoting it. It's a good way to get companies to use better solutions. Companies just don't do it by themselves, as my Google example shows.
password + salt + password or salt + password + salt are known and trivial patterns in hashing. Unpatentable and even if a patent was somehow gotten, unenforcable.
If your salt is 5 characters it can certainly be 500000000 characters instead without the patent overlords having any slimy grounds to indemn you
For password authentication, IMO a much better solution is to generate strong random passwords (21 character base64) for users and tell them to write them down and/or use a password manager (I think web browser based storage of generated passwords can be done without the user needing to see the password at all). You can still memorize a small number of those over a few weeks if necessary and there is no good reason to memorize a bunch of passwords.
One nice thing about the design is that since the data pool isn't actually storing hashes, it doesn't change over time (except when you want to grow it) it's easy to have multiple data centers that operate completely independently.
Different copies of the data pool, different networks, different DNS, etc. The client library will retry/fail-over between data centers. So while yes, you do have to make a successful API call, it's not a SPOF.
It's very easy to replicate / add redundancy when there's no active sync required between sites. The only inter-site communication we have currently is when new accounts are created, to distribute the AppID, and to aggregate usage stats, which is batched and when it fails will just pickup where it left off once the network is back up.
A Go, Rust, or (minimal, non-framework) Java authentication server speaking HTTPS to solely that AuthN interface and sharing no database with anything else is extremely unlikely to be part of any realistic "kill chain"; it'll be among the last things on your network compromised.
Meanwhile: you get to stick with technology you fully understand and can manage (simple HTTP application servers and a decent password hash) and monitor.
HSMs have a lot of uses elsewhere in secure architectures, but the password storage use case is overblown.
Basically we 100% agree with you that an authentication service should do this job. The HSM is extra credit. Although it does help in cases where the auth service's DB is leaked through some other means (e.g. backups).
I will say that I'd depart with you on the return value of that service. It shouldn't be a bool. It's better to return a token that downstream services can use to independently verify that the authentication service verified the user. Its better for your infrastructure if you aren't passing around direct user IDs but rather a cryptographically signed, short lived token that is only valid for the life of a specific request.
I agree that for the majority of usecases, a HSM is not necessary, but they do bring security to the table that a simple auth server cannot, at least not without significant engineering effort.
I'm only talking about the AuthN problem, by the way. I'm not making a general argument against circuit breaker architectures.
I think the idea of the author is to protect the operation with hardware.
"Hardware" isn't magic. The magic power of an HSM isn't the hardware; it's the minimalized attack surface of the software.
https://en.wikipedia.org/wiki/Unidirectional_network
On a slightly tangent, people with physical access to a server can extract encryption keys from RAM by plugging into a PCI slot:
Publicly accessible servers and such will, of course, still have them, but things like, say, internal database servers or the PC belonging to Debbie in Payroll, won't.
Access to things outside of the "local network" (i.e., a company's entire network, not just the directly-connected subnet) will go through an intermediary (e.g., an HTTP(S) proxy) that performs per-connection authorization with a defauly deny.
It may end up looking a little differently than this -- a default deny on all outgoing IP traffic, for example, with only specific traffic permitted -- but I believe that, eventually, this is how we'll keep random hosts from being used to exfiltrate mass amounts of data.
TL;DR: Companies need to start filtering outgoing traffic and not letting any random host on the internal network connect out to any other random, arbitrary host in the world. This will be inconvenient and expensive (to manage), however, so we'll need a few more Equifax's before it begins to catch on.
Mutual TLS can be a bit of work to get set up but leads to huge security wins over time as every RPC within your infrastructure is mediated by an authorization layer. We've helped out a bit with the SPIFFE project which is looking to make mutual TLS easy: https://spiffe.io/
Locking your data to your hardware raises the question of what would happen if the hardware failed? Also at first glance this seems to introduce difficulties with scalability across multiple machines. Also it might make it difficult to switch between infrastructure providers.
The cost of this approach should be mentioned as a footnote.
Maybe the better solution is for society to support more small tech companies with smaller user bases that have fewer dissatisfied rogue employees to leak hashed passwords in the first place.
The root of the problem is not technical, it's political.
If possible, it's nice to keep an offline copy of your key material too. Maybe locked in a safe, gpg encrypted or something.
To restore the key, you need to bring 3 out of 6 persons to the so-called key ceremony, and each has to bring his smartcard and his PIN.
The same mechanism can be used to provision multiple HSMs with the same key material. But there are other means to do this. As soon as two HSM share a common secret, also known as Key Sharing Key, they are able to exchange all key material they possess in a secure manner.
Some HSM don't even bother to store the keys they generate within the bounds of their hardware. Only it's master key is stored in it's hardware, any other key is encrypted with the master key and stored on a shared filesystem.
If this sounds artificial to you, let me assure you that such procedures are in place at various companies who deal with raw credit card data, at least in Europe. The EMV committee, the PCI organization and each issuer of credit card do mandate such procedures.
And they are very strict. We once had to ship HSMs back to the vendor, because at some point they were not supervised by at least two persons. (At least the documentation thereof was missing.)
So, yeah, the problem is political in a way that everyone is coming with their own agenda into it, which has little grounding in reality, yet affects decisions of many people substantially.
(I think AWS might have launched some sort of HSM service, but I haven't looked at the details and not clear if it could provide the right sort of guarantee)
Azure: https://azure.microsoft.com/en-us/pricing/details/key-vault/ AWS: https://aws.amazon.com/cloudhsm/
The only thing you're not able to do in a public cloud is run these in Secure Execution mode—where you get to actually execute arbitrary code inside of the enclave instead of just doing operations with keys that are protected by the HSMs.
Giving it a fancy name makes it look like it's a new idea.
1. Assume that attacker will get data X
2. Make what you keep in data X as useless and uninteresting as possible.
3. Hash data X with the most expensive and safest hash possible.
4. If you really can't do steps #2 and #3, warn your customers about what you are keeping and encrypt the heck out of everything.
There are cheaper ways of keeping secrets secret. Using a TPM on the server would be one way. SGX would be another.