This is a lot harder problem than people realize.
If you have a fixed set of machines that need secrets, then encrypting a bag of secrets with each machine's private key works ok.
But in auto scaling / automated / ephemeral scenarios, it doesn't work. You need an RBAC scheme for machines that builds layers of trust; each machine is placed into a role by a trusted service, script or person. Communication between the machines and the secrets service is verified TLS. Each event of access to, or modification of, a secret is recorded for audit purposes. And people and machines should both be treated as first-class actors.
Furthermore, secrets should be kept off permanent media; per the 12factor guidelines, secrets should come from environment variables.
Don't entangle secrets management with other tools like configuration management; otherwise you impede yourself from switching architectures down the road.
Don't create workflows that only ops can control, leaving developers out in the cold, or you are increasing organizational friction.
And if your secrets management processes are opaque to security and compliance people, then they won't have the same level of trust that they would have in a transparent system.
Here's an example of how we approach the problem: http://blog.conjur.net/chef-cookbook-uploads-with-conjur
This makes using ssh-agent with a reasonable timeout incredibly painful.
So you're left with either reentering your passphrase every 5/10/15mins, or basically never. Using smartcards for humans and TPMs for servers is a step in the right direction, but it seems ssh-agent is still missing this basic functionality - or am I missing something?
Organizations that want 2-factor auth are typically setting up bastion / jump hosts that require a second factor like a phone-delivered one-time password. This can be configured through the PAM stack.
Once on the bastion, the user can get to other machines within the accessible network using their passwordless ssh key. In effect, each bastion serves as a mini-perimeter.
And yes, people spend a lot of time entering their second factor. Dozens of times per day is not unusual.
Re-reading your question, I'm not really answering it. But maybe this anecdote is useful in some way :-)
Wouldn't it be better to generate the key in the same place it will be used? Transferring private keys over the network smells bad to me. Is there some requirement for a user to have only one key pair active at a time? If so that is bad. Each "client" environment you use should be able to upload a public key whenever it's convenient.
It's important to note that the 'user' here in Hosted Chef is not a person, it is an identity in the Chef server that is allowed to upload cookbooks. Its scope is limited to only that.
Rotating the deploy user's key when using HostedChef is a 1 step process, using knife and Conjur together
``` knife user reregister "conjurbot" | conjur variable values add hostedchef/conjurbot/private_key ```
The stdout of `knife user reregister` is the private key so you can update the variable in Conjur without even seeing the value. You could run this in a cron job if you wanted. Your CI system responsible for uploading cookbooks will pull the new private key next time it runs.
Again, not ideal that Hosted Chef only allows you one keypair per user but we can minimize the threat by rotating the key frequently.
http://blog.conjur.net/conjur-4-4-released
One of their stipulations for the audit was that we don't use it for promotional purposes so I guess a NDA is required to discuss details.
The tech we use for encryption of secrets is definitely open source here: https://github.com/conjurinc/slosilo
Conjur isn't built on in-house cryptographic software - it uses trusted open-source tools - OpenSSL, PAM and so on.
Most of our work is open-source https://github.com/conjurinc https://github.com/conjur-cookbooks
Now, I'm not going to defend these govt. standards as up-to-date or comprehensive. But they're a good philosophical reference for how to manage keys/secrets. Some COTS technologies (which I won't advertise here) try to automate/enforce strong key management for infra, but are typically only affordable for enterprise deployments.
It's much easier to feel comfortable handing out secrets of each of them had a fixed lifespan. It reduces anxiety greatly.
- MongoHQ support person has access to data in customer database.
- CircleCI stores everything in the MongoHQ database, that is used to deploy/control customer servers.
- CircleCI's Customers' CircleCI controlled environments mixed with production environments.
I am guessing everyone just expects most companies, especially those with maybe just Series A financing or close to it, expects those companies to employ this level of security paranoia?
We all just pretty much assume that they're doing the right thing(TM) with regard to security even after we've seen, time and again, that this is certainly not the case.
The established enterprise hosting companies have security-infrastructure teams that are larger than the entire staff of most startups. Draw your own conclusions about how thorough those startups are with regard to security.