undefined | Better HN

0 pointssomehnguy5y ago0 comments

Sorry, but would you mind expanding slightly on how you would implement such a system?

In my understanding once you remove all the layers of abstraction as some point it's a bunch of databases and data stores. Someone has to manage them. Why wouldn't a breach of those users be able to do whatever they want?

And a higher level, someone is writing the code to implement such a stringent access system. Why wouldn't a breach of those users (or a rogue employee) be able to accomplish bad things?

0 comments

jeffbee5y ago

Glad you asked. "There is a database and some guy is the DBA" is a very outdated architecture that can get you passing grades as an undergraduate and that's about all its good for. You should not take as a given that the right to modify datastores falls ultimately upon some individual. It is possible to permanently discard this ability, and organizations should strive for that.

mlinsey5y ago

I'm guessing you work/have recently worked at a big tech company (FANG or one of the ~5 other companies of comparable size) and are seriously overestimating how common their best practices are. Unless by "passing grades as an undergraduate" you mean "bonuses and promotions at a majority of the companies that handle your data every day"

jeffbee5y ago

G did not really get serious about infrastructure security until after the China hack (and more-so after NSA/Snowden) and didn't really get serious about insider risk until after "gcreep". Still, I don't understand the reluctance of the industry at large to learn the lessons of other people's failures. Why does each company need to separately discover that insider risk cannot be prevented by recruiting, it has to be prevented in code and hardware?

Building a large-scale information system is like building a nuclear power station. There are a million ways to screw it up and only a few recognized right ways. If you ignore the best practices, it will eventually destroy your company and harm your users. Twitter have nuked themselves here. How can they come back from this? It sure looks like an insider risk mitigation system would have been money well spent.

2 more replies

somehnguyOP5y ago

Thats not what I meant, sorry. How do you implement such a system? So theres a team to manage the datastores, but that changes nothing that on some level someone somewhere has root passwords and/or filesystem access and/or ability to modify the fleet.

We all know access controls and multiple operators are good, yeah. But at the heart of it is still a bunch of linux machines that have to be managed and deployed to. Which as far as I know has no mechanism for check with operator x before running command from operator 0.

bengalister5y ago

I know nothing about twitter's architecture but it could be:

- at-rest encryption of the datastores with the content encryption key protected by a HSM. A KMS (key management system) would be the interface to retrieve the key, with access control enabled. An even better solution would be to have the HSM cipher/decipher the data directly, thus the encryption key would never leave the HSM (or the encryption key is also ciphered by the HSM). But performance-wise it is not realistic.

- in-transit encryption from the client to the datastore. No end-to-end encryption more likely thus allowing admins who have access to encryption termination hosts (reverse proxy, twitter backend app, datastore,etc) to read (and maybe alter) the data by doing memory dumps

- access control for datastore operations: allowing only the twitter backend and some privileged users to read/write in the datastores, etc.

Doing end-to-end encryption from the client to the datastore with a key per client is possible but it would make the solution very complex to operate and not performant.

cardamomo5y ago

Your comment got me thinking: what does Twitter's infrastructure look like. This is from 2017, so I'm sure it's changed since then, but I found it interesting: https://blog.twitter.com/engineering/en_us/topics/infrastruc...

adamantoise5y ago

AWS KMS has a great whitepaper explaining how they do it here: https://d0.awsstatic.com/whitepapers/KMS-Cryptographic-Detai...

The tl;dr is that they use hardware security modules (HSMs) with quorum-based access controls. Any administrative actions such as deploying software or changing the list of authorized operators requires a quorum of operators to sign a command for that action using their respective private keys.

While this system was designed specifically around protecting customers' private keys, you could imagine a similar system around large databases.

jeffbee5y ago

> someone somewhere has root passwords

Not necessary

> or filesystem access

Also no

> or ability to modify the fleet.

Not that either. It feel like the conversation around these things is stuck in the far past. Large-scale organizations can and have driven the number of people with root passwords to zero. "Filesystem access" shouldn't be as easy as you're implying and it also shouldn't be of any use, since everything in the files ought to be separately encrypted with keys that can only be unwrapped by authorized systems.

Even the last thing you said about Linux systems starting processes ... even a minor application of imagination can lead you to think of an init daemon that can enforce the pedigree of every process on the machine.

2 more replies

dahfizz5y ago

Okay... but there is a database, right? And this database is managed in some fashion?

Presumably this database runs on some machine? And this machine was logged into in order to install and setup the database?

eternalban5y ago

One can trade data navigability and a performance hit for opacity of content.

Encrypted rows of data are meaningless to an "admin" that can query to its heart's content but will never be able to decrypt the result set. On the other hand, the layers on top (such as the web-tier that emits the plaintext) may have the keys to decrypt, but lack the privs to run around in the database; from that level, they must pass along the user's credentials to obtain user specific content.

Since people don't search by content on Twitter (afaik) and only 'meta-data' indexes are used (such as hash-tags, follower, following, date) this is entirely doable for something like Twitter.

There is also 'Homomorphic Encryption', but I'm not sure the tech there has reached acceptable performance levels.

jeffbee5y ago

Why any of that stuff? Do you think there's some guy who goes around installing spanserver on thousands of machines in GCP?

j / k navigate · click thread line to collapse

0 comments

jeffbee5y ago

mlinsey5y ago

jeffbee5y ago

2 more replies

somehnguyOP5y ago

bengalister5y ago

I know nothing about twitter's architecture but it could be:

- access control for datastore operations: allowing only the twitter backend and some privileged users to read/write in the datastores, etc.

Doing end-to-end encryption from the client to the datastore with a key per client is possible but it would make the solution very complex to operate and not performant.

cardamomo5y ago

adamantoise5y ago

AWS KMS has a great whitepaper explaining how they do it here: https://d0.awsstatic.com/whitepapers/KMS-Cryptographic-Detai...

While this system was designed specifically around protecting customers' private keys, you could imagine a similar system around large databases.

jeffbee5y ago

> someone somewhere has root passwords

Not necessary

> or filesystem access

Also no

> or ability to modify the fleet.

2 more replies

dahfizz5y ago

Okay... but there is a database, right? And this database is managed in some fashion?

Presumably this database runs on some machine? And this machine was logged into in order to install and setup the database?

eternalban5y ago

One can trade data navigability and a performance hit for opacity of content.

Since people don't search by content on Twitter (afaik) and only 'meta-data' indexes are used (such as hash-tags, follower, following, date) this is entirely doable for something like Twitter.

There is also 'Homomorphic Encryption', but I'm not sure the tech there has reached acceptable performance levels.

jeffbee5y ago

Why any of that stuff? Do you think there's some guy who goes around installing spanserver on thousands of machines in GCP?

j / k navigate · click thread line to collapse