Shipping Multi-Tenant SaaS Using Postgres Row-Level Security (opens in new tab)

(thenile.dev)

254 pointscapiki3y ago117 comments

117 comments

Be aware when using RLS with views: By default the RLS policy will be executed with the permissions of the owner of the view instead with the permissions of the user executing the current query. This way it can easily happen that the RLS policy will be bypassed because the owner of the view is a admin account or the same account that owns the underlying table (see the the gotchas section of the original post).

However, upcoming PostgreSQL 15 adds support for security invoker views: https://github.com/postgres/postgres/commit/7faa5fc84bf46ea6... That means you can then define the security_invoker attribute when creating a view and this "... causes the underlying base relations to be checked against the privileges of the user of the view rather than the view owner" (see https://www.postgresql.org/docs/15/sql-createview.html) PG15 beta 1 release notes: https://www.postgresql.org/about/news/postgresql-15-beta-1-r...

markhalonen3y ago

This was a shock to us after using RLS for a while. The solution outlined here worked great for us: https://www.benburwell.com/posts/row-level-security-postgres...

infogulch3y ago

So they create table-valued functions which support the "SECURITY INVOKER" security context, and then select from that function to form the view. I suppose there's a feature request somewhere to support the "SECURITY INVOKER" feature for views directly?

infogulch3y ago

Well well lookie here:

commitdiff 2022-03-22: Add support for security invoker views. - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

discussion 2021-12-17: [PATCH] Add reloption for views to enable RLS - https://www.postgresql.org/message-id/b66dd6d6-ad3e-c6f2-8b9...

explanatory blog post: 2022-03-22: Waiting for PostgreSQL 15 – Add support for security invoker views. - https://www.depesz.com/2022/03/22/waiting-for-postgresql-15-...

This seems to be slated for PG15: https://www.postgresql.org/docs/15/release-15.html#id-1.11.6...

> E.1.3.1.6. Privileges: Allow view access to be controlled by privileges of the view user (Christoph Heiss) Previously, view access could only be based on the view owner.

Syntatically it will look like:

    CREATE VIEW vista WITH (security_invoker=true) AS SELECT 'Hello World';

alberth3y ago

That seems like a bug to me, and a significant one as well.

The underlining promise of RLS (sometimes even referred to as “virtual private database”) in an RDBMS, is that data should never leak because it’s handled transparently by the db.

This seems like a significant leakage point that the user has to personally manage.

Spivak3y ago

This pretty much mirrors stored procedures though which have the option of running as the definer or the invoker. Breaking this with “when RLS is enabled stored procedures and views set to run with the permissions of the definer intersect with the RLS policy of the invoker” is crazy weird.

Maybe it would actually be good behavior but it would super super unintuitive.

alberth3y ago

Isn’t it also super unintuitive that developers will have to manage partitioning customer data at the Application Layer even though they think they are using a Data Layer security policy.

When RLS is enabled, their should never be a situation where Customer A might be able to gain access to Customer B data. That’s literally the entire objective of the feature.

This is giving people a false sense of security, and for multi-tenant applications has massive consequences since this will leak data.

mixmastamyk3y ago

Assuming correct, it's an absolutely horrible default that needs to be changed.

Imagine if the default Unix permission was to set files with setuid.

bearjaws3y ago

This is such a killer feature in PG, my new job uses it and it makes audits of our tenancy model dead simple.

Coming from a SaaS company that used MySQL, we would get asked by some customers how we guarantee we segmented their data, and it always ended at the app layer. One customer (A fortune 10 company) asked if we could switch to SQL Server to get this feature...

Our largest customers ask how we do database multi-tenant and we point to our SDLC + PG docs and they go 'K'.

e1g3y ago

Every B2B client who asked us how we handle multi-tenancy also asked how we ensure their data is erased at the end of the contract. Using a shared database with RLS means you have to go through all DB backups, delete individual rows for that tenant, then re-generate the backup. That’s a non-starter, so we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier.

brightball3y ago

I filled out a ton of enterprise questionnaires on this stuff before and we just told people that it would be deleted when the backups expired after X days because we didn't have the capability to delete specific rows from our backups. Nobody ever argued.

There's not a single customer I've ever run across who's going to halt a contract because you can't purge their data from your backups fast enough. They're signing up because of what you offer, not the termination clause.

e1g3y ago

A recent example from the other side: a client contacts me and says they will have to exit from our existing contract unless we can update our (AWS) infrastructure to use their (AWS) encryption keys for servers and databases handling their tenancy. In Enterprise, some tenants are very opinionated about what cloud you use and how their data lives/flows within it. I run all our infosec, including SOC2 & ISO27001 programs, and I know that using their encryption keys is nothing but security theater. But with $500k p.a. on the line, I also know when it's showtime.

3 more replies

voberoi3y ago

This is the way -- also never had an issue across US healthcare and enterprise SaaS.

coenhyde3y ago

i did the same with the same results

2 more replies

bearjaws3y ago

We usually write a "reasonable best effort" clause into our deletion, that it will 100% be deleted from production within 30 days and automatically fall out of backups 60 days from there. This also helps since we can't control our downstream vendors such as Twilio, AWS SES, etc, who all have their own legal obligations and time frames.

Even for large health systems they have been okay with it.

jerryjerryjerry3y ago

I think TTL feature provided by some DB vendors are actually orthogonal to multi tendency, where the former deals with life cycles policy of data but the original problem to delete the data of certain user is more related to privacy policy of data, though overlap may exists.

Dave3of53y ago

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

When you say 1 DB I suspect you mean you have a single DB Server and multiple DB's on that server. Then I don't think this really solves the data-residency problem as the clients data is just in a different DB but still on the same instance. It makes other problems for you as well for example you now have 2 DB's to run maintenance, upgrades, data migrations on. Current company uses a similar model for multiple types of systems and it makes upgrading the software very difficult.

It also makes scaling more difficult as instead of having a single DB cluster that you can tweek for everyone you'll need to tweek each cluster individually depending on the tenants that are on those clusters. You also have a practical limit to how many DB's you can have on any physical instance so your load balancing will become very tricky.

There are other problems it causes like federation which Enterprise Customers often want.

hobs3y ago

Agree - performance on row level security (at least on SQL Server) is terrible, sharing by database is fairly easy.

axlee3y ago

It makes BI work an absolute hellscape as well. Tradeoffs.

mrtesthah3y ago

Just make the application collect the statistics you need. Put them into a separate shared DB or use one of the many existing off the shelf SaaS collection frameworks.

spiffytech3y ago

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

I'm surprised this simplified balancing for you. When I worked somewhere with per-customer DBs, we had constant problems with rebalancing load. Some customers grew too big for where we put them, some nodes usually performed fine until the wrong subset of customers ran batch jobs simultaneously, etc.

jquery_dev3y ago

How do you manage your backend in this case? Do you have an insurance of backend for each customer or do you allow backend to make connections to all the DBs.

I'm interested in doing similar and wondering about the best way to handle the routing between the databases from a single backend.

mason553y ago

It really depends on your requirements, both functional and cost. Having a full stack per customer can be great for a lot of reasons. It's probably the safest because you never have to worry about something getting messed up in the code and creating cross-customer access. Once you're sure the environment is set up correctly you can sleep well at night. You also don't have to worry about one customer doing something to impact the performance of other environments (unless you're using shared infra, like putting all your DBs on a single cluster). And it can make maintenance easier, for example you can data migrations can start with small and/or less important customers for practice. It also can give you more flexibility if you need to make special snowflakes (i.e. some big customer wants weird IP whitelisting rules that can't work with your normal env setup).

Downsides are that it's probably more expensive and more work. Even if your infra spin up is totally automated, you still need to keep track of all the environments, you still need to keep your Infrastructure-as-Code (e.g. your terraform scripts) up to date, more can go wrong when you make changes, there's more chance for environments to drift.

So, in short, separate stacks usually means more safety & simpler application architecture in exchange for more cost and more effort to manage the fleet.

gervwyk3y ago

Also second this, we even split our AWS org into an AWS account per tentant. Although, this will maybe be a problem if we have +100s of clients. But it makes onboarding and off-loading simple.

HatchedLake7213y ago

It depends on an annual contract value (ACV), doesn't it? You can't give an AWS account to every $99 p/m plan, but you can for enterprise $50-100k+ deals.

1 more reply

jamesfinlayson3y ago

Is account creation automated?

I know there are some resources where you can only have one per region (I think you can only have one AWS::EC2::VPCEndpoint per... type and service per region) but I don't know if letting multiple tenants use the same VPC endpoint is a risk or not.

1 more reply

throw031720193y ago

Each client is running on their own instances / load balancers?

BeefWellington3y ago

One DB (or perhaps schema, depending on DBMS) per tenant is the way to go IMO. It simplifies so much of your work, from DR to deletion to scaling.

SahAssar3y ago

I think the way to handle this (based on how many companies handle GDPR compliance) is to not keep backups older than X months (usually 3 months) and have a clause that all data past that time is deleted.

eastbound3y ago

I honestly don’t understand how Oracle is still alive. Postgres has so many of these killer features.

Also, I wonder how others do tenant separation, what other solutions there are.

mritchie7123y ago

Legacy.

If you have thousands of lines of code relying on Oracle the cost to migrate would be enormous.

gav3y ago

Ignoring the cost, there's the risk/reward alignment you see in large enterprises.

Imagine you're a new CIO. You know you're probably looking at a 3-5 year tenure at this new company and you want to lead with some big wins to set the tone and show your value.

You're reviewing proposals from your senior leadership. One of the options is an Oracle migration. It could cost a million dollars to migrate, but you'd save a million dollars a year going forward. Oracle runs your mission-critical internal systems, any issues with the migration and the system you migrate to is going to cause significant financial and reputation damage. You'll have to defend this decision if anything goes wrong, i.e. you've absorbed a lot of risk but a lot less upside to you personally.

What do you do? You put the proposal to the side and look for something that has a lot better upside.

2 more replies

toomuchtodo3y ago

Amazon has a great post on this topic.

https://aws.amazon.com/blogs/aws/migration-complete-amazons-...

I thought it was cool they retrained their Oracle DBAs into other roles as part of the project.

1 more reply

beachy3y ago

Oracle has had the ability to do this for decades ("virtual private database"), so whatever is keeping them alive, it's nothing to do with this particular nifty Postgres feature.

ibejoeb3y ago

PostgreSQL is a great product, but Oracle has so many more features. Even just in the query language. There's so much power embedded in just the model clause that would be all custom software in Postgres.

If you buy Oracle, you should use Oracle. Like really lean into it. If you really need it, it will be worth the money. I don't like dealing with Oracle sales, but the product is killer.

throwaway7875443y ago

They're like a tick. Very good at burrowing in and hard to remove. They have a lot of clients for whom a dozen million dollars is a drop in the bucket, and moving away is a decade-long millions-of-dollars project.

revskill3y ago

Sharding is the only scalable way per my experience. The point about scalability here is, i can control the load as the data gets bigger.

Scarbutt3y ago

oracle has flashback

gz53y ago

And PG supports layer 3 shut-down of link listeners and inbound fw ports. So you can combine the L7 tenancy with a secure networking architecture which eliminates the problems of managing firewalls and ACLs. One of the open source examples: https://youtu.be/s-skpw7bUfI

ksec3y ago

Fortune Top 10

1 Walmart

2 Amazon

3 Apple

4 CVS Health

5 UnitedHealth Group

6 Exxon Mobil

7 Berkshire Hathaway

8 Alphabet

9 McKesson

10 AmerisourceBergen

We can rule out 2,3,7,8 …

simonw3y ago

I don't fully understand the performance implications here.

Say I was using this for a blog engine, and I wanted to run this SQL query:

    select * from entries;

But I actually only want to get back entries that my current user is allowed to view - where author_id = 57 for example.

Would PostgreSQL automatically turn the above query into the equivalent of this:

    select * from entries where author_id = 57;

And hence run quickly (assuming there's an index on that author_id column)?

Or would it need to run an additional SQL query check for every single row returned by my query to check row permissions, adding up to a lot of extra overhead?

ossopite3y ago

yes, postgres will add such a condition to the query and in simple cases like this is able to use a corresponding index

unfortunately this can break down in more complex cases. roughly postgres trusts a limited set of functions and operators not to leak information about rows (e.g. via error messages) that the RLS policy says a query should not be able to see. that set includes basic comparisons but not more esoteric operations like JSON lookups. at some point postgres will insist on checking the RLS policy result for a row before doing any further work, which can preclude the use of indexes

lmeyerov3y ago

We were looking at RLS, various ABAC integrated frameworks (casbin, ..), and zanzibar clones late last year --

* RLS is super appealing. Long-term, the architecture just makes so much more sense than bringing in additional maintenance/security/perf/etc burdens. So over time, I expect it to hollow out how much the others need to do, reducing them just to developer experience & tools (policy analysis, db log auditing, ...). Short-term, I'd only use it for simple internal projects because cross-tenant sharing is so useful in so many domains (esp if growing a business), and for now, RLS seems full of perf/expressivity/etc. footguns. So I wouldn't use for a SaaS unless something severely distinct tenant like payroll, and even then, I'd have a lot of operational questions before jumping in.

* For the needed flexibility and app layer controls, we took the middle of casbin, though others tools emerging to. Unlike the zanzibar style tools that bring another DB + runtime + ..., casbin's system of record is our existing system of record. Using it is more like a regular library call than growing the dumpster fire that is most distributed systems. Database backups, maintenance, migrations, etc are business as usual, no need to introduce more PITAs here, and especially not a vendor-in-the-middle with proprietary API protocols that we're stuck with ~forever as a dependency.

* A separate managed service might make zanzibar-style OK in some cases. One aspect is ensuring the use case won't suffer the view problem. From there, it just comes down to governance & risk. Auth0 being bought by Okta means we kind of know what it'll look like for awhile, and big cloud providers have growing identity services, which may be fine for folks. Startup-of-the-month owning parts of your control plane is scarier to me: if they get hacked, go out of business, get acquired by EvilCorp or raise $100M in VC and jack up prices, etc.

There's a lot of innovation to do here. A super-RLS postgres startup is on my list of easily growable ideas :)

On a related note: We're doing a bunch of analytics work on how to look at internal+customer auth logs -- viz, anomaly detection, and supervised behavioral AI -- so if folks are into things like looking into account take overs & privilege escalations / access abuse / fraud in their own logs, would love to chat!

jzelinskie3y ago

As the developer of an external authorization system (full disclosure)[0], I feel obligated to chime in the critiques of external authorization systems in this article. I don't think they're far off base, as we do recommend RLS for use cases like what the article covers, but anyways, here's my two cents:

1+2: Cost + Unnecessary complexity: this argument can be used against anything that doesn't fit the given use case. There's no silver bullet for any choice of solution. You should only adopt the solution that makes the most sense for you and vendors should be candid about when they wouldn't recommend adopting their solution -- it'd be bad for both the users and reputation of the solution.

3: External dependencies: That depends on the toolchain. Integration testing against SpiceDB is easier than Postgres, IMO [1]. SpiceDB integration tests can run fully parallelized and can also model check your schema so that you're certain there are no flaws in your design. In practice, I haven't seen folks write tests to assert that their assumptions about RLS are maintained over time. The last place you want invariants to drift is authorization code.

4: Multi-tenancy is core to our product: I'm not sure I'm steel-manning this point, but I'll do my best. Most companies do not employ authorization experts and solutions worth their salt should support modeling multi-tenant use cases in a safe way. SpiceDB has a schema language with idioms and recommendations to implement functionality like multi-tenancy, but still leaves it in the hands of developers to construct the abstraction that matches their domain[2].

[0]: https://github.com/authzed/spicedb

[1]: https://github.com/authzed/examples/tree/main/integration-te...

[2]: https://docs.authzed.com/guides/schema

gwen-shapira3y ago

The blog explicitly said that if the requirements involve actual authorization models (beyond simple tenancy) then RLS is not the best fit (see: https://thenile.dev/blog/multi-tenant-rls#if-you-have-sophis...).

I think this covers both the complexity aspect and the difference between what you get from RLS and what external authz brings to the table (schema, for example).

I do think that RLS is a great way for a company without authz experts to built a multi-tenant MVP safely. I've yet to see a single pre-PMF company that worries about authorization beyond that, this is a series-B concern in my experience.

semitones3y ago

Btw that's a localhost link

gwen-shapira3y ago

oops :) Too many tabs. Fixed and thank you.

shaicoleman3y ago

We're currently using the schema-per-tenant, and it's working very well for us:

* No extra operational overhead, it's just one database

* Allows to delete a single schema, useful for GDPR compliance

* Allows to easily backup/restore a single schema

* Easier to view and reason about the data from an admin point of view

* An issue in a single tenant doesn't affect other tenants

* Downtime for maintenance is shorter (e.g. database migration, non-concurrent REINDEX, VACUUM FULL, etc.)

* Less chance of deadlocks, locking for updates, etc.

* Allows easier testing and development by subsetting tenants data

* Smaller indexes, more efficient joins, faster table scans, more optimal query plans, etc. With row level security, every index needs to be a compound index

* Easy path to sharding per tenant if needed. Just move some schemas to a different DB

* Allows to have shared data and per-tenant data on the same database. That doesn't work with the tenant-per-database approach

There are a few cons, but they are pretty minor compared to the alternative approaches:

* A bit more code to deal in the tenancy, migrations, etc. We opted to write our own code rather than use an existing solution

* A bit more hassle when dealing with PostgreSQL extensions . It's best to install extensions into a separate extensions schema

* Possible caching bugs so you need to namespace the cache, and clear the query cache when switching tenant

* The security guarantees of per tenant solution aren't perfect, so you need to ensure you have no SQL injection vulnerabilities

bvirb3y ago

We ran a multi-tenant SaaS product for years w/ a schema-per-tenant approach. For the most part it all worked pretty great.

We ran into issues here and there but always found a way to work around them:

* Incremental backups were a pain because of needing to lock so many objects (# of schemas X # of tables per schema).

* The extra code to deal w/ migrations was kinda messy (as you mentioned).

* Globally unique IDs become the combination of the row ID + the tenant ID, etc...

For us though the real deal-breaker turned out to be that we wanted to have real foreign keys pointing to individual rows in tenant schemas from outside of the tenant schema and we couldn't. No way to fix that one since with multi-schema the "tenant" relies on DB metadata (the schema name).

We ended up migrating the whole app to RLS (which itself was a pretty interesting journey). We were afraid of performance issues since the multi-schema approach kinda gives you partitioning for free, but with the index usage on the RLS constraints we've had great performance (at least for our use case!).

After quite a bit of time working with both multi-schema & RLS I probably wouldn't go back to multi-schema unless I had a real compelling reason to do so due to the added complexity. I really liked the multi-schema approach, and I think most of the critiques of it I found were relatively easy to work around, but RLS has been a lot simpler for us.

POPOSYS3y ago

Would you like to explain the deal-breaker a little bit more? I do not understand the limitation you hit. It seems like it is no problem to access foreign keys in a different schema as long permissions do allow that. Thanks!

uhoh-itsmaciek3y ago

There are some other cons:

Memory usage and I/O can be less efficient. Postgres handles table data in 8kb pages, so even if you're just reading a single row, that reads 8kb from disk and puts 8kb in the Postgres buffer cache, with that row and whatever happens to be next to it in the physical layout of the underlying table. Postgres does this because of locality of reference: it's cheaper to bulk-load data from disk, and, statistically speaking, you may need the adjacent data soon. If each user is touching separate tables, you're loading a page per row for each user, and you're missing out on some of the locality benefits.

Another problem is monitoring (disclosure: I work for pganalyze, which offers a Postgres monitoring service). The pg_stat_statements extension can track execution stats of all normalized queries in your database, and that's a very useful tool to find and address performance problems. But whereas queries like "SELECT * FROM posts WHERE user_id = 123" and "SELECT * FROM posts WHERE user_id = 345" normalize to the same thing, schema-qualified queries like "SELECT * FROM user_123.posts" and "SELECT * FROM user_345.posts" normalize to different things, so you cannot easily consider their performance in aggregate (not to mention bloating pg_stat_statements by tracking so many distinct query stats). This is the case even when you're using search_path so that your schema is not explicitly in your query text.

Also, performance of tools like pg_dump is not great with a ton of database objects (tables and schemas) and, e.g., you can run into max_locks_per_transaction [1] limits, and changing that requires a server restart.

I wouldn't say you should never do schema-based multi-tenancy (you point out some good advantages above), but I'd be extremely skeptical of using it in situations where you expect to have a lot of users.

[1]: https://www.postgresql.org/docs/current/runtime-config-locks...

andy_ppp3y ago

I find adding loads of stuff to Postgres exciting and fun, but I want all of my logic in the code in GitHub, rather that floating around in my global data store. Has anyone thought about a data layer that allows you to define this stuff programmatically rather than in SQL but then it configures your data layer to work like this. Not necessarily an ORM but more a business logic layer that compiles everything down to use features like this. Or maybe even a data layer that is a set of programmatic building blocks that works as described?

hamandcheese3y ago

Flyway is a plain-sql migration tool with support for “repeatable migrations” which somewhat do what you want:

https://flywaydb.org/documentation/tutorials/repeatable

macNchz3y ago

I’ve set something like that up a handful of times in a kind of ad-hoc manner, by subclassing/extending the autogeneration tools from existing db migration frameworks to just detect changes in a directory of .sql files. Has worked pretty well to keep stored procedures/triggers/materialized views up to date with the repo.

lmeyerov3y ago

Yep -- postgres enforces, while config-as-code can inject as usual bc the whole point is it is just SQL, so policy changes are just migrations (and subject to SDLC)

mixmastamyk3y ago

You're describing an ORM, or perhaps SQLAlchemy, which has a lower level interface. .sql files work fine in version control as well. "create or replace …" pattern can make them idempotent.

simlevesque3y ago

Supabase kinda does this.

LAC-Tech3y ago

Does superbase basically give you a pgAdmin accessible instance to play with, then generate a JSON API using postgREST? It's not 100% clear to me from skimming the docs.

keybits3y ago

Yes

uhoh-itsmaciek3y ago

>Another issue we caught during testing was that some requests were being authorized with a previous request’s user id.

This is the terrifying part about RLS to me: having to rely on managing the user id as part of the database connection session seems like an easy way to shoot yourself in the foot (especially when combined with connection pooling). Adding WHERE clauses everywhere isn't great, but at least it's explicit.

That said, I've never used RLS, and I am pretty curious: it does seem like a great solution other than that one gotcha.

koolba3y ago

If you do this right, it ends up in just one place and the code that handles checking in / out connections from your pool will handle wrapping your connection usage with the appropriate session context. And of course subsequently clearing it with a DISCARD ALL.

sgarman3y ago

Am I right in my understanding that EVERY request that comes in to their api creates a new connection to the database? What about reusing connections with connection pools or one level up using pgbouncer or thing. Can you actually use RLS while reusing connections?

lemax3y ago

It's possible to implement this without creating new connections to the database for each request by using SET LOCAL and wrapping every query in a transaction. Instead of applying RLS based on the current user, you apply RLS based on the parameter value you set at the beginning of the transaction. You can set this parameter value based on the user session in your application.

Your RLS policy looks as follows: CREATE POLICY tenant_${tableName}_isolation_policy ON "${tableName}" USING ("tenant_id" = current_setting('app.current_tenant');

Your queries look something like this: BEGIN TRANSACTION SET LOCAL app.current_tenant = '${tenant}'; SELECT * from some_table END TRANSACTION;

You can even initialize your writes with a `tenant_id` column defaulted to your `current_setting('app.current_tenant')`

rst3y ago

Nope. Quoting the article itself:

"In the traditional use case of direct db access, RLS works by defining policies on tables that filter rows based on the current db user. For a SaaS application, however, defining a new db user for each app user is clunky. For an application use case you can dynamically set and retrieve users using Postgres’ current_settings() function ( i.e:

  SET app.current_app_user = ‘usr_123’

and

  SELECT current_settings(‘app.current_app_user)

)."

The policies that they define reference these settings, so they can do a "set" at the start of processing every web request, on a pre-existing db connection.

cstejerean3y ago

You can reuse the connection with a connection pool and use SET ROLE when you check it out.

ishanr3y ago

I use RLS quite heavily for my app Sudopad (https://sudopad.com) and it has been working quite well so far.

One gotcha specific to Supabase (where I run the backend) is because there is no anonymous login in Supabase, turning on RLS and using database functions marked as security definers are the way to go. Otherwise there is no easy way of stopping a 'select * from x' since some rows might not have a user_id if they are anonymous and I still want people to access the row if they know a specific primary key uuid.

mgl3y ago

Row-level security is always a tricky and hard to enforce assumption as this not how we relational databases really.

Much bigger fan of the approach described here:

Scalability, Allocation, and Processing of Data for Multitenancy

https://stratoflow.com/data-scalability-allocation-processin...

fswd3y ago

I use this for a startup in a re-write of their solution. It simplifies my queries and mutations, and security concerns. It also drammatically reduces the complexity of my code. There's also ROLES (Guest/Public user, Authenticated, Admin) and combinding the roles with Row Level Security.

I like it so much I don't want to go back!

ei8ths3y ago

I needed this two years ago, i was looking at this but couldn't figure out how to do it with a existing db connection pool to reuse connections. I might be migrating to this soon so that things will be more isolated from the tenants.

a-dub3y ago

this is cool. next up, row level encryption with private information retrieval methods for enabling queries and searches homomorphically (on data encrypted by the client that the service provider never has a key for).

POPOSYS3y ago

This is interesting - do you have any hits / docs on how to implement that?

a-dub3y ago

it's an open area of research as far as i know. the technical terms are "private information retrieval", "private information query", "homomorphic encryption" and "secure multiparty computation."

it's some of the coolest stuff in cryptography but it also tends to be experimental or very computationally expensive. zero knowledge proofs are the simplest form i know of.

as far as i know, row level end to end encrypted databases with indices that preserve privacy do not exist yet, but i fully expect that they will.

xsreality3y ago

Curious how advanced authorization (like ABAC) can be implemented with RLS. For example returning resources that are accessible to the team I belong to within the tenant.

spacemanmatt3y ago

If I were leaning into RLS today I would do it through PostgREST

pikdum3y ago

PostGraphile is also a really neat tool, using GraphQL instead of REST.

andrewstuart3y ago

I once implemented RLS/Postgres for Django.

It worked pretty well.

The basic mechanism was to intercept all outbound SQL queries and wrap them in postgres environment variables that set up the RLS.

santa_boy3y ago

This is awesome. Are there any similar features that can be implemented with Mariadb? One of my favorite products is integrated with Mariadb.

kache_3y ago

Context aware data access is really cool. And hard :)

jtwebman3y ago

What about stop using ORM abstractions as an option then it is much harder to forget needed filters?

nbevans3y ago

Using RLS to implement multi-tenancy is a terrible idea. Just deploy a database per tenant. It's not hard. Why overcomplicate it?

bicijay3y ago

Deploying a database per tenant is not that easy. You have a lot of new overhead, migrations become a pain in the ass (already are) and a lot of other little problems...

I would say a database per tenant is overcomplicating it.

mbreese3y ago

A database per tenant makes the rest of the workflow significantly easier though. No need to add clauses to SQL WHERE statements for users/groups. Queries are faster (less data). And data can be moved much easier between servers.

Yes, it does add extra overhead at account creation, during DB migrations, and for backups.

But if you don’t need cross-account or public data access, it can make life much easier.

zie3y ago

> No need to add clauses to SQL WHERE statements for users/groups.

This is basically what RLS does for you. You specify the access and you specify the current user(via a connection, SET ROLE, etc). Then it does all that complicated query filtering stuff for you, to ensure you don't screw it up.

> Queries are faster (less data). And data can be moved much easier between servers.

Not really, the overhead is just different(and likely more of it) in your solution. It's not wrong nor is using RLS right.

nickjj3y ago

> No need to add clauses to SQL WHERE statements for users/groups.

ORMs have support to add in these conditions automatically, you only have to define the condition in 1 function. This gives you the best of both worlds. You can use your FK based separation and have application level guarantees that your tables with a tenant id are scoped to a tenant by default and you have the option to query multiple tenants to do super admin level reporting across tenants by using explicit function calls that basically say "yes I know I'm bypassing the auto-enforced tenancy check".

There's no performance issues since it's all 1 schema and there's no nightmare or overhead of having to run a DB migration for each database.

I'm not sure why the author didn't include this in their blog post because it's a popular option in Rails, Flask (SQLAlchemy), etc..

Gigachad3y ago

You end up building just a bit of automation around it and its fine. The migration isn't any harder, you just run an ansible job or something to roll it out to all databases.

pg_bot3y ago

Deploying multiple databases is typically costly in the infrastructure as a service space. Plus you have more operational overhead in ensuring backups work and keeping things secure. It's much easier to use Postgres' schemas to segment the data within one single database. Frankly schemas are much easier to reason about, maintain, scale, and keep compliant than row level security.

icedchai3y ago

I worked for a SaaS that did that. We had 1000's of clients. Migrations would take all afternoon. We had custom connection pools, custom migration runners, and other weird stuff (this was about 10 years ago.) It was way too complicated, especially since most of the tenants had very little data.

jasongi3y ago

Only really worth it if each tenant is creating enough value to justify this. Hard to see how a SaaS product with a cheap/free tier would pull this off.

icedchai3y ago

We did this for the free tier at a previous company. It was terrible in terms of overhead.

paxys3y ago

Is having to write "SELECT [...] WHERE user_id=<123>" really considered a security hole? Isn't that how like every service in existence operates? Coming up with complicated auth systems and patterns just because you are scared you will accidentally skip that WHERE clause seems bizarre to me.

nordsieck3y ago

> Is having to write "SELECT [...] WHERE user_id=<123>" really considered a security hole? Isn't that how like every service in existence operates? Coming up with complicated auth systems and patterns just because you are scared you will accidentally skip that WHERE clause seems bizarre to me.

Is having to avoid use after free really considered a security hole? Isn't that how like every program in existence operates? Coming up with complicated languages and frameworks just because you're scared you will accidentally use a variable after it's been freed seems bizarre to me.

As it turns out, humans are bad at being consistent, whereas computers are much better. Maybe this particularly solution isn't "the right thing", but it's at least an attempt at modifying the environment such that mistakes no longer happen. And at a meta level, that is precisely the right thing to do.

bvirb3y ago

It's pretty nice using RLS that the entire query will follow the rules applied in the database. So for complex queries with say joins and/or subqueries they will all automatically follow the RLS policies as well. In our case we also have some global lookup tables that don't have RLS policies which can also be joined.

We've found it pretty nice to cut out a whole class of possible bugs by being able to defer it to the database level. At the application level we end up with a wrapper that sets (and guarantees unsetting) multi-tenant access to the correct tenant, and then we never have to add "tenant_id = ..." anywhere, regardless of the query. Regardless of whether we forget in some query (which we almost surely would), it cuts out quite a bit of extra code.

You can also do some cool stuff like add RLS policies for read-only multi-tenant access. Then you can query data across multiple tenants while enforcing that nothing accidentally gets written.

mbreese3y ago

From my perspective, it isn’t the security aspects that are limiting, but the usability.

If you want to have any access controls that isn’t a simple user_id==123, SQL WHERE clauses can get complicated.

Users, groups, or any kind of fine grained access control can make simple queries non-trivial. It’s even worse if a user can be authorized to view data across different accounts.

lemax3y ago

In my experience we've looked toward this kind of solution in a large legacy single-tenant application that wants to go multi tenant with more safety guarantees.

thrownaway5613y ago

I think this is mainly an issue when you're using RAW SQL statements. If you're using an ORM, there are many ways to add a where clause to the statements automatically without having to update your code every where.

galaxyLogic3y ago

When you say 'user_id' do you mean each end-user of the system or each customer?

I assume you have a few customers and then very many users belonging to each customer.

j / k navigate · click thread line to collapse

117 comments

mkurz3y ago

markhalonen3y ago

This was a shock to us after using RLS for a while. The solution outlined here worked great for us: https://www.benburwell.com/posts/row-level-security-postgres...

infogulch3y ago

Well well lookie here:

commitdiff 2022-03-22: Add support for security invoker views. - https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit...

discussion 2021-12-17: [PATCH] Add reloption for views to enable RLS - https://www.postgresql.org/message-id/b66dd6d6-ad3e-c6f2-8b9...

explanatory blog post: 2022-03-22: Waiting for PostgreSQL 15 – Add support for security invoker views. - https://www.depesz.com/2022/03/22/waiting-for-postgresql-15-...

This seems to be slated for PG15: https://www.postgresql.org/docs/15/release-15.html#id-1.11.6...

> E.1.3.1.6. Privileges: Allow view access to be controlled by privileges of the view user (Christoph Heiss) Previously, view access could only be based on the view owner.

Syntatically it will look like:

    CREATE VIEW vista WITH (security_invoker=true) AS SELECT 'Hello World';

alberth3y ago

That seems like a bug to me, and a significant one as well.

The underlining promise of RLS (sometimes even referred to as “virtual private database”) in an RDBMS, is that data should never leak because it’s handled transparently by the db.

This seems like a significant leakage point that the user has to personally manage.

Spivak3y ago

Maybe it would actually be good behavior but it would super super unintuitive.

alberth3y ago

Isn’t it also super unintuitive that developers will have to manage partitioning customer data at the Application Layer even though they think they are using a Data Layer security policy.

When RLS is enabled, their should never be a situation where Customer A might be able to gain access to Customer B data. That’s literally the entire objective of the feature.

This is giving people a false sense of security, and for multi-tenant applications has massive consequences since this will leak data.

mixmastamyk3y ago

Assuming correct, it's an absolutely horrible default that needs to be changed.

Imagine if the default Unix permission was to set files with setuid.

bearjaws3y ago

This is such a killer feature in PG, my new job uses it and it makes audits of our tenancy model dead simple.

Our largest customers ask how we do database multi-tenant and we point to our SDLC + PG docs and they go 'K'.

e1g3y ago

brightball3y ago

e1g3y ago

3 more replies

voberoi3y ago

This is the way -- also never had an issue across US healthcare and enterprise SaaS.

coenhyde3y ago

i did the same with the same results

2 more replies

bearjaws3y ago

Even for large health systems they have been okay with it.

jerryjerryjerry3y ago

Dave3of53y ago

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

There are other problems it causes like federation which Enterprise Customers often want.

hobs3y ago

Agree - performance on row level security (at least on SQL Server) is terrible, sharing by database is fairly easy.

axlee3y ago

It makes BI work an absolute hellscape as well. Tradeoffs.

mrtesthah3y ago

Just make the application collect the statistics you need. Put them into a separate shared DB or use one of the many existing off the shelf SaaS collection frameworks.

spiffytech3y ago

> we opted for having one DB per tenant which also makes sharding, scaling, balancing, and handling data-residency challenges easier

jquery_dev3y ago

How do you manage your backend in this case? Do you have an insurance of backend for each customer or do you allow backend to make connections to all the DBs.

I'm interested in doing similar and wondering about the best way to handle the routing between the databases from a single backend.

mason553y ago

So, in short, separate stacks usually means more safety & simpler application architecture in exchange for more cost and more effort to manage the fleet.

gervwyk3y ago

Also second this, we even split our AWS org into an AWS account per tentant. Although, this will maybe be a problem if we have +100s of clients. But it makes onboarding and off-loading simple.

HatchedLake7213y ago

It depends on an annual contract value (ACV), doesn't it? You can't give an AWS account to every $99 p/m plan, but you can for enterprise $50-100k+ deals.

1 more reply

jamesfinlayson3y ago

Is account creation automated?

1 more reply

throw031720193y ago

Each client is running on their own instances / load balancers?

BeefWellington3y ago

One DB (or perhaps schema, depending on DBMS) per tenant is the way to go IMO. It simplifies so much of your work, from DR to deletion to scaling.

SahAssar3y ago

eastbound3y ago

I honestly don’t understand how Oracle is still alive. Postgres has so many of these killer features.

Also, I wonder how others do tenant separation, what other solutions there are.

mritchie7123y ago

Legacy.

If you have thousands of lines of code relying on Oracle the cost to migrate would be enormous.

gav3y ago

Ignoring the cost, there's the risk/reward alignment you see in large enterprises.

Imagine you're a new CIO. You know you're probably looking at a 3-5 year tenure at this new company and you want to lead with some big wins to set the tone and show your value.

What do you do? You put the proposal to the side and look for something that has a lot better upside.

2 more replies

toomuchtodo3y ago

Amazon has a great post on this topic.

https://aws.amazon.com/blogs/aws/migration-complete-amazons-...

I thought it was cool they retrained their Oracle DBAs into other roles as part of the project.

1 more reply

beachy3y ago

Oracle has had the ability to do this for decades ("virtual private database"), so whatever is keeping them alive, it's nothing to do with this particular nifty Postgres feature.

ibejoeb3y ago

If you buy Oracle, you should use Oracle. Like really lean into it. If you really need it, it will be worth the money. I don't like dealing with Oracle sales, but the product is killer.

throwaway7875443y ago

revskill3y ago

Sharding is the only scalable way per my experience. The point about scalability here is, i can control the load as the data gets bigger.

Scarbutt3y ago

oracle has flashback

gz53y ago

ksec3y ago

Fortune Top 10

1 Walmart

2 Amazon

3 Apple

4 CVS Health

5 UnitedHealth Group

6 Exxon Mobil

7 Berkshire Hathaway

8 Alphabet

9 McKesson

10 AmerisourceBergen

We can rule out 2,3,7,8 …

simonw3y ago

I don't fully understand the performance implications here.

Say I was using this for a blog engine, and I wanted to run this SQL query:

    select * from entries;

But I actually only want to get back entries that my current user is allowed to view - where author_id = 57 for example.

Would PostgreSQL automatically turn the above query into the equivalent of this:

    select * from entries where author_id = 57;