However, upcoming PostgreSQL 15 adds support for security invoker views: https://github.com/postgres/postgres/commit/7faa5fc84bf46ea6... That means you can then define the security_invoker attribute when creating a view and this "... causes the underlying base relations to be checked against the privileges of the user of the view rather than the view owner" (see https://www.postgresql.org/docs/15/sql-createview.html) PG15 beta 1 release notes: https://www.postgresql.org/about/news/postgresql-15-beta-1-r...
The underlining promise of RLS (sometimes even referred to as “virtual private database”) in an RDBMS, is that data should never leak because it’s handled transparently by the db.
This seems like a significant leakage point that the user has to personally manage.
Maybe it would actually be good behavior but it would super super unintuitive.
Coming from a SaaS company that used MySQL, we would get asked by some customers how we guarantee we segmented their data, and it always ended at the app layer. One customer (A fortune 10 company) asked if we could switch to SQL Server to get this feature...
Our largest customers ask how we do database multi-tenant and we point to our SDLC + PG docs and they go 'K'.
There's not a single customer I've ever run across who's going to halt a contract because you can't purge their data from your backups fast enough. They're signing up because of what you offer, not the termination clause.
Even for large health systems they have been okay with it.
When you say 1 DB I suspect you mean you have a single DB Server and multiple DB's on that server. Then I don't think this really solves the data-residency problem as the clients data is just in a different DB but still on the same instance. It makes other problems for you as well for example you now have 2 DB's to run maintenance, upgrades, data migrations on. Current company uses a similar model for multiple types of systems and it makes upgrading the software very difficult.
It also makes scaling more difficult as instead of having a single DB cluster that you can tweek for everyone you'll need to tweek each cluster individually depending on the tenants that are on those clusters. You also have a practical limit to how many DB's you can have on any physical instance so your load balancing will become very tricky.
There are other problems it causes like federation which Enterprise Customers often want.
I'm surprised this simplified balancing for you. When I worked somewhere with per-customer DBs, we had constant problems with rebalancing load. Some customers grew too big for where we put them, some nodes usually performed fine until the wrong subset of customers ran batch jobs simultaneously, etc.
I'm interested in doing similar and wondering about the best way to handle the routing between the databases from a single backend.
Also, I wonder how others do tenant separation, what other solutions there are.
If you have thousands of lines of code relying on Oracle the cost to migrate would be enormous.
If you buy Oracle, you should use Oracle. Like really lean into it. If you really need it, it will be worth the money. I don't like dealing with Oracle sales, but the product is killer.
1 Walmart
2 Amazon
3 Apple
4 CVS Health
5 UnitedHealth Group
6 Exxon Mobil
7 Berkshire Hathaway
8 Alphabet
9 McKesson
10 AmerisourceBergen
We can rule out 2,3,7,8 …
Say I was using this for a blog engine, and I wanted to run this SQL query:
select * from entries;
But I actually only want to get back entries that my current user is allowed to view - where author_id = 57 for example.Would PostgreSQL automatically turn the above query into the equivalent of this:
select * from entries where author_id = 57;
And hence run quickly (assuming there's an index on that author_id column)?Or would it need to run an additional SQL query check for every single row returned by my query to check row permissions, adding up to a lot of extra overhead?
unfortunately this can break down in more complex cases. roughly postgres trusts a limited set of functions and operators not to leak information about rows (e.g. via error messages) that the RLS policy says a query should not be able to see. that set includes basic comparisons but not more esoteric operations like JSON lookups. at some point postgres will insist on checking the RLS policy result for a row before doing any further work, which can preclude the use of indexes
* RLS is super appealing. Long-term, the architecture just makes so much more sense than bringing in additional maintenance/security/perf/etc burdens. So over time, I expect it to hollow out how much the others need to do, reducing them just to developer experience & tools (policy analysis, db log auditing, ...). Short-term, I'd only use it for simple internal projects because cross-tenant sharing is so useful in so many domains (esp if growing a business), and for now, RLS seems full of perf/expressivity/etc. footguns. So I wouldn't use for a SaaS unless something severely distinct tenant like payroll, and even then, I'd have a lot of operational questions before jumping in.
* For the needed flexibility and app layer controls, we took the middle of casbin, though others tools emerging to. Unlike the zanzibar style tools that bring another DB + runtime + ..., casbin's system of record is our existing system of record. Using it is more like a regular library call than growing the dumpster fire that is most distributed systems. Database backups, maintenance, migrations, etc are business as usual, no need to introduce more PITAs here, and especially not a vendor-in-the-middle with proprietary API protocols that we're stuck with ~forever as a dependency.
* A separate managed service might make zanzibar-style OK in some cases. One aspect is ensuring the use case won't suffer the view problem. From there, it just comes down to governance & risk. Auth0 being bought by Okta means we kind of know what it'll look like for awhile, and big cloud providers have growing identity services, which may be fine for folks. Startup-of-the-month owning parts of your control plane is scarier to me: if they get hacked, go out of business, get acquired by EvilCorp or raise $100M in VC and jack up prices, etc.
There's a lot of innovation to do here. A super-RLS postgres startup is on my list of easily growable ideas :)
On a related note: We're doing a bunch of analytics work on how to look at internal+customer auth logs -- viz, anomaly detection, and supervised behavioral AI -- so if folks are into things like looking into account take overs & privilege escalations / access abuse / fraud in their own logs, would love to chat!
1+2: Cost + Unnecessary complexity: this argument can be used against anything that doesn't fit the given use case. There's no silver bullet for any choice of solution. You should only adopt the solution that makes the most sense for you and vendors should be candid about when they wouldn't recommend adopting their solution -- it'd be bad for both the users and reputation of the solution.
3: External dependencies: That depends on the toolchain. Integration testing against SpiceDB is easier than Postgres, IMO [1]. SpiceDB integration tests can run fully parallelized and can also model check your schema so that you're certain there are no flaws in your design. In practice, I haven't seen folks write tests to assert that their assumptions about RLS are maintained over time. The last place you want invariants to drift is authorization code.
4: Multi-tenancy is core to our product: I'm not sure I'm steel-manning this point, but I'll do my best. Most companies do not employ authorization experts and solutions worth their salt should support modeling multi-tenant use cases in a safe way. SpiceDB has a schema language with idioms and recommendations to implement functionality like multi-tenancy, but still leaves it in the hands of developers to construct the abstraction that matches their domain[2].
[0]: https://github.com/authzed/spicedb
[1]: https://github.com/authzed/examples/tree/main/integration-te...
I think this covers both the complexity aspect and the difference between what you get from RLS and what external authz brings to the table (schema, for example).
I do think that RLS is a great way for a company without authz experts to built a multi-tenant MVP safely. I've yet to see a single pre-PMF company that worries about authorization beyond that, this is a series-B concern in my experience.
* No extra operational overhead, it's just one database
* Allows to delete a single schema, useful for GDPR compliance
* Allows to easily backup/restore a single schema
* Easier to view and reason about the data from an admin point of view
* An issue in a single tenant doesn't affect other tenants
* Downtime for maintenance is shorter (e.g. database migration, non-concurrent REINDEX, VACUUM FULL, etc.)
* Less chance of deadlocks, locking for updates, etc.
* Allows easier testing and development by subsetting tenants data
* Smaller indexes, more efficient joins, faster table scans, more optimal query plans, etc. With row level security, every index needs to be a compound index
* Easy path to sharding per tenant if needed. Just move some schemas to a different DB
* Allows to have shared data and per-tenant data on the same database. That doesn't work with the tenant-per-database approach
There are a few cons, but they are pretty minor compared to the alternative approaches:
* A bit more code to deal in the tenancy, migrations, etc. We opted to write our own code rather than use an existing solution
* A bit more hassle when dealing with PostgreSQL extensions . It's best to install extensions into a separate extensions schema
* Possible caching bugs so you need to namespace the cache, and clear the query cache when switching tenant
* The security guarantees of per tenant solution aren't perfect, so you need to ensure you have no SQL injection vulnerabilities
We ran into issues here and there but always found a way to work around them:
* Incremental backups were a pain because of needing to lock so many objects (# of schemas X # of tables per schema).
* The extra code to deal w/ migrations was kinda messy (as you mentioned).
* Globally unique IDs become the combination of the row ID + the tenant ID, etc...
For us though the real deal-breaker turned out to be that we wanted to have real foreign keys pointing to individual rows in tenant schemas from outside of the tenant schema and we couldn't. No way to fix that one since with multi-schema the "tenant" relies on DB metadata (the schema name).
We ended up migrating the whole app to RLS (which itself was a pretty interesting journey). We were afraid of performance issues since the multi-schema approach kinda gives you partitioning for free, but with the index usage on the RLS constraints we've had great performance (at least for our use case!).
After quite a bit of time working with both multi-schema & RLS I probably wouldn't go back to multi-schema unless I had a real compelling reason to do so due to the added complexity. I really liked the multi-schema approach, and I think most of the critiques of it I found were relatively easy to work around, but RLS has been a lot simpler for us.
Memory usage and I/O can be less efficient. Postgres handles table data in 8kb pages, so even if you're just reading a single row, that reads 8kb from disk and puts 8kb in the Postgres buffer cache, with that row and whatever happens to be next to it in the physical layout of the underlying table. Postgres does this because of locality of reference: it's cheaper to bulk-load data from disk, and, statistically speaking, you may need the adjacent data soon. If each user is touching separate tables, you're loading a page per row for each user, and you're missing out on some of the locality benefits.
Another problem is monitoring (disclosure: I work for pganalyze, which offers a Postgres monitoring service). The pg_stat_statements extension can track execution stats of all normalized queries in your database, and that's a very useful tool to find and address performance problems. But whereas queries like "SELECT * FROM posts WHERE user_id = 123" and "SELECT * FROM posts WHERE user_id = 345" normalize to the same thing, schema-qualified queries like "SELECT * FROM user_123.posts" and "SELECT * FROM user_345.posts" normalize to different things, so you cannot easily consider their performance in aggregate (not to mention bloating pg_stat_statements by tracking so many distinct query stats). This is the case even when you're using search_path so that your schema is not explicitly in your query text.
Also, performance of tools like pg_dump is not great with a ton of database objects (tables and schemas) and, e.g., you can run into max_locks_per_transaction [1] limits, and changing that requires a server restart.
I wouldn't say you should never do schema-based multi-tenancy (you point out some good advantages above), but I'd be extremely skeptical of using it in situations where you expect to have a lot of users.
[1]: https://www.postgresql.org/docs/current/runtime-config-locks...
This is the terrifying part about RLS to me: having to rely on managing the user id as part of the database connection session seems like an easy way to shoot yourself in the foot (especially when combined with connection pooling). Adding WHERE clauses everywhere isn't great, but at least it's explicit.
That said, I've never used RLS, and I am pretty curious: it does seem like a great solution other than that one gotcha.
Your RLS policy looks as follows: CREATE POLICY tenant_${tableName}_isolation_policy ON "${tableName}" USING ("tenant_id" = current_setting('app.current_tenant');
Your queries look something like this: BEGIN TRANSACTION SET LOCAL app.current_tenant = '${tenant}'; SELECT * from some_table END TRANSACTION;
You can even initialize your writes with a `tenant_id` column defaulted to your `current_setting('app.current_tenant')`
"In the traditional use case of direct db access, RLS works by defining policies on tables that filter rows based on the current db user. For a SaaS application, however, defining a new db user for each app user is clunky. For an application use case you can dynamically set and retrieve users using Postgres’ current_settings() function ( i.e:
SET app.current_app_user = ‘usr_123’
and SELECT current_settings(‘app.current_app_user)
)."The policies that they define reference these settings, so they can do a "set" at the start of processing every web request, on a pre-existing db connection.
One gotcha specific to Supabase (where I run the backend) is because there is no anonymous login in Supabase, turning on RLS and using database functions marked as security definers are the way to go. Otherwise there is no easy way of stopping a 'select * from x' since some rows might not have a user_id if they are anonymous and I still want people to access the row if they know a specific primary key uuid.
Much bigger fan of the approach described here:
Scalability, Allocation, and Processing of Data for Multitenancy
https://stratoflow.com/data-scalability-allocation-processin...
I like it so much I don't want to go back!
it's some of the coolest stuff in cryptography but it also tends to be experimental or very computationally expensive. zero knowledge proofs are the simplest form i know of.
as far as i know, row level end to end encrypted databases with indices that preserve privacy do not exist yet, but i fully expect that they will.
It worked pretty well.
The basic mechanism was to intercept all outbound SQL queries and wrap them in postgres environment variables that set up the RLS.
I would say a database per tenant is overcomplicating it.
Yes, it does add extra overhead at account creation, during DB migrations, and for backups.
But if you don’t need cross-account or public data access, it can make life much easier.
Is having to avoid use after free really considered a security hole? Isn't that how like every program in existence operates? Coming up with complicated languages and frameworks just because you're scared you will accidentally use a variable after it's been freed seems bizarre to me.
As it turns out, humans are bad at being consistent, whereas computers are much better. Maybe this particularly solution isn't "the right thing", but it's at least an attempt at modifying the environment such that mistakes no longer happen. And at a meta level, that is precisely the right thing to do.
We've found it pretty nice to cut out a whole class of possible bugs by being able to defer it to the database level. At the application level we end up with a wrapper that sets (and guarantees unsetting) multi-tenant access to the correct tenant, and then we never have to add "tenant_id = ..." anywhere, regardless of the query. Regardless of whether we forget in some query (which we almost surely would), it cuts out quite a bit of extra code.
You can also do some cool stuff like add RLS policies for read-only multi-tenant access. Then you can query data across multiple tenants while enforcing that nothing accidentally gets written.
If you want to have any access controls that isn’t a simple user_id==123, SQL WHERE clauses can get complicated.
Users, groups, or any kind of fine grained access control can make simple queries non-trivial. It’s even worse if a user can be authorized to view data across different accounts.
I assume you have a few customers and then very many users belonging to each customer.