Citus 12: Schema-based sharding for PostgreSQL (opens in new tab)

(citusdata.com)

160 pointsmulander2y ago45 comments

45 comments

Nice to see this on HN :)

The high-level is: You enable a setting and every CREATE SCHEMA creates a new shard. All the tables in the schema will be co-located so you can have efficient joins & foreign keys between the tables.

On top of that, you can also have reference tables that are replicated to all nodes, again for fast joins & foreign keys with all schemas.

Everything else is about making every PostgreSQL feature work as seamlessly as if there was no sharding. You can still do things like transactions across schemas, create and use custom types, access controls, work with other extensions, use procedures, etc.

justinclift2y ago

Is there any way to have that automatically create n copies of the data across shards?

Something like a "min-copies 2" setting, which would ensure the shared data has at least one viable alternative in case of hardware failure (etc).

mslot2y ago

Historically, there has been, but we have found physical replication is ultimately a lot more performant, robust, and tweakable. Citus users & platforms generally use physical replication of each node for high availability.

For instance, the ability to quickly spin up a new replica using a disk snapshot is very useful and only feasible at the server level.

It is still possible to replicate shards (via the citus.shard_replication_factor setting), but it only helps for scaling read throughput, at the cost of lower write throughput.

metadat2y ago

Does Oracle support anything like this? Or any other DBMS in widespread use, for that matter?

The promise behind this approach to DB sharding has great potential. Simultaneously impressive, novel, and badass.

I wish this had been available ten years ago at a few of my startups!

isoos2y ago

Depends on the definition of widespread use, but e.g. CockroachDB and YugabyteDB offer scaling of a subset of Postgresql features (and extending on the cluster capabilities).

Nevertheless, Citus is now a promising open source alternative.

ksec2y ago

Something like Vitess / PlanetScale?

brightball2y ago

I watched a Citus demo a few months back and it blew me away. I hear upgrades are the big challenge.

krab2y ago

Not really. It's comparable to a regular Postgres upgrade.

But you can screw it up - see https://github.com/citusdata/citus/discussions/6934

sokols2y ago

There is Oracle RAC Sharding, but AFAIK it works only at a partition level.

pickledish2y ago

Hm, question for people a bit more familiar with Postgres -- what is meant by "schema" here?

My definition is "the columns and column types of a table", but, that doesn't seem to make sense with what they're talking about here ("large" and "small" schemas probably aren't referring to wide and narrow tables for example, and I don't see how sharding by my definition of "schema" could even make sense anyways)

hans_castorp2y ago

Schemas are namespaces (actually called that internally in Postgres).

The SQL standard defines a two level namespace hierarchy. A single "instance" of contains multiple catalogs and each catalog contains multiple schemas (and each schema then contains objects like tables, views, types, functions etc).

Many database products use the term "database" instead of "catalog" e.g. in Postgres and SQL Server. But "schema" is used quite uniformly. MySQL's "databases" are in fact "schemas" though.

pickledish2y ago

Whew boy, that's a more sizable hierarchy than I thought was going on under the hood lol, I guess I've got some reading to do. TY for the pointers!

somat2y ago

I always vaguely wonder why no one allowed a full hierarchical schema path. I expect it is probably because "the standard sez one level of schema only"

mulanderOP2y ago

Schemas are groupings of tables and other entities that can be defined within a database. You can think of them like of a namespace in programming languages. You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.

By large and small we are referring to the amount of data each schema holds currently. They can grow over time and some of them may become very big while others will remain small (storage wise).

pickledish2y ago

Ah, interesting ok

> You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.

So in this respect, each table within a schema indeed already acts like a "shard" of the overall table

Is this enforced? Like, if I create a table "messages" in schema A and a table "messages" in table B, must they have the same columns/column types, or is that just convention

mulanderOP2y ago

They can be different, as long as your application can handle them being different (which makes sense for microservices as an example).

1 more reply

paulddraper2y ago

Not to be rude...but the same things as every other ANSI SQL database. Nothing PostgreSQL specific here.

Schema is the thing you get from `CREATE SCHEMA`...a namespace of tables/functions/views/etc.

pickledish2y ago

Ah I see! Yeah I guess it's just a concept I've never run into before; unfortunate that it shares a name with the "other" concept of schema used by e.g. https://json-schema.org

Thank you!

Hovertruck2y ago

Schema is a proper noun in Postgres: https://www.postgresql.org/docs/current/ddl-schemas.html

ubertaco2y ago

I've seen MySQL clients/docs also use "schema" to refer to "this collection of tables", basically what SQLite would call "a database".

The words get really fuzzy here when you try to get a concrete and ubiquitous definition, though.

Best I can do is to say, in these queries:

    use foo;
    select id, name from foo.people where id < 10;

..."foo" is the schema in MySQL and, apparently, Postgres parlance.

skunkworker2y ago

Having used schema based sharding in postgres before, I hope citus has a way around connection pooling and pgbouncer, as if you use pgbouncer transaction pooling your search_path could disappear at any time.

mulanderOP2y ago

Great observation! :)

We worked upstream to have `search_path` properly handled (tracked per client) by pgbouncer.

https://github.com/pgbouncer/pgbouncer/commit/8c18fc4d213ad4...

Check config.md in that commit for a verbose, humanized description.

stevefan19992y ago

What happens if one node lost their shards due to external event? (e.g. Disk corruption, physcially destroyed like OVH Cloud) I do understand we still have to actively backup but I rather not serve any data than serving wrong data at the time of severe outage. Also I want to see any forward error correction code (FECC) would be implemented in Citus so we can do this on the fly rather than relying on RAID, e.g. RAID10, RAIDZ1, RAIDZ2

mslot2y ago

The answer is mostly the same as for PostgreSQL, since Citus stores data in regular PostgreSQL tables.

It's a good idea to set up archival into blob storage / S3 using a tool like WAL-G for disaster recovery purposes, and streaming replication using a tool like Patroni.

(Or use a managed service like Azure Cosmos DB for PostgreSQL)

If a node is down and cannot be quickly recovered then the remaining shards are still available for reads and writes, except for DDLs.

asah2y ago

love citus! this is useful.

one gotcha: schemas are a weird old thing in SQL that are kinda the worst of all worlds, basically more like prefixing your SQL object names than a real level of indirection. Schema objects can't be manipulated as a batch, they don't provide the isolation / access control of databases, and can't be manipulated in DML and require stored procedures to manipulate.

mslot2y ago

True, though I'd phrase it more as a middle ground. Which model is best depends on your needs.

A big advantage of schemas over databases is that you can access different schemas over the same connection, which allows much higher density, since connections are a scarce resource in PostgreSQL. You can also have shared tables.

A big advantage over row-based sharding is that you can have variations of the table definitions or a different set of tables altogether in each schema. Plus for the purpose of sharding it's easier to use, since normalized data models complicate row-based sharding.

nine_k2y ago

Schemas are namespaces, a convenience feature, not a security feature.

rad_gruchalski2y ago

That's exactly the reason I went for YugabyteDB tablespaces implementation on one of past projects. Dedicated database locked down to selected nodes for full isolation.

paulddraper2y ago

> Schema objects can't be manipulated as a batch,

I would make a partitioned table, and put each partition in its own schema.

binwiederhier2y ago

It is worth noting that Citus was acquired by Microsoft a few years ago (not a secret, but may not be obvious to everyone), and they have since shifted heavily towards the Azure side of the world.

Back when it happened we tried to buy their product and they were not sure if the Citus standalone product was even going to exist, and they refused to demo it even. Odd timing possibly, but it's a data point.

On the positive side of things, the shard rebalancing was not open source back then iirc, which made the open source version pretty useless. Now it seems to be open source: https://www.citusdata.com/product/comparison -- pretty cool.

I'd still be careful to bank on it as a Citus only customer or open source user.

mulanderOP2y ago

Hi, we open-sourced the project fully: https://www.citusdata.com/blog/2022/06/17/citus-11-goes-full...

There no longer is an enterprise version, what runs on Azure is the exact same Citus that you can run yourself. We even invested in Patroni, to make it easier for the community to self-host Citus with HA setups.

While we obviously want people using Citus on Azure, having Citus as a viable open-source choice is our path to achieving that. I wasn't part of the company when the acquisition happened so can't speak to that, but I can imagine how that could have made sales at the transition time unclear.

Personally I would also like to add, that the team is full of long term open-source contributors. We contribute both to PostgreSQL and projects around it (like pgbouncer). I understand and respect your reservation, but wanted to share my perspective on it.

mariocesar2y ago

I'm having trouble getting CitusDB to work with RDS, even though I'd really like to use it with AWS. Whenever I try to research how to make it happen, I get stuck in a lot of challenges and end up concluding that it might not be possible. While I could use Azure instead, I'm hesitant because I have a lot of resources and infrastructure in AWS and it wouldn't make sense to move the database layer. The idea of having to do DBA and Ops work on my own EC2 instances is overwhelming. I would appreciate any guidance on how to use RDS with Citus on their documentation. It's concerning that Citus might be withholding information about the possibility of using RDS, given that Microsoft is its partner.

AlexB1382y ago

Microsoft isn't its "partner", they were entirely bought by Microsoft. The goal is to make the technology an advantage for Azure, as I understand it. It's very unlikely you'll see them offer first-party support for RDS, or CloudSQL.

janee2y ago

Yeah it's a bit of a shame they were bought by one of the cloud providers.

Would've been great if something like Cruncy Bridge, PgAnalyze and Citus were merged into one company so that you have managed Postgres over the 3 major providers with the best pg tech out the box.

Having your DB and apps in different providers is a stuff up, the latency and data transfer costs killed us, so Citus was a non start (we don't want to manage it ourselves in EC2).

stevefan19992y ago

Checkout stackgres: https://stackgres.io/install/

They are offering Citus integration

mariocesar2y ago

Thank you!

krab2y ago

Because you can't install extensions to RDS, I think it can't be done unless AWS chooses to support it.

You can still install Postgres + Citus on EC2 or use some 3rd party service to manage it for you.

aamederen2y ago

You can run Citus on EC2 for now.

__s2y ago

What's advantage over having tenant id as distribution column? Seems like you make schema name the distribution column. Maybe gross setups where same name function definition varies between schemas (been there done that, don't want to do it again)

Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use case

Couldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every row

Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

mslot2y ago

> What's advantage over having tenant id as distribution column?

Ease of use is definitely the main one. If you're willing to put in the work required to use tenant ID as a distribution column (add it to tables, primary keys, foreign keys, filters, joins), then it's a more scalable approach.

A challenge with sharding by tenant ID is that many applications use a normalized data model, meaning not all tables obviously have a tenant ID column. When you use a schema per tenant on vanilla PostgreSQL, no additional steps are typically required to enable schema-based sharding.

There are some other benefits of schema-based sharding such as custom table definitions, simpler & more versatile access control, and longer term we expect it will be easier to pin a large tenant to a node using schemas, or distribute the tables of ultra-large tenants (by some other dimension). Of course, row-based sharding has other benefits like parallel cross-tenant queries and global DDL.

> Couldn't the microservice case be handled by having distributed tables with no distribution column?

Absolutely. We first implemented the notion of single shard distributed tables with a NULL distribution column, and then built schema-based sharding on top as a convenience layer.

> Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

Yes, public schema (or other another non-distributed schema) can have reference tables & distributed tables as usual, and tables in distributed schemas can have foreign keys / local joins with reference tables.

janpio2y ago

One reasons why some prefer multi tenancy via schemas instead of a tent ant column: Reading a column with the wrong tenant id happens easily, just forget the where in a query. Across schemas, this is much harder to get wrong as a schema has to be explicitly named in a query.

potamic2y ago

Doesn't sharding imply horizontal partitioning? Some of their examples show vertical partitioning use cases. Not sure how I feel about that. I can't imagine why you would do microservices and host their databases as schemas on the same instance.

mulanderOP2y ago

Because with schema-based sharding in Citus, the schemas are the sharding unit and the system can move them to new nodes being added to the cluster.

You can start with all your microservices sharing a single node, add nodes and have the storage layer distributed horizontally in a way transparent to the services themselves.

potamic2y ago

I see, so it's kinda paving the way for a db-as-a service solution?

j / k navigate · click thread line to collapse

45 comments

mslot2y ago

Nice to see this on HN :)

The high-level is: You enable a setting and every CREATE SCHEMA creates a new shard. All the tables in the schema will be co-located so you can have efficient joins & foreign keys between the tables.

On top of that, you can also have reference tables that are replicated to all nodes, again for fast joins & foreign keys with all schemas.

justinclift2y ago

Is there any way to have that automatically create n copies of the data across shards?

Something like a "min-copies 2" setting, which would ensure the shared data has at least one viable alternative in case of hardware failure (etc).

mslot2y ago

For instance, the ability to quickly spin up a new replica using a disk snapshot is very useful and only feasible at the server level.

It is still possible to replicate shards (via the citus.shard_replication_factor setting), but it only helps for scaling read throughput, at the cost of lower write throughput.

metadat2y ago

Does Oracle support anything like this? Or any other DBMS in widespread use, for that matter?

The promise behind this approach to DB sharding has great potential. Simultaneously impressive, novel, and badass.

I wish this had been available ten years ago at a few of my startups!

isoos2y ago

Depends on the definition of widespread use, but e.g. CockroachDB and YugabyteDB offer scaling of a subset of Postgresql features (and extending on the cluster capabilities).

Nevertheless, Citus is now a promising open source alternative.

ksec2y ago

Something like Vitess / PlanetScale?

brightball2y ago

I watched a Citus demo a few months back and it blew me away. I hear upgrades are the big challenge.

krab2y ago

Not really. It's comparable to a regular Postgres upgrade.

But you can screw it up - see https://github.com/citusdata/citus/discussions/6934

sokols2y ago

There is Oracle RAC Sharding, but AFAIK it works only at a partition level.

pickledish2y ago

Hm, question for people a bit more familiar with Postgres -- what is meant by "schema" here?

hans_castorp2y ago

Schemas are namespaces (actually called that internally in Postgres).

Many database products use the term "database" instead of "catalog" e.g. in Postgres and SQL Server. But "schema" is used quite uniformly. MySQL's "databases" are in fact "schemas" though.

pickledish2y ago

Whew boy, that's a more sizable hierarchy than I thought was going on under the hood lol, I guess I've got some reading to do. TY for the pointers!

somat2y ago

I always vaguely wonder why no one allowed a full hierarchical schema path. I expect it is probably because "the standard sez one level of schema only"

mulanderOP2y ago

By large and small we are referring to the amount of data each schema holds currently. They can grow over time and some of them may become very big while others will remain small (storage wise).

pickledish2y ago

Ah, interesting ok

> You can have the same table definition, within the same database, defined multiple times (each in a different schema) and each holding different data.

So in this respect, each table within a schema indeed already acts like a "shard" of the overall table

Is this enforced? Like, if I create a table "messages" in schema A and a table "messages" in table B, must they have the same columns/column types, or is that just convention

mulanderOP2y ago

They can be different, as long as your application can handle them being different (which makes sense for microservices as an example).

1 more reply

paulddraper2y ago

Not to be rude...but the same things as every other ANSI SQL database. Nothing PostgreSQL specific here.

Schema is the thing you get from `CREATE SCHEMA`...a namespace of tables/functions/views/etc.

pickledish2y ago

Ah I see! Yeah I guess it's just a concept I've never run into before; unfortunate that it shares a name with the "other" concept of schema used by e.g. https://json-schema.org

Thank you!

Hovertruck2y ago

Schema is a proper noun in Postgres: https://www.postgresql.org/docs/current/ddl-schemas.html

ubertaco2y ago

I've seen MySQL clients/docs also use "schema" to refer to "this collection of tables", basically what SQLite would call "a database".

The words get really fuzzy here when you try to get a concrete and ubiquitous definition, though.

Best I can do is to say, in these queries:

    use foo;
    select id, name from foo.people where id < 10;

..."foo" is the schema in MySQL and, apparently, Postgres parlance.

skunkworker2y ago

mulanderOP2y ago

Great observation! :)

We worked upstream to have `search_path` properly handled (tracked per client) by pgbouncer.

https://github.com/pgbouncer/pgbouncer/commit/8c18fc4d213ad4...

Check config.md in that commit for a verbose, humanized description.

stevefan19992y ago

mslot2y ago

The answer is mostly the same as for PostgreSQL, since Citus stores data in regular PostgreSQL tables.

It's a good idea to set up archival into blob storage / S3 using a tool like WAL-G for disaster recovery purposes, and streaming replication using a tool like Patroni.

(Or use a managed service like Azure Cosmos DB for PostgreSQL)

If a node is down and cannot be quickly recovered then the remaining shards are still available for reads and writes, except for DDLs.

asah2y ago

love citus! this is useful.

mslot2y ago

True, though I'd phrase it more as a middle ground. Which model is best depends on your needs.

nine_k2y ago

Schemas are namespaces, a convenience feature, not a security feature.

rad_gruchalski2y ago

That's exactly the reason I went for YugabyteDB tablespaces implementation on one of past projects. Dedicated database locked down to selected nodes for full isolation.

paulddraper2y ago

> Schema objects can't be manipulated as a batch,

I would make a partitioned table, and put each partition in its own schema.

binwiederhier2y ago

It is worth noting that Citus was acquired by Microsoft a few years ago (not a secret, but may not be obvious to everyone), and they have since shifted heavily towards the Azure side of the world.

I'd still be careful to bank on it as a Citus only customer or open source user.

mulanderOP2y ago

Hi, we open-sourced the project fully: https://www.citusdata.com/blog/2022/06/17/citus-11-goes-full...

mariocesar2y ago

AlexB1382y ago

janee2y ago

Yeah it's a bit of a shame they were bought by one of the cloud providers.

Would've been great if something like Cruncy Bridge, PgAnalyze and Citus were merged into one company so that you have managed Postgres over the 3 major providers with the best pg tech out the box.

Having your DB and apps in different providers is a stuff up, the latency and data transfer costs killed us, so Citus was a non start (we don't want to manage it ourselves in EC2).

stevefan19992y ago

Checkout stackgres: https://stackgres.io/install/

They are offering Citus integration

mariocesar2y ago

Thank you!

krab2y ago

Because you can't install extensions to RDS, I think it can't be done unless AWS chooses to support it.

You can still install Postgres + Citus on EC2 or use some 3rd party service to manage it for you.

aamederen2y ago

You can run Citus on EC2 for now.

__s2y ago

Seems like article only offers ease of use. Guess I've never used microservices enough to consider that use case

Couldn't the microservice case be handled by having distributed tables with no distribution column? ie today I'd create a distribution column & fill it with the same value on every row

Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

mslot2y ago

> What's advantage over having tenant id as distribution column?

> Couldn't the microservice case be handled by having distributed tables with no distribution column?

Absolutely. We first implemented the notion of single shard distributed tables with a NULL distribution column, and then built schema-based sharding on top as a convenience layer.

> Can one have a reference schema which can be efficiently used alongside every other schema? Guess that's public schema with create_reference_table/create_distributed_function

janpio2y ago

potamic2y ago

mulanderOP2y ago

Because with schema-based sharding in Citus, the schemas are the sharding unit and the system can move them to new nodes being added to the cluster.

You can start with all your microservices sharing a single node, add nodes and have the storage layer distributed horizontally in a way transparent to the services themselves.

potamic2y ago

I see, so it's kinda paving the way for a db-as-a service solution?

j / k navigate · click thread line to collapse