Representing Enums in PostgreSQL (opens in new tab)

(making.close.com)

112 pointswojcikstefan2y ago74 comments

74 comments

TFA is all about how to make changes where you drop elements of an enum, and how hard that is. The obvious thing though is not covered: don't do that! Instead you should:

  a. Add CHECK() constraints on columns of
     the relevant ENUM type checking that
     the value is one of the "live" values.
  b. RENAME "dead" ENUM values to indicate
     that they're dead.
  c. UPDATE ... SET column_of_that_enum_type = ...
     WHERE column_of_that_enum_type IN (dead_values_here) ...

(c) can be done slowly, say, with a LIMIT clause to keep load and disruptions down. Yeah, you might have "broken" data for a while, but you can make your queries report better values for columns of that ENUM type in the same way that the UPDATE would fix them.

hn_throwaway_992y ago

Glad to see this as the top comment, completely agree. After reading the article, was thinking that the only real downside of using enum types (but there is one more issue not mentioned, more on that below) is when you need to remove values. In reality, I've found removing enum values to be a very rare occurrence in prod. Removing an enum value fundamentally breaks backwards compatibility, so usually a better option is to essentially deprecate it, which is more like what you recommend.

Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.

thom2y ago

You can create an implicit cast to and from text with CREATE CAST and it becomes transparent.

hn_throwaway_992y ago

Can you explain a little more what you mean here? I'm not sure this helps my use case. E.g. if I run a BQ external query like the following

    SELECT * FROM EXTERNAL_QUERY("my-pg-connection-string", """
        SELECT enum_column FROM foo;
    """);

that always fails for me, even if I define "CREATE CAST (my_enum_type AS text) WITH INOUT AS IMPLICIT;"

The only way I can get it to work is if I use "SELECT enum_column::text FROM foo;" and I don't need a custom cast definition to do that.

3 more replies

aidos2y ago

Oh nice. I didn’t know about that feature.

faangsticle2y ago

> BQ doesn't support enums

Sounds like a reason to use views to work around the limitations of BigQuery

cryptonector2y ago

> Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.

Yeah, one has to write more adapter queries to add those casts to text (or from text, for DMLs), but it's OK, and ENUMs are just really nice. And as u/faangsticle says, use VIEWs for that.

cropcirclbureau2y ago

Good approach. And with Postgres `DOMAIN` types features to DRY up the `CHECK` the live constraint, this should be very palatable. I usually opt for enum tables and regularly find myself needing to declare `archived_at` columns for said tables. Did not know of the `RENAME` command either and what you describe renders native enums less warty.

dikei2y ago

Agree, PostgreSQL enum is really just for mapping with an enum in application code. And just like you should not remove an enum or change their ordinal in application code, don't do that to a PostgreSQL enum

BulgarianIdiot2y ago

I find it odd how many schema changes in modern RDBMS must be done on the whole table at once. You can split a table in chunks and recode each chunk gradually in a way which doesn't change the data in it (so no downtime) but removes dead entries like updated enums.

In a way you describe how we can emulate this process. The question is why the heck wouldn't databases do this themselves? Same with adding and dropping columns.

Consider how PostgreSQL encodes null for example, by skipping them in the row as fields, and adding them in a null bitmap in front of the row. Meaning... rows are not uniformly sized, there's no math like offset = row * rowsize + field_offset; kind of addressing for reading a field in PG where recoding some of the rows breaks the entire table.

And yet we have all those huge monolithic operations that need to be done atomically. So weird.

rpcope12y ago

After having suffered through the consequences of "type" enums on MySQL, and see some things go through a long life that used "enums" in the database (in multiple different databases, include Postgres), I'm not convinced that either of these are the right choice for representing enumerations. The string with check constraint seems dumb if for no other reason than if the table that uses it winds up having many rows, you're basically burning up lots of extra space for arguably no reason, and if you ever have to alter the name of an enum (or something similar), that update is going to be really expensive.

I think the "right" choice for enums probably looks a little more like:

   CREATE TABLE myEnum(
     enumID SERIAL NOT NULL PRIMARY KEY,
     enumName TEXT UNIQUE NOT NULL,
     -- enum description or other metadata columns here.
   );

  CREATE TABLE foo(
    ...
    associatedEnumID INTEGER NOT NULL REFERENCES myEnum(enumID),
    ...
  );

I think this has the benefit of being space efficient like the native typed enum, while being relatively flexible (easy to change names, add new enum values, add data about the enum itself, etc.)

SPBS2y ago

I prefer directly using strings as enums, and using the foreign key constraint only to validate enum values.

    CREATE TABLE my_enum ( name TEXT PRIMARY KEY );

    CREATE TABLE foo ( my_enum TEXT REFERENCES my_enum (name) );

The reason is because a SELECT * FROM foo showing cryptic enum ordinals is a headache, and having to join the enum table every time is potentially slower than just reading from the column directly. An ASCII character only takes 1 byte, so an INT enum is just as space efficient as using 4 characters, which affords way more descriptiveness than a meaningless ordinal number.

ledgerdev2y ago

If you use longer named enums(eg. my_enum_xyz) in my_enum, does this store a full copy of the enum text bytes of 'my_enum_xyz' into table foo?

SPBS2y ago

Yeah the full text of the enum is stored in the table ('my_enum_xyz' is 11 ASCII characters so it takes up 11 bytes, plus 1 byte needed to store the length of the string).

faangsticle2y ago

The point of enums is so the names can be descriptive, which typically will be longer than 4 characters.

SPBS2y ago

Perhaps I was unclear, my underlying point is that short strings are cheap. And enum names are almost always short strings. 4 characters not enough? What about 8? That’s the size of a BIGINT, which everybody now uses for primary keys (they’re not slow). What about 16 characters? That’s the size of a UUID, which everyone now uses for primary keys (they’re not slow).

Even 4 characters (which perform just as fast as Postgres’ native enums) is enough if you develop a system of abbreviations. The airline industry gets by with just three characters!

1 more reply

SigmundA2y ago

Yep thats what I typically do except a string "code" and string "description" that way the raw data is more readable without joining the coded value and many times you want a short code vs long description, like with US states and their abbreviations. Most codes are 1 or 2 characters up to maybe 4 so you end up with less or the same space used than a 32 bit int with UTF-8.

Metadata like sorting and even what date ranges the code is valid for and even security as in who's allowed to use the code in the app.

cryptonector2y ago

This is... exactly what PG does under the covers for ENUM types. And also of course this is historically the canonical way to do ENUMs in SQL.

jfb2y ago

This is exactly what I do, save with natural keys.

netcraft2y ago

In my experience, you need those descriptions and other metadata on these enum values more often than not.

jpgvm2y ago

PostgreSQL enums feel like a bit of a hack in general. I end up using an "enum table" approach in many cases as joining against a very small table has negligible performance impact in all but the most performance sensitive databases and foreign key constraints are a well understood quantity.

cryptonector2y ago

PG enums are "enum tables" under the hood. With native enum support the JOINs with the enum tables happen at query compilation time, which is a performance benefit you should not ignore.

marcosdumay2y ago

PostgreSQL enums are there for you to use on your code API. They can go into tables because sometimes it's important to put code data into tables.

What they are absolutely not for is to replace the one standard way to define data enums that everybody use since relational algebra was created. It's such an outrageously naive idea that I'm sure most people here were attracted to the title thinking it's about algebraic types or some other similar misunderstanding.

wojcikstefanOP2y ago

Can you elaborate on what it means to use PostgreSQL enums "on your code API"?

> It's such an outrageously naive idea that I'm sure most people here were attracted to the title thinking it's about algebraic types or some other similar misunderstanding.

Just to share how we at Close got into this discussion (which I personally don't find as "outrageous" as you), SQLAlchemy – which we use in our Python code – uses `native_enum=True` as the default value for its `Enum` type [0], which then translates to using types in PostgreSQL:

> native_enum – Use the database’s native ENUM type when available. Defaults to True. When False, uses VARCHAR + check constraint for all backends.

[0]: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqla...

marcosdumay2y ago

> Can you elaborate on what it means to use PostgreSQL enums "on your code API"?

If you export a procedure for creating a socket into pgPlSQL, you shouldn't use magical numbers for setting the socket flags. You should use enums.

As for sqlalchmey, well that design is not good. It should support more mappings than just to string. But well, personally, I would ignore the feature and go without enum types (notice that it's a recent addition). You can always declare your own enum and convert the column if you want.

2 more replies

aidos2y ago

Heyya! Close customer here.

> When False, uses VARCHAR + check constraint for all backends

Is this behaviour new? Or is it that alembic still doesn't pick up changes to the enum to recreate the constraint? We implemented our own alembic hooks to automate the migrations for us. We've found the check constraint model to be fairly effective.

1 more reply

akersten2y ago

Is there a reason ALTER TYPE name ADD VALUE new_enum_value wouldn't Just Work for the first example for 99% of use cases? Seems like the only drawback highlighted for native enums is that you have to lock the whole table if you completely swap out the type of one of the columns, which... Yes, that's true, but also very rare? I guess TFA mentions removing a value from an enum, but you shouldn't do that imo - leave the data in place and ignore it at the app layer, like how you'd treat a deleted flag or similar. Or migrate it and leave the old value hanging around in the type.

jpmelos2y ago

> Is there a reason ALTER TYPE name ADD VALUE new_enum_value wouldn't Just Work for the first example for 99% of use cases?

I'm sorry, we don't have any examples in the article about adding values to enums, so I don't know to what you are referring here.

> Yes, that's true, but also very rare?

It is rare, indeed, but we just have a strong inclination towards having a clean schema, so we do prefer to clean up things that would otherwise be unused. But I understand that many people would prefer to leave them hanging there, even if they are not going to be used in any way.

jpalomaki2y ago

Native enums can be used in queries like strings, but with type checking:

  select * from cust where type = ’company’ -- ok
  select * from cust where type = ’c0mpany’ -- error

As mentioned, they take less space. Important if you use these columns in index and have millions of rows.

ceving2y ago

The "less space" argument does not seem to be true in every situation. Enums may suffer from alignments. See here: https://dba.stackexchange.com/a/258591/50410

Too2y ago

Isn’t index size more proportional to number of distinct values and number of rows. With few distinct enum options, the size of each shouldn’t have big impact?

Icathian2y ago

Little known bug, but enums also choke dump/restore if you use them as a hash partition key.

That said, they can be really nice for all the same reasons static types are nice. Good article!

wojcikstefanOP2y ago

Whew, good to know! Another point towards the `CHECK` constraint approach. Thanks! :)

ComputerGuru2y ago

It sure sounds like a bug to me, too! Is there an upstream bug id? I’d imagine dump/restore should “just work” with anything that can be done in postgres oob.

Icathian2y ago

Donno about a bug Id, but there were a couple mailing list discussions. Here: https://www.postgresql.org/message-id/flat/1376149.167526827...

Disclosure: I reported this particular bug. It was a fun one to run down.

fabian2k2y ago

Native enums in Postgres just seem to have many more drawbacks than advantages. I never really used them, it didn't seem worth it.

I usually use C# Enums translated by EF Core now, which works perfectly fine on the C# side. The only missing part would be to give the DB the information about the enum names, so it could show those to me instead of just the raw numbers. But I assume there is no way to do that. Adding entire tables just doesn't seem worth it for this use case alone.

ComputerGuru2y ago

> Adding entire tables just doesn't seem worth it for this use case alone.

If you peek at a particular table often enough and it has a crazy enough enum to maybe justify making your life a little easier, what you can do is add a table just for debug purposes and use a view that maps the int to it, but otherwise only use the direct int:enum cast in your app/orm/etc. thereby not paying any runtime cost for foreign keys or integrity checking.

bob10292y ago

> Adding entire tables just doesn't seem worth it for this use case alone.

I am going down this path. There are a lot of downstream use cases (e.g. support, reporting, etc) that can benefit from having the canonical names & human-friendly labels in the database.

Keeping database in sync with code like this requires some discipline and manual scripting, but I think it's worth it.

wojcikstefanOP2y ago

> I usually use C# Enums translated by EF Core now (...)

Does this approach ensure data consistency? Or could you accidentally insert a number value into your table which is not represented by any of the enum values?

ComputerGuru2y ago

Well C# is strongly typed so that’s punted to the CLR type system (assuming you’re not using raw sql to insert values via other means/frontends) - but it isn’t too hard to forcibly to cast a raw int to an enum in C# which doesn’t actually contain a definition with that integral value (unlike in even stronger typed languages eg in rust where it’s a lot more work).

I “manually” add a check constraint (via EF, so it’s not so much manual as it is “remember to copy and paste this in your db.cs) to assert the value is greater than or equal to zero and less than the number of inhabitants in the enum, but this fails if you manually code the enum values (eg for flags, legacy interop, etc).

fabian2k2y ago

With raw SQL you could write anything in it. Within EF Core it won't allow bad values, you could trick it I think but that is not something you'd do accidentally. You can always add a CHECK constraint in addition, if you have more uncontrolled places modifying those values.

Rapzid2y ago

IDK I use the npgsql enum mapping to native pg enum and it works great. Pretty sure I have it configured for string values too.

brycelarkin2y ago

The article misses the most common way to represent enums: foreign keys…

tsxxst2y ago

Foreign keys have a different set of tradeoffs. We'd have to either look enum values up before making the query, or use a join that might affect query plan.

rollulus2y ago

When would one use this approach versus a domain type [1]? How does it differ?

[1]: https://www.postgresql.org/docs/current/domains.html

jpmelos2y ago

That would be an improvement on the `CHECK` constraint approach if the same `CHECK` constraint is used in multiple columns, to DRY it up. The `DOMAIN` types offer the same commands for `CHECK` constraints, namely `NOT VALID` when updating the constraint, and `VALIDATE CONSTRAINT` to make sure it is valid after having updated it.

somat2y ago

I am probably missing something obvious(space/time optimization for very large/busy tables?) but isn't this a perfect example of why foreign keys exist. In fact both case studies presented by the article(enums and fixed string check constraints) are clunky enough that I would say both are wrong and they should just go with a foreign key.

Canada2y ago

I agree with this, I don't use enums they are always more trouble than they are worth.

They break FDW unless they are pre-created on the importing side. Super inconvenient.

wojcikstefanOP2y ago

Apologies for my ignorance, but what does "FDW" stand for in your comment? :)

Quekid52y ago

Foreign Data Wrapper

adgjlsfhk12y ago

specifically, they are one of the coolest features in postgres. Have you ever wanted to pretend that a CSV (or your filesystem or google) or basically anything else was a SQL database? With a couple hundred lines of code, you can do this. The performance won't be great, but it's ridiculously flexible.

1 more reply

efxhoy2y ago

I had to implement a workaround for that for our DW which imports from two different app databases. IMO this should be part of

  import foreign schema

but then you would have to qualify the names somehow if you were importing from more than one db. I wrote this as a workaround, it runs every day as part of our data import job.

  DROP SCHEMA IF EXISTS fdw_enum CASCADE;
  CREATE SCHEMA fdw_enum;
  
  -- Get names of the enums using ::regtype casting and label and sort_order from app_a and app_b.
  CREATE VIEW fdw_enum.app_a_enums AS SELECT * FROM dblink('fdw_app_a',
  $QUERY$
    SELECT
      enumtypid::regtype AS name,
      enumsortorder sort_order,
      enumlabel label
    FROM pg_enum;
  $QUERY$
  ) AS t (name text, sort_order integer, label text);
  
  CREATE VIEW fdw_enum.app_b_enums AS SELECT * FROM dblink('fdw_app_b',
  $QUERY$
    SELECT
      enumtypid::regtype AS name,
      enumsortorder sort_order,
      enumlabel label
    FROM pg_enum;
  $QUERY$
  ) AS t (name text, sort_order integer, label text);
  
  -- Ensure enums with the same names aren't defined in both app_a and app_b.
  DO
  $DO$
  DECLARE
    matching_count integer;
  BEGIN
    SELECT COUNT(*) into matching_count
    FROM fdw_enum.app_a_enums INNER JOIN fdw_enum.app_b_enums USING (name);
    ASSERT
      matching_count = 0,
      'app_a and app_b-NG have identically named enums. Implement a check that they are identically defined.';
  END
  $DO$;
  
  CREATE VIEW fdw_enum.upstream_enums AS
  SELECT * FROM fdw_enum.app_a_enums
  UNION ALL
  SELECT * FROM fdw_enum.app_b_enums;
  
  
  CREATE PROCEDURE fdw_enum.create_type_if_not_exists(name text)
  LANGUAGE plpgsql
  AS $PROC$
    BEGIN
      EXECUTE format('CREATE TYPE %s AS ENUM ()', name);
      EXCEPTION WHEN duplicate_object THEN RAISE NOTICE 'Enum type % already exists, skipping', name;
    END;
  $PROC$
  ;
  
  
  -- To make this idempotent we create the enums empty then alter them by adding values.
  -- So instead of `CREATE TYPE foo AS ENUM ('bar', 'baz');` we do
  -- `CREATE TYPE IF NOT EXISTS foo AS ENUM ();
  -- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('bar');
  -- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('baz');
  -- This ensure the procedure is re-runnable and can add new values to the enum as they are added upstream.
  -- Order is ensured by the ORDER BY in the loop query.
  CREATE PROCEDURE fdw_enum.refresh_upsteram_enums()
  LANGUAGE plpgsql
  AS $PROC$
  DECLARE
    rec record;
    ddl text;
  BEGIN
    FOR rec IN SELECT DISTINCT(name) AS name FROM fdw_enum.upstream_enums
    LOOP
      CALL fdw_enum.create_type_if_not_exists(rec.name);
    END LOOP;
  
    FOR rec in
      SELECT *
      FROM fdw_enum.upstream_enums
      ORDER BY name, sort_order
    LOOP
      ddl := FORMAT('ALTER TYPE %s ADD VALUE IF NOT EXISTS %s', rec.name, quote_literal(rec.label));
      EXECUTE ddl;
    END LOOP;
  END
  $PROC$
  ;
  
  CALL fdw_enum.refresh_upsteram_enums();

cmcconomy2y ago

I'm happy to have a FK to a reftable with int, shortname, description

jensenbox2y ago

What is the difference in size on disk between the two options?

jpmelos2y ago

An enum stores a small reference to the enum value in the tuples, and the constraint approach stores the string itself. The constraint approach takes more space with (a) larger enum values, (b) more columns using them, and (c) more rows in the tables that use them. The difference depends on the characteristics of the database we're talking about.

out_of_protocol2y ago

smallint looks like a good alternative, with dictionary in the app or separate table. So far i've only seen storing dictionary in app source code approach

wojcikstefanOP2y ago

Just to make sure I understand your suggestion fully, you're saying: 1. Use `smallint` instead of `text` for the column type. 2. Otherwise follow our "`CHECK` constraint" approach (without it we're jeopardizing data consistency because we can store anything between -32768 and 32767 in that column). 3. Translate the int to the enum in your application.

Right?

ComputerGuru2y ago

Most ORMs or just DALs can automate step 3 for you. If you’re using a strongly typed language with any type of ffi support, you’ll probably already have native cast from int to enum available (unless you’re using rust where that’s considered unsafe by default).

out_of_protocol2y ago

Yep!

jpmelos2y ago

I don't like this approach much because getting cryptic integers when you do a `SELECT` in the database is really cumbersome.

arp2422y ago

Agreed, but on the other hand it also saves a lot of complexity and possible headaches down the line. It's kind of a matter of choosing which headache you want. Especially if your table gets larger all those extra bytes in text columns can cost you dozens of GB of disk space, makes indexing slower, etc.

I MySQL/MariaDB enums are just "aliases" for ints, and that works much nicer IMHO, and adding a new value is cheap because it doesn't recheck all the rows (removing values is still expensive, as it needs to check it's not actually used by any row).

j / k navigate · click thread line to collapse

74 comments

cryptonector2y ago

TFA is all about how to make changes where you drop elements of an enum, and how hard that is. The obvious thing though is not covered: don't do that! Instead you should:

  a. Add CHECK() constraints on columns of
     the relevant ENUM type checking that
     the value is one of the "live" values.
  b. RENAME "dead" ENUM values to indicate
     that they're dead.
  c. UPDATE ... SET column_of_that_enum_type = ...
     WHERE column_of_that_enum_type IN (dead_values_here) ...

hn_throwaway_992y ago

thom2y ago

You can create an implicit cast to and from text with CREATE CAST and it becomes transparent.

hn_throwaway_992y ago

Can you explain a little more what you mean here? I'm not sure this helps my use case. E.g. if I run a BQ external query like the following

    SELECT * FROM EXTERNAL_QUERY("my-pg-connection-string", """
        SELECT enum_column FROM foo;
    """);

that always fails for me, even if I define "CREATE CAST (my_enum_type AS text) WITH INOUT AS IMPLICIT;"

The only way I can get it to work is if I use "SELECT enum_column::text FROM foo;" and I don't need a custom cast definition to do that.

3 more replies

aidos2y ago

Oh nice. I didn’t know about that feature.

faangsticle2y ago

> BQ doesn't support enums

Sounds like a reason to use views to work around the limitations of BigQuery

cryptonector2y ago

Yeah, one has to write more adapter queries to add those casts to text (or from text, for DMLs), but it's OK, and ENUMs are just really nice. And as u/faangsticle says, use VIEWs for that.

cropcirclbureau2y ago

dikei2y ago

BulgarianIdiot2y ago

In a way you describe how we can emulate this process. The question is why the heck wouldn't databases do this themselves? Same with adding and dropping columns.

And yet we have all those huge monolithic operations that need to be done atomically. So weird.

rpcope12y ago

I think the "right" choice for enums probably looks a little more like:

   CREATE TABLE myEnum(
     enumID SERIAL NOT NULL PRIMARY KEY,
     enumName TEXT UNIQUE NOT NULL,
     -- enum description or other metadata columns here.
   );

  CREATE TABLE foo(
    ...
    associatedEnumID INTEGER NOT NULL REFERENCES myEnum(enumID),
    ...
  );

I think this has the benefit of being space efficient like the native typed enum, while being relatively flexible (easy to change names, add new enum values, add data about the enum itself, etc.)

SPBS2y ago

I prefer directly using strings as enums, and using the foreign key constraint only to validate enum values.

    CREATE TABLE my_enum ( name TEXT PRIMARY KEY );

    CREATE TABLE foo ( my_enum TEXT REFERENCES my_enum (name) );

ledgerdev2y ago

If you use longer named enums(eg. my_enum_xyz) in my_enum, does this store a full copy of the enum text bytes of 'my_enum_xyz' into table foo?

SPBS2y ago

Yeah the full text of the enum is stored in the table ('my_enum_xyz' is 11 ASCII characters so it takes up 11 bytes, plus 1 byte needed to store the length of the string).

faangsticle2y ago

The point of enums is so the names can be descriptive, which typically will be longer than 4 characters.

SPBS2y ago

Even 4 characters (which perform just as fast as Postgres’ native enums) is enough if you develop a system of abbreviations. The airline industry gets by with just three characters!

1 more reply

SigmundA2y ago

Metadata like sorting and even what date ranges the code is valid for and even security as in who's allowed to use the code in the app.

cryptonector2y ago

This is... exactly what PG does under the covers for ENUM types. And also of course this is historically the canonical way to do ENUMs in SQL.

jfb2y ago

This is exactly what I do, save with natural keys.

netcraft2y ago

In my experience, you need those descriptions and other metadata on these enum values more often than not.

jpgvm2y ago

cryptonector2y ago

PG enums are "enum tables" under the hood. With native enum support the JOINs with the enum tables happen at query compilation time, which is a performance benefit you should not ignore.

marcosdumay2y ago

PostgreSQL enums are there for you to use on your code API. They can go into tables because sometimes it's important to put code data into tables.

wojcikstefanOP2y ago

Can you elaborate on what it means to use PostgreSQL enums "on your code API"?

> It's such an outrageously naive idea that I'm sure most people here were attracted to the title thinking it's about algebraic types or some other similar misunderstanding.

> native_enum – Use the database’s native ENUM type when available. Defaults to True. When False, uses VARCHAR + check constraint for all backends.

[0]: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqla...

marcosdumay2y ago

> Can you elaborate on what it means to use PostgreSQL enums "on your code API"?

If you export a procedure for creating a socket into pgPlSQL, you shouldn't use magical numbers for setting the socket flags. You should use enums.

2 more replies

aidos2y ago

Heyya! Close customer here.

> When False, uses VARCHAR + check constraint for all backends

1 more reply

akersten2y ago

jpmelos2y ago

> Is there a reason ALTER TYPE name ADD VALUE new_enum_value wouldn't Just Work for the first example for 99% of use cases?

I'm sorry, we don't have any examples in the article about adding values to enums, so I don't know to what you are referring here.

> Yes, that's true, but also very rare?

jpalomaki2y ago

Native enums can be used in queries like strings, but with type checking:

  select * from cust where type = ’company’ -- ok
  select * from cust where type = ’c0mpany’ -- error

As mentioned, they take less space. Important if you use these columns in index and have millions of rows.

ceving2y ago

The "less space" argument does not seem to be true in every situation. Enums may suffer from alignments. See here: https://dba.stackexchange.com/a/258591/50410

Too2y ago

Isn’t index size more proportional to number of distinct values and number of rows. With few distinct enum options, the size of each shouldn’t have big impact?

Icathian2y ago

Little known bug, but enums also choke dump/restore if you use them as a hash partition key.

That said, they can be really nice for all the same reasons static types are nice. Good article!

wojcikstefanOP2y ago

Whew, good to know! Another point towards the `CHECK` constraint approach. Thanks! :)

ComputerGuru2y ago

It sure sounds like a bug to me, too! Is there an upstream bug id? I’d imagine dump/restore should “just work” with anything that can be done in postgres oob.

Icathian2y ago

Donno about a bug Id, but there were a couple mailing list discussions. Here: https://www.postgresql.org/message-id/flat/1376149.167526827...

Disclosure: I reported this particular bug. It was a fun one to run down.

fabian2k2y ago

Native enums in Postgres just seem to have many more drawbacks than advantages. I never really used them, it didn't seem worth it.

ComputerGuru2y ago

> Adding entire tables just doesn't seem worth it for this use case alone.

bob10292y ago

> Adding entire tables just doesn't seem worth it for this use case alone.

I am going down this path. There are a lot of downstream use cases (e.g. support, reporting, etc) that can benefit from having the canonical names & human-friendly labels in the database.

Keeping database in sync with code like this requires some discipline and manual scripting, but I think it's worth it.

wojcikstefanOP2y ago

> I usually use C# Enums translated by EF Core now (...)

Does this approach ensure data consistency? Or could you accidentally insert a number value into your table which is not represented by any of the enum values?

ComputerGuru2y ago

fabian2k2y ago

Rapzid2y ago

IDK I use the npgsql enum mapping to native pg enum and it works great. Pretty sure I have it configured for string values too.

brycelarkin2y ago

The article misses the most common way to represent enums: foreign keys…

tsxxst2y ago

Foreign keys have a different set of tradeoffs. We'd have to either look enum values up before making the query, or use a join that might affect query plan.

rollulus2y ago

When would one use this approach versus a domain type [1]? How does it differ?

[1]: https://www.postgresql.org/docs/current/domains.html

jpmelos2y ago

somat2y ago

Canada2y ago

I agree with this, I don't use enums they are always more trouble than they are worth.

They break FDW unless they are pre-created on the importing side. Super inconvenient.

wojcikstefanOP2y ago

Apologies for my ignorance, but what does "FDW" stand for in your comment? :)

Quekid52y ago

Foreign Data Wrapper

adgjlsfhk12y ago

1 more reply

efxhoy2y ago

I had to implement a workaround for that for our DW which imports from two different app databases. IMO this should be part of

  import foreign schema

but then you would have to qualify the names somehow if you were importing from more than one db. I wrote this as a workaround, it runs every day as part of our data import job.

  DROP SCHEMA IF EXISTS fdw_enum CASCADE;
  CREATE SCHEMA fdw_enum;
  
  -- Get names of the enums using ::regtype casting and label and sort_order from app_a and app_b.
  CREATE VIEW fdw_enum.app_a_enums AS SELECT * FROM dblink('fdw_app_a',
  $QUERY$
    SELECT
      enumtypid::regtype AS name,
      enumsortorder sort_order,
      enumlabel label
    FROM pg_enum;
  $QUERY$
  ) AS t (name text, sort_order integer, label text);
  
  CREATE VIEW fdw_enum.app_b_enums AS SELECT * FROM dblink('fdw_app_b',
  $QUERY$
    SELECT
      enumtypid::regtype AS name,
      enumsortorder sort_order,
      enumlabel label
    FROM pg_enum;
  $QUERY$
  ) AS t (name text, sort_order integer, label text);
  
  -- Ensure enums with the same names aren't defined in both app_a and app_b.
  DO
  $DO$
  DECLARE
    matching_count integer;
  BEGIN
    SELECT COUNT(*) into matching_count
    FROM fdw_enum.app_a_enums INNER JOIN fdw_enum.app_b_enums USING (name);
    ASSERT
      matching_count = 0,
      'app_a and app_b-NG have identically named enums. Implement a check that they are identically defined.';
  END
  $DO$;
  
  CREATE VIEW fdw_enum.upstream_enums AS
  SELECT * FROM fdw_enum.app_a_enums
  UNION ALL
  SELECT * FROM fdw_enum.app_b_enums;
  
  
  CREATE PROCEDURE fdw_enum.create_type_if_not_exists(name text)
  LANGUAGE plpgsql
  AS $PROC$
    BEGIN
      EXECUTE format('CREATE TYPE %s AS ENUM ()', name);
      EXCEPTION WHEN duplicate_object THEN RAISE NOTICE 'Enum type % already exists, skipping', name;
    END;
  $PROC$
  ;
  
  
  -- To make this idempotent we create the enums empty then alter them by adding values.
  -- So instead of `CREATE TYPE foo AS ENUM ('bar', 'baz');` we do
  -- `CREATE TYPE IF NOT EXISTS foo AS ENUM ();
  -- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('bar');
  -- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('baz');
  -- This ensure the procedure is re-runnable and can add new values to the enum as they are added upstream.
  -- Order is ensured by the ORDER BY in the loop query.
  CREATE PROCEDURE fdw_enum.refresh_upsteram_enums()
  LANGUAGE plpgsql
  AS $PROC$
  DECLARE
    rec record;
    ddl text;
  BEGIN
    FOR rec IN SELECT DISTINCT(name) AS name FROM fdw_enum.upstream_enums
    LOOP
      CALL fdw_enum.create_type_if_not_exists(rec.name);
    END LOOP;
  
    FOR rec in
      SELECT *
      FROM fdw_enum.upstream_enums
      ORDER BY name, sort_order
    LOOP
      ddl := FORMAT('ALTER TYPE %s ADD VALUE IF NOT EXISTS %s', rec.name, quote_literal(rec.label));
      EXECUTE ddl;
    END LOOP;
  END
  $PROC$
  ;
  
  CALL fdw_enum.refresh_upsteram_enums();

cmcconomy2y ago

I'm happy to have a FK to a reftable with int, shortname, description

jensenbox2y ago

What is the difference in size on disk between the two options?

jpmelos2y ago

out_of_protocol2y ago

smallint looks like a good alternative, with dictionary in the app or separate table. So far i've only seen storing dictionary in app source code approach

wojcikstefanOP2y ago

Right?

ComputerGuru2y ago

out_of_protocol2y ago

Yep!

jpmelos2y ago

I don't like this approach much because getting cryptic integers when you do a `SELECT` in the database is really cumbersome.

arp2422y ago

j / k navigate · click thread line to collapse