a. Add CHECK() constraints on columns of
the relevant ENUM type checking that
the value is one of the "live" values.
b. RENAME "dead" ENUM values to indicate
that they're dead.
c. UPDATE ... SET column_of_that_enum_type = ...
WHERE column_of_that_enum_type IN (dead_values_here) ...
(c) can be done slowly, say, with a LIMIT clause to keep load and disruptions down. Yeah, you might have "broken" data for a while, but you can make your queries report better values for columns of that ENUM type in the same way that the UPDATE would fix them.Note one other thing I have found is that postgres enums are less "portable" than plain text columns. For example, if you're using BigQuery to query postgres tables as an external connection, BQ doesn't support enums, so your external connection query has to cast any enum return values (e.g. some_enum_column::text) before returning them to BQ, which can be a pain if you want to do "SELECT * FROM ..." some table that returns enum columns.
Sounds like a reason to use views to work around the limitations of BigQuery
Yeah, one has to write more adapter queries to add those casts to text (or from text, for DMLs), but it's OK, and ENUMs are just really nice. And as u/faangsticle says, use VIEWs for that.
In a way you describe how we can emulate this process. The question is why the heck wouldn't databases do this themselves? Same with adding and dropping columns.
Consider how PostgreSQL encodes null for example, by skipping them in the row as fields, and adding them in a null bitmap in front of the row. Meaning... rows are not uniformly sized, there's no math like offset = row * rowsize + field_offset; kind of addressing for reading a field in PG where recoding some of the rows breaks the entire table.
And yet we have all those huge monolithic operations that need to be done atomically. So weird.
I think the "right" choice for enums probably looks a little more like:
CREATE TABLE myEnum(
enumID SERIAL NOT NULL PRIMARY KEY,
enumName TEXT UNIQUE NOT NULL,
-- enum description or other metadata columns here.
);
CREATE TABLE foo(
...
associatedEnumID INTEGER NOT NULL REFERENCES myEnum(enumID),
...
);
I think this has the benefit of being space efficient like the native typed enum, while being relatively flexible (easy to change names, add new enum values, add data about the enum itself, etc.) CREATE TABLE my_enum ( name TEXT PRIMARY KEY );
CREATE TABLE foo ( my_enum TEXT REFERENCES my_enum (name) );
The reason is because a SELECT * FROM foo showing cryptic enum ordinals is a headache, and having to join the enum table every time is potentially slower than just reading from the column directly. An ASCII character only takes 1 byte, so an INT enum is just as space efficient as using 4 characters, which affords way more descriptiveness than a meaningless ordinal number.Metadata like sorting and even what date ranges the code is valid for and even security as in who's allowed to use the code in the app.
What they are absolutely not for is to replace the one standard way to define data enums that everybody use since relational algebra was created. It's such an outrageously naive idea that I'm sure most people here were attracted to the title thinking it's about algebraic types or some other similar misunderstanding.
> It's such an outrageously naive idea that I'm sure most people here were attracted to the title thinking it's about algebraic types or some other similar misunderstanding.
Just to share how we at Close got into this discussion (which I personally don't find as "outrageous" as you), SQLAlchemy – which we use in our Python code – uses `native_enum=True` as the default value for its `Enum` type [0], which then translates to using types in PostgreSQL:
> native_enum – Use the database’s native ENUM type when available. Defaults to True. When False, uses VARCHAR + check constraint for all backends.
[0]: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqla...
I'm sorry, we don't have any examples in the article about adding values to enums, so I don't know to what you are referring here.
> Yes, that's true, but also very rare?
It is rare, indeed, but we just have a strong inclination towards having a clean schema, so we do prefer to clean up things that would otherwise be unused. But I understand that many people would prefer to leave them hanging there, even if they are not going to be used in any way.
select * from cust where type = ’company’ -- ok
select * from cust where type = ’c0mpany’ -- error
As mentioned, they take less space. Important if you use these columns in index and have millions of rows.That said, they can be really nice for all the same reasons static types are nice. Good article!
Disclosure: I reported this particular bug. It was a fun one to run down.
I usually use C# Enums translated by EF Core now, which works perfectly fine on the C# side. The only missing part would be to give the DB the information about the enum names, so it could show those to me instead of just the raw numbers. But I assume there is no way to do that. Adding entire tables just doesn't seem worth it for this use case alone.
If you peek at a particular table often enough and it has a crazy enough enum to maybe justify making your life a little easier, what you can do is add a table just for debug purposes and use a view that maps the int to it, but otherwise only use the direct int:enum cast in your app/orm/etc. thereby not paying any runtime cost for foreign keys or integrity checking.
I am going down this path. There are a lot of downstream use cases (e.g. support, reporting, etc) that can benefit from having the canonical names & human-friendly labels in the database.
Keeping database in sync with code like this requires some discipline and manual scripting, but I think it's worth it.
Does this approach ensure data consistency? Or could you accidentally insert a number value into your table which is not represented by any of the enum values?
I “manually” add a check constraint (via EF, so it’s not so much manual as it is “remember to copy and paste this in your db.cs) to assert the value is greater than or equal to zero and less than the number of inhabitants in the enum, but this fails if you manually code the enum values (eg for flags, legacy interop, etc).
They break FDW unless they are pre-created on the importing side. Super inconvenient.
import foreign schema
but then you would have to qualify the names somehow if you were importing from more than one db. I wrote this as a workaround, it runs every day as part of our data import job. DROP SCHEMA IF EXISTS fdw_enum CASCADE;
CREATE SCHEMA fdw_enum;
-- Get names of the enums using ::regtype casting and label and sort_order from app_a and app_b.
CREATE VIEW fdw_enum.app_a_enums AS SELECT * FROM dblink('fdw_app_a',
$QUERY$
SELECT
enumtypid::regtype AS name,
enumsortorder sort_order,
enumlabel label
FROM pg_enum;
$QUERY$
) AS t (name text, sort_order integer, label text);
CREATE VIEW fdw_enum.app_b_enums AS SELECT * FROM dblink('fdw_app_b',
$QUERY$
SELECT
enumtypid::regtype AS name,
enumsortorder sort_order,
enumlabel label
FROM pg_enum;
$QUERY$
) AS t (name text, sort_order integer, label text);
-- Ensure enums with the same names aren't defined in both app_a and app_b.
DO
$DO$
DECLARE
matching_count integer;
BEGIN
SELECT COUNT(*) into matching_count
FROM fdw_enum.app_a_enums INNER JOIN fdw_enum.app_b_enums USING (name);
ASSERT
matching_count = 0,
'app_a and app_b-NG have identically named enums. Implement a check that they are identically defined.';
END
$DO$;
CREATE VIEW fdw_enum.upstream_enums AS
SELECT * FROM fdw_enum.app_a_enums
UNION ALL
SELECT * FROM fdw_enum.app_b_enums;
CREATE PROCEDURE fdw_enum.create_type_if_not_exists(name text)
LANGUAGE plpgsql
AS $PROC$
BEGIN
EXECUTE format('CREATE TYPE %s AS ENUM ()', name);
EXCEPTION WHEN duplicate_object THEN RAISE NOTICE 'Enum type % already exists, skipping', name;
END;
$PROC$
;
-- To make this idempotent we create the enums empty then alter them by adding values.
-- So instead of `CREATE TYPE foo AS ENUM ('bar', 'baz');` we do
-- `CREATE TYPE IF NOT EXISTS foo AS ENUM ();
-- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('bar');
-- `ALTER TYPE foo ADD VALUE IF NOT EXISTS ('baz');
-- This ensure the procedure is re-runnable and can add new values to the enum as they are added upstream.
-- Order is ensured by the ORDER BY in the loop query.
CREATE PROCEDURE fdw_enum.refresh_upsteram_enums()
LANGUAGE plpgsql
AS $PROC$
DECLARE
rec record;
ddl text;
BEGIN
FOR rec IN SELECT DISTINCT(name) AS name FROM fdw_enum.upstream_enums
LOOP
CALL fdw_enum.create_type_if_not_exists(rec.name);
END LOOP;
FOR rec in
SELECT *
FROM fdw_enum.upstream_enums
ORDER BY name, sort_order
LOOP
ddl := FORMAT('ALTER TYPE %s ADD VALUE IF NOT EXISTS %s', rec.name, quote_literal(rec.label));
EXECUTE ddl;
END LOOP;
END
$PROC$
;
CALL fdw_enum.refresh_upsteram_enums();Right?
I MySQL/MariaDB enums are just "aliases" for ints, and that works much nicer IMHO, and adding a new value is cheap because it doesn't recheck all the rows (removing values is still expensive, as it needs to check it's not actually used by any row).