> Current CI/CD practices often make it very easy for software developers to commit and roll out database migrations to a production environment, only to find themselves in the middle of a production incident minutes later. While a staging deployment might help, it's not guaranteed to share the same characteristics as production (either due to the level of load or monetary constraints).
(neon.tech employee here)
This is where branching databases with production data helps quite a bit. Your CI/CD environment and even staging can experience the schema changes. When you build from a seed database you can often miss this kind of issue because it lacks the characteristics of your production environment.
But the author rightly calls out how staging isn't even enough in the next paragraph:
>The problem is, therefore (and I will repeat myself), the scale of the amount of data being modified, overall congestion of the system, I/O capacity, and the target table's importance in the application design.
Your staging, even when branched from production, won't have the same load patterns as your production database. And that load and locks associated will result in a different rollout.
This has me thinking if you can match the production environment patterns in staging by setting staging up to mirror the query patterns of production. Mirroring like what's available from pg_cat could put your staging under similar pressure.
And then this also made me think about how we're not capturing the timing of these schema changes. Unless a developer looks and sees that their schema change took 56 seconds to complete in their CI system you won't know that this change might have larger knock on effects in production.
"Works on my DB" is new "works on my machine" (and don't trademark it, please :)))
> Let's start with The Bad News - you cannot avoid rewriting the entire table, which will generate a significant amount of WAL files in the process. This is a given, and you must plan how to manage it.
There are some exceptions to this. The blog post focuses on changing a column from int to bigint, which indeed needs a full table rewrite to reallocate 8 bytes per ID value instead of 4. But:
> As an exception, when changing the type of an existing column, if the USING clause does not change the column contents and the old type is either binary coercible to the new type or an unconstrained domain over the new type, a table rewrite is not needed. [0]
So, you could change from varchar to text, for example, or change from varchar(20) to varchar(4), etc. and not incur a full table rewrite.
0: https://www.postgresql.org/docs/current/sql-altertable.html
That one should be fixed since Postgres 11 https://brandur.org/postgres-default
Adding the column as not nullable and setting a default is the absolute killer.
A work around when you still want to have a default is to do a three pass migration. First add the column as nullable and no default, then run an update to set the default on all rows, and finally change it to not nullable and set the default constraint.
It has a surprising difference in speed.
ALTER TABLE table_name ALTER COLUMN column_name [SET DATA] TYPE new_data_type
>you need to make sure the source system has enough disk space to hold the WAL files for a long enough time
if the asynchronous replication process has an external buffer instead of the WAL, then it addresses this issue
Probably because every tutorial on the Internet, along with the docs, recommends doing it this way. All the gotchas are buried in the footnotes.
The safe option is four steps minimum.
It's not hard to see why people would be tempted by the unsafe option.
Because you lose a significant amount of performance if you start adding NULL and variable-length columns just because you're afraid of a table rewrite.
Because the resulting table will not have had 1 table of update-induced bloat at the end of the operation.
Because you can be sure the modification is applied atomically and you as the user can be sure the migration from A to B goes through as expected or has a graceful rollback to the old data, rather than getting stuck or failures halfway through the migration.
Because toasted data from DROP-ed columns is not removed from storage with the DROP COLUMN statement, but only after the row that refers to that toasted value is updated or deleted.
...
Every column you "DROP" remains in the catalogs to make sure old tuples' data can be read from disk. That's overhead you now will have to carry around until the table is dropped. I'm not someone who likes having to carry that bloat around.
People haven't asked hard enough to the right people, I suppose. PostgreSQL is an open-source project, where wanted and needed features are supplied by willing individuals and companies, and vetted by the community so that the code base remains in good quality.
I just suppose no-one has bothered yet with implementing ALTER TABLE CONCURRENTLY to the point that it's been accepted by the community, though another reason might be because the induced metadata churn (only 1600 distinct user-defined columns available in each table at most) might become problematic with ADD COLUMN ... DROP COLUMN.
For very big tables it's often better to manually add a column, backfill then drop-swap.
Good old MySQL days...
* pgroll: Written in Golang, first commits June 2023. https://github.com/xataio/pgroll
* pg-osc: Written in Ruby, first commits Dec 2021. https://github.com/shayonj/pg-osc
Meanwhile over in the MySQL and MariaDB ecosystem, external OSC tools have been around for quite some time, starting with oak-online-alter-table over 15 years ago. The most popular options today are pt-online-schema-change or gh-ost, but other battle-tested solutions include fb-osc, LHM, and the latest entry Spirit.
And if we're saying that's a problem, then sounds like we're going back into the NoSQL debates from a decade ago.
Hopefully not.
I think it's better to understand your schema as much as possible, and have a sane process for applying changes when needed. Defining a schema forces you to think about what you're building.
OTOH, the idea that developers on a project are just going to throw whatever new attributes they need into a document as they go along is a recipe for bugs and inefficiency. Also, near-instant technical debt, as early work frequently survives longer than anticipated.
You also don't completely escape data changes without pain when using a NoSQL database. If for instance you change a string to an int you'd still need to figure out what to do with existing data, either via conversion or handling in-code.
YES. Thank you. Sit down with pencil and paper, write down a table name, and start putting attributes into it. Then define a PK, and ask yourself if every attribute is directly related to the PK (a user named foo has an id of 1, and lives in country bar). Repeat. Then ask yourself how you’d join the tables. If you find that something _could_ be represented as a join, but isn’t, consider doing so.
Still risks involved in migrations (mostly from the migration executing too quickly and creating high load in the cluster - the admission control system should have reduced this) and we have extra review steps for them, but it's been very useful to be able to migrate large tables without any extra application-level work.
If you give your schema a good thought (The one place where you shouldn't rush and take shortcuts at the beginning) and for example use jsonb columns and later move data out of it if you notice you need to query on it more performantly you can get very far.
The pain of data model migrations is also usually not that big if the company isn't very large and has a lot of legacy yet.
> use jsonb columns
These two statements are mutually exclusive in most cases. If you want JSON, don’t use a relational database.
IME, the “we’ll normalize this later” event never occurs.
I'm extremely curious to hear what you consider a better/simpler choice. At least postgres gives you the tools to do schema migrations, and if you're operating at a scale where such migrations become a problem (i.e. probably not for a while) you really ought to know what you're doing.