"[Three months prior to the incident] We upgraded our databases to a new minor version that introduced a subtle, undetected fault in the database’s failover system."
could have been prevented if you had stopped upgrading minor versions, i.e. froze on one specific version and not even applied security fixes, instead relying on containing it as a "known" vulnerable database?
The reason I ask is that I heard of ATM's still running windows XP or stuff like that. but if it's not networked could it be that that actually has a bigger uptime than anything you can do on windows 7 or 10?
what I mean is even though it is hilariously out of date to be using windows xp, still, by any measure it's had a billion device-days to expose its failure modes.
when you upgrade to the latest minor version of databases, don't you sacrifice the known bad for an unknown good?
excuse my ignorance on this subject.