We should always be doing (the thing we want to do)
Somme examples that always get me in trouble (or at least big heated conversations)
1. Always be building: It does not matter if code was not changed, or there has been no PRs or whatever, build it. Something in your org or infra has likely changed. My argument is "I would rather have a build failure on software that is already released, than software I need to release".
2. Always be releasing: As before it does not matter if nothing changed, push out a release. Stress the system and make it go through the motions. I can't tell you how many times I have seen things fail to deploy simply because they have not attempted to do so in some long period of time.
There are more just don't have time to go into them. The point is if "you did it, and need to do it again ever in the future, then you need to continuously do it"
Consider publishing a new version of a library: you'd be bumping the version number all the time and invalidating caches, causing downstream rebuilds, for little reason. Or if clients are lazy about updating, any two clients would be unlikely to have the same version.
Or consider the case when shipping results in a software update: millions of customer client boxes wasting bandwidth downloading new releases and restarting for no reason.
Even for a web app, you are probably invalidating caches, resulting in slow page loads.
With enough work, you could probably minimize these side effects, so that releasing a new version that doesn't actually change anything is a non-event. But if you don't invalidate the caches, you're not really doing a full rebuild.
So it seems like there's a tension between doing more end-to-end testing and performance? Implementing a bunch of cache levels and then not using it seems counterproductive.
1) I want to invalidate caches, I want to know that these systems work. I want to know that my software properly handles this situation.
2) if I have lazy clients. I want to know. And I want to motivate them on updating sooner or figure out how to force update them. I don’t want to not update because some people are slow. I want the norm to be it is updating, so when there is a reason to update, like a zero day, I can have some notion that the updates will work and the lazy clients will not be an issue.
I am not talking about fake or dry runs that go through some portion of motions, I want every aspect of the process to be real.
Performance means nothing if your stuff is down. And any perceived performance gained by not doing proper hygiene is just tweaking the numbers to look better than they really are.
You can try and predict everything that'll happen in production, but if you have nothing to extrapolate from, e.g. because this is your very first large live event, the chances of getting that right are almost zero.
And you can't easily import that knowledge either, because your system might have very different points of failure than the ones external experts might be used to.
I can’t tell you the I bet of times things worked because the cache was hot. And a restart or cache invalidation would actually cause an outage.
Caches must be invalidated at a regular interval. Any system that does not do this is heading for some bad days.
While I too am generally a long-term sort of engineer, it's important to understand that this is a valid argument on its own terms, so you don't try to counter it with just "piffle, that's stupid". It's not stupid. It can be shortsighted, it leads to a slippery slope where every day you make that decision it is harder to release next time, and there's a lot of corpses at the bottom of that slope, but it isn't stupid. Sometimes it is even correct, for instance, if the system's getting deprecated away anyhow why take any risk?
And there is some opportunity cost, too. No matter how slick the release, it isn't ever free. Even if it's all 100% automated it's still going to barf sometimes and require attention that not making a new release would not have. You could be doing something else with that time.
A great engineering team will identify a tax they dislike and work to remove it. Using the same example, that means improving the success rate of deployments so you have the data (the success record) to take to leadership to change the policy and remove the tax.
It is just a reframing of build vs maintain.
Additionally, refactor circle jerks are terrible for back-porting subsequent bug fixes that need to be cherry picked to stable branches.
A lot of of the world isn’t CD and constant releases are super expensive.
"Test what you fly, and fly what you test" (Supposedly from aviation)
"There should be one joint, and it should be greased regularly" (Referring to cryptosystems I think, but it's the same principle. Things like TLS will ossify if they aren't exercised. QUIC has provisions to prevent this.)
> 2. Always be releasing...
A good argument for this is security. Whatever libraries/dependencies you have, unpin the versions, and have good unit tests. Security vulnerabilities that are getting fixed upstream must be released. You cannot fix and remove those vulnerabilities unless you are doing regular releases. This in turn also implies having good unit tests, so you can do these builds and releases with a lower probability of releasing something broken. It also implies strong monitoring and metrics, so you can be the first to know when something breaks.
Nitpick: unit tests by definition should not be exercising dependencies outside the unit boundary. What you want are solid integration and system tests for that.