[1] http://www.infoq.com/presentations/Debugging-Production-Syst...
In the end, its a wash.
Engineering is about handling what goes wrong, not what goes right. It's about handling the errors, changes, misuse, etc. It isn't about the techniques per say, as much as the mindset of living in an imperfect world.
[Edit: Fixed a typo.]
There are a certain series of things you have to hit in a fairly hyper-dimensional world, dodging constraints, hurdling uncertainty and taking risk in your stride as you struggle to make products that work, delight consumers and make bank.
It's like a complex and exquisite ballet really, with suppliers, manufacturers, producers and designers all coming together to make extraordinary products that astonish the world.
Ah, I love engineering.
[1] https://news.ycombinator.com/item?id=4238984
> designing a rocket engine is a massive game of high dimensional parameter whack-a-mole, it's very difficult to get a passable configuration without a lot of iteration and forwards-backwards passes
[1] http://robertheaton.com/2013/04/01/check-youre-wearing-trous... [2] http://www.amazon.com/Clean-Coder-Conduct-Professional-Progr...
All software has bugs and these specific problems have been fixed in newer versions, but they are super scary issues to run into with your distributed coordination service.
I see a large amount of legacy maintenance in cost center based programming. Revenue generating industry channels seem to favor the enterprise aspect of software engineering. Startups attempt to just build, and fix as necessary (cowboy). Yet, each has their own facet of software engineering.
I am still trying to draw the line between too-enterprisy, too-maintenancy, and too-cowboy. At my current job, we assume everything is certain. The uncertainties are not coded for, because everything is internal. This bothers me to a large extent. I love coding for the uncertain. Giving more control to the user and automating a whole department is right up my court. Sadly, it is hard to convert people. Only the 'RU' in CRUD is in the user's hands most of the time. It is pure legacy fear.
The removing cascading failures part needs more emphasis. Remove portions from your cycle/automation/jobs. What happens? I also agree with the measure and monitor portion. Waiting to create analyzers and looking at metrics once the program starts breaking in production is too late.
Looking forward to the next posts.
However, I do agree that handling the huge and complex range of inputs, not only the expected ones, is a great beginning to the process, one that is often overlooked. And same goes for internal monitoring, to make sure your system is still functioning as designed.