> So, the Minecraft server should work reliably and, if it goes down, I should know well before they do
How are metrics helpful? There is so much fun that could be had in setting up an actually resilient system instead.
Why worry over metrics and alerts when you could orchestrate an infrastructure that grants you the superpower of being able to spin up a server with a copy of the world on a whim instead (or even a system that auto-starts one whenever there is demand)?
As you said "There is so much fun that could be had in setting up an actually resilient system instead.", maybe the author has more fun setting up alerts and metrics instead of a resilient system like you do?
The truth is that in most real-world scenarios getting alerts, metrics is much more important than building a fully resilient system (Expensive, maybe overengieering for early stage etc.).
> However, the author is not really procrastinating—he gets paid for this. As the first sentence in the blog post says "One of the secret pleasures of life is to be paid for things you would do for free.", which I can very much understand as I often work or play with things I could use at work in my free time.
Adding the backup for the world files, already having Systemd bringing back a crashing server, makes the setup rather resilient. Sure, there's infinite more things that can go wrong, but with swiftly decreasing likelihood.
> The truth is that in most real-world scenarios getting alerts, metrics is much more important than building a fully resilient system (Expensive, maybe overengieering for early stage etc.).
This, very much this.
> However, the author is not really procrastinating—he gets paid for this. As the first sentence in the blog post says "One of the secret pleasures of life is to be paid for things you would do for free.", which I can very much understand as I often work or play with things I could use at work in my free time.
Yes :-)
Funny, because I have the opposite opinion. Build for failure first; if it’s critical/production then also monitor, but if an earthquake takes down an EC2 zone and you have no ability to spin it up exactly the way it was then the avalanche of alerts and metrics falling off a cliff[0] isn’t exactly going to help you (or your mental well-being).
Generally speaking, if you build for failure first, then monitoring becomes much more useful and actionable; and simultaneously it becomes much less important for a hobby project.
[0] That assuming you gather them from a different zone that wasn’t affected by the same downtime in the first place; speaking of, how are you monitoring your monitors? and so on.
Metrics are the means to an end of alerting. And with alerting, I mean getting pinged on my phone when something important breaks. Like, you know, the server going down.
> Why worry over metrics and alerts when you could orchestrate an infrastructure that grants you the superpower of being able to spin up a server with a copy of the world on a whim instead (or even a system that auto-starts one whenever there is demand)?
As somebody who has run cloud and enterprise software for almost two decades now, I can be that needs monitoring too. The more moving parts there are, the more things go wrong. The more things go wrong, and the more you care they get fixed, the more monitoring you need :-)
> As somebody who has run cloud and enterprise software for almost two decades now, I can be that needs monitoring too
To be clear, I strongly believe that if you run anything seriously in production, you must monitor it—but first you need to be able to spin it back up with minimal effort. It may take a while to get there if you just inherited a rusty legacy snowflake monolith that no one dares to breathe the wrong way near, but if you are starting anew it is a bad mistake to not have that down first considering how straightforward it is nowadays.
Then, for hobby projects of low criticality (because people in this thread mistakenly assume I mean any personal project, I have to reiterate: nothing controlling points of ingress into your house or the like), you may find that once you have the latter, the former becomes optional and not really that interesting anymore.