And in general, nomad worked pretty well for us but our consul cluster kept mysteriously failing. I think that caused our nomad cluster to fail because it was backed by Consul.
The one complaint I did have about nomad (same as consul) was that the recovery process was manual where you had to manually generate a peers.json.
I was shocked when I saw that. Truly one of the "finding how the sausage is made" moments even though I've managed linux servers for two decades – I always assumed it would use zeroconf/bonjour/multicast DNS (remember cloud auto-join?) or something similarly elegant to auto discover other nodes in the network and just reconnect and rebuild a cluster. I mean what's the point of all this stuff if it can't be used to recover a cluster and just Do The Right Thing™? The shiny new experience is stellar (like sales, or setting up a new cluster), but the flip side (when things go wrong) is a mess. That's why we eventually said "nope!" to all the custom stuff and went with boring, plain vanilla ECS, which is itself too much now that we've started using fly.
Don't ever want to even think about having to hand-write a peers.json file to recover a cluster, boot things up, and pray to the ancient gods that it works.
We don't have time for that nonsense. Please, take my money, Fly/Render/everyone else. Your costs are a margin of error compared to what I had to pay a devops person to build our own stack. (I'm not even exaggerating. It was six figures. DevOps people are worth every penny but they cost many, many pennies.) Ultimately, we never used the infra.
I want to focus on building solutions for my customers and not fiddling with weird server stuff.