Not entirely. Some issues were fixable, like moving our RabbitMQ cluster away from RHEL to AWS. But others weren't. There was an upstream service we depended on that went down, that caused a cascading failure. It was the company's core product, a massive Java program running on bare metal that frequently OOM-killed our service, and even though it was the big money-maker, no team owned it, and nobody understood how it worked. I don't remember why our service had to share a host with this monster, but there was a good reason and it just couldn't be worked around.