Brief non-operation (reboot / service restart) is often better than a prolonged outage. Particularly where SLAs are set to create an expectation and acceptance of this, and where redundancy exists.
I'm thinking too that there's a feedback process at work here, and some sort of damping mechanism would help with that.