edit: not sure why my question deserved a downvote...
It's a shame Amazon doesn't have thousands of employees to divide these tasks between different people, as it is only these busy operators who could update this status page.
If you're right, why have the status page then? It is useless by your definition yes?
Its even more frustrating when you are aware of problems early on and start talking to support and THEY don't even know about problems yet.
Maybe the thousands of people is what prevents status from being updated, everyone tries to hide their own faults internally even
Which is why, during incident responses, there has to be people in charge of communication. Both internal and external communication, and some of this can be further delegated.
That's a poor excuse.
> It requires escalation up the management chain and careful wording
Careful wording is more important for external stakeholders who might not have the full context. If one is stepping in eggshells with internal management too, that's bad management. Incident communication should be factual and concise.
Could not agree more. It's immensely frustrating working with organisations that spend more time trying to cover up the cause of a outage to external stakeholders than actually fixing the root cause.
The same organisations tend try and blame individuals for outages.
I think both are a symptom of businesses that embrace the "blame culture"
"A top of rack switch let out the blue smoke and it'll be ~30 before we can re-rack it" would impact what fraction of a fraction of a percent of canaries? Irrelevant to me, unless of course my VM lives on a box backed by that switch. ;)
The status dashboard exists for us to laugh at when things break and to convince C*Os that everything is fine. That's it.
EDIT: 15 minutes later and the board is looking worse again.