This isn't a GitHub only issue but rather one that would affect all quick-to-launch startups (most). What I'm learning from this is that one needs to regularly revisit the infrastructure and how it's glued together with the provisioning system.
If it's not broken, break it.
With tools like Chef & AWS CloudFormation, there shouldn't be an excuse.
I would love more detail in the type II error in this validation step, and is worth exploring deeper. What was the verification step? Why did it not detect the issue? What review process was used for the verification step?
While the failed verification step is not the root cause, having good safety checks are the most important part of planning good changes, whether they're DNS reconfigurations, network changes or software deployments.
Maybe I'm just too careful (perhaps because I've seen it happen before) but I prefer to wait a helluva lot longer than that for verification.
And perhaps it's because I dealt with Microsoft Active Directory so much in the past but I am extremely careful when it comes to DNS. If there's one thing that'll screw up your entire environment (especially in an AD-based network), it's broken DNS.
I doubt there is a time when they wouldn't have disrupted a significant part of their userbase. Even if you assume a specific place has the majority of users (San Fransisco, Germany, whatever) developers tend to work odd hours anyway.
This has the great effect of lowering the median travel times and information transmission latencies between the world's population centers, and it means that for at least this geological epoch; we're always going to have daily global peak and off-peak times for human-driven activity.