Crucially however, a second software bug in the management software did not propagate the canary step’s conclusion back to the push process, and thus the push system concluded that the new configuration was valid and began its progressive rollout.It seems obvious to me that the push system should not proceed without confirmation from the management software, and the management software should not confirm the change is OK if it detects failure.
I see a straightforward defect here, not a confluence of edge cases.