In load testing there's a difference between testing if something executes as specified versus testing where it breaks. I'm pretty sure that GitHub has tests to validate their performance specification. They may have tests even far in excess of that. They may not have tested where it breaks. They may have had a discussion like: what if someone tries such and such, and their answer may have been: we have good monitoring, we'll catch it before it gets out of hand, and lock out the user in question (which ultimately is what they did). In other words, spending engineering resources to determine the breaking point may not have been a priority. I'm not saying that I would agree with that in all circumstances, but it's their determination to make.
Sure, that all makes sense. It’s still true that someone stressing git and GH’s services in this particular way produced information that wasn’t especially redundant. Monitoring was good at catching it, but probably based more on service quality than on the actual thing under stress. Now there’s some data about the thing under stress, and if nothing else that allows some knob turns to calibrate monitoring. And if nothing else, that would more readily catch someone doing the same with nefarious purposes.