I don't think you should be looking at the best of the Microsoft/GitHub corpora to gauge their overall quality. You probably want to be looking at the quality of the median project, which is going to be heavily influenced by the long tail of low quality projects.
IMO, the long tail of non-code-reviewed, written-by-someone-in-their-first-month-of-coding, barely-even-compiles noob code[0] in Github is going to be orders of magnitude larger than the long tail of crap in Microsoft's internal repos.
[0] Hey, everyone has to start somewhere. There's nothing wrong with your first "hello world" program being buggy - that's what being a beginner means. But it's probably not the sort of code you want to train an LLM on.