FWIW this seems to be the current interpretation of copyright laws when it comes to machine learning, at least in the US. The only questions I've really seen about the legality of Copilot is about it reproducing code and whether that reproduction is fair use or not. But few are arguing that training the model itself on any available source is violating fair use.