To be specific, the FAQ states: "It has been trained on natural language text and source code from publicly available sources, including code in public repositories on GitHub."
Some have raised concerns that Copilot violates at least the spirit of many open source licenses, laundering otherwise unusable code by sprinkling magic AI dust... most likely leaving the Copilot user responsible for copyright infringement.
Been a hell of a decade, hasn't it.
The wisdom of crowds works best when:
1. participants are independent (otherwise you may get failure modes, such as "groupthink" or "information cascades")
2. participants are informed, but in different ways, with different opinions;
3. there is a clear, accepted aggregation mechanism, where individual errors "cancel out" to some degree
I view the topics in James Surowiecki's book (or the Wikipedia summary of it, at least) as required thinkinpg for everyone, preferably synthesized with a study of statistics and political economy.
In particular, the Wikipedia article's section on "Five elements required to form a wise crowd" is a slightly different slicing of the required elements that I offer above.
* If you read that section, trust is listed. I, however, don't see trust as a necessary condition for a "wise crowd". Trust is often useful (or even necessary) when a collective decision is used for governance, decision-making, and policy.
AI is just recomposition of existing snippets of code, art, text, music, etc. Does an AI fall under fair use? What happens when an AI produces something too similar to an existing work or trademark. I know the computer won't get sued, the owner/user will. But still, it's a hard problem.
Even if Copilot was initialized with snippets from Open Source Software (exclusively), it doesn't mean that copyright infringement isn't a concern.
It's not random recomposition, which is worthless. It's useful recomposition, adapted to the request and context. It adds something of its own to the mix.