If these folks win - we again throw progress under the bus.
There is no technical reason why Microsoft can't respect licenses with Copilot. But that would mean more work and less training input, so they do code laundering and excuse it with comparisons to human learning because making AI seem more advanced than it is has always worked well in marketing.
Edit: And where do you draw the line between "learning" and copying? I can train a network to exactly reproduce licensed code (or books, or movies) just like a human can memorize it given enough time - and both of those would be considered a copyright violation if used without correct attribution. If you trained an AI model with copyrighted data you will get copyrighted results with random variation which might be enough to become unrecognizable if you're lucky.
I'd pose a question to you - would it be okay for me to copy/paste your code verbatim into my paid product in violation of your license and claim that I'm just using it for "learning"?
Yes, but attribution should still be given. Just because you don't copy-paste someone else's creation doesn't mean you're licensed to use it.
What if, instead of a tool, you had a random consultant do some work, and it was found out that he asked a ton of stuff on Stack Overflow and copied the CC-BY-SA 4.0 answers into his work? What if it was then found out that one of those answers was based on copying something from the Linux kernel? Who is responsible for doing the license check on the code before releasing the product?
Do you know whether the code you got from Copilot has an incompatible license? No, so if you plan to use Copilot for serious projects you need it to include sources/licenses either way. In fact that would be a very helpful feature as it would let you filter licenses.
Hard no. Please stop using open source code if this is how you think of it.
Without licenses being respected, we don't get open source communities.
So why MS can screw only with some licenses that you call "open source". Your example with a human reading a book would also work with code available licenses or decompiled binaries.
I would have been fine if the open source code was used to create an open model or if MS would have put his ass on the line and also train the model with all the GitHub code because they claim there is no copyright issue.
P.S. I am not a lawyer.