This is a silly opinion to hold, isn't it? I mean, you release projects under a license with the express purpose of freely distributing your code among anyone in the world that may have any interest whatsoever, and even allow they themselves to share it with anyone they feel fit. But you are somehow outraged if people actually use said code?
Please make it make sense.
You're making things up: the outrage is not that people used it, it's that the licence requires attribution at least, and opening the derivative product at worst. Token providers that trained on open source did neither.
> Please make it make sense.
I am skeptical that you didn't know the reason for the outrage because it's been repeated in every single thread where this was discussed.
I myself repeated it multiple times each time this feigned confusion you display appears.
Like I am doing now, yet again.
So? Just because a piece of output data is encrypted or compressed and does not resemble the input, does not mean that the process did not take the input.
We have decades of law that regards zipped files as infringment, lossy compression (MP3's) as infringment, etc.
> guess another way to put it is show your code in the output of an llm that isn't being attributed correctly.
Well, a better way of putting it is answering the question "Will that model have existed had none of the code used as input existed".
IOW, can that model be generated or created without first having all that copyrighted code used as input?
National Law Review covered some of those nuances last year: https://natlawreview.com/article/federal-courts-issue-first-...
US Copyright Office has a substantial document discussing each of the four factors, and making it clear this is an unanswered question, and details of the particular case will decide which way courts go. It is a prepublication version, and it's over 100 pages, but it covers the issues well, citing arguments on all sides.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
What are you talking about? There is no distribution, only read access.
That said, FOSS licenses are non-exclusive. Regarding the original upthread topic of GitHub's copilot training, iirc GitHub's terms and conditions involve granting them a license in order to host your code. Depending what else is in those terms, they may have had the ability to use all hosted code for LLM training through that license, instead of the FOSS licensing on any given Open Source repo. But that would only apply to GitHub/Microsoft, not third party scrapers.