Thanks for the links below, reading these opinions took me two more hours of my time but helped me grind some thoughts!
First a quick answer to your two last questions. Programs and binaries are widely recognized as copyrightable. What I am wondering is whether the action of compiling a program constitutes a contribution worthy of protection and of additional copyright. To give a concrete example, imagine I am a company that uses gcc and big machines to provide compilation as a service. You feed it a BSD-licensed source code. My server returns a binary on which I claim a proprietary copyright. Are you allowed to dismiss it as being just the result of a totally deterministic and automated process and reclaim it as BSD? I would argue yes but it could be a non-obvious court case.
Anyway, I don't think I agree on the comparison between compilation and training.
> Do you have a link to those fair use rulings?
I was thinking about this [1] ruling (Authors Guild, Inc. v. Google, Inc.) in which Google scanned commercial books and used this obviously non-free dataset to provide in-text search mechanisms. I am pretty bitter about the fact that one of the main reason for the favorable outcome (Google won) was that the judge estimated it had an "obvious" usefulness when the ruling finally happened, some 10 years after the scanning started at which point it was certainly not appearing obvious to non-tech people. So Google had to prove a tech while in a legal grayzone, a luxury orgs like Debian may not have.
------------------
Now for the real meat :-)
> The Debian ML policy linked above goes a fair way to making truly open source deep learning models
Actually, I am wondering if they are not a bit blinded by the way the GPL works and if they don't constraint themselves a bit artificially by imaginary legal precedent.
They all seem to assume that a trained model will be recognized as a compiled binary, but I see at least 5 competing comparisons that were proposed and could hold ground legally:
1. Trained models as compiled binary
2. Compilation of facts as proposed here [2]. I find it pretty persuasive even if its author dismisses it for what I think is not a good argument.
3. Rendered 2D image from a 3D model
4. 2D photograph of a real 3D object
5. Training as a copyrightable creative creation [3]
It is understandable that Debian maintainers think about everything in terms of programs and source but I feel they shoehorn a bit that notion in the case of machine learning and may not realize how much more flexible the legal framework actually is.
Admittedly, I am less interested in the consequences of slapping the GPL on a trained model than I am about finding a way to solve the potential problems caused by bad actors in the field, just like FOSS did it for regular software. I am strongly suspecting we may have to write a viral license adapted to ML.
One of my example is how would one go to prevent one's work being used by OpenAI the day they decide to refuse releasing their trained models? Or to prevent helping Google or Facebook gained an even more dominant position by adding data to an already good model?
We benefit a lot from the fact that, right now, there seems to be genuinely good will from wealthy actors to contribute to the research community but it feels to me like a Mexican standoff. What happens when one decides to run off with what is published and secretly improves it for commercial gains?
I must say that I have been happily surprised by how much things are free for use right now, from research, algorithms, frameworks and trained models. We avoided a lot of dystopias, probably through some unsung heroe researchers who imposed openness to their employers upon being hired.
The risk still exists though, as all this openness can be reversed on a whim. Basically, I am wondering how we can put all the chances on our sided that the first AGI will benefit the humanity instead of its owner?
Sorry for the wall of text, but if you are still there and would like to continue that discussion, here is fine, but real time discussion is also fine, you can shoot me a mail at yves.quemener@gmail.com and we can do Hangout or Signal from there.
[1] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....
[2] https://lists.debian.org/debian-devel/2018/07/msg00175.html
[3] https://lists.debian.org/debian-devel/2019/05/msg00380.html