Legally a copyright claim seems weak, but they didn't assert one. Some of their claims look stronger than others. The DMCA claim in particular strikes me as strong-ish at first glance, though.
Morally I think this class action is dead wrong. This is how innovation dies. Many of the class members likely do not want to kill Copilot and every future service that operates similarly. Beyond that, the class members aren't likely to get much if any money. The only party here who stands to clearly benefit is the attorneys.
I am more hesitant to release code on GitHub under any licenses now. Even outside of GPL-esque terms, I've considered open sourcing some of my product's components under a source available but otherwise proprietary license, but if Microsoft won't adhere to popular licenses like the GPL, why would they adhere my own licensing terms?
If my licenses mean nothing, why would I release my work in a form that will be ripped off by a trillion dollar company without any attribution, compensation or even a license to do so? The incentives to create and share are diminished by companies that won't respect the terms you've released your creations under.
That's just me as an individual. Thinking in terms of for-profit companies, many of them would choose not to share their source code if they know their competitors can ignore their licenses, slurp it up and regurgitate it at an incomprehensible scale.
I strongly disagree. There would be more innovation if code couldn't be copyrighted or kept secret. See: all of open source.
> I've considered open sourcing some of my product's components under a source available but otherwise proprietary license
What's the point of that? This isn't useful to anyone. The fact you even consider it shows you don't understand open source. I'm sure you happily use open source code yourself though.
I actually agree. However this is not what's happening. Copilot effectvely removes copyright from FLOSS code, but doesn't touch proprietary software. FLOSS loses it's teeth against the corporations.
The purpose of releasing source available but proprietary code is so that users can learn and integrate into it, and making it available lets anyone learn how it works. The only reason I even considered making the source available is balance between 1) needing to eat and 2) valuing open source enough to risk #1.
Please take your condescension elsewhere.
There is a ton of innovative stuff that is not open source. I don't see what open source has to do with innovation.
Is there a GutHub terms of agreement that covers Copilot?
It being in GitHub has not been brought up as a factor yet (by GitHub/Microsoft), AFAIK they could use code from other places with that logic, they just don't need to.
Why do you want to release code on GitHub with an oppressive license? What's the motivation for you, and what's the benefit for anyone else in it being released?
The size of code fragments being generated with these AI tools is, as far as I can tell, extremely small. Do you think you could even notice if your own implementation of sqrt, comments and all, wound up in Excel?
The problem (or A problem) with copilot is that it tries to sidestep those licenses, purpotedly allowing you to build upon the work of others without giving anything back even if the work you are building on has been published on the explicit condition that what you create with it should also be shared in the same way. While the great AI tumbler makes the legal copyright infringement argument complicated by giving you lots of small bits from lots of different sources it really does not change the moral situation: you are explicitly going against the wishes of the people that are enabling you to do what you are doing.
Beyond copyleft, this kind of disregard for other peoples wishes also applies to attribution even with more liberal licenses. Programming is already a field where proper attrubution is woefully lacking - we don't need to make it worse by introducing processes where it becomes much harder if not impossible to tell who contributed to the creation.
Now I am all for maximum code sharing. I'm all for abolishing copyright entirely and letting everyone build what they want without being shackled by so-called intellectual property. But that is not something Microsoft is doing with Copilot. What they have created is a one way funnel from OSS to proprietary software. If Microsoft had initially trained Copilot on their own proprietary sources this would have been seen very differently. But they did not. Because the way Microsoft "loves open source" is not in the way of a mutally beneficial symbiotic relationship but that of an abuser that loves taking advantage of whatever they can with giving as little back as they can get away with.
(And refusing to opt in shouldn't have to mean switching to a new hosting platform.)
> Beyond that, the class members aren't likely to get much if any money. The only party here who stands to clearly benefit is the attorneys.
That's the case in pretty much any class action. I look at class actions as having two purposes: to require that the defendant stops doing something, and to fine the defendant some amount of money. Sure, individual class members will see very little of that money, but I look at it as a way of hurting a company that has done people wrong. Hopefully they won't do that anymore, and other companies will be on notice that they shouldn't do those bad things either. Of course, sometimes monetary damages end up being a slap on the wrist, just something a company considers a cost of doing business.
That's my point. Many of the class members don't want the company to stop doing this.
I have code on GitHub, and Copilot is a useful tool. I don't care if my code was used to train the model. Sure, I personally could opt out of the suit, but that would be utterly meaningless in the grand scheme of things. The bottom line is, if I'm a coder with code on Github and I like Copilot, this suit is a huge net negative.
Even more importantly, I want to see the next version of Copilot that will be created by some other company, and then the next version after that. I want development to continue in this area at a high velocity. This suit does nothing but put giant screeching brakes on that development, and that is just a shame.
I have some code on Github as well and would not want it to be used in training, nor by Microsoft nor by other company. It is under GPL license to ensure that any derived use is public and not stripped of copyrights and locked into proprietary codebase, and copilot is pretty much 100% opposite of this.
that's the idea, yeah, and it would've been great if that's how copilot worked all the time
as for the whataboutism, if developers copied copyrighted code, the rights holder has the right to go after them, too, if they so choose
the rights holder could also choose to go after only big companies that violate licenses egregiously, if they so choose
you know, common sense and nuance
Now, Microsoft is violating other people's software licenses to repackage the work of numerous free and open source software contributors into a proprietary product. There is nothing moral about flouting the same type of contract that you depend on every day, for the sake of generating more money.
Either the entire Copilot dataset needs to be made available under a license that would be compatible with the code it was derived from (most likely AGPLv3), or Windows and Office need to be brought into the commons. Microsoft cannot have it both ways without legal repercussions.
If an AI model is the joint property of all the people who contributed IP to it, it’s a pretty hugely democratic and decentralizing force. It also will incentivise a huge amount of innovation on better, richer data sources for AI.
If an AI model isn’t joint property of the IP it learned then it’s a great way to build extractive business models because the raw resource is mostly free. This will incentivise larger, more centralised entities.
Much of the most interesting data comes from everyday people. A class action precedent is probably good for society and good for innovation (particularly pushing innovation on the edge/data collection side)
With current technology, the only licensing model we can offer is "give us your training set example, we'll chuck a few pennies at you out of credit sales and nothing more". We can't even comply with CC-BY because the model can't determine who to attribute to.
This legal challenge is coming one way or another. I think it’s better to get it out of the way early. At least then we will know the rules going forward, as opposed to being in some quasi-legal gray area for years.