> At times, it sounded like the case was the authors’ to lose, with [Judge] Chhabria noting that Meta was “destined to fail” if the plaintiffs could prove that Meta’s tools created similar works that cratered how much money they could make from their work. But Chhabria also stressed that he was unconvinced the authors would be able to show the necessary evidence. When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”
> When defendants invoke the fair use doctrine, the burden of proof shifts to them to demonstrate that their use of copyrighted works is legal. Boies stressed this point during the hearing, but Chhabria remained skeptical that the authors’ legal team would be able to successfully argue that Meta could plausibly crater their sales. He also appeared lukewarm about whether Meta’s decision to download books from places like LibGen was as central to the fair use issue as the plaintiffs argued it was. “It seems kind of messed up,” he said. “The question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”
Now that big capital wants to steal from individuals, big capital wins again.
(Unrelatedly, has Boies ever won a high profile lawsuit? I remember him from the Bush/Gore recount issue, where he represented the Democrats.)
The argument for 'fair use' in DVD copying/sharing is much weaker since the thing being shared in that case is a verbatim, digital copy of the work. 'Format shifting' is a tenuous argument, and it's pretty easily limited to making (and not distributing) personal copies of media.
For AI training, a central argument is that training is transformative. An LLM isn't intended to produce verbatim copies of trained-upon works, and the problem of hallucination means an LLM would be unreliable at doing so even if instructed to. That transformation could support the idea of fair use, even though copies of the data are made (internally) during the training process and the model's weights are in some sense a work 'derived' from the training data.
If you analogize to human leaning, then there's clearly no copyright infringement in a human learning from someone's work and creating their own output, even if it "copies" an artist's style or draws inspiration from someone's plot-line. However, it feels unseemly for a computer program to do this kind of thing at scale, and the commercial impact can be significantly greater.
They seek to convert them into more products. The needs of the copyright holders , who are relatively small businesses and individuals are outweighed by the needs of Meta.
Sarah wanting to watch a movie or listen to music... Too bad she doesn't have an elite team of lawyers to justify whatever she wants.
In practice Meta has the money to stretch this out forever and at most pay inconsequential settlements.
YouTube largely did the same thing, knowingly violate copyright law, stack the deck with lawyers and fix it later.
Here's this: >Boies also was on the Theranos board of directors,[2][74] raising questions about conflicts of interest.[75] Boies agreed to be paid for his firm's work in Theranos stock, which he expected to grow dramatically in value.[75][3]
https://en.wikipedia.org/wiki/David_Boies
That was one of the decisions of all time.
Did any of the defendants raise a fair use defense based on a transformative use that they were making of the downloaded copies? If not, you are in the domain of "unlike legal situations lead to unlike decisions" which is not exactly surprising.
He teamed up with opposing counsel from Bush v. Gore, Ted Olson, and the pair of them represented plaintiffs in Hollingsworth v. Perry, the SCOTUS case which overturned Prop 8, California's gay marriage ban.
[0] https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...
now, when the mega-corporations do it, it is 'just the cost of doing business'.
in both cases, the mega-corporations win because...they have the most money. law, and certainly justice, is not for the poor. at least not in america.
This trial is way beyond the statutes and case law. The judge is doing a job, hard to conceive what the best job would be - I'm not sure Congress even knows what the policy should be or if the public has even the faintest wiff of how things should work.
Better to read the submission before drawing conclusions rather than only the HN title. In this case the HN title has been editorialised.
The actual title of the article is "A Judge Says Meta's AI Copyright Case Is About `the Next Taylor Swift'"
"The judge didn't make any sort of ruling, this is just reporting on a pretrial hearing."
The HN title doesn't mention anything about a "ruling". Nor does the title chosen by Wired.
The subheading in the article reads "Meta's contentious AI copyright battle is heating up-and the court may be close to a ruling."
That is accurate. The Court will soon decide the SJ motions.
Reading the article leaves no chance of being mislead by any title:
"If Chhabria grants either motion, he'll issue a ruling before the case goes to trial-and likely set an important precedent shaping how courts deal with generative AI copyright cases moving forward."
"LLM, please summarize Sarah Silverman's memoir for me."
edit: Reader's Digest would be very surprised to know that they shouldn't have been paying for books.
1. Training AI on freely available copyright - Ambiguous legality, not really tested in court. AI doesn't actually directly copy the material it trains on, so it's not easy to make this ruling.
2. Circumventing payment to obtain copyright material for training - Unambiguously illegal.
Meta is charged with doing the latter, but it seems the plaintiffs want to also tie in the former.
The judge in this case seems to disagree with you, not accepting the premise that downloading the material from pirate sites for this use inherently gets the plaintiffs an out from having to address fair use defense as to the actual use.
> the plaintiffs want to also tie in the former.
No, the defense wants to and the judge hasn't let the plaintiffs avoid it the way you argue they automatically can.
This is a good point, as a reminder, the Folsom tests (failing or passing any one is not conclusive, they are to be holistically considered) are:
- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes (Note also that whether or not the use is transformative is part of this test).
- the nature of the copyrighted work
- the amount and substantiality of the portion used in relation to the copyrighted work as a whole
- the effect of the use upon the potential market for or value of the copyrighted work
https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors
What is inspiration? What is imitation? What is plagiarism? The lines aren't clearly drawn for humans... much less for LLMs.
I can absolutely guarantee you that neither DeepSeek nor Alibaba's highly talented Qwen group will care even a little bit, in the long run. Not if there's value to be had in AI. (And I can tell you down to the dollar what LLMs can save in certain business use cases.)
If the US decides to unilaterally shut down LLMs, that just means that the rest of the world will route around us. Whether this is good or bad is another question.
You assume that getting tested means the AI trainers lose, and also thar the model architectures that have been developed can’t be retrained from scratch with public domain, owned, and purpose-licensed material. (With several AI companies having been actively pursuing deals to license content for AI training for a while now.)
End of the road for major AI companies, and hopefully something better can be created once it's declared illegal without any murky waters.
There are LLMs trained on data that isn't illegally obtained, OLMo by Ai2 is one such model, that is actually open source and uses open data for training. Just because it's "very difficult" for OpenAI et al shouldn't be an argument to force them to behave ethically anyways. If they cannot survive acting legally, then so be it, sucks for them.
I see no reason why we should even entertain the idea of extending human rights to computer programs, and so far, nobody has been able to give me any good reasons why.
Furthermore, why are we only entertaining the human rights that can be used for profit-driven purposes? Why do LLMs, for example, not have the right to free speech? Or an attorney? It seems highly unethical to grant these computer programs some protections as if they're humans but not grant them personhood. This is akin to slavery, which is something we actually have to consider. Anthropomorphization is a double-edged sword. We cannot simultaneously consider them human when convenient and then consider them programs when it's not. Or, if we want to do that, we need to form coherent argument to why, how, and when.
Because the obvious question would be - how can free people compete with that?
https://273ventures.com/kl3m-the-first-legal-large-language-...
So, it's really the majority of companies breaking the law who will be affected. Companies using permissible and licensed works will be fine. The other companies would finally have to buy large collections of content, too. Their billions will have go to something other than GPU's.
Of course it does. Large models are trained on gigantic clusters. How can you train without copying the material to machines in the cluster?
I thought the copyright infringement was by the people who provided the copyrighted material when they did not have the rights to do so.
I may be wrong on this, but it would seem a reasonable protection for consumers in general. Meta is hardly an average consumer, but I doubt that matters in the case of the law. Having grounds to suspect that the provider did not have the rights might though.
They also mention Books3, but they don't appear to actually allege anything against Meta in regards to it and are just providing context.
I don't think it actually changes anything material about this complaint if Meta bought all the books at a bookstore since that also doesn't give you the right to copy the works.
The original complaint is 2 years old though, so I don't really know the current state of argumentation.
https://www.courtlistener.com/docket/67569326/1/kadrey-v-met...
Note that incidental copying (i.e. temporary copies made by computers in order to perform otherwise legal actions) is generally legal, so "copying" in the complaint can't refer merely to this and must refer more broadly to the model itself being a copy in order to have standing.
The final say may ultimately come from the Cox vs Record Labels case from 2019 that is still working it's way through the appeal courts.
If the record labels win their appeal, anyone who helped facilitate the infringement can be brought into a lawsuit. The record labels sued Cox for infringement by it's users. It's not out of the question that any ISP that provides Internet connectivity to Facebook could be pulled in for damages.
For Meta these two cases could result in an existential threat to the company, and rightly so because the record labels do not play games. The blood is already in the water.
IP that had been previously loaded by Blizzard itself
https://en.wikipedia.org/wiki/MDY_Industries,_LLC_v._Blizzar....
IANAL, but it doesn't look that hard. On first glance this is a fair use issue.
What an LLM spits out is pretty clearly transformative use. But the fact that it pulls not only the entirety of the work, but the entirety of MOST works means that the amount is way beyond what could be fair use. Plus it's commercial use. Put it together and all LLMs are way illegal.
What do you mean by "pulls"?
What matters in traditional fair use is how substantially your output copies the work (among other factors). Your input is generally assumed to be reading/watching/listening to the entire work, and there is no problem with that.
As a (creative) friend of mine flatly said, they refuse to use an LLM until it can prove where it learned something from/cite its original source. Artists and creatives can cite their inspirational sources, while LLMs cannot (because their developers don't care about credit, only output) by design. To them, that's the line in the sand, and I think that's a reasonable one given that not a single creative in my circles has been cut payment from these multi-billion-dollar AI companies for the unauthorized use of their works in training these models.
[1] https://arxiv.org/abs/2504.07096
Even humans have a lot of internalized unconscious inspirational sources, but I get your point.
Regardless, deep learning models are valuable because they generalize within the training data to uncover patterns and features and relationships that are implicit, rather (simply) present with the data. While they can return things that happen to be within the training set, there is no reason to believe that any particular output is literally found there or is something that could be attributable, or that a human would ever attribute. Human artists also make meaning from the broad texture of their life experiences and general diffuse unattributable experience of culture.
Sure, this is something a random artist is unlikely to know, but if they are simply refusing to pick up a useful tools that can't give credit--say avoiding LLMs for brainstorming, or generative selection tools for visual editing, or whatever, their particular careers will be harmed by their incurious sentimentality, and other human artists will thrive because they know that tools are just tools, and it is the humans using the tools that make meaning that people care about.
Why? Was it legal for me to download copyrighted songs from Limewire as "fair use"? Because a few people were made examples of.
I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)
I would be happy with that outcome. I’m a fanfiction writer, and a lot of the stories I read are very much for learning. ;-)
Secondly, there's an argument that the infringement happens only when the LLM produces output based in part of whole on the source material.
In other words, training a model is not infringing in itself. You could "research" with it. But selling the output as "from your model" is highly suspect. Your business is then based on selling something based other people's work, that you do not have rights to.
What fair use? Were the books promised to them by god or something?
We need to frame this case - and ongoing artist-vs-AI-stuff -using a pseudoscience headline I saw recently: 'average person reads 60k words/day'.
I won't bother sourcing this, because I don't think it's true, but it illustrates the key point: consumers spend X amount of time/day reading words.
> It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books.
and from the article:
> When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”
The market share an author (or any other artist type) is competing with for Meta is not 'what if an AI wrote celebrity memoirs?'. Meta isn't about to start a print publishing division.
Authors are competing with Meta for 'whose words did you read today?' Were they exclusively Meta's - Instagram comments, Whatsapp group chat messages, Llama-generated slop, whatever - or did an author capture any of that share?
The current framing is obviously ludicrous; it also does the developers of LLMs (the most interesting literary invention since....how long ago?) a huge disservice.
Unfortunately the other way of framing it (the one I'm saying is correct) is (probably) impossible to measure (unless you work for Meta, maybe?) and, also, almost equally ridiculous.
Legal cases are often based on BS, really an open form of extortion.
The plaintiffs might've been hoping for a settlement.
Meta could pay $xM+ to defend itself.
Maybe they thought Meta would be happy to pay them $yM to go away.
The reality is, there's very little Meta couldn't just find a freely available substitute for if it had to, it might just take a little more digging on their end.
The idea that any one individual or small group is so valuable that can hold back LLMs by themselves is ridiculous.
But you'll find no end to people vain enough to believe themselves that important.
To make fair use of a book's passage, you have to cite it. The except has to be reasonably small.
Without fair use, it would not be possible to write essays and book reviews that give quotes from books. That's what it's for. Not for having a machine read the whole book so it can regurgitate mashups of any part of it without attribution.
Making a parody is a kind of fair use, but parodies are original expression based on a certain structure of the work.
That's not true. That's what's required for something not to be plagiarism, not for something not to be copyright infringement.
Fair use is not at all the same as academic integrity, and while academic use is one of the fair use exceptions, it's only one. The most you would have to do with any of the other fair use exceptions is credit where you got the material (not cite individual passages), because you're not necessarily even using those passages verbatim.
Neither of them died, though, both parties just kept all the books from the public and used them for their own purposes, while normal people had to squirrel them away and trade them illegally. It's the Tech Cartels vs. the Copyright Trolls. It'll end up as a romance.
Letting Meta launder copyrighted works to make billions, while threatening the rest of us over the most trivial derivative work, sounds like the worst outcome to me.
Copyright is a mistake. It demands that we compete instead of collaborate. LLMs don't provide enough utility to deserve special treatment in these circumstances. If anyone can infringe copyright, then everyone should be able to.
Where was this argument when Napster was being sued?
Now, if someone makes an infringing use of the thing I put up on the internet. Then I have some kind of recourse, at least through the courts, if I have a lot of money to pay lawyers.
But if someone makes a fair use of the thing I put up on the internet, then I don't have any recourse, because that's the way the law works.
As far as I understand it, using data as input data to a machine learning model that substantially transforms and does not duplicate the input data is currently believed to be fair use.
So, the training use of freely available data seems pretty straightforward that authors can't control when they make it freely available.
It seems like Facebook made use of data that wasn't freely available, though -- ebook rip library type stuff. That's the bit I think they could be in trouble for. But that's just a plain-old "Napster" style copyright question, as far as I understand it.
The lawyer's argument that Llama "obliterates the market" for written works seems weak. I, and anyone I know, put down AI slop fiction before the first paragraph is done, because it's not the same thing as real fiction.
Also I read that ordinary folks have been arrested for filming in the cinema even if they did not redistribute the video (due to being arrested). Again, it is unfair why they get arrested and Zuckerberg doesn't.
When the scale is a significant portion of all human text output ever, I don't think we're in the realm of any prior model. This is now something closer to how society attempts to approach natural resources like land, frequency bands, utility right-of-way, etc. I think this is the direction that laws and legislation should look to go. Or maybe not, I don't claim to have the answer, only that existing models are inadequate.
However, claiming llama is not a 'substantial transformation' of the information used to build it seems untennable.
The complaint feels to me more like the paint factory claiming rights to the paintings you created with it's paint, rather than a classic pirate DVD copier that just resells copies.
Maybe a midway could be some Google Books like solution where you can still find anything but where the output is restricted to just substantial fragments and not complete verbatim chapters?
I do not believe people use llama to 'read published books on the cheap'.
Will the neural network (LLM) itself become illegal? Will its outputs be deemed illegal?
If so, do humans who have read an illegally downloaded book become illegal? Do their creative outputs become illegal?
What is the substantive difference between training a model locally using these works that are presumably pulled in from some database somewhere and Napster, for example?
Would a p2p network for sharing of copyrighted works be legal if the result is to train a model? What if I promise the model can't reproduce the works verbatim?
I have this debate with a friend of mine. He's terrified of AI making all of our jobs obsolete. He's a brilliant musician and programmer both, so he's both enthused and scared. So let's go with the Swift example they use.
Performance Artists have always tried to cultivate an image, an ideal, a mythos around the character(s). I've observed that as the music biz has gotten more tough, that the practice of selling merch at shows has ramped up. Social media is monetized now. There's been a big diversification in the effort to make more money from everything surrounding the music itself. So too will it be with artists.
You're starting to see this already. Artists which got big not necessarily because of the music, but because of the weird cult of personality they built. One who comes to minds is Poppy, who ironically enough built a cult of personality around her being a fake AI bot...
https://en.wikipedia.org/wiki/Poppy_(singer)
You've definitely got counter-examples like Hatsune Miku - but the novelty of Miku was because of the artificiality (within a culture that, like, really loves robots and shit). AI pop stars will undoubtedly produce listenable records and make some money, but I don't expect that they will be able to replace the experience of fans looking for a connection with an artist. Watch the opening of a Taylor Swift concert, and you'll probably get it.
Has making music for a living ever not been tough?
Fair.
> I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs.
That accounts for the big artists on the radio (yes some people listen to that). But, what about everyone else? I would posit that most artists (the one-hit wonders, the ones without radio success, etc.) write their own songs. It seems like there's such acts who make a go of it just fine, who write their own songs and really nail the connection with fans. I would point to a regional band near me: Mr. Blotto.
There's some very impressive youtubers who are claiming to be generating new music with AI. The one I listen to the most I very much doubt has everything 100% generated - he probably generates a bunch of melodies and other bits of track and stiches the best candidates together. They do crank out a new album basically every 2 weeks though - and has just a scant few thousand followers. They are not making money, but the music is pretty on par with bands which sell hundreds of thousands or millions of albums.
This is part of what makes me think it's the people who can cultivate the mythos, the personality, the whole experience, who are going to be the big winners in the AI music economy. Sure, maybe Gorillaz obfuscates the identity of the artists (side note: do they though? it's well known to be a supergroup), but it still is a curated experience that human creativity was leveraged to create the whole experience.
We should be careful not to conflate the affects of copyright to the affects of advertising.
It's going to take centuries to undo the damage wracked by IP-supported private enterprise. And now we also have to put up with fucking chatbots. This is the worst timeline.
You are free to copy bytes as you see fit, and the internet treats them identically whether they are random noise or whether a codec can turn them into music, film, books, or whatever inspires you.
The problem is that some humans, justifying their behavior by claiming it as "official", may act out with violence against you if they (rightly or wrongly, that's important to note) perceive that your actions are causing the internet to copy bytes to which they object.
Enduring nonviolence is likely yet ahead as consensus grows over the end of the legitimacy of these legacy states.
edit: i'm serious. many americans would be much happier taking this option if they knew it existed. i may take it myself