Judge said Meta illegally used books to build its AI (opens in new tab)

(wired.com)

406 pointsmekpro1y ago341 comments

341 comments

The title for this submission is somewhat misleading. The judge didn't make any sort of ruling, this is just reporting on a pretrial hearing. He also doesn't seem convinced as to how relevant downloading books from LibGen is to the case:

> At times, it sounded like the case was the authors’ to lose, with [Judge] Chhabria noting that Meta was “destined to fail” if the plaintiffs could prove that Meta’s tools created similar works that cratered how much money they could make from their work. But Chhabria also stressed that he was unconvinced the authors would be able to show the necessary evidence. When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”

> When defendants invoke the fair use doctrine, the burden of proof shifts to them to demonstrate that their use of copyrighted works is legal. Boies stressed this point during the hearing, but Chhabria remained skeptical that the authors’ legal team would be able to successfully argue that Meta could plausibly crater their sales. He also appeared lukewarm about whether Meta’s decision to download books from places like LibGen was as central to the fair use issue as the plaintiffs argued it was. “It seems kind of messed up,” he said. “The question, as the courts tell us over and over again, is not whether something is messed up but whether it’s copyright infringement.”

bgwalter1y ago

The RIAA lawyers never had to demonstrate that copying a DVD cratered the sales of their clients. They just got high penalties for infringers almost by default.

Now that big capital wants to steal from individuals, big capital wins again.

(Unrelatedly, has Boies ever won a high profile lawsuit? I remember him from the Bush/Gore recount issue, where he represented the Democrats.)

Majromax1y ago

> The RIAA lawyers never had to demonstrate that copying a DVD cratered the sales of their clients. They just got high penalties for infringers almost by default.

The argument for 'fair use' in DVD copying/sharing is much weaker since the thing being shared in that case is a verbatim, digital copy of the work. 'Format shifting' is a tenuous argument, and it's pretty easily limited to making (and not distributing) personal copies of media.

For AI training, a central argument is that training is transformative. An LLM isn't intended to produce verbatim copies of trained-upon works, and the problem of hallucination means an LLM would be unreliable at doing so even if instructed to. That transformation could support the idea of fair use, even though copies of the data are made (internally) during the training process and the model's weights are in some sense a work 'derived' from the training data.

If you analogize to human leaning, then there's clearly no copyright infringement in a human learning from someone's work and creating their own output, even if it "copies" an artist's style or draws inspiration from someone's plot-line. However, it feels unseemly for a computer program to do this kind of thing at scale, and the commercial impact can be significantly greater.

dragonwriter1y ago

. If you analogize to human leaning

You can't make valid legal analogies to human learning when dealing with copyright law, because human brains are not a fixed media under copyright law, thus impressions in human brains are not copies of any kind under copyright law, thus "well, when you make an impression in the human brain, its not a copyright violation" is never a good legal analogy for when you make something that is in a form which can be a copy under copyright law.

pessimizer1y ago

> there's clearly no copyright infringement in a human learning from someone's work and creating their own output, even if it "copies" an artist's style or draws inspiration from someone's plot-line.

What do you mean here by "clearly?" This is not at all clear, and court cases have been decided in the opposite direction.

This case: https://www.reuters.com/article/lifestyle/marvin-gaye-family...

is as far from what you say is "clearly" true as could possibly be. You're handwaving away the parts of the question that are difficult.

2 more replies

bluefirebrand1y ago

> For AI training, a central argument is that training is transformative

This doesn't actually matter though does it? They still had access to a copy of the data in the first place to train the AI on

Since they likely did not pay a license to have access to the books they trained the AI on, then they violated copyright

The same way it would be violating copyright for a university student to pirate a textbook and learn from it

3 more replies

9999000009991y ago

Meta needs these books.

They seek to convert them into more products. The needs of the copyright holders , who are relatively small businesses and individuals are outweighed by the needs of Meta.

Sarah wanting to watch a movie or listen to music... Too bad she doesn't have an elite team of lawyers to justify whatever she wants.

In practice Meta has the money to stretch this out forever and at most pay inconsequential settlements.

YouTube largely did the same thing, knowingly violate copyright law, stack the deck with lawyers and fix it later.

anton-c1y ago

Content id is actually a pretty good solution tho. Copyright holders do get paid ad revenue, and when u post a vid of your daughter's wedding with Kenny loggins' copyrighted music playing, it doesn't get removed. Sony or Universal or whomever is just gonna get the monetization. Seems like a good deal for both sides.

Edit: I didn't make it clear... I don't think meta is going to be paying or offering a revenue stream like YouTube ended up creating. I also have no idea if YT actually brings in money for those groups and if the copyright holders essentially took what they could get or were happy with the deal so who knows.

It's only when other parts of the system get abused there's problems but that's a sep issue...

anshumankmr1y ago

Interesting figure that guy.

Here's this: >Boies also was on the Theranos board of directors,[2][74] raising questions about conflicts of interest.[75] Boies agreed to be paid for his firm's work in Theranos stock, which he expected to grow dramatically in value.[75][3]

https://en.wikipedia.org/wiki/David_Boies

That was one of the decisions of all time.

A_D_E_P_T1y ago

He was also the primary villain of John Carreyrou's account of Theranos' rise and fall -- Bad Blood -- as his firm attempted to bully and hound whistleblowers, and intimidate their families with baseless legal threats. Not a very nice or ethical guy.

2 more replies

ImPostingOnHN1y ago

If I remember correctly the legal precedent from that era, and if I'm summarizing correctly: Those who served or uploaded were considered to be infringing, since they were "making copies" by serving or uploading, whereas those who downloaded infringing copies were not themselves infringers. Meta in this case is at least described by the latter, and the question is whether LLM generation constitutes the former.

kranke1551y ago

Zambyte1y ago

I'm curious what you mean by "in it's modern form". You seem to suggest there was a previous form that was not invented by corporations, but I don't believe that is the case.

2 more replies

morkalork1y ago

The golden rule strikes again!

1 more reply

cma1y ago

That's because that was statutory infringement where marketplace impact things come up more in fair use ("drummer reacts to hearing most famous drummer for the first time"). They look at whether it acts as a substitute for the original, but there are different rules depending on the type of fair use, how transformative it is, and more.

dragonwriter1y ago

> The RIAA lawyers never had to demonstrate that copying a DVD cratered the sales of their clients.

Did any of the defendants raise a fair use defense based on a transformative use that they were making of the downloaded copies? If not, you are in the domain of "unlike legal situations lead to unlike decisions" which is not exactly surprising.

ndsipa_pomu1y ago

IIRC format shifting was argued to be part of fair use. I would think that taking an audio CD and ripping it into MP3 would be transformative, but I don't think the law would agree.

favorited1y ago

> Unrelatedly, has Boies ever won a high profile lawsuit? I remember him from the Bush/Gore recount issue, where he represented the Democrats.

He teamed up with opposing counsel from Bush v. Gore, Ted Olson, and the pair of them represented plaintiffs in Hollingsworth v. Perry, the SCOTUS case which overturned Prop 8, California's gay marriage ban.

nadermx1y ago

[0] https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...

immibis1y ago

No, but the law treats it as bad as stealing when an individual copyright-infringes from a corporation, so why shouldn't it be as bad as stealing when a corporation copyright-infringes from an individual?

Of course, even this isn't enough, since corporations regularly steal (actually) from individuals, with near impunity.

fmblwntr1y ago

ironically, he was the head lawyer on the legal team for napster (obviously a huge loss) but it accords well with your theory

fazeirony1y ago

when kids during the napster era were downloading music, the mega-corporations yelled that their bottom-lines to shareholders were being destroyed.

now, when the mega-corporations do it, it is 'just the cost of doing business'.

in both cases, the mega-corporations win because...they have the most money. law, and certainly justice, is not for the poor. at least not in america.

doctorpangloss1y ago

Hacker News readers want simple, first principles answers that fit in a tweet, that require no reading, let alone case law, to understand.

This trial is way beyond the statutes and case law. The judge is doing a job, hard to conceive what the best job would be - I'm not sure Congress even knows what the policy should be or if the public has even the faintest wiff of how things should work.

1vuio0pswjnm71y ago

"The title of this submission is somewhate misleading."

Better to read the submission before drawing conclusions rather than only the HN title. In this case the HN title has been editorialised.

The actual title of the article is "A Judge Says Meta's AI Copyright Case Is About `the Next Taylor Swift'"

"The judge didn't make any sort of ruling, this is just reporting on a pretrial hearing."

The HN title doesn't mention anything about a "ruling". Nor does the title chosen by Wired.

The subheading in the article reads "Meta's contentious AI copyright battle is heating up-and the court may be close to a ruling."

That is accurate. The Court will soon decide the SJ motions.

Reading the article leaves no chance of being mislead by any title:

"If Chhabria grants either motion, he'll issue a ruling before the case goes to trial-and likely set an important precedent shaping how courts deal with generative AI copyright cases moving forward."

pessimizer1y ago

> “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”

"LLM, please summarize Sarah Silverman's memoir for me."

edit: Reader's Digest would be very surprised to know that they shouldn't have been paying for books.

Dylan168071y ago

If you do that, it won't be able to give you a summary detailed enough to infringe anything.

pessimizer1y ago

It may give me a summary good enough that I don't have to buy the book, since it read the book. If there are any parts that aren't detailed enough for me, I can ask them to be expanded.

If you're telling me that's not "infringing," you should follow what up with the argument for why it is not.

2 more replies

selfselfgo1y ago

To me it’s a totally insane argument from the judge, if it doesn’t stop the authors from making money on their works, then the judge is basically capping the income on all writers. The AI is totally useless without their knowledge and yet they have to prove they aren’t hurting its profits. Like these authors are entitled to derivative uses of their writing, if they’re not it’s a total farce.

Workaccount21y ago

Let me make a clarifying statement since people confuse (purposely or just out of ignorance) what violating copyright for AI training can refer to:

1. Training AI on freely available copyright - Ambiguous legality, not really tested in court. AI doesn't actually directly copy the material it trains on, so it's not easy to make this ruling.

2. Circumventing payment to obtain copyright material for training - Unambiguously illegal.

Meta is charged with doing the latter, but it seems the plaintiffs want to also tie in the former.

dragonwriter1y ago

> Circumventing payment to obtain copyright material for training - Unambiguously illegal.

The judge in this case seems to disagree with you, not accepting the premise that downloading the material from pirate sites for this use inherently gets the plaintiffs an out from having to address fair use defense as to the actual use.

> the plaintiffs want to also tie in the former.

No, the defense wants to and the judge hasn't let the plaintiffs avoid it the way you argue they automatically can.

aidenn01y ago

> The judge in this case seems to disagree with you, not accepting the premise that downloading the material from pirate sites for this use inherently gets the plaintiffs an out from having to address fair use defense as to the actual use.

This is a good point, as a reminder, the Folsom tests (failing or passing any one is not conclusive, they are to be holistically considered) are:

- the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes (Note also that whether or not the use is transformative is part of this test).

- the nature of the copyrighted work

- the amount and substantiality of the portion used in relation to the copyrighted work as a whole

- the effect of the use upon the potential market for or value of the copyrighted work

https://en.wikipedia.org/wiki/Fair_use#U.S._fair_use_factors

flessner1y ago

If the former ever gets tested in court, it's the end of the road. All major AI companies have trained on copyrighted work, one way or another.

What is inspiration? What is imitation? What is plagiarism? The lines aren't clearly drawn for humans... much less for LLMs.

ekidd1y ago

> If the former ever gets tested in court, it's the end of the road. All major AI companies have trained on copyrighted work, one way or another.

I can absolutely guarantee you that neither DeepSeek nor Alibaba's highly talented Qwen group will care even a little bit, in the long run. Not if there's value to be had in AI. (And I can tell you down to the dollar what LLMs can save in certain business use cases.)

If the US decides to unilaterally shut down LLMs, that just means that the rest of the world will route around us. Whether this is good or bad is another question.

flessner1y ago

The pattern hasn't changed in decades. Remember when ZTE copied Cisco's router code so precisely they included the same bugs and documentation typos?

LLMs are a drop on a hot stone compared to countless other factors why the world already is routing around the US - but I don't want to get political or economical.

vkou1y ago

> If the US decides to unilaterally shut down LLMs, that just means that the rest of the world will route around us.

You're talking as if they are some kind of nationalized or publically-owned asset, as opposed to a bunch of for-profit, privately-owned silos.

2 more replies

theturtletalks1y ago

China found the perfect way to disrupt US tech, releasing open source versions of it for free or at least cheaper. Most of US tech is built on open source anyways and with the pace YC is investing in open source alternatives, it will win out in most niches.

My fear is that the US tech won’t be able to compete with state sponsored open source out of China and will move to ban open source or suppress it somehow.

2 more replies

jamiek881y ago

> And I can tell you down to the dollar what LLMs can save in certain business use cases.)

Please do!!

1 more reply

shrubhub1y ago

What's the point of being proud of one system of government if you're willing to relinquish it in the face of adversary?

Shouldn't they have to follow the law?

serial_dev1y ago

You point to Chinese companies disregarding any rules if there is value to be had in AI, while in the US, AI companies going to get 500 billion investment and a whistleblower is dead.

US AI companies will either make sure that a similar ruling will never be made or they will ignore it and pay the fines. They won't let anybody stop the gravy train.

BobaFloutist1y ago

Or AI companies could use some of their vast reserves of cash to pay for licensing agreements and pay people for their fucking intellectual property, then feed it to the beast.

But then they'd have to actually communicate with people and negotiate consent instead of just hoovering up everything they can get their hands on in their quest to replace it.

dragonwriter1y ago

> If the former ever gets tested in court, it's the end of the road. All major AI companies have trained on copyrighted work, one way or another.

You assume that getting tested means the AI trainers lose, and also thar the model architectures that have been developed can’t be retrained from scratch with public domain, owned, and purpose-licensed material. (With several AI companies having been actively pursuing deals to license content for AI training for a while now.)

diggan1y ago

> If the former ever gets tested in court, it's the end of the road. All major AI companies have trained on copyrighted work, one way or another.

End of the road for major AI companies, and hopefully something better can be created once it's declared illegal without any murky waters.

There are LLMs trained on data that isn't illegally obtained, OLMo by Ai2 is one such model, that is actually open source and uses open data for training. Just because it's "very difficult" for OpenAI et al shouldn't be an argument to force them to behave ethically anyways. If they cannot survive acting legally, then so be it, sucks for them.

nradov1y ago

That would hardly be the end of the road. If copyright enforcement gets stricter then that will give a market advantage to the largest, best funded major AI companies like OpenAI because they can afford to simply buy licenses from copyright holders. I predict that we'll see new middlemen arise specifically to handle this licensing, much like the agencies that handle most music licensing today.

aprilthird20211y ago

It's not the end, all these companies have "clean" datasets which they train their models on now, along with training on the previous "dirty" models. But it's been so many generations, that they don't need to worry about this copyright issue anymore

const_cast1y ago

The lines for humans aren't clearly drawn, but they are drawn. The main difference is that humans are humans and LLMs are computer programs.

I see no reason why we should even entertain the idea of extending human rights to computer programs, and so far, nobody has been able to give me any good reasons why.

Furthermore, why are we only entertaining the human rights that can be used for profit-driven purposes? Why do LLMs, for example, not have the right to free speech? Or an attorney? It seems highly unethical to grant these computer programs some protections as if they're humans but not grant them personhood. This is akin to slavery, which is something we actually have to consider. Anthropomorphization is a double-edged sword. We cannot simultaneously consider them human when convenient and then consider them programs when it's not. Or, if we want to do that, we need to form coherent argument to why, how, and when.

EMIRELADERO1y ago

You're thinking about it using the wrong framework IMO.

It's not about the program's rights, it's about the human's rights to use the program. Not the machine's right to do something, but the human's right to do something through a machine, or make a machine do something.

1 more reply

imtringued1y ago

Or maybe they just need a license for their particular use case...

Yizahi1y ago

The whole point of the fair use clauses is to protect humans. Clearly we can easily say that programs are altogether exempt in favor of humans, and it would be a proper thing to do, until the first real AI is built.

vkou1y ago

If corporations owned human slaves and fed them copyrighted materials so that they were inspired to produce original creative output, I don't think that creative output should enjoy legal protections either. Even if slavery were not illegal.

Because the obvious question would be - how can free people compete with that?

nickpsecurity1y ago

The FairTrained models claim to train with only public domain and legal works. Companies are also licensing works. This company has a lawful, foundation model:

https://273ventures.com/kl3m-the-first-legal-large-language-...

So, it's really the majority of companies breaking the law who will be affected. Companies using permissible and licensed works will be fine. The other companies would finally have to buy large collections of content, too. Their billions will have go to something other than GPU's.

bilbo0s1y ago

I don't know?

Not really sure a claim is good enough. I don't know that you can just go into court and say, "Trust me, I don't use copyrighted material."

And I also can't see any way, other than providing training data and training an identically structured model on that data, that a company can conclusively show that they got the weights in an allegedly copyright free model from the copyright free training data a company provides.

4 more replies

triceratops1y ago

> AI doesn't actually directly copy the material it trains on

Of course it does. Large models are trained on gigantic clusters. How can you train without copying the material to machines in the cluster?

thethimble1y ago

“Copy” is ambiguous here. Of course data is copied during training. That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

xyzzy_plugh1y ago

Why does it have to be verbatim? Seriously, this I don't understand.

If I produce a terrible shakycam recording of a film while sitting in a movie theater, it's not a verbatim copy, nor is it even necessarily representative of the original work -- muddied audio, audience sounds, cropped screen, backs of heads -- and yet it would be considered copyright infringement?

How many times does one need to compress the JPEG before it's fair use? I'm legitimately curious what the test is here.

7 more replies

OtherShrezzing1y ago

>That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

The NYTimes in 2023 was able to demonstrate that the models can reproduce entire articles verbatim[0] with minimal coercion.

[0]https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

1 more reply

HWR_141y ago

"Of course data is copied during training" is copying. As far as I know, the law is consistent that temporary copies are also covered by the copyright act, and that's how some analogous cases were resolved.

2 more replies

dragonwriter1y ago

> That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

While a tool being used to create infringing copies of some other work (whether or not it is the source material used to create the tool, and whether or not the infringing material is also verbatim copies) is relevant to whether the tool vendor is liable for contributory infringement for the infringing use of the tool, the absence of a capacity for creating such copies isn't usually enough to say that copying to make the tool isn't infringing.

(That said, generative AI tools, including LLMs specifically, have been shown to have the capacity to make such copies, to the extent that vendors of hosted models are now putting additional checks on output to try to mitigate the frequency with which verbatim copies of substantial portions of training-set works are produced, so arguing that LLMs can't do that is silly.)

1 more reply

crystal_revenge1y ago

> That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

Transformers are fundamentally large compression algorithms where the target of compression is not just to minimize reconstruction loss + compressed file size. In fact, basically all of machine learning used today can be viewed through the lens of learning a compression algorithm with added goals other than the usual.

By this logic if I create a lossy Jpeg of a copyrighted image it's not "copying" because the lossy compression.

jayd161y ago

So if they could produce verbatim segments, that would be a violation? The technology is certainly there and these companies need to work backwards to prevent that.

superkuh1y ago

The US Federal government operates with the rule that if human eyes don't look at it it doesn't count as a copy or looking at it. This allows them to unconstitutionally spy and log all people's telecommunications. Applying it here it seems pretty clear that corps are within the established bounds. As are any human persons that want to train an LLM this way.

1 more reply

moomin1y ago

A good way of thinking about this is: consider the case where the data in question is illegal. Could you get into trouble for not only having access to it but also making copies of it?

There’s plenty of case law there…

1 more reply

kergonath1y ago

That copying is already a violation. At least it was when regular people weee on the receiving end of the lawsuits.

triceratops1y ago

Copyright is the right to make copies. Why is copying during training is any different from producing copies of training data after training?

If we're going that way, let me torrent every movie and TV show ever to "train" myself.

5 more replies

nashashmi1y ago

Copyright law does not restrict storing copyright information. It restricts distribution of copyright data without permission. So a computer can store and analyze data but cannot spit it out verbatim. If it spits it out under fair use clause, then it becomes debatable whether the new work is fair use.

codedokode1y ago

Then why folks were arrested for filming in the cinemas? I don't think that's how the law works [1]:

> 106. Exclusive rights in copyrighted works

> Subject to sections 107 through 122, the owner of copyright under this title has the exclusive rights to do and to authorize any of the following:

> (1) to reproduce the copyrighted work in copies or phonorecords;

And later:

> 501. Infringement of copyright

> (a) Anyone who violates any of the exclusive rights of the copyright owner as provided by sections 106 through 122 or of the author as provided in section 106A(a), ..., is an infringer of the copyright or right of the author, as the case may be.

To me it seems clear that Zuckerberg violated author's exclusive right to reproduce copyrighted works. The law doesn't say it is ok to do if nobody knows about it.

For curious, what is considered a "copy":

> “Copies” are material objects, other than phonorecords, in which a work is fixed by any method now known or later developed, and from which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device. The term “copies” includes the material object, other than a phonorecord, in which the work is first fixed.

So an SSD with LLM weights should also be considered a "copy" if from them the work can be "reproduced".

[1] https://www.copyright.gov/title17/92chap1.html#106

2 more replies

ksynwa1y ago

Don't they mean that LLMs cannot perfectly reproduce the source material?

tomrod1y ago

They're only stochastically lossy compression -- so sometimes it can.

1 more reply

giancarlostoro1y ago

I have a weird controversial view on this in terms of how to legally do it, and that is, for your 1 model, you should be only required to buy a digital copy of the work, maybe publishers should make digital copies that are tailored for LLMs to churn through, but then price it at a reasonable rate, and make the format basically perfect for LLMs.

romanzubenko1y ago

This is actually clever, let the market decide the price and the worth of each book for training. Pricing per model might be tricky, instead annual licensing for training might be better pricing structure. Very quickly all big publishers and big labs might find very precisely what the fair price is to pay per book/catalogue.

giancarlostoro1y ago

Yeah, you dont want to price them way too high to where nobody will pay for them, maybe even have the dreaded "Contact us for Pricing" thing setup.

nashashmi1y ago

All AI is “trained” on existing works. But it also works by outputting altered copied data. This output part is a copyright violation.

startupsfail1y ago

It’s weird that you are saying it’s unambiguously illegal. AFAIK, in some cases used for training were initially created by non-profits and transformed sufficiently to strip the copyrights.

Lerc1y ago

I'm not sure if Meta did anything illegal in 2. either.

I thought the copyright infringement was by the people who provided the copyrighted material when they did not have the rights to do so.

I may be wrong on this, but it would seem a reasonable protection for consumers in general. Meta is hardly an average consumer, but I doubt that matters in the case of the law. Having grounds to suspect that the provider did not have the rights might though.

singron1y ago

The original complaint alleges that the training process requires copying the material into the model and thus requires consent of the copyright holder. (Copyright protects copying but notably not use, so the complaint has to say they copied it in order to have standing). Then it says they didn't have consent.

They also mention Books3, but they don't appear to actually allege anything against Meta in regards to it and are just providing context.

I don't think it actually changes anything material about this complaint if Meta bought all the books at a bookstore since that also doesn't give you the right to copy the works.

The original complaint is 2 years old though, so I don't really know the current state of argumentation.

https://www.courtlistener.com/docket/67569326/1/kadrey-v-met...

Note that incidental copying (i.e. temporary copies made by computers in order to perform otherwise legal actions) is generally legal, so "copying" in the complaint can't refer merely to this and must refer more broadly to the model itself being a copy in order to have standing.

rixthefox1y ago

> but it would seem a reasonable protection for consumers in general.

The final say may ultimately come from the Cox vs Record Labels case from 2019 that is still working it's way through the appeal courts.

If the record labels win their appeal, anyone who helped facilitate the infringement can be brought into a lawsuit. The record labels sued Cox for infringement by it's users. It's not out of the question that any ISP that provides Internet connectivity to Facebook could be pulled in for damages.

For Meta these two cases could result in an existential threat to the company, and rightly so because the record labels do not play games. The blood is already in the water.

Dylan168071y ago

So they promise to all their ISPs not to torrent again, and the ISPs keep accepting big piles of money to provide service?

I don't see how that's a threat to Meta.

CyberMacGyver1y ago

The source being illegal doesn’t make your use legal. Infact one could argue that it’s equally illegal or worse since a corporation knowingly engaged in illegal activity.

Lerc1y ago

Well obviously, but the converse is also true. The source being illegal doesn't make your use illegal. In the eyes of the law it doesn't matter if something is better or worse, it's the law, it shouldn't be confused with morality. An act is illegal if the law says it is and isn't if it isn't.

Just by being involved as a party does not make you culpable. Murderers are criminals, the murdered, less so.

Choosing to be a party might not make you culpable. You may be an active participant but unaware of the law breaking (being defrauded). Or the law may explicitly state that you can engage with people committing criminal acts and reap the benefits so long as you don't break those laws (or encourage them to be broken) yourself. Some forms of journalism are protected in this way.

Ultimately to have a case you have to state.

1. What law was broken 2. How an action by a party is in violation of that law. 3. That the action actually happened.

The largest problem with this case is not that 3. is in doubt but showing which 1. and 2. they are talking about.

blibble1y ago

Blizzard managed to get a copyright infringement win against a defendant company that merely accessed their game client (IP) in memory: a cheat reading values of player position

IP that had been previously loaded by Blizzard itself

https://en.wikipedia.org/wiki/MDY_Industries,_LLC_v._Blizzar....

Lerc1y ago

That link says that 'win' was a summary judgement that was reversed upon appeal.

It seems the case is ongoing, but the case against the company is not one of them committing copyright infringement, but that copyright infringement occurred that they encouraged, enabled, and profited from.

lsaferite1y ago

Even if your belief that only the person *providing* the content is liable, do you honestly think a single person found all the content, downloaded it, directly trained the model themselves, and then deleted the content? If at any step the content was given or shared to anyone else for any reason, have they not converted into a provider themselves?

Lerc1y ago

That in itself is a complicated issue, but I would suspect that this does not count as copyright infringement. The entity in possession of the data does not change, the copies and manipulation are performed by employees but at no time do they own what they are manipulating.

If it were true that this constitutes a transfer of possession, then the targets for the lawsuit should be the individual employees, and I don't think anyone wants that.

1 more reply

knowitnone1y ago

So you're saying I can legally download movies as long as I don't provide them to others? Sweet!

alangibson1y ago

> AI doesn't actually directly copy the material it trains on, so it's not easy to make this ruling.

IANAL, but it doesn't look that hard. On first glance this is a fair use issue.

What an LLM spits out is pretty clearly transformative use. But the fact that it pulls not only the entirety of the work, but the entirety of MOST works means that the amount is way beyond what could be fair use. Plus it's commercial use. Put it together and all LLMs are way illegal.

Dylan168071y ago

> the fact that it pulls not only the entirety of the work

What do you mean by "pulls"?

What matters in traditional fair use is how substantially your output copies the work (among other factors). Your input is generally assumed to be reading/watching/listening to the entire work, and there is no problem with that.

TimPC1y ago

I think the headline is a bit misleading. Mets did pirate the works but may be entitled to use them under fair use. It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books. I also don’t think AI generated fiction is anywhere near high quality enough to substantially reduce the market for the original author.

stego-tech1y ago

The problem is that "harm" as defined by copyright law is strictly limited to loss of sales due to breach of that copyright; it makes no allowment (that I know of) to livelihoods lost by the theft of the work indefinitely, as AI boosters suggest their tools can do (replace people). The way this court case is going, it's an uphill battle for the plaintiffs to prove concrete harm in that very narrow context, when the real harm is the potential elimination of their future livelihoods through theft, rather than immediately tangible harms.

As a (creative) friend of mine flatly said, they refuse to use an LLM until it can prove where it learned something from/cite its original source. Artists and creatives can cite their inspirational sources, while LLMs cannot (because their developers don't care about credit, only output) by design. To them, that's the line in the sand, and I think that's a reasonable one given that not a single creative in my circles has been cut payment from these multi-billion-dollar AI companies for the unauthorized use of their works in training these models.

ijk1y ago

A difficult, but not intractable problem: OLMoTrace claims to be able to trace from output to training data in seconds [1]. Notably, it can do this because OLMo itself was intentionally designed to be open and transparent [2]; it was trained on 4.6 trillion tokens of entirely open data (which you can download yourself) [3]. There's nothing stopping Meta or OpenAI from creating a similar tool, other than the obvious detail of that showing their exact training data.

[1] https://arxiv.org/abs/2504.07096

[2] https://allenai.org/blog/olmotrace

[3] https://huggingface.co/datasets/allenai/olmo-mix-1124

stego-tech1y ago

I love it! Keeping this in my back pocket the next time someone claims that keeping accounting of training data and sourcing it isn't feasible or technically possible.

lopis1y ago

> Artists and creatives can cite their inspirational sources

Even humans have a lot of internalized unconscious inspirational sources, but I get your point.

immibis1y ago

Well, no, because all those file-sharing users used to get fined $250,000 or whatever, which is obviously much greater than the amount they would have paid for whatever they downloaded.

tedivm1y ago

Github does give free copilot access to open source developers it considers important enough (which is a pretty low bar). While not the same as actually paying, it's the only example I can think of where the company that used people's copyrighted material actually gave something back to those people.

bgwalter1y ago

They want to track and utilize the new code that those developers are writing. And they want to keep them on GitHub. And they want to claim in potential lawsuits:

"See, those developers themselves have used CoPilot, so they approve the copyright infringement."

_aavaa_1y ago

What you’re describing is the Extend phase of Microsoft’s plan.

mistrial91y ago

the educated and erudite can wait in line near the Castle; every day due to the grace of our masters, unused bread from the master's table is available without prejudice. These people can have a fine life, and the people are fulfilled.

msabalau1y ago

AI Boosters can suggest whatever nonsense strikes their fancy, and creatives can give into fear for no reason, but the best estimates we have from the BLS is that the there are careers and ongoing demand for artists, writers, photographers.

Regardless, deep learning models are valuable because they generalize within the training data to uncover patterns and features and relationships that are implicit, rather (simply) present with the data. While they can return things that happen to be within the training set, there is no reason to believe that any particular output is literally found there or is something that could be attributable, or that a human would ever attribute. Human artists also make meaning from the broad texture of their life experiences and general diffuse unattributable experience of culture.

Sure, this is something a random artist is unlikely to know, but if they are simply refusing to pick up a useful tools that can't give credit--say avoiding LLMs for brainstorming, or generative selection tools for visual editing, or whatever, their particular careers will be harmed by their incurious sentimentality, and other human artists will thrive because they know that tools are just tools, and it is the humans using the tools that make meaning that people care about.

subscribed1y ago

Your friend might want to check Perplexity.

lopis1y ago

While Perplexity is able to show sources for the information it shows, the language part, and the body of text upon which it was trained, is a black box, and sources are not given, nor typically desirable as a user.

apercu1y ago

> but may be entitled to use them under fair use.

Why? Was it legal for me to download copyrighted songs from Limewire as "fair use"? Because a few people were made examples of.

I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)

Filligree1y ago

> I'm a musician, so 80% of the music I listen to is for learning so it's fair use, right? ;)

I would be happy with that outcome. I’m a fanfiction writer, and a lot of the stories I read are very much for learning. ;-)

lukeschlather1y ago

I don't believe anyone was ever penalized for downloading only uploading which seems like a pretty similar principle to what the judge is saying here.

sillysaurusx1y ago

Heh. People were penalized for merely creating search engines that happened to link to songs. Supposedly the RIAA accepted the offer of a 20-something’s life savings, but only if they switched their major from CS to something else. I believe it, having witnessed those times.

fngjdflmdflg1y ago

This is why Meta didn't seed.[0]

[0] https://torrentfreak.com/meta-says-it-made-sure-not-to-seed-...

immibis1y ago

If you used it for fair-use purposes, it could well have been legal. The only way to find out for sure would be to have them sue you, and then successfully or unsuccessfully defend yourself with a fair use argument. Please keep in mind that the law is a kind of stochastic process; "how illegal" something is dependent on how many times someone is found liable for it, which is something that takes a bunch of lawsuits to actually know, and each lawsuit is unique. It's not a computer program where if(X && Y && !Z) then punishment(); (well it sort of is, but X and Y and Z aren't definite boolean values, but things that have to be estimated based on evidence). (I am not a lawyer and this is not legal advice)

BrawnyBadger531y ago

If the result of this becomes that substantial remixes and fanfiction can be commercialized without permission from authors then I am happy. This stuff should have been fair use to begin with. Granted it probably already is fair use but because of the way copyright is enforced online it is effectively banned regardless.

SideburnsOfDoom1y ago

Firstly, no kidding, of course it's "illegal" and "Piracy".

Secondly, there's an argument that the infringement happens only when the LLM produces output based in part of whole on the source material.

In other words, training a model is not infringing in itself. You could "research" with it. But selling the output as "from your model" is highly suspect. Your business is then based on selling something based other people's work, that you do not have rights to.

aurizon1y ago

Yes, current AI video/text product is inferior at this time. Youtube is full of all genres of inferior products - at this time! The ramp of improvement is quite steeply pointing upwards. This is retrospective of the days of spinning jennies and knitting/weaving machines that soon made manual products un-economic - that said, excellent craft/art product endured on a smaller scale. AI is also taking a toll on the movie arts, staring at the low end and climbing the same incremental improvement rungs. All the special effects(SFX) are in a similar boat. Prop rentals are hit hard. 100 high res photos of an old Studio Tv camera - all angles/sizes/lighting can be added to an AI prop library and with a green screen insert the prop can manifest as a true object in any aspect. There can be many. It still takes people to cull the hallucinations - a declining problem. Same with actors. They can be patterned after a famous actor - with likeness fees, or created de-novo. All the classic aspects of a studio production suffer the same incremental marginalisation - in 5 years = what will remain? - what new tech will emerge? I feel that many forks will emerge, all fighting for a place in the sun = some will be weeded out, some will flower - but at a very high pace. The old producers/directors/writers - the whole panoply of what makes a major studio will be scattered like dried bread crumbs,

internetter1y ago

> Meta did pirate the works but may be entitled to use them under fair use

What fair use? Were the books promised to them by god or something?

TimPC1y ago

Fair use allows for certain uses of copyrighted works without a specific license for those works. One of the major criterion is how transformative the work is and an LLM model is very different from the original work so it seems likely that criterion at least is met.

SideburnsOfDoom1y ago

> an LLM model is very different from the original work

True, but not the only relevant thing.

If the output of the LLM is "not very different from the original work" then the output could be the infringement. Putting a hypercomplex black box between the source work and the plagiarised output does not in itself make it "not infringing". The "LLM output as a service" business is then based on selling something based other people's work, that they do not have rights to.

It's falling for misdirection, "pay no attention to the LLM behind the curtain" to think otherwise.

1 more reply

matkoniecz1y ago

"fair use" is a specific legal term

internetter1y ago

I'm aware. I was unsure what doctrine of fair use meta's behaviour could be defended as.

What am I, if not an LLM, ingesting copyrighted materials so that I may improve my own future outputs? Why is my own piracy not protected in the same manner?

1 more reply

ta12431y ago

In a specific legal jurisdiction.

The Berne convention mentions "fair practice", and puts the responsibility on the individual countries.

1 more reply

gabriel666smith1y ago

I think there's a really fundamental misunderstanding of the playing field in this case. (Disclaimer that my day job is 'author', and I'm pro-piracy.)

We need to frame this case - and ongoing artist-vs-AI-stuff -using a pseudoscience headline I saw recently: 'average person reads 60k words/day'.

I won't bother sourcing this, because I don't think it's true, but it illustrates the key point: consumers spend X amount of time/day reading words.

> It seems like the authors are setting up for failure by making the case about whether the AI generation hinders the market for books. AI book writing is such a tiny segment what these models do that if needed Meta would simply introduce guard rails to prevent copying the style of an author and continue to ingest the books.

and from the article:

> When he turned to the authors’ legal team, led by high-profile attorney David Boies, Chhabria repeatedly asked whether the plaintiffs could actually substantiate accusations that Meta’s AI tools were likely to hurt their commercial prospects. “It seems like you’re asking me to speculate that the market for Sarah Silverman’s memoir will be affected,” he told Boies. “It’s not obvious to me that is the case.”

The market share an author (or any other artist type) is competing with for Meta is not 'what if an AI wrote celebrity memoirs?'. Meta isn't about to start a print publishing division.

Authors are competing with Meta for 'whose words did you read today?' Were they exclusively Meta's - Instagram comments, Whatsapp group chat messages, Llama-generated slop, whatever - or did an author capture any of that share?

The current framing is obviously ludicrous; it also does the developers of LLMs (the most interesting literary invention since....how long ago?) a huge disservice.

Unfortunately the other way of framing it (the one I'm saying is correct) is (probably) impossible to measure (unless you work for Meta, maybe?) and, also, almost equally ridiculous.

onlyrealcuzzo1y ago

> I also don’t think AI generated fiction is anywhere near high quality enough to substantially reduce the market for the original author.

Legal cases are often based on BS, really an open form of extortion.

The plaintiffs might've been hoping for a settlement.

Meta could pay $xM+ to defend itself.

Maybe they thought Meta would be happy to pay them $yM to go away.

The reality is, there's very little Meta couldn't just find a freely available substitute for if it had to, it might just take a little more digging on their end.

The idea that any one individual or small group is so valuable that can hold back LLMs by themselves is ridiculous.

But you'll find no end to people vain enough to believe themselves that important.

kazinator1y ago

Do you not understand that "fair use" is not some copyright free-for-all which lets you use works wholesale without attribution as if they were suddenly public domain?

To make fair use of a book's passage, you have to cite it. The except has to be reasonably small.

Without fair use, it would not be possible to write essays and book reviews that give quotes from books. That's what it's for. Not for having a machine read the whole book so it can regurgitate mashups of any part of it without attribution.

Making a parody is a kind of fair use, but parodies are original expression based on a certain structure of the work.

danaris1y ago

> To make fair use of a book's passage, you have to cite it.

That's not true. That's what's required for something not to be plagiarism, not for something not to be copyright infringement.

Fair use is not at all the same as academic integrity, and while academic use is one of the fair use exceptions, it's only one. The most you would have to do with any of the other fair use exceptions is credit where you got the material (not cite individual passages), because you're not necessarily even using those passages verbatim.

kazinator1y ago

If your "fair use" is

- of a commercial nature;

- plagiarism;

- substantially large (e.g. whole work);

you're not on good legal footing.

1 more reply

ryandrake1y ago

AI hucksters vs. the Copyright Cartel. When two evil villains fight, who do you root for? Here's hoping they somehow destroy each other.

pessimizer1y ago

Last time they fought, it was google vs. the publishers, and it resulted in the scanning and archiving of all of those books in the first place.

Neither of them died, though, both parties just kept all the books from the public and used them for their own purposes, while normal people had to squirrel them away and trade them illegally. It's the Tech Cartels vs. the Copyright Trolls. It'll end up as a romance.

thomastjeffery1y ago

I can only root for them both to lose.

Letting Meta launder copyrighted works to make billions, while threatening the rest of us over the most trivial derivative work, sounds like the worst outcome to me.

Copyright is a mistake. It demands that we compete instead of collaborate. LLMs don't provide enough utility to deserve special treatment in these circumstances. If anyone can infringe copyright, then everyone should be able to.

akomtu1y ago

The copyright cartel is going to lose because it represents the dying old world order of many competing enclaves. AI isn't just a sloppy text generator, it's the new ideology of forced uniformity that permits no boundaries. So no copyright cartels.

probably_wrong1y ago

You can always root for the lawyers.

labrador1y ago

"Chhabria is cutting through the moral noise and zeroing in on economics. He doesn't seem all that interested in how Meta got the data or how “messed up” it feels—he’s asking a brutally simple question: Can you prove harm?"

https://archive.is/Hg4Xr

trinsic21y ago

> Can you prove harm?

Where was this argument when Napster was being sued?

dragonwriter1y ago

This is the source headline, but it is pure clickbait; the judge absolutely did not say that in any of the quotes in the article; in the hearing on both parties motions for partial sunmary judgement, he both said that would be the case if the plaintiffs proved certain facts and raised doubts that they have the evidence to prove them.

ebfe11y ago

And this is how Chinese model will win in long term, perhaps... They will be trained on everything and anything without consequences and we will all use it because these models are smarter (except for area like Chinese history and geography). I don't have the right answer on what can be done here to protect copyright or rather contributing back to authors of a paper without all these millions dollar wasted in lawsuits.

aprilthird20211y ago

There's no winning though. There's no real moat when it comes to AI remember. There will be tons of models of similar, squishy types of unique attributes (squishy meaning it works great sometimes and not other times, and that's just normal). And it will mostly be decided which to use based on cost and compliance.

granzymes1y ago

Title seems misleading after reading the article.

gtowey1y ago

It's mind blowing to me that the court might deny the right of the authors to control licensing of this kind of usage of their work.

jwatte1y ago

If I put something up for anyone to read on the internet. And someone reads it on the internet. I can't really control that, right?

Now, if someone makes an infringing use of the thing I put up on the internet. Then I have some kind of recourse, at least through the courts, if I have a lot of money to pay lawyers.

But if someone makes a fair use of the thing I put up on the internet, then I don't have any recourse, because that's the way the law works.

As far as I understand it, using data as input data to a machine learning model that substantially transforms and does not duplicate the input data is currently believed to be fair use.

So, the training use of freely available data seems pretty straightforward that authors can't control when they make it freely available.

It seems like Facebook made use of data that wasn't freely available, though -- ebook rip library type stuff. That's the bit I think they could be in trouble for. But that's just a plain-old "Napster" style copyright question, as far as I understand it.

The lawyer's argument that Llama "obliterates the market" for written works seems weak. I, and anyone I know, put down AI slop fiction before the first paragraph is done, because it's not the same thing as real fiction.

codedokode1y ago

It's typical double standards policy: Google and Github remove links to pirated material (and pirated material itself) so that ordinary folks cannot download it for free, but when Zuckerberg downloads gigabytes of pirated material without paying, it's ok. The legal system doesn't want to put an ordinary folk and Zuckerberg at the same level.

Also I read that ordinary folks have been arrested for filming in the cinema even if they did not redistribute the video (due to being arrested). Again, it is unfair why they get arrested and Zuckerberg doesn't.

caminanteblanco1y ago

I feel like this submission does a disservice by changing the title from the article's. It is misleading, and implied that the judge has already given a ruling, when they have not.

codedokode1y ago

If it is a "fair use" to run the business using pirated books, does it mean that it is "fair use" to use pirated software as long as you don't distribute it? Why pay for copyrighted works if Zuckerberg downloads them for free?

ineedasername1y ago

I simply don't think that the copyright IP framework as it exists can be applied to training on this scale. Or, if it can, the relative value of any specific author/content creator's work is deminimis.

When the scale is a significant portion of all human text output ever, I don't think we're in the realm of any prior model. This is now something closer to how society attempts to approach natural resources like land, frequency bands, utility right-of-way, etc. I think this is the direction that laws and legislation should look to go. Or maybe not, I don't claim to have the answer, only that existing models are inadequate.

PeterStuer1y ago

I get both sides of this debate.

However, claiming llama is not a 'substantial transformation' of the information used to build it seems untennable.

The complaint feels to me more like the paint factory claiming rights to the paintings you created with it's paint, rather than a classic pirate DVD copier that just resells copies.

Maybe a midway could be some Google Books like solution where you can still find anything but where the output is restricted to just substantial fragments and not complete verbatim chapters?

I do not believe people use llama to 'read published books on the cheap'.

TrnsltLife1y ago

Reading the books changes the weights of the neural network. If ruled illegal, wouldn't it also become illegal for a human to read an illegally downloaded book? So far, I thought just redistribution was illegal.

Will the neural network (LLM) itself become illegal? Will its outputs be deemed illegal?

If so, do humans who have read an illegally downloaded book become illegal? Do their creative outputs become illegal?

delecti1y ago

Books are sold for the purpose of people reading them, including all the normal consequences that happen from a person reading a book. AI training being analagous to that doesn't unlock some cheat code that makes it legal, or reading books illegal. And it might indeed be found legal, but not for that reason.

jayd161y ago

So to the legal peanut gallery here...

What is the substantive difference between training a model locally using these works that are presumably pulled in from some database somewhere and Napster, for example?

Would a p2p network for sharing of copyrighted works be legal if the result is to train a model? What if I promise the model can't reproduce the works verbatim?

adingus1y ago

I'm wondering if authors are making the same mistakes that the music industry did with Napster and kazaa. Using AI has led to more book purchases for me. If I discover and enjoy a book via AI I'm more inclined to buy it. The cats out of the bag, so pet him.

tacheiordache1y ago

Just look at the state of the music industry.

aprilthird20211y ago

Most people are not buying more books because of AI and no court would entertain this logic truly.

mtlynch1y ago

Can you share more about how you sample books with AI?

adingus1y ago

I don't really sample them, but if I want to know more about a subject I will normally ask for a book recommendation to go along with it.

nickpsecurity1y ago

It can tell you about authors, books, useful techniques, etc. If it cites references, that can generate page views on their site ir sales. It can also replace that, though, with AI supplier benefiting commercially.

moregrift1y ago

This is the main reason why Chinese AI will be better than Western AI in the long term - Chinese companies can train on higher quality dataset (all the copyrighted books in the world)

RajT881y ago

> “What about the next Taylor Swift?” he asked, arguing that a “relatively unknown artist” whose work was ingested by Meta would likely have their career hampered if the model produced “a billion pop songs” in their style.

I have this debate with a friend of mine. He's terrified of AI making all of our jobs obsolete. He's a brilliant musician and programmer both, so he's both enthused and scared. So let's go with the Swift example they use.

Performance Artists have always tried to cultivate an image, an ideal, a mythos around the character(s). I've observed that as the music biz has gotten more tough, that the practice of selling merch at shows has ramped up. Social media is monetized now. There's been a big diversification in the effort to make more money from everything surrounding the music itself. So too will it be with artists.

You're starting to see this already. Artists which got big not necessarily because of the music, but because of the weird cult of personality they built. One who comes to minds is Poppy, who ironically enough built a cult of personality around her being a fake AI bot...

https://en.wikipedia.org/wiki/Poppy_(singer)

You've definitely got counter-examples like Hatsune Miku - but the novelty of Miku was because of the artificiality (within a culture that, like, really loves robots and shit). AI pop stars will undoubtedly produce listenable records and make some money, but I don't expect that they will be able to replace the experience of fans looking for a connection with an artist. Watch the opening of a Taylor Swift concert, and you'll probably get it.

atrus1y ago

I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs. If people like Max Martin can pump out multiple hit songs for multiple groups, it kinda shows that who wrote the song doesn't matter.

Has making music for a living ever not been tough?

RajT881y ago

> Has making music for a living ever not been tough?

Fair.

> I think that argument is further hampered (taylor being an exception) by the fact that most pop stars already don't write their own songs.

That accounts for the big artists on the radio (yes some people listen to that). But, what about everyone else? I would posit that most artists (the one-hit wonders, the ones without radio success, etc.) write their own songs. It seems like there's such acts who make a go of it just fine, who write their own songs and really nail the connection with fans. I would point to a regional band near me: Mr. Blotto.

reverendsteveii1y ago

counterpoint: Gorillaz is a band specifically designed around the idea that the artist doesn't have to exist in order to do all of the things you mention above. Gorillaz has an image, a style, a mythos, all of that. Granted, when they started at the time (2000) there needed to be human creativity in order to create all of that but with AI everything about this can now be generated. I suspect it won't be long before all of our theorizing goes out the window because someone actually did create an act where everything, the music, the stage show, interviews, looks, all of it, is just AI-generated. That's when we'll get dollar votes on whether AI can actually generate a meaningful musical experience that people want to have.

RajT881y ago

Very relevant point. My own counter-example of Hatsune Miku is not totally AI generated - just generated using very sophisticated musical tools (see: Vocaloid).

There's some very impressive youtubers who are claiming to be generating new music with AI. The one I listen to the most I very much doubt has everything 100% generated - he probably generates a bunch of melodies and other bits of track and stiches the best candidates together. They do crank out a new album basically every 2 weeks though - and has just a scant few thousand followers. They are not making money, but the music is pretty on par with bands which sell hundreds of thousands or millions of albums.

This is part of what makes me think it's the people who can cultivate the mythos, the personality, the whole experience, who are going to be the big winners in the AI music economy. Sure, maybe Gorillaz obfuscates the identity of the artists (side note: do they though? it's well known to be a supergroup), but it still is a curated experience that human creativity was leveraged to create the whole experience.

codedokode1y ago

Every "Hatsune Miku" song has a talented person behind that avatar. Miku is just a synth and 3D model.

thomastjeffery1y ago

Would Taylor Swift be famous without the support of her label? I sincerely doubt it. How many equally talented artists

We should be careful not to conflate the affects of copyright to the affects of advertising.

terbo1y ago

Meanwhile Chinese models are uncensored, trained on everything they can get, and outperform restricted models ..

option1y ago

this is a huge issue AI companies in China do not have. the law must adjust now.

zoobab1y ago

Just went to the public library and read a book to train my brain without permissions from the authors.

lern_too_spel1y ago

Did you then distribute copies of your brain to other people who used them to reproduce the copyrighted works verbatim? https://www.patronus.ai/blog/introducing-copyright-catcher

codr71y ago

Meta did what they always do, whatever they think they can get away with.

throwacct1y ago

Only Facebook?!!

nottorp1y ago

Only Meta?

penguin_booze1y ago

Good. Now do OpenAI.

steele1y ago

In a just world, this would shutter the organization.

openplatypus1y ago

What is this just world you are talking about?

kazinator1y ago

The legal system is not going to be kind to the AI hucksters. Why? Because, quite stupidly and counterproductively, they have stepped on its toes by claiming that AI can replace lawyers. On top of that, there have been incidents of lawyers getting in hot water for generating slop instead of doing their work. So, this isn't just some distant, abstract tech issue for the lawyers and judges, like whether APIs should be copyrighted. If you're in any kind of business, you generally want these people to be on your side. Oopsies!

Mbwagava1y ago

Whether or not Meta wins this case, I'm never going to support any government that supports both LLMs and IP. Like we have to put up with IP despite having no clear value to a digital society but as soon as it becomes inconvenient it goes out the window? Nah, let's just trash the state and start over.

It's going to take centuries to undo the damage wracked by IP-supported private enterprise. And now we also have to put up with fucking chatbots. This is the worst timeline.

jMyles1y ago

The good news is that the internet is, fundamentally and in a way that no legacy state can alter, not a place where IP is cognizable.

You are free to copy bytes as you see fit, and the internet treats them identically whether they are random noise or whether a codec can turn them into music, film, books, or whatever inspires you.

The problem is that some humans, justifying their behavior by claiming it as "official", may act out with violence against you if they (rightly or wrongly, that's important to note) perceive that your actions are causing the internet to copy bytes to which they object.

Enduring nonviolence is likely yet ahead as consensus grows over the end of the legitimacy of these legacy states.

labrador1y ago

I hope you don't think I'm snarky because I'm serious. If you're an American citizen you can homestead in Alaska and cut yourself off from all this if you like.

edit: i'm serious. many americans would be much happier taking this option if they knew it existed. i may take it myself

sillysaurusx1y ago

Homesteading is tremendously expensive, unfortunately. Most people can’t.

labrador1y ago

I didn't know that, but in that case there are a lot of young men and women on HN who are financically successful, but are tremendously unhappy. That's the case for me when I looked into it 25 years ago.

quesera1y ago

Growing food in Alaska sounds like a meager existence, though.

labrador1y ago

People hunt and fish mostly. That's a big part of the appeal.

j / k navigate · click thread line to collapse

341 comments

ndiddy1y ago