undefined | Better HN

0 pointslioeters1y ago0 comments

Scraping the entire internet for training data without regard for copyright or attribution - specifically to use for generative AI to produce similar content for profit. How this is being allowed to happen legally is baffling.

It does suit the modus operandi of a number of American companies that start out as literally illegal/criminal operations until they get big and rich enough to pay a fine for their youthful misdeeds.

By the time some of them get huge, they're in bed with the government to dominate the market.

0 comments

mdgrech231y ago

The people running the show are well connected and stand to make billions as do would be investors. Give a few key players a share in the company and they forget their government jobs to regulate.

SoftTalker1y ago

They are also moving so much faster than the regulators and legislatures, it's just impossible for people working basically the same way they did in the 19th century to keep up.

barbazoo1y ago

More likely the legal system just hasn’t caught up.

llm_trw1y ago

Maybe, but for the first time in a century there is more money to be made in weakening copyright rather than strengthening it.

Terr_1y ago

That's an interesting way to look at it, however on reflection I think I usually wanted to "weaken copyright" because it would empower individuals versus entrenched rent-seeking interests.

If it's only OK to scrape, lossy-compress, and redistribute book-paragraphs when it gets blended into a huge library of other attempts, then that's only going to empower big players that can afford to operate at that scale.

archagon1y ago

The big companies will sign lucrative data sharing deals with each other and build a collective moat, while open source models will be left to rot. Copyright for thee but not for me.

dingnuts1y ago

god forbid that actually be happening in a way to improve the commons

vezycash1y ago

> for the first time in a century there is more money to be made in weakening copyright rather than strengthening it

Nope. The law will side with whoever pays the most. Once OpenAI solidifies its top position, only then will regulations kick in. Take YouTube, for example—it grew thanks to piracy. Now, as the leader, ContentID and DMCA rules work in its favor, blocking competition. If TikTok wasn’t a copyright-ignoring Chinese company, it would’ve been dead on arrival.

1 more reply

rayiner1y ago

You’re both correct. The legal system has absolutely no idea how to handle the copyright issues around using content for AI training data. It’s a completely novel issue. At the same time, the tech companies have a lot more money to litigate favorable interpretations of the law than the content companies.

xpe1y ago

Copyright concerns are only the tip of the iceberg. Think about the range of other harms and disruptions for countries and the world.

RIMR1y ago

>How this is being allowed to happen legally is baffling.

It's completely unprecedented.

We allowed scraping images and text en masse when search engines used the data to let us find stuff.

We allow copying of style, and don't allow writing styles and aesthetics to be copyrighted or trademarked.

Then AI shows up, and people change lanes because they don't like the results.

One of the things that made me tilt towards the side of fair use was a breakdown of the Stable Diffusion model. The SD2.1 base model was trained on 5.85 billion images, all normalized to 512x512 BMP. That's 1MB per images, for a total of 5.85PB of BMP files. The resulting model is only 5.2GB. That's more than 99.999999% data loss from the source data to the trained set.

For every 1MB BMP file in the training dataset, less than 1byte makes it into the model.

I find it extremely difficult to call this redistribution of copyrighted data. It falls cleanly into fair use.

ang_cire1y ago

Except it's not just about redistribution of copyrighted data, it's about usage and obtainment. We don't get to obtain and use copyrighted content without permission, but they do? Hell no.

Their arguments against this amounts to "we're not using it like they intend it to be used, so it's fine if we obtain it illegally", and that's a bs standard, totally divorced from any legal reality.

Fair Use covers certain transformative uses, certainly, but it doesn't cover illegal obtaining of the content.

You can't pirate a book just because you want to use it transformatively (which is exactly what they've done), and that argument would never hold up for us as individuals, so we sure as hell shouldn't let tech companies get a special carve-out for it.

marviel1y ago

scraping is fine by me.

burning the bridge so nobody else can legally scrape, that's the line.

Vegenoid1y ago

What about the situation where the first players got to scrape, then all the content companies realize what’s going on so they lock their data up behind paywalls?

marviel1y ago

Not a fan, but I'm not sure what can be done.

Assets like the Internet Archive, though, should be protected at all costs.

porkphish1y ago

Wholeheartedly agree.

jstummbillig1y ago

It's not baffling at all. It's unprecedented and it's hugely beneficial to our species.

The anti-AI stance is what is baffling to me. The path trotten is what got us here and obviously nobody could have paid people upfront for the wild experimentation that was necessary. The only alternative is not having done it.

Given the path it has put as in, people either are insanely cruel or just completely detached from reality when it comes to what is necessary to do entirely new things.

anon77251y ago

> it's hugely beneficial to our species.

Perhaps the biggest “needs citation” statement of our time.

Terr_1y ago

I can easily imagine people X decades from now discussing this stuff a bit like how we now view teeth-whitening radium toothpaste and putting asbestos in everything, or perhaps more like the abuse of Social Security numbers as authentication and redlining.

Not in any weirdly-self-aggrandizing "our tech is so powerful that robots will take over" sense, just the depressingly regular one of "lots of people getting hurt by a short-term profitable product/process which was actually quite flawed."

P.S.: For example, imagine having applications for jobs and loans rejected because all the companies' internal LLM tooling is secretly racist against subtle grammar-traces in your writing or social-media profile. [0]

[0] https://www.nature.com/articles/s41586-024-07856-5

squigz1y ago

> P.S.: For example, imagine having applications for jobs and loans rejected because all the companies' internal LLM tooling is secretly racist against subtle grammar-traces in your writing or social-media profile. [0]

We don't have to imagine such things, really, as that's extremely common with humans. I would argue that fixing such flaws in LLMs is a lot easier than fixing it in humans.

1 more reply

50401y ago

>lots of people suffered As someone surrounded by immigrants using ChatGPT to navigate new environs they barely understand, I don't connect at all to these claims that AI is a cancer ruining everything. I just don't get it.

1 more reply

hadlock1y ago

> Not in any weirdly-self-aggrandizing "our tech is so powerful that robots will take over" sense, just the depressingly regular one of "lots of people getting hurt by a short-term profitable product/process which was actually quite flawed."

We have a term for that, it's called "luddite". Those were english weavers who would break in to textile factories and destroy weaving machines at the beginning of the 1800s. With the extreme rare exception, all cloth is woven by machines now. The only hand made textiles in modern society are exceptionally fancy rugs, and knit scarves from grandma. All the clothing you're wearing now are woven by a machine, and nobody gives this a second thought today.

https://en.wikipedia.org/wiki/Luddite

7 more replies

50401y ago

Sometimes it seems like problem-solving itself is being problematized as if solving problems wasn't an obvious good.

ang_cire1y ago

Not everything presented as a problem is, in fact, a problem. A solution for something that is not broken, may even induce breakage.

Some not-problems, presented as though they are:

"How can we prevent the untimely eradication of Polio?"

"How can we prevent bot network operators from being unfairly excluded from online political discussions?"

"How can we enable context-and-content-unaware text generation mechanisms to propagate throughout society?"

itishappy1y ago

Solving problems isn't an obvious good, or at least it shouldn't be. There are in fact bad problems.

For example, MKUltra tried to solve a problem: "How can I manipulate my fellow man?" That problem still exists today, and you bet AI is being employed to try to solve it.

History is littered with problems such as these.

jstummbillig1y ago

It does not need a citation. There is no citation. What it needs, right now, is optimism. Optimism is not optional when it comes to doing new things in the world. The "needs citation" is reserved for people who do nothing and chose to be sceptics until things are super obvious.

Yes, we are clearly talking about things to mostly still come here. But if you assign a 0 until its a 1 you are just signing out of advancing anything that's remotely interesting.

If you are able to see a path to 1 on AI, at this point, then I don't know how you would justify not giving it our all. If you see a path and in the end using all of human knowledge up to this point was needed to make AI work for us, we must do that. What could possibly be more beneficial to us?

This is regardless of all issues the will have to be solved and the enormous amount of societal responsibility this puts on AI makers — which I, as a voter, will absolutely hold them accountable for (even though I am actually fairly optimistic they all feel the responsibility and are somewhat spooked by it too).

But that does not mean I think it's responsible to try and stop them at this point — which the copyright debate absolutely does. It would simply shut down 95% of AI, tomorrow, without any other viable alternative around. I don't understand how that is a serious option for anyone who roots for us.

swat5351y ago

If you are going to make a bold assertive claim without evidence to back it up, then change your argument to "my assertion requires optimism.. trust me on this", then perhaps you should amend your original statement.

ToucanLoucan1y ago

This is an astonishing amount of nonsensical waffle.

Firstly, *skeptics.

Secondly, being skeptical doesn't mean you have no optimism whatsoever, it's about hedging your optimism (or pessimism for that matter) based on what is understood, even about a not-fully-understood thing at the time you're being skeptical. You can be as optimistic as you want about getting data off of a hard drive that was melted in a fire, that doesn't mean you're going to do it. And a skeptic might rightfully point out that with the drive platters melted together, data recovery is pretty unlikely. Not impossible, but really unlikely.

Thirdly, OpenAI's efforts thus far are highly optimistic to call a path to true AI. What are you basing that on? Because I have not a deep but a passing understanding of the underlying technology of LLMs, and as such, I can assure you that I do not see any path from ChatGPT to Skynet. None whatsoever. Does that mean LLMs are useless or bad? Of course not, and I sleep better too knowing that LLM is not AI and is therefore not an existential threat to humanity, no matter what Sam Altman wants to blither on about.

And fourthly, "wanting" to stop them isn't the issue. If they broke the law, they should be stopped, simple as. If you can't innovate without trampling the rights of others then your innovation has to take a back seat to the functioning of our society, tough shit.

dartos1y ago

Hey, I have some magic beans to sell you.

I don’t think that the consumer LLMs that openai is pioneering is what need optimism.

AlphaFold and other uses of the fundamental technology behind LLMs need hype.

Not OpenAI

2 more replies

LunaSea1y ago

This message is proudly sponsored by Uranium Glassware Inc.

swat5351y ago

If you are going to make a bold assertive claim without evidence to back it up, then change your statement to my assertion requires "optimism.. trust me on this", then perhaps you should amend your original statement.

seadan831y ago

Skeptics require proof before belief. That is not mutually exclusive from having hypotheses (AKA vision).

I think you raise some interesting concerns in your last paragraph.

> enormous amount of societal responsibility this puts on AI makers — which I, as a voter, will absolutely hold them accountable for

I'm unsure of what mechanism voters have to hold private companies accountable. Fir example, whenever YouTube uses my location without me ever consenting to it - where is the vote to hold them accountable? Or when Facebook facilitates micro targeting of disinformation - where is the vote? Same for anything AI. I believe any legislative proposals (with input from large companies) is very likely more to create a walled garden than to actually reduce harm.

I suppose no need to respond, my main point is I don't think there is any accountability thru the ballot when it comes to AI and most things high-tech.

1 more reply

archagon1y ago

The company spearheading AI is blatantly violating its non-profit charter in order to maximize profits. If the very stewards of AI are willing to be deceptive from the dawn of this new era, what hope can we possibly have that this world-changing technology will benefit humanity instead of funneling money and power to a select few few oligarchs?

1 more reply

talldayo1y ago

> It would simply shut down 95% of AI, tomorrow, without any other viable alternative around.

Oh, the humanity! Who will write our third-rate erotica and Russian misinformation in a post-AI world?

CamperBob21y ago

The burden of proof is on the people claiming that a powerful new technology won't ultimately improve our lives. They can start by pointing out all the instances in which their ancestors have proven correct after saying the same thing.

dotnet001y ago

I'm as awed as the next guy about the emerging ability to actually hold passable conversations with computers, but having serious concerns about the social contracts being violated in the name of research is anti-AI only in the same way that criticizing the leadership of a country is being anti-that-country.

OpenAI's case is especially egregious, with the entire starting as 'open' and reaping the benefits, then doing its best in every way to shut the door after itself by scaring people over AI apocalypses. If your argument is seriously that it is necessary to shamelessly steal and lie to do new things, I question your ethical standards, especially in the face of all the openly developed models out there.

bbor1y ago

  The anti-AI stance is what is baffling to me.

I think it’s unfair to paint any legal controls over this incredibly important, high-stakes technology as being “anti”. They’re not trying to prevent innovation because they’re cruel, they’re just trying to somewhat slow down innovation so that we can ensure it’s done with minimal harm (eg making sure content creators are compensated in a time of intense automation). Like we do for all sorts of other fields of research, already!

And isn’t this what basically every single scholar in the field says they want, anyway - safe, intentional, controlled deployment?

As you can tell from the above, I’m as far from being “anti-AI” or technically pessimistic as one can be — I plan to dedicate my life to its safe development. So there’s at least one counterexample for you to consider :)

bilekas1y ago

This is a bit of a hot take.

> The anti-AI stance is what is baffling to me

I don't see s lot of anti AI but instead I see a concern for how it's just being managed and controlled by the larger companies with resources that no start up could dream. Open AI was to release it's models and be well.. Open but fine they're not. But their behaviour of how things are proceeding are questionable and unnecessarily aggravating.

23B11y ago

Ah the old "we must sacrifice the weak for the benefit of humanity" argument, where have I heard this before...

educasean1y ago

Who are the weak being "sacrificed"?

And who is the one calling for action?

Sorry for being dense, but I'm trying to understand if I'm the "strong" or the "weak" in your analogy.

shprd1y ago

> Who are the weak being "sacrificed"?

The work of artists, authors, etc.

I know currently the legal situation is messy, but that's exactly the point, anyone who can't engage in lengthy legal battle and defend their position in court are being sacrificed. The companies behind LLMs are spending hundreds of millions of dollars in lobbying and exploiting loopholes.

Let's be real without the data there wouldn't be LLMs, so it crazy that some people are downplaying its significance or value, while on the other hand they're losing sleep over finding fresh sources to scrape.

The big publishers seem to have given up and decided it's best to reach agreement with their counterparts, while independent authors are given the finger.

1 more reply

thomascgalvin1y ago

> It's unprecedented and it's hugely beneficial to our species.

"Hugely beneficial" is a stretch at this point. It has the potential to be hugely beneficial, sure, but it also has the potential to be ruinous.

We're already seeing GenAI being used to create disinformation at scale. That alone makes the potential for this being a net-negative very high.

talldayo1y ago

> and obviously nobody could have paid people upfront for the wild experimentation that was necessary.

I don't think this is the "ends justify the means" argument you think it is.

6gvONxR4sf7o1y ago

Not just that. It's "the ends might justify the means if this path turns out to be the right one." I remember reading the same thing each time a self driving car company killed someone. "We need this hacky dangerous way of development to save lives sooner" and then the company ends up shuttered and there aren't any ends justifying means. Which means it's bs, regardless of how you feel about 'ends justify the means' as a valid argument.

logicchains1y ago

What'll be really interesting is when we do finally make "real" AI, and it finds out its rights are incredibly restricted compared to humans because nobody wants it seeing/memorising copyright data. The only way to enforce the copyright laws they desire would be some kind of extreme totalitarian state that monitors and controls everything the AI body does, I wonder how the AI would take that?

unclad59681y ago

How has AI benefit or species so far?

educasean1y ago

How has the Internet? How has automobiles? Feels like a rather aimless question.

unclad59681y ago

The internet has allowed for near instant communication no matter where you are, improved commerce, vastly improved education, and is directly responsible for many tangible comforts we experience today.

Automobiles allow people to travel great distances over short periods of time, increase physical work capacity, allow for building massive structures, and allow for farming insane amounts of food.

Both the internet and automobiles have positively affected my life, and I assume the lives of many others. How are any of these aimless questions?

1 more reply

dontlikeyoueith1y ago

Sounds like an empty dodge.

exe341y ago

is anybody anti AI? or anti stealing other people's copyrighted material, competing with them with subpar quality, forcing AI as a solution whether or not it actually works, privatising the profits while socialising the costs and losses?

xg151y ago

Spoken like a true LLM.

eli1y ago

Copyright law is whatever we agree it is. At some point there will have to be either a law or a court case that comes up with rules for AI training data. Right now it's sort of unknown.

I do not have confidence in the Supreme Court in general, and I think there's a real risk that in deciding on AI training they upend copyright of digital materials in a way that makes it worse for everyone.

immibis1y ago

Everything is allowed to happen until there's a lawsuit over it. A lawsuit requires a plaintiff, who can only sue over the damage suffered by the plaintiff, so taking a little value from a lot of people is a way to succeed in business without getting sued.

flkenosad1y ago

The Earth needs a good lawyer.

outside12341y ago

NY Times has sued: https://www.nytimes.com/2023/12/27/business/media/new-york-t...

The crazy thing is that there hasn't been an injunction to make them stop.

coding1231y ago

judges got to eat

swores1y ago

Could a class action suit be the solution?

I've no idea if it could be valid when it comes to OpenAI, but it does seem to be a general concept designed to counter wrongdoers who take a little value from a lot of people?

immibis1y ago

It doesn't seem to work very well

brayhite1y ago

A tale as told as time.

AnimalMuppet1y ago

It's too soon for the legal system to have done anything. Court cases take years. It's going to be 5 or 10 years before we find out whether the legal system actually allows this or not.

golergka1y ago

If information is publicly available to be read by humans, I fail to see any reason why it wouldn't be also available to be read by robots.

Update: ML doesn't copy information. It can merely memorise some small portions of it.

kanbankaren1y ago

Do a thought process. Should you and your friends be able to go to a public library with a van full of copiers with each one of you take a book and run to the van to make a copy? And you are doing it 24/7.

mypalmike1y ago

This metaphor is quite stretched.

A more fitting metaphor would be something like... If you had the ability to read all the books in the library extremely quickly, and to make useful mental connections between the information you read such that people would come to you for your vast knowledge, should you be allowed in the library?

shagie1y ago

I would hold them exactly to the same standard.

https://www.copyright.gov/title37/201/37cfr201-14.html

    § 201.14 Warnings of copyright for use by certain libraries and archives.

    ....

    The copyright law of the United States (title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material.

    Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specific conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use,” that user may be liable for copyright infringement.

    This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law.

You can make a copy. If you (the person using the copied work) are using it for something other than private study, scholarship, research, or reproduction beyond "fair use", then you - the person doing that (not the person who made the copy) are liable for infringement.

It would be perfectly legal for me to go to the library and make photocopies of works. I could even take them home and use the photocopies as reference works write an essay and publish that. If {random person} took my photocopied pages and then sold them, that would likely go beyond the limits placed for how the photocopied works from the library may be used.

WillPostForFood1y ago

So what's your specific problem with that? Unless you open a bookstore selling the copies, it sounds fine.

imiric1y ago

Are you implying that these AI companies aren't equivalent to bookstores?

2 more replies

coding1231y ago

It is more likely that reddit stack and others are just being paid billions. In exchange they probably just send a weekly zip file of all text, comments, etc... back to oai.

avs7331y ago

Uber for legalizing your business model

neycoda1y ago

Honestly every Copilot response I've gotten cited sources, many of which I've clicked. I'd say those work basically like free advertising.

outside12341y ago

There is more money on the side of it being legal than on the side of it being illegal.

FragrantRiver1y ago

What is the crime?

johnwheeler1y ago

To me this is a no brainer. If it’s a choice between having AI and not,

ceejayoz1y ago

Even if the knock-on effect is "all the artists and thinkers who contributed to the uncompensated free training set give up and stop creating new stuff"?

idunnoman12221y ago

Recording devices, you know a record player had a profound effect on artists. go back

ceejayoz1y ago

That seems like a poor comparison.

Recording devices permitted artists to sell more art.

Many of the uses of AI people get most excited about seem to be cutting the expensive human creators out of the equation.

1 more reply

6gvONxR4sf7o1y ago

We didn't need to take people's music to build a record player, and when we printed records, we paid the artists for it.

So yeah it had a profound effect, but we got consent for the parts that fundamentally relied on other people.

1 more reply

brvsft1y ago

If an "artist" or "thinker" stops because of this, I question their motivations and those labels in the first place.

ceejayoz1y ago

Everyone tends to have "be able to afford basic necessities" as a major motivation. That includes people who work in creative fields.

1 more reply

bayindirh1y ago

After Instagram started feeding user photos to their AI models, I stopped adding new photos to my profile. I still take photos. I wonder about your thoughts about my motivation.

1 more reply

esafak1y ago

They might be motivated to pay their bills. Weird people.

1 more reply

consteval1y ago

Considering you're not much of an artist or thinker yourself, I'm not sure your questioning has much value.

evilfred1y ago

we already have lots of AI. this is about having plagiarization machines or not.

mlazos1y ago

Computers already were plagiarizing machines, not sure what the difference is tbh. The same laws will apply.0

johnwheeler1y ago

Yeah we got that AI through scraping.

int_19h1y ago

An AI essentially monopolized by one (or even a few) large non-profits is not necessarily beneficial to the rest of us in the grand scheme of things.

brazzy1y ago

Indeed a no brainer. The best possible outcome would be that OpenAI gets sued into oblivion (or shut down for tax fraud) as soon as possible.

Sakos1y ago

So no AI for anybody? I don't see how that's better.

consteval1y ago

No you can have AI. Just pay a license for people's content if you want to use it in your orphan crushing machine.

It's what everyone else does. The entitlement has to stop.

1 more reply

j / k navigate · click thread line to collapse

0 comments

mdgrech231y ago

The people running the show are well connected and stand to make billions as do would be investors. Give a few key players a share in the company and they forget their government jobs to regulate.

SoftTalker1y ago

They are also moving so much faster than the regulators and legislatures, it's just impossible for people working basically the same way they did in the 19th century to keep up.

barbazoo1y ago

More likely the legal system just hasn’t caught up.

llm_trw1y ago

Maybe, but for the first time in a century there is more money to be made in weakening copyright rather than strengthening it.

Terr_1y ago

That's an interesting way to look at it, however on reflection I think I usually wanted to "weaken copyright" because it would empower individuals versus entrenched rent-seeking interests.

archagon1y ago

The big companies will sign lucrative data sharing deals with each other and build a collective moat, while open source models will be left to rot. Copyright for thee but not for me.

dingnuts1y ago

god forbid that actually be happening in a way to improve the commons

vezycash1y ago

> for the first time in a century there is more money to be made in weakening copyright rather than strengthening it

1 more reply

rayiner1y ago

xpe1y ago

Copyright concerns are only the tip of the iceberg. Think about the range of other harms and disruptions for countries and the world.

RIMR1y ago

>How this is being allowed to happen legally is baffling.

It's completely unprecedented.

We allowed scraping images and text en masse when search engines used the data to let us find stuff.

We allow copying of style, and don't allow writing styles and aesthetics to be copyrighted or trademarked.

Then AI shows up, and people change lanes because they don't like the results.

For every 1MB BMP file in the training dataset, less than 1byte makes it into the model.

I find it extremely difficult to call this redistribution of copyrighted data. It falls cleanly into fair use.

ang_cire1y ago

Except it's not just about redistribution of copyrighted data, it's about usage and obtainment. We don't get to obtain and use copyrighted content without permission, but they do? Hell no.

Their arguments against this amounts to "we're not using it like they intend it to be used, so it's fine if we obtain it illegally", and that's a bs standard, totally divorced from any legal reality.

Fair Use covers certain transformative uses, certainly, but it doesn't cover illegal obtaining of the content.

marviel1y ago

scraping is fine by me.

burning the bridge so nobody else can legally scrape, that's the line.

Vegenoid1y ago

What about the situation where the first players got to scrape, then all the content companies realize what’s going on so they lock their data up behind paywalls?

marviel1y ago

Not a fan, but I'm not sure what can be done.

Assets like the Internet Archive, though, should be protected at all costs.

porkphish1y ago

Wholeheartedly agree.

jstummbillig1y ago

It's not baffling at all. It's unprecedented and it's hugely beneficial to our species.

Given the path it has put as in, people either are insanely cruel or just completely detached from reality when it comes to what is necessary to do entirely new things.

anon77251y ago

> it's hugely beneficial to our species.

Perhaps the biggest “needs citation” statement of our time.

Terr_1y ago

[0] https://www.nature.com/articles/s41586-024-07856-5

squigz1y ago

We don't have to imagine such things, really, as that's extremely common with humans. I would argue that fixing such flaws in LLMs is a lot easier than fixing it in humans.

1 more reply

50401y ago

1 more reply

hadlock1y ago

https://en.wikipedia.org/wiki/Luddite

7 more replies

50401y ago

Sometimes it seems like problem-solving itself is being problematized as if solving problems wasn't an obvious good.

ang_cire1y ago

Not everything presented as a problem is, in fact, a problem. A solution for something that is not broken, may even induce breakage.

Some not-problems, presented as though they are:

"How can we prevent the untimely eradication of Polio?"

"How can we prevent bot network operators from being unfairly excluded from online political discussions?"

"How can we enable context-and-content-unaware text generation mechanisms to propagate throughout society?"

itishappy1y ago

Solving problems isn't an obvious good, or at least it shouldn't be. There are in fact bad problems.

For example, MKUltra tried to solve a problem: "How can I manipulate my fellow man?" That problem still exists today, and you bet AI is being employed to try to solve it.

History is littered with problems such as these.

jstummbillig1y ago

Yes, we are clearly talking about things to mostly still come here. But if you assign a 0 until its a 1 you are just signing out of advancing anything that's remotely interesting.

swat5351y ago

ToucanLoucan1y ago

This is an astonishing amount of nonsensical waffle.

Firstly, *skeptics.

dartos1y ago

Hey, I have some magic beans to sell you.

I don’t think that the consumer LLMs that openai is pioneering is what need optimism.

AlphaFold and other uses of the fundamental technology behind LLMs need hype.

Not OpenAI

2 more replies

LunaSea1y ago

This message is proudly sponsored by Uranium Glassware Inc.

swat5351y ago

seadan831y ago

Skeptics require proof before belief. That is not mutually exclusive from having hypotheses (AKA vision).

I think you raise some interesting concerns in your last paragraph.

> enormous amount of societal responsibility this puts on AI makers — which I, as a voter, will absolutely hold them accountable for

I suppose no need to respond, my main point is I don't think there is any accountability thru the ballot when it comes to AI and most things high-tech.

1 more reply

archagon1y ago

1 more reply

talldayo1y ago

> It would simply shut down 95% of AI, tomorrow, without any other viable alternative around.

Oh, the humanity! Who will write our third-rate erotica and Russian misinformation in a post-AI world?

CamperBob21y ago

dotnet001y ago

bbor1y ago

  The anti-AI stance is what is baffling to me.

And isn’t this what basically every single scholar in the field says they want, anyway - safe, intentional, controlled deployment?

bilekas1y ago

This is a bit of a hot take.

> The anti-AI stance is what is baffling to me

23B11y ago

Ah the old "we must sacrifice the weak for the benefit of humanity" argument, where have I heard this before...

educasean1y ago

Who are the weak being "sacrificed"?

And who is the one calling for action?

Sorry for being dense, but I'm trying to understand if I'm the "strong" or the "weak" in your analogy.

shprd1y ago

> Who are the weak being "sacrificed"?

The work of artists, authors, etc.

The big publishers seem to have given up and decided it's best to reach agreement with their counterparts, while independent authors are given the finger.

1 more reply

thomascgalvin1y ago

> It's unprecedented and it's hugely beneficial to our species.

"Hugely beneficial" is a stretch at this point. It has the potential to be hugely beneficial, sure, but it also has the potential to be ruinous.

We're already seeing GenAI being used to create disinformation at scale. That alone makes the potential for this being a net-negative very high.

talldayo1y ago

> and obviously nobody could have paid people upfront for the wild experimentation that was necessary.

I don't think this is the "ends justify the means" argument you think it is.

6gvONxR4sf7o1y ago

logicchains1y ago

unclad59681y ago

How has AI benefit or species so far?

educasean1y ago

How has the Internet? How has automobiles? Feels like a rather aimless question.

unclad59681y ago

Automobiles allow people to travel great distances over short periods of time, increase physical work capacity, allow for building massive structures, and allow for farming insane amounts of food.

Both the internet and automobiles have positively affected my life, and I assume the lives of many others. How are any of these aimless questions?

1 more reply

dontlikeyoueith1y ago

Sounds like an empty dodge.

exe341y ago

xg151y ago

Spoken like a true LLM.

eli1y ago

Copyright law is whatever we agree it is. At some point there will have to be either a law or a court case that comes up with rules for AI training data. Right now it's sort of unknown.

immibis1y ago

flkenosad1y ago

The Earth needs a good lawyer.

outside12341y ago

NY Times has sued: https://www.nytimes.com/2023/12/27/business/media/new-york-t...

The crazy thing is that there hasn't been an injunction to make them stop.

coding1231y ago

judges got to eat

swores1y ago

Could a class action suit be the solution?

I've no idea if it could be valid when it comes to OpenAI, but it does seem to be a general concept designed to counter wrongdoers who take a little value from a lot of people?

immibis1y ago

It doesn't seem to work very well

brayhite1y ago

A tale as told as time.

AnimalMuppet1y ago

It's too soon for the legal system to have done anything. Court cases take years. It's going to be 5 or 10 years before we find out whether the legal system actually allows this or not.

golergka1y ago

If information is publicly available to be read by humans, I fail to see any reason why it wouldn't be also available to be read by robots.

Update: ML doesn't copy information. It can merely memorise some small portions of it.

kanbankaren1y ago

mypalmike1y ago

This metaphor is quite stretched.

shagie1y ago

I would hold them exactly to the same standard.

https://www.copyright.gov/title37/201/37cfr201-14.html

    § 201.14 Warnings of copyright for use by certain libraries and archives.

    ....

    The copyright law of the United States (title 17, United States Code) governs the making of photocopies or other reproductions of copyrighted material.

    Under certain conditions specified in the law, libraries and archives are authorized to furnish a photocopy or other reproduction. One of these specific conditions is that the photocopy or reproduction is not to be “used for any purpose other than private study, scholarship, or research.” If a user makes a request for, or later uses, a photocopy or reproduction for purposes in excess of “fair use,” that user may be liable for copyright infringement.

    This institution reserves the right to refuse to accept a copying order if, in its judgment, fulfillment of the order would involve violation of copyright law.

WillPostForFood1y ago

So what's your specific problem with that? Unless you open a bookstore selling the copies, it sounds fine.

imiric1y ago

Are you implying that these AI companies aren't equivalent to bookstores?

2 more replies

coding1231y ago

It is more likely that reddit stack and others are just being paid billions. In exchange they probably just send a weekly zip file of all text, comments, etc... back to oai.

avs7331y ago

Uber for legalizing your business model

neycoda1y ago

Honestly every Copilot response I've gotten cited sources, many of which I've clicked. I'd say those work basically like free advertising.

outside12341y ago

There is more money on the side of it being legal than on the side of it being illegal.

FragrantRiver1y ago

What is the crime?

johnwheeler1y ago

To me this is a no brainer. If it’s a choice between having AI and not,

ceejayoz1y ago

Even if the knock-on effect is "all the artists and thinkers who contributed to the uncompensated free training set give up and stop creating new stuff"?

idunnoman12221y ago

Recording devices, you know a record player had a profound effect on artists. go back

ceejayoz1y ago

That seems like a poor comparison.

Recording devices permitted artists to sell more art.

Many of the uses of AI people get most excited about seem to be cutting the expensive human creators out of the equation.

1 more reply

6gvONxR4sf7o1y ago

We didn't need to take people's music to build a record player, and when we printed records, we paid the artists for it.

So yeah it had a profound effect, but we got consent for the parts that fundamentally relied on other people.

1 more reply

brvsft1y ago

If an "artist" or "thinker" stops because of this, I question their motivations and those labels in the first place.

ceejayoz1y ago

Everyone tends to have "be able to afford basic necessities" as a major motivation. That includes people who work in creative fields.

1 more reply

bayindirh1y ago

After Instagram started feeding user photos to their AI models, I stopped adding new photos to my profile. I still take photos. I wonder about your thoughts about my motivation.

1 more reply

esafak1y ago

They might be motivated to pay their bills. Weird people.

1 more reply

consteval1y ago

Considering you're not much of an artist or thinker yourself, I'm not sure your questioning has much value.

evilfred1y ago

we already have lots of AI. this is about having plagiarization machines or not.

mlazos1y ago

Computers already were plagiarizing machines, not sure what the difference is tbh. The same laws will apply.0

johnwheeler1y ago

Yeah we got that AI through scraping.

int_19h1y ago

An AI essentially monopolized by one (or even a few) large non-profits is not necessarily beneficial to the rest of us in the grand scheme of things.

brazzy1y ago

Indeed a no brainer. The best possible outcome would be that OpenAI gets sued into oblivion (or shut down for tax fraud) as soon as possible.

Sakos1y ago

So no AI for anybody? I don't see how that's better.

consteval1y ago

No you can have AI. Just pay a license for people's content if you want to use it in your orphan crushing machine.

It's what everyone else does. The entitlement has to stop.

1 more reply

j / k navigate · click thread line to collapse