undefined | Better HN

0 pointspeter_d_sherman2y ago0 comments

>The scrapers themselves already happily ignore copyright, they won't be inclined to obey a no-ai.txt.

Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Currently I see no organisation who would be willing to do this or even just technologically able - as even just detecting such scrapers is an extremely hard task.

// Part of Image Web Scraper For AI Image Generator ingestion psuedocode:

if fileExists("no-ai.txt") {

  // Abort image scraping for this site -- move on to the next site

} else {

  // Continue image scraping for this site

};

See? Nice and simple!

Also -- let me ask you this -- what happens to the intellectual property (or just plain property) rights of Images on the web after the author dies? Or say, 50 years (or whatever the legal copyright timeout is) after the author dies?

Legal grey area perhaps?

Also -- what about Images that exist in other legal jurisdictions -- i.e., other countries?

How do we know what set of laws are to apply to a given image?

Point is: If you're going to endorse and/or construct a legal framework (and have it be binding -- keep in mind you're going to have to traverse the legal jurisdictions of many countries, many countries!) -- you might as well consider such issues.

Also -- at least in the United States, we have Juries that can override any Law (Separation of Powers) -- that is, that which is considered "legally binding" -- may not be quite so "legally binding" if/when properly explained to a proper jury in light of extenuating (or just plain other) circumstances!

So kindly think of these issues prior to making all-encompasing proposals as to what you think should be "legally binding" or not.

I comprehend that you are just trying to solve a problem; I comprehend and empathize; but the problem might be a bit greater than you think, and there might be one if not serveral unexplored partial/better (since no one solution, legal or otherwise, will be all-encompassing) solutions -- because the problem is so large in scope -- but all of these issues must be considered in parallel -- or errors, present or future will occur...

0 comments

xg152y ago

> Part of Image Web Scraper For AI Image Generator ingestion psuedocode:...

Yes, and who is supposed to run that code?

> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?

Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining. So in practical terms it's impossible to remove an image if it has already been trained on.

So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.

[1] https://www.theverge.com/2022/11/8/23446821/microsoft-openai...

[2] https://www.theverge.com/2023/2/6/23587393/ai-art-copyright-...

[3] https://www.heise.de/hintergrund/Stock-photographer-sues-AI-...

peter_d_shermanOP2y ago

>Yes, and who is supposed to run that code?

People that are honest and ethical?

And/or groups that don't want to risk getting sued? (Your: [1] [2] [3])?

>> Name two entities that were asked to stop using a given individuals' images that failed to stop using them after the stop request was issued.

>Github? OpenAI?[1] Stable Diffusion?[2] LAION?[3] What do you think why there are currently multiple high-profile lawsuits ongoing about exactly that topic?

Because:

a) (Some) American Lawyers (AKA "Bar Association Members") -- are Sue Happy?

b) Because various Governments / Deep States (foreign and domestic) / Dark Money Groups / Paid (and highly biased) Political Activists -- want to see if they can get new draconian laws (whilst believing their actions to be super-patriotic to their respective countries!) -- or at least court precedents that move in that direction -- passed?

c) Because there's big money at stake, all the way around? (https://www.biblegateway.com/passage/?search=1%20Timothy%206...)

d) Because the alleged "victims" are "playing the victim card"?

(https://tvtropes.org/pmwiki/pmwiki.php/Main/PlayingTheVictim...) (Note that as a theory, this pairs well with (a)!)

(How much revenue will they be losing if their net income from artwork was $0? Also, wouldn't such high profile cases give the artists a ton of free advertising? The Defendant companies should counter-sue for giving the Plaintiff artists what amounts to free publicity for their artwork so great that they couldn't buy it with all of the Google advertising credits in the world!)

>Besides, that's not how things work. Training a foundation model takes months and currently costs a fortune in hardware and power - and once the model is trained, there is, as of now, no way to remove individual images from the model without restraining.

>"without retraining"...

Meditate on that one for a moment...

>So in practical terms it's impossible to remove an image if it has already been trained on.

In practical terms -- just retrain the model -- sans ("without") the encroaching images!

The models will need to be updated every couple of months anyway to include new public data from the web!

Create a list of images NOT to include in the next run (see above, "no-ai.txt" -- good suggestion incidentally!) -- and then don't include them" on the next run!

It's not Rocket Science! :-)

(Also, arguably Elon Musk doesn't think that "Rocket Science" is in fact as hard as "Rocket Science" is purported to be -- but that's a separate debate! <g>)

>So the better question would be, name two entities who have ignored an artist's request to not include their image when they encountered it the first time. It's still a trick question though because the point is that scraping happens in private - we can't know which images were scraped without access to the training data. The one indication that it was probably scraped is if a model manages to reproduce it verbatim - which is the basis for some of the above lawsuits.

Explain to me, from the point of view of an AI company, how that AI company is to know ahead of time NOT to include an image from the web? (And thus not break the law, copyright law at least, and thus not incur the lawsuits and all the chaos that will apparently follow such an act?)

How is the AI company supposed to know, ahead of time, that a given image on the web is not to be included?

How please?

Because you see, that's the root of the problem you are trying to solve.

In fact, let me ask you a better question...

How can an arbitrary Internet User -- not a big, legally powerful AI company, but an arbitrary small-fry Internet User -- know ahead of time, that a given Image, exposed to the public via the public Internet; the Web -- that the artist who created that image (or the intellectual/artistic property holder) -- does NOT want their Image to be used for specific purposes?

Because well, I don't know of any easily parsible, easily understandable standard for that on the Web currently...

So, to recap, the question is:

How is everybody (humans and machines) to know the unambiguous, easily parsable, easily understandable uses that the artist (or intellectual/artistic property) of an image -- wishes/wills for that image?

And how to easily know the unintended uses?

That might be a better definition of the problem that is trying to be solved...

j / k navigate · click thread line to collapse