undefined | Better HN

0 pointslolinder1y ago0 comments

> A lot of data, mostly produced by humans who are not credited and have no say in how the output weights are licensed

And this is what I think everyone is actually dancing around: I suspect the insistence on publishing the training data has very little to do with a sense of purity around the definition of Open Source and everything to do with frustrations about copyright and intellectual property.

For that same reason, we won't see open source models by this definition any time soon, because the legal questions around data usage are profoundly unsettled and no company can afford to publicize the complete set of data that they trained on until they are.

My personal ethic says that intellectual property is a cancer that sacrifices knowledge and curiosity on the altar of profit, so I'm not overly concerned about forcing companies to reveal where they got the data. If they're releasing the resulting weights under a free license (which, notably, Llama isn't) then that's good enough for me.

0 comments

smolder1y ago

> For that same reason, we won't see open source models by this definition any time soon

It's totally fine if we don't have many (or any) models meeting the definition of open source! How hard is it to use a different term that actually applies?

The people on my side of the argument seem to be saying: "do not misapply these words", not "do not give away your weights".

Insisting on calling a model with undisclosed sources "open source" has what benefit? Marketing? That's really all I can think of... that it's to satisfy the goals of propagandists.

Shamar1y ago

It's not just marketing: European AI Act impose several compliance obligations to corporations building AI system, including serious scientific scrutiny on the whole training process.

Such obligations are designed to mitigate the inherent risks that AI can pose to individuals and society.

The AI Act exempts open source from such scientific scrutiny because it's already transparent.

BUT if OSI defines black boxes as "open source", they open a loophole that will be exploited to harm people without being held accountable.

So it's not just marketing, but dangerous corporate capture.

acka1y ago

Exactly. Without models being truly open source, (training data, training procedures, alignment etc.), there is no way for auditors to assess, for example, whether a model was trained on data exhibiting certain forms of selection bias (anything from training data or alignment being overly biased towards Western culture, controversial political or moral viewpoints, particular religions, gender stereotypes, even racism) which might lead to dangerous outcomes later on, whether by contamination of derived models or during inference.

JumpCrisscross1y ago

> if OSI defines black boxes as "open source", they open a loophole that will be exploited to harm people without being held accountable

The OSI’s definition matches the legal definition in the EU and California (and common use). If the OSI says open data only, it will just be ignored. (If people are upset about the current use, they can make the free vs. open distinction we do in software to keep the pedantic definition contained.)

seba_dos11y ago

> very little to do with a sense of purity around the definition of Open Source and everything to do with frustrations about copyright and intellectual property

The whole reason FOSS exists is because of frustrations about copyright and intellectual property, anything else is derived from that, so I'm not sure what your point is.

zoobab1y ago

"frustrations about copyright and intellectual property"

Intellectual property is an undefined term, I would say copyright, although patents can also play a role in some countries.

seba_dos11y ago

> Intellectual property is an undefined term

That's one of the frustrating things about it ;)

j / k navigate · click thread line to collapse

0 comments

smolder1y ago

> For that same reason, we won't see open source models by this definition any time soon

It's totally fine if we don't have many (or any) models meeting the definition of open source! How hard is it to use a different term that actually applies?

The people on my side of the argument seem to be saying: "do not misapply these words", not "do not give away your weights".

Insisting on calling a model with undisclosed sources "open source" has what benefit? Marketing? That's really all I can think of... that it's to satisfy the goals of propagandists.

Shamar1y ago

It's not just marketing: European AI Act impose several compliance obligations to corporations building AI system, including serious scientific scrutiny on the whole training process.

Such obligations are designed to mitigate the inherent risks that AI can pose to individuals and society.

The AI Act exempts open source from such scientific scrutiny because it's already transparent.

BUT if OSI defines black boxes as "open source", they open a loophole that will be exploited to harm people without being held accountable.

So it's not just marketing, but dangerous corporate capture.

acka1y ago

JumpCrisscross1y ago

> if OSI defines black boxes as "open source", they open a loophole that will be exploited to harm people without being held accountable

seba_dos11y ago

> very little to do with a sense of purity around the definition of Open Source and everything to do with frustrations about copyright and intellectual property

The whole reason FOSS exists is because of frustrations about copyright and intellectual property, anything else is derived from that, so I'm not sure what your point is.

zoobab1y ago

"frustrations about copyright and intellectual property"

Intellectual property is an undefined term, I would say copyright, although patents can also play a role in some countries.

seba_dos11y ago

> Intellectual property is an undefined term

That's one of the frustrating things about it ;)

j / k navigate · click thread line to collapse