undefined | Better HN

0 pointsJumpCrisscross1y ago0 comments

> the training data is the source

Sure. But that's not going to be released. The term open source AI cannot be expected to cover it because it's not practical.

0 comments

tintor1y ago

Meta can call it something else other than open source.

Synthetic part of the training data could be released.

plsbenice341y ago

Of course it could be practical - provide the data. The fact of that society is a dystopian nightmare controlled by a few megacorporations that don't want free information does not justify outright changing the meaning of the language.

JumpCrisscrossOP1y ago

> provide the data

Who? It's not their data.

exe341y ago

why are they using it?

1 more reply

diggan1y ago

So because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

JumpCrisscrossOP1y ago

> because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Meta et al would love for the choice to be between, on one hand, open weights only, and, on the other hand, open training data, because the latter is impractical. That dichotomy guarantees that when someone says open source AI they'll mean open weights. (The way open source software, today, generally means source available, not FOSS.)

unethical_ban1y ago

>Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Here's the source of the disagreement. You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding.

Other person is saying it doesn't matter how convenient it is or how much Meta wants to use it, that the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use.

This would be like Adobe giving Photoshop away for free, but for personal use only and not for making ads for Adobe's competitors. Sure, Adobe likes it and most users may be fine with it, but it isn't open source.

>The way open source software, today, generally means source available, not FOSS.

I don't agree with that. When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core".

1 more reply

diggan1y ago

> Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Right, so the onus is on Facebook/Meta to get that right, then they could call something Open Source, until then, find another name that already doesn't have a specific meaning.

> (The way open source software, today, generally means source available, not FOSS.)

No, but it's going in that way. Open Source, today, still means that the things you need to build a project, is publicly available for you to download and run on your own machine, granted you have the means to do so. What you're thinking of is literally called "Source Available" which is very different from "Open Source".

The intent of Open Source is for people to be able to reproduce the work themselves, with modifications if they want to. Is that something you can do today with the various Llama models? No, because one core part of the projects "source code" (what you need to reproduce it from scratch), the training data, is being held back and kept private.

Palomides1y ago

source available is absolutely not the same as open source

you are playing very loosely with terms that have specific, widely accepted definitions (e.g. https://opensource.org/osd )

I don't get why you think it would be useful to call LLMs with published weights "open source"

2 more replies

elevatedastalt1y ago

No, we need to adapt an existing term into the new context that it is being deployed in.

j / k navigate · click thread line to collapse

0 comments

tintor1y ago

Meta can call it something else other than open source.

Synthetic part of the training data could be released.

plsbenice341y ago

JumpCrisscrossOP1y ago

> provide the data

Who? It's not their data.

exe341y ago

why are they using it?

1 more reply

diggan1y ago

So because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

JumpCrisscrossOP1y ago

> because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

unethical_ban1y ago

>Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Here's the source of the disagreement. You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding.

>The way open source software, today, generally means source available, not FOSS.

I don't agree with that. When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core".

1 more reply

diggan1y ago

> Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Right, so the onus is on Facebook/Meta to get that right, then they could call something Open Source, until then, find another name that already doesn't have a specific meaning.

> (The way open source software, today, generally means source available, not FOSS.)

Palomides1y ago

source available is absolutely not the same as open source

you are playing very loosely with terms that have specific, widely accepted definitions (e.g. https://opensource.org/osd )

I don't get why you think it would be useful to call LLMs with published weights "open source"

2 more replies

elevatedastalt1y ago

No, we need to adapt an existing term into the new context that it is being deployed in.

j / k navigate · click thread line to collapse