undefined | Better HN

0 pointsJumpCrisscross1y ago0 comments

> because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Meta et al would love for the choice to be between, on one hand, open weights only, and, on the other hand, open training data, because the latter is impractical. That dichotomy guarantees that when someone says open source AI they'll mean open weights. (The way open source software, today, generally means source available, not FOSS.)

0 comments

unethical_ban1y ago

>Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Here's the source of the disagreement. You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding.

Other person is saying it doesn't matter how convenient it is or how much Meta wants to use it, that the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use.

This would be like Adobe giving Photoshop away for free, but for personal use only and not for making ads for Adobe's competitors. Sure, Adobe likes it and most users may be fine with it, but it isn't open source.

>The way open source software, today, generally means source available, not FOSS.

I don't agree with that. When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core".

JumpCrisscrossOP1y ago

> You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding

I'm actually not a fan of Meta's definition. I'm arguing specifically against an unrealistic definition, because for practical purposes that cedes the term to Meta.

> the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use

Agree. I think the focus should be on the use restrictions.

> When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core"

This isn't consistently applied. It's why we have the free vs open vs FOSS fracture.

diggan1y ago

> Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Right, so the onus is on Facebook/Meta to get that right, then they could call something Open Source, until then, find another name that already doesn't have a specific meaning.

> (The way open source software, today, generally means source available, not FOSS.)

No, but it's going in that way. Open Source, today, still means that the things you need to build a project, is publicly available for you to download and run on your own machine, granted you have the means to do so. What you're thinking of is literally called "Source Available" which is very different from "Open Source".

The intent of Open Source is for people to be able to reproduce the work themselves, with modifications if they want to. Is that something you can do today with the various Llama models? No, because one core part of the projects "source code" (what you need to reproduce it from scratch), the training data, is being held back and kept private.

Palomides1y ago

source available is absolutely not the same as open source

you are playing very loosely with terms that have specific, widely accepted definitions (e.g. https://opensource.org/osd )

I don't get why you think it would be useful to call LLMs with published weights "open source"

JumpCrisscrossOP1y ago

> terms that have specific, widely accepted definitions

OSF's definition is far from the only one [1]. Switzerland is currently implementing CH Open's definition, the EU another one, et cetera.

> I don't get why you think it would be useful to call LLMs with published weights "open source"

I don't. I'm saying that if the choice is between open weights or open weights + open training data, open weights will win because the useful definition will outcompete the pristine one in a public context.

[1] https://en.wikipedia.org/wiki/Open-source_software#Definitio...

diggan1y ago

For the EU, I'm guessing you're talking about the EUPL, which is FSF/OSI approved and GPL compatible, generally considered copyleft.

For the CH Open, I'm not finding anything specific, even from Swiss websites, could you help me understand what you're referring to here?

I'm guessing that all these definitions have at least some points in common, which involves (another guess) at least being able to produce the output artifacts/binaries by yourself, something that you cannot do with Llama, just as an example.

1 more reply

Palomides1y ago

diluting open source into a marketing term meaning "you can download something" would be a sad result

SquareWheel1y ago

> specific, widely accepted definitions

Realistically, nobody outside of Hacker News commenters have ever cared about the OSD. It's just not how the term is used colloquially.

Palomides1y ago

who says open source colloquially? ime anyone who doesn't care about software licenses will just say free (per free beer)

and (strong personal opinion) any software developer should have a firm grip on the terminology and details for legal reasons

1 more reply

j / k navigate · click thread line to collapse

0 comments

unethical_ban1y ago

>Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Here's the source of the disagreement. You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding.

>The way open source software, today, generally means source available, not FOSS.

I don't agree with that. When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core".

JumpCrisscrossOP1y ago

> You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding

I'm actually not a fan of Meta's definition. I'm arguing specifically against an unrealistic definition, because for practical purposes that cedes the term to Meta.

> the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use

Agree. I think the focus should be on the use restrictions.

> When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core"

This isn't consistently applied. It's why we have the free vs open vs FOSS fracture.

diggan1y ago

> Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Right, so the onus is on Facebook/Meta to get that right, then they could call something Open Source, until then, find another name that already doesn't have a specific meaning.

> (The way open source software, today, generally means source available, not FOSS.)

Palomides1y ago

source available is absolutely not the same as open source

you are playing very loosely with terms that have specific, widely accepted definitions (e.g. https://opensource.org/osd )

I don't get why you think it would be useful to call LLMs with published weights "open source"

JumpCrisscrossOP1y ago

> terms that have specific, widely accepted definitions

OSF's definition is far from the only one [1]. Switzerland is currently implementing CH Open's definition, the EU another one, et cetera.

> I don't get why you think it would be useful to call LLMs with published weights "open source"

[1] https://en.wikipedia.org/wiki/Open-source_software#Definitio...

diggan1y ago

For the EU, I'm guessing you're talking about the EUPL, which is FSF/OSI approved and GPL compatible, generally considered copyleft.

For the CH Open, I'm not finding anything specific, even from Swiss websites, could you help me understand what you're referring to here?

1 more reply

Palomides1y ago

diluting open source into a marketing term meaning "you can download something" would be a sad result

SquareWheel1y ago

> specific, widely accepted definitions

Realistically, nobody outside of Hacker News commenters have ever cared about the OSD. It's just not how the term is used colloquially.

Palomides1y ago

who says open source colloquially? ime anyone who doesn't care about software licenses will just say free (per free beer)

and (strong personal opinion) any software developer should have a firm grip on the terminology and details for legal reasons

1 more reply

j / k navigate · click thread line to collapse