undefined | Better HN

0 pointsbuyucu1y ago0 comments

Opening data is an invitation to lawsuits. That is why even the most die-hard open source enthusiasts are reluctant. It is also why people train a model and generate data with it, rather than sharing the original datasets.

These datasets are huge, and it's practically impossible to make sure they are clean of illegal or embarrassing stuff.

0 comments

johnla1y ago

Sounds like a job for AI.

sdesol1y ago

I understand the reasoning and I hope there is legislation in the future that basically goes "If you can't produce the data, you can't charge more than this for it". Basically, LLM producers will have to treat their product as a commodity product that can only be priced based on the compute resources plus some overhead.

j / k navigate · click thread line to collapse

0 comments

johnla1y ago

Sounds like a job for AI.

sdesol1y ago

j / k navigate · click thread line to collapse