undefined | Better HN

0 pointssebzim45003y ago0 comments

>And if you're a data provider, are there any assurances that openai isn't just scraping the output and using it as part of their RLHF training loop, baking your proprietary data into their model?

I don't think this should be a major concern for most people

i) What assurance is there that they won't do that anyway? You have no legal recourse against them scraping your website (see linkedin's failed legal battles).

ii) Most data providers change their data sometimes, how will ChatGPT know whether the data is stale?

iii) RLHF is almost useless when it comes to learning new information, and finetuning to learn new data is extremely inefficient. The bigger concern is that it will end up in the training data for the next model.

0 comments

majormajor3y ago

To me the logical outcome of this is siloization of information.

If display ad revenue as a way of monetizing knowledge and expertise dries up, why would we assume that all of the same level of information will still be put out there for free on the public internet?

Paywalls on steroids for "vetted" content and an increasingly-hard-to-navigate mix of people sharing good info for free + spam and misinformation (now also machine generated!) to try to capture the last of the search traffic and display ad monetization market.

visarga3y ago

Two more years down the line, AI writes better content than most people and we just don't care who wrote it, but why.

majormajor3y ago

The AI has to learn from something. A lot of people feeding the internet with content today are getting paid for it one way or another. In ways that wouldn't hold up if people stop using the web as-is.

Solving that acquisition and monetization of new stuff into the AI models problems will be interesting.

pixl973y ago

People are highly egotistical and love feeding endless streams of video and pictures online, and our next generation models will be there to slurp it all up.

scarface743y ago

Paying for good content and not dealing with adTech? I would definitely pay for that.

sebzim4500OP3y ago

Is there good data out there that's ad supported? There are some good youtube channels, I can't think of anything else.

majormajor3y ago

Only ad supported, or dual revenue, or what? E.g. even most paywalled things are also ad supported.

j / k navigate · click thread line to collapse

0 comments

majormajor3y ago

To me the logical outcome of this is siloization of information.

visarga3y ago

Two more years down the line, AI writes better content than most people and we just don't care who wrote it, but why.

majormajor3y ago

Solving that acquisition and monetization of new stuff into the AI models problems will be interesting.

pixl973y ago

People are highly egotistical and love feeding endless streams of video and pictures online, and our next generation models will be there to slurp it all up.

scarface743y ago

Paying for good content and not dealing with adTech? I would definitely pay for that.

sebzim4500OP3y ago

Is there good data out there that's ad supported? There are some good youtube channels, I can't think of anything else.

majormajor3y ago

Only ad supported, or dual revenue, or what? E.g. even most paywalled things are also ad supported.

j / k navigate · click thread line to collapse