undefined | Better HN

0 pointsjayd165d ago0 comments

If Google can't filter out the SEO spam from their results, why do you think they did it for the LLM training data?

0 comments

pjs_5d ago

The training process literally ingests the majority of text on the internet, including a huge volume of SEO garbage, and seeks to create a self-consistent compressed model of that. This is totally imperfect of course but is also likely more truthful than the median Google result, because of the incentive for self-consistency and coherence that is created by the reward function as well as during RL.

Imagine that you had 1,000 years to read every Google result on a particular topic, and literally infinite patience. You would read a lot of rubbish but ultimately you are a smart person, you would figure out the underlying truth and likely produce something that is more valuable than the average or even the sum of the parts.

jayd16OP5d ago

Honestly this feels like wishful thinking. If they could do it at all, they could do it to fix search.

hollandheese5d ago

Why are you assuming that they want to filter out the SEO spam?

binkHN5d ago

It's a new frontier and people have not targeted it yet?

j / k navigate · click thread line to collapse