undefined | Better HN

0 pointsxandrius16d ago0 comments

I think people are misunderstanding reward functions and LLMs.

LLMs don't actually have a reward system like some other ML models.

0 comments

storus16d ago

They are trained with one, and when you look at DPO you can say they contain an implicit one as well.

j / k navigate · click thread line to collapse

0 pointsxandrius16d ago0 comments

I think people are misunderstanding reward functions and LLMs.

LLMs don't actually have a reward system like some other ML models.

storus16d ago

They are trained with one, and when you look at DPO you can say they contain an implicit one as well.

j / k navigate · click thread line to collapse