undefined | Better HN

0 pointsmenhguin1y ago0 comments

Hi, contributor to Entropix here. This is just my opinion, but I don't think it goes counter to the Bitter Lesson at all, because it's meant to leverage model computation capabilities. Several papers have suggested that models internally compute certainty (https://arxiv.org/abs/2406.16254), and in my view our method simply leverages this computation and factors it explicitly into decoding.

This is as opposed to pure sampling + next token prediction which basically randomly chooses a token. So if a model does 1274 x 8275 and it's not very sure of the answer, it still confidently gives an answer even though it's uncertain and needs to do more working.

0 comments

danielmarkbruce1y ago

100%. It's in line with bitter lesson learnings. Good going.

j / k navigate · click thread line to collapse

0 pointsmenhguin1y ago0 comments