undefined | Better HN

0 pointsirthomasthomas1d ago0 comments

It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.

0 comments

dominotw1d ago

do you mean pre training? so 4.8 is just post training of an old pretrained model?

btw where do they tell you how they trained the model.

j / k navigate · click thread line to collapse

0 comments

dominotw1d ago

do you mean pre training? so 4.8 is just post training of an old pretrained model?

btw where do they tell you how they trained the model.

j / k navigate · click thread line to collapse