Skip to content
Better HN
Top
New
Best
Ask
Show
Jobs
Search
⌘K
undefined | Better HN
0 points
irthomasthomas
1d ago
0 comments
Share
It means for 4.7 they trained a new base model with different architecture, different pre-training data (later knowledge cutoff), and a new tokenizer. Vs finetuning an existing model, which was the case for 4.6, and probably for 4.8.
0 comments
default
newest
oldest
dominotw
1d ago
do you mean pre training? so 4.8 is just post training of an old pretrained model?
btw where do they tell you how they trained the model.
j
/
k
navigate · click thread line to collapse