undefined | Better HN

0 pointsmiven2y ago0 comments

It was mentioned to be a 20B in a comparison table in a paper co-written by Microsoft, but they've since claimed that it's just an error, and I mean, they'd need to be sitting on some really impressive distilling techniques to shrink a 175B model down to 20B with only a slight drop in performance.

0 comments

Klaus232y ago

OpenAI have been sitting on GPT4 for months, and on the basemodel even longer. I would not be surprised if they did some or all of the distillation of the model with GPT4.

Mixtral is 56B combined, if we subtract a little for MoE inefficiencies we could say that Mixtral is about 40B combined. This is a 2x increase over 20B. We have seen new models beat others twice the size.

That and a massive amount of excellent data for alignment should produce some great results.

I don't think it's out of the realm of possibility that 20B is real.

j / k navigate · click thread line to collapse

0 pointsmiven2y ago0 comments

0 comments

Klaus232y ago

OpenAI have been sitting on GPT4 for months, and on the basemodel even longer. I would not be surprised if they did some or all of the distillation of the model with GPT4.

That and a massive amount of excellent data for alignment should produce some great results.

I don't think it's out of the realm of possibility that 20B is real.

j / k navigate · click thread line to collapse