undefined | Better HN

0 pointsvessenes2y ago0 comments

Update - I'm still cautious about this paper, but I had the table numbers inverted in my head while thinking about it. The paper shows better perplexity results than competing models at larger parameter sizes, so I was wrong.

0 comments

pclmulqdq2y ago

I was pretty unhappy and suspicious for the same reason. Not reporting perplexity for a 70B network while reporting its efficiency means that someone did something and the result wasn't good enough to put in the paper.

GaggiX2y ago

According to the author, the 70B model is not fully trained.

pclmulqdq2y ago

"Is not fully trained" can also mean "we did not figure out how to reach an acceptable loss" or "training was unstable," both of which are common for ML systems.

1 more reply

kristjansson2y ago

One can forgive the lack of quality results for the 70B model, but apparently they trained 7B and 13B versions of their model, and don't report those either.

j / k navigate · click thread line to collapse

0 comments

pclmulqdq2y ago

GaggiX2y ago

According to the author, the 70B model is not fully trained.

pclmulqdq2y ago

"Is not fully trained" can also mean "we did not figure out how to reach an acceptable loss" or "training was unstable," both of which are common for ML systems.

1 more reply

kristjansson2y ago

One can forgive the lack of quality results for the 70B model, but apparently they trained 7B and 13B versions of their model, and don't report those either.

j / k navigate · click thread line to collapse