undefined | Better HN

0 pointsjijji1y ago0 comments

yeah, according to the article [0] about the release of Llama 3.1 405B, it was trained on 15 trillion tokens using 16000 Nvidia H100's to do it. Even if they did release the training data, I don't think many people would have the number of gpus required to actually do any real training to create the model....

[0] https://ai.meta.com/blog/meta-llama-3-1/

0 comments

yencabulator1y ago

And a token is the sequence number of a sequence of input in a restricted dictionary. GPT-2 was said to have 50k distinct tokens, so I think it's safe to assume even the latest ones are well below 4M tokens, so max 4 bytes per token. 15 trillion tokens -> 4 bytes/token * 15 T tokens -> training input<=60 TB doesn't sound that large.

It's the computation that is costly.

j / k navigate · click thread line to collapse

0 pointsjijji1y ago0 comments

[0] https://ai.meta.com/blog/meta-llama-3-1/

0 comments

yencabulator1y ago

It's the computation that is costly.

j / k navigate · click thread line to collapse