> these efforts take serious resources
Meta just published their new optimization results [1]. According to them
> training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.
In this context a GPU is an NVIDIA A100, which you can buy, if you can buy, for $10000.
And this is after an explosion of ideas that lead to unthinkable optimizations just two years ago.
If someone did train such a model 2 years ago, it would have cost hundreds of millions. Now it's 5 million. Maybe in 2 years it's going to be only $50k. Should you start a startup now and invest $5 million, an risk someone stealing the show for pennies in 2 years? If you do, I really can't see if you can afford to open source the results of your training.
[1] training a 7B model on 512 GPUs to 2T tokens using this method would take just under two weeks.