Also, I wonder how they decide what code is worth training on. Because a lot of code is written in poor style/has technical debt, it might be the case that these LLMs in the long run lead to an increase in the technical debt in our society. Plus, eventually, and this might already be happening, the LLM are going to end up training on their own outputs, so that could lead to self immolation by the model. I am not certain RLHF completely resolves this issue.