it seems unlikely to me that ChatGPT is directly trained on chat data. if it is, we should see it know information past its knowledge cutoff. afaik that hasn't happened.
I assume the chat logs are instead training a reward model, which itself is then used as the reward function during RLHF training.