I don't think this should be a major concern for most people
i) What assurance is there that they won't do that anyway? You have no legal recourse against them scraping your website (see linkedin's failed legal battles).
ii) Most data providers change their data sometimes, how will ChatGPT know whether the data is stale?
iii) RLHF is almost useless when it comes to learning new information, and finetuning to learn new data is extremely inefficient. The bigger concern is that it will end up in the training data for the next model.