OpenAI has been hiding their datasets, and certainly haven't credited me for the data they stole from my website and github repositories. If OpenAI doesn't think they should give attribution to the data they used, it seems weird to require that of others.
Edit: Responding to your edit, Deepseek only claimed that the final training run was $5m, not that the whole process caught that (they even call this out). I think it's important to acknowledge that, even if they did get some training data from OpenAI, this is a remarkable achievement.