Open source means "the preferred version for modification" and this fits with model weights since you can fine tune them with your own data. Modifying raw training data would be quite unwieldly and pointless.
Isn't this comparison completely backwards? As I understand it, it's useless for a person to own a source dataset for an LLM, because its "compilation" costs $n million.