What you can't easily do is retrain from scratch using a heavily modified architecture or different training data preconditioning. So yes, it is valuable to have dataset access and compute to do this and this is the primary type of value for LLM providers. It would be great if this were more open — it would also be great if everybody had a million dollars.
I think it's pretty misguided to put down the first type of value and openness when honestly they're pretty independent, and the second type of value and openness is hard for anybody without millions of dollars to access.