Still, for the one about organic data (or "pre-war steel") drying out, it's not a threat to model development at all. People repeating this point don't realize that we already have way more data than we need. We got to where we are by brute-forcing the problem - throwing more data at a simple training process. If new "pristine" data were to stop flowing now, we still a) have decent pre-trained base models, and a dataset that's more than sufficient to train more of them, and b) lots of low-hanging fruits to pick in training approaches, architectures and data curation, that will allow to get more performance out of same base data.
That, and the fact that synthetic data turned out to be quite effective after all, especially in the latter phases of training. No surprise there, for many classes of problems this is how we learn as well. Anyone who has experience studying math for maturity exam / university entry exams knows this: the best way to learn is to solve lots of variations of the same set of problems. These variations are all synthetic data, until recently generated by hand, but even their trivial nature doesn't make them less effective at teaching.