There are lots of talks about that aspect on the debian MLs as well. They argue on whether they should have machines to redo the training of models that are considered open-source, if that should be part of the "build" process.
I think it is also worth noticing that we are going that way but there are a lot of actors with a lot of processing power. Notice how, two years after one model breaks records, there are ways to make it run with 1000x less power. We are bruteforcing the problem but I am having doubts that raw power is going to matter a lot in a few years.
Also a cat detectors is pretty usable at 99%, not everybody needs 99.99%.
More than processing power, the real power in distributed training lies in the variety of situations. A thousand users may have a hard time having more computing power than Facebook's TPU farm but it will be easier for them to have a larger dataset.