You use VC money and run at a loss while focusing on marketing and tech evangelism, getting more and more startups and hopefully established companies using your software. As the cracks begin to show those growing organizations have too much tied to your system and they can't afford outages and need to scale. So they pay you for the Enterprise version of your software where you actually fix all of flaws present in the community version.
Look at MongoDB if you need a good case study. It was incredibly hyped from about 2009-2015, people would defend it in heated online arguments, and today it's rarely considered for greenfield projects. But they're making about $100M/qtr selling subscriptions to Enterprise & Atlas servicing the technical debt established during that hype cycle.
It's not just Docker hub, there's services like the various Programming language package repos (npm, rubygems etc) and the Linux distro package repos.
I would have had Github in that category, but now it's owned by MS, presumably they don't have many of that kind of funding problems...
2) You can use a different registry
3) Run something like kraken[1] so machines can share already-downloaded images with eachother
4) If you need an emergency response, you can docker save[2] an image on a box that has it cached and manually distribute it/load it into other boxes
0: https://docs.docker.com/registry/recipes/mirror/
1: https://github.com/uber/kraken
2: https://docs.docker.com/engine/reference/commandline/save/
I'd also add as an option - https://goharbor.io
If you can't build a deploy a new version of your app, you can probably live with it and grab a cup of coffee.
If you server fails over and your new server can't pull the current image, your app is potentially down, and that's a lot worse.
The math you do here is the cost of wasted time versus the cost for you to run your own registry with better uptime.
Seems like the tech giants should load balance these images for the good of the Internet to provide some decent redundancy and for my sanity at 11.30pm.
Partial failure is just fact of life. If this is a major issue for your process, it might be better to try and find ways to alter your process so this isn't an issue. Alternatively, mirror locally.
Being honest no build is worth losing sleep over. We are piggybacking on their service and bandwidth. For us to start building the infrastructure to cache their images doesn't make financial sense, we deploy daily and their uptime always allows for that.
Rule #1 Host your own stuff, never rely on others.
Rule #2 automate everything
Launched a new one.. docker pull bam error. Customer unsatisfied.
Incident Status Full Service Disruption