AlphaGo came before AlphaGo Zero; it was trained on human games, then improved further via self-play. The later AlphaGo Zero proved that pre-training on human games was not necessary, and the model could learn from scratch (i.e. from zero) just via self-play.
For DeepSeek-R1, or any reasoning model, training data is necessary, but hard to come by. One of the main contributions of the DeepSeek-R1 paper was describing their "bootstrapping" (my term) process whereby they started with a non-reasoning model, DeepSeek-V3, and used a three step process to generate more and more reasoning data from that (+ a few other sources) until they had enough to train DeepSeek-R1, which they then further improved with RL.
DeepSeek-R1 Zero isn't a self-play version of DeepSeek-R1 - it was just the result of the first (0th) step of this bootstrapping process whereby they used RL to finetune DeepSeek-V3 into the (somewhat of an idiot savant - one trick pony) R1 Zero model that was then capable of generating training data for the next bootstrapping step.