Automatic verification (oracle) is being used today to create synthetic data for LLMs. I don't see it as a big difference versus AlphaZero. While there's no way to ensure that a single synthetic reasoning trace is correct, as long as it leads to the correct answer according to the verifier, the law of large numbers should take care of that.
The problem is that it's difficult to create verifiers for many things we care about like architectural taste. So I expect to see superhuman capabilities on the things we can make verifiers for, but for other things it's harder to predict. We may see transfer learning or we may see collapse. My money would be more on transfer learning.