The advantage of alpha-go-zero is that it is constrained to the language of go. If you made two LLM train only off each other they would develop their own language. Maybe they'd be great at reasoning, but we wouldn't understand them. Even humans in that situation would develop jargon, and as time goes on a dialect or language of their own. And humans are a lot more grounded in their language than LLMs.