I thought synthetic training data was a different concept:
Use a current LLM to generate a bunch of fake reddit posts, other websites, code, etc. billions of artifacts but change certain properties of our real world. You could make the sky red, and dogs able to talk, and dolphins as well you get the idea. Anything goes. Generate not just a crazy world but all the artifacts from this world. Then train an new LLM on this crazy data. Now ask questions. And rather that silly changes like sky is red, you can do really fundamental changes to the laws of physics and from that, learn something new about the real rules.