It sounds like they essentially brute-forced the solutions ?
Ask LLM for answer, answer for LLM to verify the answer. Ask LLM for answer, answer for LLM to verify the answer. Add a bit of randomness. Ask LLM for answer, answer for LLM to verify the answer. Add a bit of randomness. Repeat 5B times (this is what the paper says).
Evolution itself is the ultimate brute-force algorithm—it’s just applied over millennia. Trial and error, coupled with selection and refinement, is the only way to generate novelty when there’s no clear blueprint.