You should already know what to ask to extract the answer OpenAI claims gpt-5.2-pro gave them.
Then you should be lucky to get an answer that makes sense.
Then you should already know how to verify the model's response.
Only after all these steps should you cherry-pick the one-in-a-million successful response to feature on your website.
And finally, you should prove that the answer didn't already exist in the training data. It's highly likely that the problem was solved before and the model picked that up. I have yet to see a genuinely novel discovery these models can produce.
* I'm an LLM researcher, but that doesn't mean I should close my eyes to the unjustified hype around language models.