A small detail wasn't clear to me: for these incorrectly formalized problems, how do they get the correct answer as ground truth for training? Have a human to manually solve them?
(In contrast to problems actually from "a huge database of IMO-type problems", they do have answers for these already).