This makes sense to me. If you think about the training data, texts working through problems using formal predicate logic are likely to be correct, and much more likely to be precise about what information is (or isn’t) contained in the propositions. So if you formulate the problem in this language, you’re prompting the model to sample from patterns that are more likely to give you the result you want. Whereas if you use regular English, it could be sampling from cooking blogs or who knows what.