You can see this by giving it broken code and seeing what it can predict.
I gave copilot a number of implementations of factorial with the input of 5. When it recognized the correct implementations, it was able to combine the ideas of "factorial", "5", and "correct implementation" to output 120. But when I gave it buggy implementations, it could recognize they were wrong, but the concepts of "factorial", "5", and "incorrect implementation" weren't enough for it to output the correct wrong result produced. Even when I explained its attempts to calculate the wrong output was itself wrong, it couldn't 'calculate' the right answer.