There's a certain amount of variance in the way that people utilize these agents. Put five people in a room and ask them to compose the same prompt and you have five distinct prompts. Couple this with the fact that models respond better/worse to certain prompts depending on the stylistic composition of the prompt itself. And since people tend to write in the same style, you'd get people who have more luck with one model over another, where one model happens to align more readily with their prompt style.
To wit, I have noticed that I tend to prefer Codex's output for planning and review, but Opus for implementation; this is inverted from others at work.