I feel why combinatoric is harder for AI models is the same reason why LLM's are not great at reasoning anything out of distribution. LLM's are good pattern recognizers and fascinating at this point. But simple tasks like counting intersections at the Venn diagrams requires more strategy and less pattern recognition. Pure NN based models seem won't be enough to solve these problems. AI agents and RL are promising.
I don't know anything about lean but I am curious that proof of combinatorial problems can be as well represented as number theory or algebra. If combinatorial problem solutions are always closer to natural language, the failure of LLMs are expected. Or, at least we can assume it might take more time to make it better. I am making assumption in here that solutions of combinatorial problems in IMO are more human language oriented and relies on more common sense/informal logic when it compared to geometry or number theory problems.