Model saw thousands of examples of "how to implement X". When your codebase does X differently, training data wins. You can see it happen: point out the conflict, and a model that's reasoning would shift gears - ask questions, acknowledge tension. But model in retrieval mode just reiterates. Same confidence, same explanation, maybe rephrased.
That's why "I completely understand this time" keeps happening in AI responses. From model's view, nothing to check - the pattern it retrieved already "makes sense."
In short, if you're not doing something completely new, something that AI will almost certainly do better than you, then you're safe. Otherwise, you'll have to put in a considerable amount of effort to get the AI to cooperate, or you'll have to do the most difficult tasks yourself, simply because the limitations described above haven't been addressed.
Secondly, they are able to produce intelligence that wasn't represented in their training input. As a simple example take a look at chess AI. The top chess engines have more intelligence over the game of chess than the top humans. They have surpassed humans understanding of chess. Similar with LLMs. They train on synthetic data that other LLMs have made and are able to find ways to get better and better on their own. Humans learn off the knowledge of other humans and it compounds. The same thing applies to AI. It is able to generated information and try things and then later reference what it tried when doing something else.
That was partly possible because chess is a constrained domain: rigid rules and board states.
But LLM land is not like that. LLM land was trained on pre-existing text written by humans. They do discover patterns within said data but the point stands that the data and patterns within are not actually novel.
Some of the pretraining. Other pretraining is on text written by AI. Human training data is only but a subset of what these models train on. There is a ton of synthetic training data now.