I'm solving the "make the solution to their own conclusions" with rigorous "context engineering". Skills, MCPs, and, above all, context window switching.
E.g. with TDD, I find that a model that writes both the tests and the code, will almost always hone in on a solution, then -grudgingly- write a test for that, but quite certain with the final code "in mind" already.
So, I instruct it to use sub-agents; though I find the tooling on figuring out what context is and isn't passed between agents and subagents severely lacking.
Or, also worked pretty well, have one thread write the test. Only that. It cannot read code, it can only read the tests directory or even a subset thereof. Then another thread, entirely new context, must run the test, see it fail, start implementing and stop as soon as the test is green - it obviously cannot edit the test. Yet another new context then is instructed to refactor based on rigorous refactoring skills.
A lot of work - And ironically, skills written by agents are pretty bad, I found, so a lot of manual work. But the rewards are promising.