> Yes, these things do best when they have a (simulated) environment they can make mistakes in and that can give them clear and fast feedback.
This always felt like a reason to throw it at coding. With its rigid syntax you'll know quickly and cheaply if what was written passes an absolute minimaal level of quality.
Anecdotal from my own experience. But someone might have done a study by now: they are much cheaper and quicker to run on AIs than on undergrad students (yet alone professional devs).
Also pre-AI: when I set up nice property tests, I could develop much better even while a bit tired.