So, what we do is automate the hand-holding. In your physics simulation example, you can have the system attempt to compile on every change and fix any errors it finds (we use strict linting, type-checking, compile errors, etc.); and you can provide a metric of "good" and have it check for that and revise/iterate as needed. What we've found particularly useful is breaking the problem into smaller pieces--"The Unix Philosophy" as the system is quite capable of extracting, composing, defining APIs, etc. over small pieces. Make larger things out of reliable smaller things like any reasonable architecture.
These things are not "creative"... they are just piecing together decent infrastructure and giving the "actor" the ability to use it.
Then break planning, design, implementation, testing, etc. apart and do the same for each phase--reduce "creativity" to process and the systems can follow the process quite nicely with minimal intervention.
Then, any time you do need to intervene, use the system to help you automate the next thing so you don't have to intervene in the same way again next time.
This is what we've been doing for months and it's working well.