I wrote up some comments while reading this paper:
https://hachyderm.io/@ianbicking/110175433063839064I found the part on "reflection" was particularly interesting.
The cost of the simulation is also notable, going up to a thousand dollars for 3 days, running on GPT 3.5. That's very high from my experience doing this sort of thing, but not unimaginable when you do things in a loop, and if you are doing 25 queries on a clock tick. But should be very optimizable.