> can you deterministically test the thing you are asking it to do?
Of course: have it write tests first; and run them to check its work.
Works well for refactoring, but greenfield implementations still rely on a spec that is guaranteed to be incomplete, overcomplete and wrong in many ways.
Weirdly, and i fully think this is just some cognitive bias I don't have the knowledge to name, the ai seems very happy to please me. Like when it gets something done in one shot, it seems very happy to do so.
It's because expressing emotion tests well in RLHF (reinforcement learning, human feedback), which is the layer on top of the next-token-predictor LLM. As a bonus, it helps manipulate operator reactions to incorrect output, and improve engagement (aka token use).
The "thought process" of an LLM only exists as inference response to next token prediction prompts. It's the illusion of emotion.
Well if the spec is incomplete it sounds like you should lower scope for the AI, and then go from there. I wouldn't be too keen to give a junior engineer free reign and expect awesomeness