For example I'm working on some virtualization things where I want a machine to be provisioned with a few options of linux distros and BSDs. In one prompt I asked for this list to be provisioned so a certain test of ssh would complete, it worked on it for several hours and now we're doing the code review loop. At first it gave up on the BSDs and I had to poke it to actually finish with an idea it had already had, now I'm asking it to find bugs and it's highlighting many mediocre code decisions it has made. I haven't even tested it so I'm not sure if it's lying about anything working yet.