Something I recently started doing that I really like is, when building a feature, I write acceptance criteria and then I make sure the agents have everything they need to do QA and verify it themselves. The main difference from before is that instead of QAing it myself (which I still do but it's almost always working), the agent can exercise the whole flow itself and verify the acceptance criteria. It takes more work because I often have to set up MCPs, accounts, credentials, etc - but the output is better.
My question... do people have good approaches to this? Is this a common workflow? I see some of this conversation called harness engineering; I see some called feedback-loop engineering; I see some startups (Shiplight AI, Autosana, Ranger). I'm trying to find more discussion & best practices here, and see how others are thinking about this.
I have a lot of trouble keeping these separate. I use MacOS - you can have many desktops, but they're all in the same workspace. I think what I was is something like tmux for my whole computer, where I can switch away from a project and come back and be where I left off, with only the content from that project.
I actually tried to build this myself as the OS level, but Mac seems to lock everything down pretty hard.
Anybody have a good solution?