Are you using Claude Code? Because that might be the secret cause you're missing. With Claude Code I can instruct it to validate things after its done with code, and usually it finds that it goofed. I can also tell it to work on like five different things, and go "hey spin up some agents to work on this" and it will spawn 5 agents in parallel to work on said things.
I've basically ditched Groke et al and I refuse to give Sam Altman a penny.