Your initial comment made it sound like you were commenting on a genuine apples-for-apples comparisons between humans and LLMs, in a controlled setting. That's the place for empiricism, and I think dismissing studies examining such situations is a mistake.
A good warning flag for why that is a mistake is the recent article that showed engineers estimated LLMs sped them up by like 24%, but when measured they were actually slower by 17%. One should always examine whether or not the specifics of the study really applies to them--there is no "end all be all" in empiricism--but when in doubt the scientific method is our primary tool for determining what is actually going on.
But we can just vibe it lol. Fwiw, the parent comment's claims line up more with my experience than yours. Leave an agent running for "hours" (as specified in the comment) coming up with architectural choices, ask it to document all of it, and then come back and see it is a massive mess. I have yet to have a colleague do that, without reaching out and saying "help I'm out of my depth".