It's not just that, but the core is just that, even with reasoning models. Harness can only get you closer to the good result, but can't save you from every pitfall.
As for PM analogy - don't forget that models don't learn and keep doing same stupid stuff they were doing a month ago.