Let’s say it can simulate theory of computation better than 99% of the population and can very capably synthesize and infer from any text based sources. I think that would shake the world, and it wouldn’t even need to be near AGI.
To achieve the same with an AI that doesn't have a real understanding of the business logic, programmers would still be needed to write the test suite. But unlike most test suites that are typically underspecified, the test suite would need to be likely more complicated than the program itself. You could use ChatGPT to expedite writing of the test programs, but attention would still be required to actually verify the tests themselves.
Edit: If you showed a programmer from the 1950s python syntax and told them that all you have to do is write these words to build a program, they'd think it was artificial intelligence.
The total percentage isn't exactly what matters. Emergent properties as a metric is a smokescreen.
If that last 1% incorrectly demonstrates that A<C<B implies A<B<C, that means the system is not reliable enough to perform logical computations. You'd need to have a person oversee 100% to catch the last 1% of serious but basic errors. In such a case you might as well hire the person directly for the job.
A computer can sort an array of numbers faster than likely 99% of the population, it doesn't mean it's useful.
https://gkoberger.github.io/stacksort/
ChatGPT’s program output seems to be basically the smarter version of this, but it ain’t gonna scale to anything truly novel.
I mean, it depends on what we expect the AI to do. Maybe it would be revolutionary to just have, like, an average programmer with a ton of free time (so, the AI only has to beat like 99.7% of humanity to do that). On the other hand, if we want it to change the world by being much better than the average person, I guess we’d need a couple more 9’s.