I think this just demonstrates how the goalposts are shifting though.
Until pretty recently most people would probably say “the average human is very flexible at solving reasoning tasks compared to machines which find reasoning incredibly challenging“.
Now it’s “well of course this AI which wasn’t specifically trained for verbal reasoning can beat an average human at verbal reasoning - humans are useless at almost everything!”
Your goalpost seems to be that GPT needs to be better than experts in their field to be considered “good” at something - but I think it’s just interesting to reflect that that’s the benchmark we are applying now.