How do you make an objective statement about how well GPT-4 does logical reasoning?
Running benchmarks seems like a reasonable way to do it. The objective statements are the benchmark results. They are there. That's the main result of the paper.
You can make objective statements by benchmarking, but by the nature of benchmarking you need something to benchmark lower to be able to conclude that something is performing poorly.
Benchmarking is comparative - that’s the whole point - so the conclusions aren’t actually backed up by the paper.