Yeah, but, better at _what_?
Cars are dramatically faster today than 100 years ago. But they still can't fly.
Similarly, LLMs performing better on synthetic benchmarks does not demonstrate that they will eventually become superintelligent beings that will replace humanity.
If you want to actually measure that, then these benchmarks need to start asking questions that demonstrate superintelligence: "Here is a corpus of all current research on nuclear physics, now engineer a hydrogen bomb." My guess is, we will not see much progress.