While I also hold a peer comment's view that the Turing Test is meaningless, I would further add that even
that has not been meaningfully beaten.
In particular we redefined the test to make it passable. In Turing's original concept the competent investigator and participants were all actively expected to collude against the machine. The entire point is that even with collusion, the machine would be able to pass. Instead modern takes have paired incompetent investigators alongside participants colluding with the machine, probably in an effort to be part 'of something historic'.
In "both" (probably more, referencing the two most high profile - Eugene and the large LLMs) successes, the interrogators consistently asked pointless questions that had no meaningful chance of providing compelling information - 'How's your day? Do you like psychology? etc' and the participants not only made no effort to make their humanity clear, but often were actively adversarial obviously intentionally answering illogically, inappropriately, or 'computery' to such simple questions. And the tests are typically time constrained by woefully poor typing skills (this the new normal in the smartphone gen?) to the point that you tend to get anywhere from 1-5 interactions of a few words each.
The problem with any metric for something is that it often ends up being gamed to be beaten, and this is a perfect example of that.