It does not tend towards magic. It does happen and people can replicate it. Melanie Mitchell recently brought back the point by Drew McDermott that AI people tend to use wishful mnemonic. Words like understanding or generalisation can easily be just a wishful mnemonic. I fully agree with that.
But the fact remains. A model that has ~100% training accuracy and ~0% validation accuracy on simple but non-trivial dataset is able to reach ~100% training and ~100% validation accuracy.
> This hints to the fact that the effect depends on the structure of the dataset and so it's unlikely to, well, generalise to data that cannot be strictly controlled.
Indeed, but it is still interesting. It may be that it manifests itself because there is a very simple rule underlying the dataset and the dataset is finite. But it also seems to work under some degree of noise and that's encouraging.
For example, the fact that it may help study connection of wide, flat local-minima and generalization is encouraging.
> You are impressed by the fact that one particular, counter-intuitive result was obtained
I'm impressed by double descent phenomenon as well. And this one shows up all over the place.
> There is a well-known paper by John Ioannidis on cognitive biases in medical research: Why most published research findings are false
I know about John Ioannidis. I was writing and thinking a lot about replication crisis in science in general. BTW - it's quite a pity that Ioannidis himself started selecting data towards his thesis with regard to COVID-19.
> It's not about machine learning per sé, but its observations can be applied to any field where empirical studies are common, like machine learning.
Unfortunately, it applies to theoretical findings too. For example, universal approximation theorem, no free lunch theorem or incompleteness theorems, are widely misunderstood. There are also countless less known theoretical results that are similarly misunderstood.