Bayesian is a natural order of inference for people. The whole concept of the black swan ("all swans are white") proves this out.
Frequentist statistics is much less intuitive to people.
My preference is for people to be able to use some statistics, and Bayesian gets them productive faster.
Bayesian statistics gets a big boost because it's usually taught as a system instead of as a recipe book.
The problem is that what we perceive as random and extremely unlikely events are in fact much more probable than what we estimate from using Gaussian methods. And the frequentist approach helps to create this distortion by ignoring black swans.
Here's a great video demonstrating how people tend to misunderstand randomness: https://youtu.be/tP-Ipsat90c
As for what is more natural… I've seen a (frequentist) introduction to statistics, and it simply did not make sense. Nothing was justified, you just had to learn the stuff by rote and apply it in situations that look like they could use one tool or another.
Probability theory on the other hand is pretty obvious. The axioms required to derive it are ridiculously few and ridiculously intuitive. From there you get the sum and product rules, and all the rest. Always made perfect sense to me.
On the subject of statistical education: The point I tried to make is that I think it is much easier to study first the likelihood, the central quantity of frequentist inference. One can then go to the Bayesian world simply by allowing the parameters to be random variables. Furthermore, as other commentors have pointed out, technical difficulties arise in the non-conjugate Bayesian setting when MCMC sampling has to be used. In my opinion, MCMC algorithms, convergence diagnostics, etc. are certainly not topics for an intro stats course.
Having used Bayesian stats heavily, I'd note that the hard parts are not gone, they are just located elsewhere - in how to actually do the computations, rather than in how to set up problems. Each can be taught poorly or well, but given that MCMC is certainly harder than least-squares, it seems difficult to argue that using Bayesian statistics is easier. (Unless you're not just applying the methods by rote, and letting the computer spit out answers - and if you are, I don't know why you are better off with Bayesian methods. In fact, if that's what you're doing, please stop doing statistics and pay an expert instead.)
Starting with applied probability and applied statistics (incl. regression, ANOVA, GLMs) allow you to solve problems and feel useful and engaged before being thrown into the mathematical rigor required of Bayesian statistics.
I just want to add a bit more. It's quite easy today, to generate and play with random numbers. If you think you understand a process that has generated your data simulate it and run the simulated data through the same analysis. I do this for real -- I don't trust myself to choose the right statistical analysis, so I always test my chosen analysis with simulated data. If I can fool myself with simulated data, than my real data is probably fooling me too.
Could we, for instance, collect enough data on typing discipline to end the static/dynamic typing once and for all? Enough data to overcome the priors of both static typing and dynamic typing proponents?
We could, but that would require pretty big sample sizes. Like 10,000 developers of various competence, working on 1,000 projects of various domains and difficulties for various amounts of time (from a few days to at least a few months). Who is ever going to fund that?
Until we get such a miracle controlled study, our respective priors will still matter.
Then we promptly switch back to p-values of .05, a lot of the time not even bothering with a statistical power calculation. I've had better success with introducing power, though. I suspect that's because we can fit it into the existing frequentist framework.
This drives me nuts. If you haven't, check out the paper "Beyond subjective and objective in statistics" by Gelman and Hennig (2017).
Right at the beginning they make the point that any analysis includes external information in many ways, such as adjusting variables for imbalance, how we deal with outliers, regularization, etc.
Especially if you're doing any sort of causal inference, you're usually making strong assumptions before estimating your model, even just in terms of which variables are included and how they're connected. The idea that priors are somehow ruining an "objective" model is just absurd to me. You're already making so many other decisions about your model that will affect estimates and your interpretation of them. Priors seem like another perfectly reasonable decision to have to make as well, with the benefit of getting results that I think in general are must more easily understood by a lay audience. (E.g., I don't think I've ever encountered someone not on my data science team that actually understands what a p-value is. But people are much better at understanding when I say, there's an X percent chance that there is a positive effect here.)
Another issue that I personally have with Bayesianism is that I believe that assigning probabilities to singular events is only meaningful and admissible at all if there is a good analytic explanation for the respective propensity. For example, we may be able to deduce that a die is reasonably fair from the way it is constructed and our knowledge of physics, and later confirm this by frequentist analysis. Merely believing or claiming that the die is fair is not acceptable. Again, the difference is only one of attitude in the end, I suppose.
Maybe philosophers have given Bayesian statistics a bad rap, too, because many of those who call themselves Bayesians are also "probabilists", i.e., they think that rational belief must conform to the probability calculus. There are many arguments against probabilism and the only arguments that speak for it are Dutch book arguments. The view does not have very strong foundations.
I think some caution can be justified to a certain extent (not the blind "emotional" objections). When establishing priors in a low data regime, one must necessarily be careful. It's a knob whose mass can change a lot in the inference conclusion. That said, if we trust our belief about the region the available data do not inform us well of, why not utilize our domain knowledge/belief?
How many coin tosses in a row have to land heads before a frequentist decides that the coin is unfair?
I think they are pretty close to the way sane people answer some kinds of questions.
https://onlinelibrary.wiley.com/doi/book/10.1002/97811192863...
Jaynes is certainly very deep and some sections are harder than others. It's interesting regardless of your level (this is a book worth rereading several times).
For a less technical, but full of insight, introduction see Dennis Lindley's Understanding Uncertainty:
https://onlinelibrary.wiley.com/doi/book/10.1002/97811186501...
[1]: https://www.amazon.com/Bayesian-Methods-Hackers-Probabilisti...
But really, the first two chapters aren't that hard.
My engineer friend called my PhD friend a "frequentist", like it was a dirty word, despite only having one, maybe two, classes in college about bayesian math/statistics/whatever (my ignorance).
This quote jumped out at me in the article:
"I wanted to write a book on Bayesian statistics that really anyone could pick up and use to gain real intuitions for how to think statistically and solve real problems using statistics."
In the context of the statement, it sounds like he is claimin any non-bayesian statistics is useless (or less valuable/reliable at best) than other forms of statistical analysis?