Even though it is highly economically valuable to be able to tell to what extent you can trust a prediction, the modelling of uncertainty of ML pipelines remains an academic affair in my experience.
When I tried to bring an "uncertainty mindset" over when I moved to industry, I found that (1) most DS/ML scientists use ML models that typically don't provide an easy way to estimate uncertainty intervals, (2) in the industry I was in (media) people who make decisions and use model prediction as one of the input for their decision-making are typically not very quantitative and an uncertainty interval, rather than give strength to their process, would confuse them more than anything else: they want a "more or less" estimate, more than a "more or less plus something more and something less" estimate. (3) When services are customer-facing (see ride-sharing) providing an uncertainty interval (your car will arrive between 9 and 15 minutes) would anchor the customer to the lower estimate (they do for the price of rides book in advance, and they need to do it, but they are often way off).
So for many ML applications, an uncertainty interval that nobody internally or externally would base their decision upon is just a nuisance.
most DS/ML scientists use ML models that typically don't provide an easy way to estimate uncertainty intervals
Not an DS/ML scientist but a data engineer. The models I've used have been pretty much "slap it into XGBoost with k-fold CV, call it done" — an easy black box. Is there any model or approach you like to estimate uncertainty with similar ease?
I've seen uncertainty interval / quantile regression done using XGBoost, but it isn't out of the box. I've also been trying to learn some Bayesian modeling, but definitely don't feel handy enough to apply it to random problems needing quick answers at work.
One is bigger than the other as far as I remember which means that the standard error of the prediction interval is bigger?
The question is: Why don’t people use these models? While Bayesian Neural Networks might be tricky to deploy & debug for some people, Gaussian Processes etc. are readily available in sklearn and other implementations.
My theory: most people do not learn these methods in their „Introduction to Machine Learning“ classes. Or is it lacking scalability in practice?
To be fair, I suspect lots of people do this, but for whatever reason nobody talks about it.
Only speaking from my own little perspective in bioinformatics, lack of scalability above all else, both for BNNs and GPs.
Sure, the library support could be better, but that was not the main hurdle, more of a friction.
In traditional nonparametric statistics, uncertainty estimates are obtained by a process called bootstrapping. But there's a trade-off. There's no free lunch!) If you want to eschew strong distributional hypotheses, you need to pay for it with more data and more compute. The "more compute" typically involves fitting variants of the model in question to many subsets of the original dataset. In deep learning applications in which each fit of the model is extremely expensive, this is impractical.