The main reason is that they generally work for software companies where it’s easier and less susceptible to analyst influence to implement the suggested change and test it with a Random Control Trial. I remember running an analysis that found that gender was a significant explaining factor for behavior on our site; my boss asked (dismissively): What can we do with that information? If there is an assumption of how things work that doesn’t translate to a product change, that insight isn’t useful; if there is a product intuition, testing the product change itself is key, and there’s no reason to delay that.
There are cases where RCTs are hard to organize (for example, multi-sided platform businesses) of changes that can’t be tested in isolation (major brand changes). Those tend to benefit from the techniques described there——and they have dedicated teams. But this is a classic case of a complicated tool that doesn’t fit most use cases.
Everytime we wanted to use this for real data it is just a little bit too much effort and the results are not conclusive because it is hard to verify huge graphs. My colleague e.g. wanted to apply it explain risk confounders in investment funds.
I personally also do not like the definition of causality they base it on.
E.g.: making UI elements jump around unpredictably after a page load may increase the number of ad clicks simply because users can’t reliably click on what they actually wanted.
I see A/B testing turning into a religion where it can’t be argued with. “The number went up! It must be good!”
The default for all code changes at Netflix is they’re A/B tested.
<ShamelessSelfPromotion> I also have a series of blog posts on the topic: https://github.com/DataForScience/Causality where I work through Pearls Primer: https://amzn.to/3gsFlkO </ShamelessSelfPromotion>
The only problem is by the time I graduated I was somewhat disillusioned with most causal inference methods. It takes a perfect storm natural experiment to get any good results. Plus every 5 years a paper comes out that refutes all previous papers that use whatever method was in vogue at the time.
This article makes me want to get back into this type of thinking though. It’s refreshing after years of reading hand-wavy deep learning papers where SOTA is king and most theoretical thinking seems to occur post hoc, the day of the submission deadline.
Take for instance the running example of catholic schoolings effect on test scores used by the boook Counterfactuals and Causal Inference. Subsequent chapter re-treat this example with increasingly sophisticated techniques and more complex assumptions about causal mechanisms, and each time they uncover a flaw in the analysis using techniques from previous chapters.
My lesson from this: outcomes causal inference is very dependent on assumptions and methodologies, of which the options are many. This is a great setting for publishing new research, but its the opposite of what you want in an industry setting where the bias is/should be towards methods that are relatively quick to test and validate and put in production.
I see researchers in large tech companies pushing for causal methodologies, but I'm not convinced they're doing anything particularly useful since I have yet to see convincing validation on production data of their methods that show they're better than simpler alternatives which will tend to be more robust.
This seems like a natural feature of any sensitive method, not sure why this is something to complain about. If you want your model to always give the answer you expected you don't actually have to bother collecting data in the first place, just write the analysis the way pundits do.
I am aware of three reputable causal inference frameworks:
1. Judea Pearl's framework, which dominates in CS and AI circles
2. Neyman-Rubin causal model: https://en.wikipedia.org/wiki/Rubin_causal_model
3. Structural equation modelling: https://en.wikipedia.org/wiki/Structural_equation_modeling
None of them would acknowledge each other, but I believe the underlying methodology is the same/similar. :-)
It's good to see that it is becoming more accepted, especially in Medicine, as it will give more, potentially life-saving, information to make decisions.
In Social Sciences, on the other hand, causal inference is being completely willfully ignored. Why? Causal inference is an obstacle to making a preconceived conclusions based on pure correlations: something correlates with something, therefore ... invest large sums of money, change laws in our favor, etc... This works for both sides. Sadly, I don't think this could be fixed.
This remark is totally ignorant of the reality in the social sciences. Certainly in economics (which I know well) this hasn't described the reality of empirical work for more than 30 years. Political Science and Sociology are increasingly concerned with causal methods as well.
Medicine on the other hand is the opposite. Medical journals generally publish correlations when they aren't publishing experiments.
This conflicts with what the article says:
> Economists and social scientists were among the first to recognize the advantages of these emerging causal inference techniques and incorporated in their research.
The DID literature, for instance, has been expanding at the speed of light -- it has never been so hard to keep up as it is now.
However, most experiments are usually expensive: they require investing in building the feature in question and then collecting data for 1-4 weeks before being certain of the effects (plus there are long-term ones to worry about). Some companies report that fewer than 50% of their experiments prove truly impactful (my experience as well). That’s why only a small number of business decisions are made using experiments today.
Observational causal inference offers another approach, trading off full confidence in causality with speed and cost. It was pretty hard to run correctly so far, so it is not widely adopted. We are working on changing that with Motif Analytics and wrote a post with an in depth exploration of the problem: https://www.motifanalytics.com/blog/bringing-more-causality-... .
https://ftp.cs.ucla.edu/pub/stat_ser/r513.pdf
> Abstract: Personalized decision making targets the behavior of a specific individual, while population-based decision making concerns a sub-population resembling that individual. This paper clarifies the distinction between the two and explains why the former leads to more informed decisions. We further show that by combining experimental and observational studies we can obtain valuable information about individual behavior and, consequently, improve decisions over those obtained from experimental studies alone.
It's an important part of validating that your data-driven output or decision is actually creating the change you hope for. So many fields either do poor experimentation or none at all, others are prevented from doing the usual "full unrestricted RCT": med and fin svcs and other regulated industries have legal constraints on what they can experiment with; in other cases, data privacy restricts the measures one can take.
I've had many data folks throw up their hands if they can't do a full RCT, and instead look to pre-post with lots of methodological errors. You can guess how many of those projects end up. (No, not every change needs a full test, and some things are easy rollback. But think of how many others would have benefitted from some uncertainty reduction.)
Sure, "LLM everything" and "just gbm it!" and "ok, just need a new feature table and I'm done!" are all important and fun parts of a data science day. But if I can't show that a data driven decision or output makes things better, then it's just noise.
Causal modeling gets us there. It improves the impact of ml models that recognize the power of causal interventions, and it gives us evidence that we are helping (or harming).
It's (IMO) necessary, but of course, not sufficient. Lots of other great things are done by ML eng and data scientists and data eng and the rest, having nothing to do with casual inference... But I keep thinking how much better things get when we apply a causal lens to our work.
(And next on my list would be having more data folks understanding slowly changing dimension tables, but this can wait for another time).
Biologists, if not data scientists, are used to considering indirect evidence for causality. It's why we sometimes accept studies performed in other organisms as evidence for biology in humans; it's why we sometimes accept research performed on post-mortem human tissue as being representative of the biology of living humans; to name but a few examples. A big part of a compelling high-impact biology (or bioinformatics) paper is often the innovative ways that one comes up to show causality when a direct RCT is not feasible, and papers are frequently rejected because they don't to the follow-up experiments required to show causality.
But there are a slew of laws and requirements around _how_ to run an RCT across the world of bio-related work, esp as it becomes a product. From marketing to manufacture to packaging, there are strict limits around where variation is allowed, at least anything involving the FDA in the US. (Some would say too many regs, others say not enough).
And in those cases, having a wider collection of ways to impute cause would be great.
If my understanding is right, this means that each model has to be hand-crafted, adding significant technical debt to complex systems, and we can't get ahead of the assessment. And yet, it's probably the only way forward for viable AI governance.
To be clear, you can overfit while your validation loss does not decrease. If your train and test data are too similar then no holdout will help you measure generalization. You have to remember that datasets are proxies for the thing you're actually trying to model, they are not the thing you are modeling themselves. You can usually see this when testing on in class but out of train/test distribution data (e.g. data from someone else).
You have to be careful because there are a lot of small and non-obvious things that can fuck up statistics. There's a lot of aggregation "paradoxes" (Simpsons, Berkson's), and all kinds of things that can creep in. This is more perilous the bigger your model too. The story of the Monte Hall problem is a great example of how easy it is to get the wrong answer while it seems like you're doing all the right steps.
For the article, the author is far too handwavy with causal inference. The reason we tend not to do it is because it is fucking hard and it scales poorly. Models like Autoregressive (careful here) and Normalizing Flows can do causal inference (and causal discovery) fwiw (essentially you need explicit density models with tractable densities: referring to Goodfellow's taxonomy). But things get funky as you get a lot of variables because there are indistinguishable causal graphs (see Hyvarien and Pajunen). Then there's also the issues with the types of causalities (see Judea's Ladder) and counterfactual inference is FUCKING HARD but the author just acts like it's no big deal. Then he starts conflating it with weaker forms of causal inference. Correlation is the weakest form of causation, despite our often chanted saying of "correlation does not equate to causation" (which is still true, it's just in the class and the saying is more getting at confounding variable). This very much does not scale. Similarly discovery won't scale as you have to permute so many variables in the graph. The curse of dimensionality hits causal analysis HARD.
Insofar as causal inference has no such 'check', its because there never was any. Casual inference is about dispelling that illusion.
Aye, and that's the issue I'm trying to understand. How to know if model 1 or model 2 is more "real" or, for my lack of a better term, more useful and reflective of reality?
We can focus on a particular philosophical point, like parsimony / Occam's razor, but as far as I can tell that isn't always sufficient.
There should be some way to determine a model's likelihood of structure beyond "trust me, it works!" If there is, I'm trying to understand it!
Most data scientists work for companies that don't really want to pay for controlled experiments outside of maybe letting the UI team do A/B tests. Natural experiments can be hard to come by in a business setting. And all of the wild mathematical gyrations that econometricians and political scientists have developed to try to do causal inference from correlational data have a tendency not to be as popular in business because, outside of some special domains such as politics and consumer finance, it can be rather difficult to get away with dressing your emperor in math that nobody can understand instead of actual clothing.
Does anyone know similar challenges/competitions?
ACIC links to years I could find:
- 2016: https://arxiv.org/abs/1707.02641
- 2017: https://arxiv.org/abs/1905.09515
- 2019: https://sites.google.com/view/acic2019datachallenge/data-cha...
Say I have a simple table of outdoor temperatures and ice cream sales.
What can the machinery of causal inference do for me in this situation?
If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?
If I can use causal inference, what can it tell me? If I think of it as a function CA(data), can it tell me if the relationship is actually causal? Can it tell me the direction of the relationship? If there were more columns, could it return a graph of causal relationships and their strength? Or do I need to provide that graph to this function?
I know a wet pavement can be caused by rain or spilled water or that an alarm can go off due to an earthquake or a burglary. I have common sense. I also understand the basics of graph traversal from comp sci classes.
How do I practically use causal inference?
To the authors of future articles on this (or any technical tutorial), please explain the essence, the easy path, then the caveats and corner cases. Only then will abstract philosophizing make sense.
Not much. Causal inference works over networks of variables, specifically a DAG. But usually you know more than one variable association, so this is more an issue of pedagogy than the tool itself.
Probably the shortest, most persuasive example I can give you is a logical resolution to Simpson's Paradox: when the correlation between two variables can change depending on whether you consider a third variable or not.
The classic example is gender discrimination in college admissions. When looking at admissions rates across the entire university, women are less likely to be accepted than men. But when (in this example) you break that down into departments, every department favors women over men. This is a paradoxical contradiction, and worrying in that your science is only as good as the dimensions your data captures. Worse, the data offers no clean way to say which is the correct answer: the aggregate or the total. Statisticians stumbled for a long while on this, and it's kind of wild that we were able to declare smoking causes cancer without a resolution to this.
Pearl wrote a paper on how bayesian approaches resolve the paradox[1], but it does presume familiarity with terms like "colliders," "backdoor criterion" and "do-calculus." His main point is that causal inference techniques give us the language and tools to resolve the paradox that frequentist approaches do not.
If every department favored women then the entire university would also favor women. Parity is guaranteed in that scenario. What happened in the Berkeley case is that not every department favored women, and women applied disproportionately to the departments with lower admissions rates (including some that didn't favor them), while men did the opposite.
You need to do that, and the math can help you measure how much each arrow contributes. The idea that you need to provide your model of the world is strangely not a key part of most introductions, but it’s crucial.
> outdoor temperatures and ice cream sales
That’s too simple: a simple regression can handle that. Causal inference can handle cases with three variables, assuming you provide an interaction graph. Say: your ice cream truck goes either to a fancy neighborhood or a working-class plaza. After observing the weather, you decide where to go, so know that wealth and weather influence sales, but sales can’t influence the other two. Assuming you have data all for cases (sunny/poor, sunny/rich, rainy/poor, rainy/rich), then you can separate the two effects.
Not quite. Regression by itself will not answer the causal (or equivalently, the counterfactual) question.
I strongly suspect you already know this and was elaborating on a related point. But just for the sake of exposition, let me add a few words for the HN audience at large.
Let me give an example. In an email corpus, mails that begin with "Honey sweetheart," will likely have a higher than baseline open rate. A regression on word features will latch on to that. However, if your regular employer starts leading with "Honey sweetheart" that will not increase the open rate of corporate communications.
Causal or counterfactual estimation is fundamentally about how a dependent variable responds to interventional changes in a causal variable. Regression and relatedly, conditional probabilities are about 'filtering' the population on some predicate.
An email corpus when filtered upon the opening phrase "Honey sweetheart" may have disproportionately high email open rates, but that does not mean that adding or adopting such a leading phrase will increase the open rate.
Similarly, regressing dark hair as a feature against skin cancer propensity will catch an anti-correlation effect. Dyeing blonde hair dark will not reduce melanoma propensity.
Interestingly, folks are finally doing more realistic experiments in the casual equiv of arch search, and genAI is giving these efforts a second wind. Still feels like at the toy stage or for academics & researchers with a lot of time on their hands, vs relevant for most data scientists.
I'm still on the sidelines, but keep checking in in case finally practical for our users..
You have more than that! You have knowledge about the world!
> What can the machinery of causal inference do for me in this situation?
Well, (I’m being purposefully pedantic here) you haven’t really asked a question yet. The first thing it can do is help you while you’re formulating one. It can answer questions like, “how can I anticipate how things I have and havent measured will the estimates I’m interested in/making?”
> If it doesn’t apply here, what do I need to add to my dataset to make it appropriate for causal inference? More columns of data? Explicit assumptions?
The first thing you need to do is articulate what you’re actually interested in. Then you need to be explicit about the causal relationships between things relevant to those questions. The big thing (to me) is that particular causal structures have testable conditional independence structures and by assessing these, you can build evidence for or against particular diagrams of the context.
You need to perform a properly controlled experiments to infer casualty. And even then it's hard.
Inferring casualty from observational data is cargo cult science.
How's the ice cream example better than the sugary snacks example given in the article?
Here's the part about needing to add more columns to the data:
> When dealing with a causal question, it’s crucial to include variables known as confounders. These are variables that can influence both the treatment and the outcome. By including confounding variables, we can better isolate and estimate the true causal effect of the treatment. Failing to add or account for confounding variables may lead to incorrect estimates.
Not the OP, but because that fails to explain how the basic hypothetical example works(!)
You want to know how much your sales would be in a parallel world where kids were stuck with bland snacks compared to your sweet treats. This is where causal inference steps in to provide the solution. (nice graph follows)
So how is that done?
The simple version using graphical models and joint probabilities isn't difficult to explain or teach. The issue is that to do anything useful with it at scale you either need MCMC or variational inference and that's an entirely different bag of worms all together. For medical datasets you rarely have "scale", instead you have very few sample cases and a large expert model (the doctor/specialist).
And if you like it, 2nd part is here [2]
Just... don't do this. You're not going to be able to math your way to better conclusions. Make your model, make your plots, and use critical thinking to evaluate your results.
I think it confuses far more than it helps.
https://matheusfacure.github.io/python-causality-handbook/la...
But I agree with other comments here, at the end of the day it seems like causal analysis often boils down to whether you trust the analyst and/or their techniques since it is hard to validate the results.
It's also open-sourced. Welcome to have a try.
It's an illustrative example, taking it literally is missing the point. The reason you know it doesn't make sense for umbrellas to cause rain is that you already have an applicable causal model. The situations where you need to do causal inference are exactly those where you don't, and can't just rely on "reasonableness" or "plausibility".
it’s Mathematical soup for trying to normalize out the effects of other variables to see what remains and calling it “causal”.