Featured image: A comic strip explaining the difference between correlation and causation. Image source: xkcd.com
As noted in this classic science comic, people often think correlation implies causation. And this is probably every scientist’s pet peeve — it is definitely one of mine.
Let’s talk correlation first
It is a direct relationship between two things. That is, both A and B behave in a similar pattern, as in the following examples:
- Global temperature correlates with carbon dioxide in the atmosphere
- Health and social problems correlate with income inequality
- Divorce rate in Maine correlates with per capita consumption of margarine
- Per capita consumption of mozzarella cheese correlates with civil engineering doctorates awarded
You probably raised your eyebrows when reading the last two examples. So what does it mean for correlation?
At least, on its own, it means nothing. Because it is quite easy to find two factors which show similar patterns when there a million factors out there (check this great website for more spurious correlations). Often, we also find two factors linked when, in fact, they are linked to a common third factor and not to each other. See these news articles below:
- Sincere smiling promotes longevity. Or does reduced stress?
- Watching too much TV can kill you (early). Or does lack of exercise?
- Credit cards can make you fat. Or does increased junk food spending and consumption?
You get the picture. And sometimes unfortunately, the examples aren’t as ludicrous. For example, a correlation has been shown between autism and vaccine. Believing such spurious correlations, in spite of the overwhelming evidence against it, can do serious damage.
Moving on to causation
Causation is what many journalists intend to show when they show correlation. It is the causal relationship between two things. That is, A causes an effect B. Examples include:
- Smoking causes lung cancer
- Human activity is responsible for global warming
- Weight training makes you stronger
Two scientists, Koch and Dale, listed criteria which, if fulfilled, would prove causation. Here is a combined and simplified version of it:
A –> M –> B
A is the agent which causes the effect B. M is the mediator which is formed by A, and which leads to B. The following must happen for causation to be proved:
- If formation of mediator M is blocked, then there should be no effect B.
- If action of mediator M is blocked, there should be no effect B.
- Mediator M is formed only in response to agent A.
- If mediator M is given, it has the same effect B.
In the above examples, A = smoking, human activity and weight training; B is lung cancer, global warming and stronger and; M is presence of harmful chemicals, increased carbon dioxide and increased muscle mass.
However, mediator M need not only be formed in response to agent A. For example, lung cancer occurs in non-smokers too, especially to those who have been continually exposed to radon gas or asbestos.
When could correlation imply causation?
Another scientist, Hill, listed criteria which, if fulfilled, would imply a causal relationship between two factors that are correlated. This is helpful when there are no experimental data or when effects are caused indirectly, due to interaction of multiple factors. The necessary criteria are:
- Strong correlation between the agent (A) & effect (B) (though a weak association does not preclude causation)
- Consistent correlation between A & B (same results shown by different people in different places)
- Specific correlation between A & B (only A seems to cause B)
- B occurs only after exposure to A
- Typically, greater exposure to A results in a bigger incidence of B (though the opposite could also happen)
- Possible way (mechanism) through which A causes B (though this data is not always available)
- Similar correlation shown between A & B in the lab and in the population (though lack of lab, i.e., experimental, evidence does not mean the association doesn’t exist in the population)
- If experiments could be done, it should show that A leads to B
In other words, it is not easy to prove causation, especially in our complex world. Think about this the next time you are asked to believe a direct relationship, rather than a causal one.