“Correlation Does not Imply Causation” is not as simple as it looks.

This year I gave lectures to three different classes about introductory statistics. One of my favorite classes was Correlation and Linear Regression, because It allowed me to talk to students about more practical applications of the usually very mystical concepts of statistics.

In all three lectures I talked about the “correlation vs causation” dillema, and in all the exams I included the following question: “A group of scientists find out that the rate of use of product X has a strong correlation with the rate of hairlessness in people. Is this result useful to say whether product X causes loss of hair? What arguments or experiments could you use to determine if X and loss of hair have a casual relationship?”

It is a very open ended question, I wanted to pick at my student’s brain and gauge how much of the class they assimilated more than I wanted an exact answer with an exact score (there were other questions for that). The results were very interesting.

As for the first part of the question, a very large number of students in all classes misunderstood the meaning of the word “useful”. They would answer that either the correlation does not prove the causation, which is correct, but is not what I was asking. Of course correlation does not prove the causation, the question itself states that! But still, it is useful information, because if the correlation was 0, we could exclude a direct causation right out of the bat.

As for the second part of the question, many, many students suggested tests with control groups to test if the causation really exists. This shows a fundamental misunderstanding of either what control groups are really about, or what about what correlation really means. If you have already stablished a strong correlation between X and hair loss, the logical conclusion is that a control group that is not using X will not show hair loss. The control group will simply confirm the correlation we already know about, and teach us nothing about the causality (or lack thereof) of X and hair loss.

This is not exactly an easy question — even though it uses no numbers or asks for no formulas! — but since I said they could use arguments instead of experiments, they could suggest different X with and without causality relationship with hair loss, and compare them. For instance, they could say that X could be either an anti hair loss product (and the causality relation would actually be inverse in this case!) or a shampoo with a bad chemical balance (and in this case there would be a causal relation) – the “experiment” in this case would be to actually investigate X and how it relates to hair loss in chemical/biological terms, and not something as simple as “Do a Z-test/double blind test/whatever”.

Things to keep in mind for the next time around…

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.