One of the fundamental things any economist or statistician knows (or should know) is that correlation does not equal causation.¹ This phrase gets thrown around a lot and has become a repetitive buzzword, but the differences between the two cannot be emphasized enough.

Oftentimes, hot new studies will be passed around that seem to indicate triumph for one's political views. It's easy to just look at a study or statistic that says "individuals who own dogs tend to live longer" and conclude that "I should get a dog". But, what if the dog owners are living longer because happier and more active people would tend to own dogs? These individuals would live longer anyway. Without a control group and similar beginning backgrounds, a study evaluating these two different groups (dog owners vs. non dog-owners) would likely be inaccurate. Of course, it doesn't really matter if this study causes people to buy more dogs, but many policy decisions and political ideologies are based on seemingly sound data.

Noah Smith just discussed the correlation vs. causation debate for Bloomberg View with a great example of a bad study:

"The NMP study finds that people who wait to have sex later tend to have higher marital quality. That’s a correlation. Does it mean that if you choose to wait longer to have sex, you will have higher marital quality? Not necessarily!

For example, suppose that there is a group of people in the NMP study sample who had very neglectful parents, or who came from broken homes. These people might tend to have sex earlier in their relationships, because their parents didn't educate them about the dangers of STDs, pregnancy, etc. And suppose that these people also tend to have bad marriages, because their parents didn’t show them a good example. Even if this only represents a small subset of the people in the study, it could drive the entire result.

In this case, the omitted variable -- bad parents -- isn't something you can control. That is crucial."

Here Noah brings up one of the issues with correlations. Oftentimes, there are omitted variables driving the relationship between two things, but people take the two things to be related as though they cause each other. I notice this a lot when it comes to issues regarding different socioeconomic classes. Two examples of omitted variables:

1) [Insert race here] tends to be [richer/poorer]. This must mean they are [more/less] deserving and [harder/lazier] workers.

In this case, the omitted variables are likely family background, education quality, and social bias towards that race. Without considering the effect omitted variables might have on socioeconomic outcomes, one can end up sounding pretty ignorant!

2) A recent study touts that running just 5 minutes a day can end up extending one's life:

"Running, even 5 to 10 min/day and at slow speeds <6 miles/h, is associated with markedly reduced risks of death from all causes and cardiovascular disease. This study may motivate healthy but sedentary individuals to begin and continue running for substantial and attainable mortality benefits."

There is no doubt in my mind that exercise has very clear effects on life expectancy, but in this case we must consider: are the people who run living longer because they live a healthier lifestyle, or because they run? My inclination is to say that those who run are much more likely to also eat well and take better care of themselves, which would cause them to live longer/be healthier anyway.

So, as you can see, examining correlation and causation can be just as easy as taking a critical look at facts you hear. The next time you come across a study or surprising fact, think to yourself: how could this relationship be influenced by other variables? Are the two variables even related? (see below for some funny correlations that are decidedly not causations)

A few more problems with studies can include reverse causality and selection bias. Reverse causality is when two things are related, but people erroneously think that one causes the other. Say there's a correlation between membership in Krispy Kreme's rewards program and obesity. You might say, "oh, being a member of Krispy Kreme's rewards makes people obese, we should end this program". In reality, individuals who eat a lot of donuts (and in turn have higher obesity rates) would be more likely to get a card than the casual once-a-year donut eater.

Reverse causality is a big issue in less scientifically keen climates - take Ebola, where natives (without much scientific or medical knowledge) fight doctors because they think doctors cause Ebola and their loved one's deaths. In their minds, when the doctors come, people start dying from Ebola, which means that the doctors are causing Ebola. It makes sense from their perspective, but is a very dangerous example of reverse causality.

The final issue one might encounter with studies is selection bias. Again citing Noah Smith,

"this is when your sample is not chosen randomly. In the case of this NMP study, it’s a serious flaw. The NMP study concludes that people who had a child before their current marriage are less likely to have a successful marriage. But those children didn’t arrive randomly out of thin air. They represent the existence of a previous failed relationship!...

The NMP study selects a sample of people who have demonstrated a tendency to break up -- i.e., people who have had a kid in the past with another partner -- and compares them to a random sample of the population. It very well might be the case that having experience with child-rearing, or with breakups, actually helps you form a stable relationship in the future, all else being equal. But because of selection bias, the NMP study would still tell you the opposite."

Unless a study has similar control and test groups and/or diverse respondents representing an accurate cross-section of the group in question, the conclusions cannot be considered valid.

So, next time you hear about an interesting study, examine what else there could be to the conclusions! Before you know it, you (like me) will be unable to hear a fact without thinking about the potential omitted variables, selection bias, and reverse causality.²

¹For comedic relief, imagine a group of angry statisticians chanting this.

²If you want to save mental energy, disregard this article and continue accepting all the facts you hear. You may sound stupid, but critically examining things sure can get exhausting!

Correlation, Causation, and Why the Study You're Citing is Probably Wrong