## “Women are made to be loved, not understood.”

## Oscar Wilde

Imagine you’re at a party making small talk and some guy suggests its a well-known fact that intelligent women marry less intelligent men (factually this is correct). The genius supports his insight with reasons like, intelligent women have low self-esteem, or intelligent women want to dominate their partner, and so on. But what does account for the fact?

Daniel Kanneman, in __Thinking, Fast and Slow__, uses this example to illustrate the concepts of regression to the mean, correlation, and why we instinctively look for causal explanations when a simple statistical one will do.

Consider performance in sports. Statistically a golfer who posts a good score on the first day of competition is likely to post a worse score on the second. This is an empirical fact from a sport that collects a lot of data. Similarly a basketball player who has a breakout season is unlikely to do as well in the next – a phenomenon also known as the ‘Sports Illustrated jinx’. Remember Jeremy Lin. He went from Linsanity to this. With a breakout season comes increasing attention, scrutiny, pressure, demands on your time and maybe complacency. All of these combine to lower the effectiveness of the same player the following season, hence second season syndrome. Or do they?

Kahneman says we think this way because our brains are wired to reach for causal explanations on the one hand, and because we fail to intuitively grasp statistical concepts on the other. In this case the correct explanation for a bad second season is more likely randomness, correlation and regression to the mean.

The correlation between two things is based the factors they share. Kahneman uses another example to illustrate. Consider weight and piano playing. If at any age, we say that piano playing depends only on weekly hours of practice, and that weight depends only on consumption of ice-cream, then we have the following equations:

- weight = age + ice cream consumption
- piano playing = age + weekly hours of practice

Intuitively we understand that if Tom ranks high in weight (relative to his classmates) he is probably older than the class average and probably eats more ice cream than the other kids. Similarly if Barbara ranks low in piano playing (relative to her classmates) she is probably younger than average and is likely to practice less. And so weight correlates to piano playing because age is common to both. But the correlation is imperfect because other factors also play a role. Tom’s rank for weight doesn’t perfectly predict his rank for piano playing. And ‘*whenever the correlation…is imperfect, there will be regression to the mean*‘.

And this is where randomness or luck comes in. Luck influences almost everything to a greater or lesser degree. And so correlation between 2 things is almost always imperfect. And randomness is usually the reason. Consider the (admittedly simplified) Jeremy Lin equation:

- Performance in season 1 = luck in season 1 + weekly hours of practice
- Performance in season 2 = luck in season 2 + weekly hours of practice

It’s reasonable to assume that Jeremy put in more practice hours not less in his second season, this is mostly what professional sports people do, especially intelligent undrafted players who know that they are under scrutiny from headline seeking sports journalists who are as eager to write the fallen-from-grace story as they were to write the ascension story in the first place. And so if weekly hours of practice increased it is luck between seasons that differed – imperfect correlation from randomness.

As so back to the discussion on why intelligent women marry down. The correlation between the intelligence scores of spouses is imperfect (we know that spouses are not equally intelligent although men and women on average are), and so *‘it is a mathematical inevitability that highly intelligent women will be married to husbands who are on average less intelligent that they are’*. This is regression to the mean.

Still doesn’t feel right? Is your mind still reaching for causal explanations over statistics? If you want to remove the regression effect and isolate a cause you need to use a control group. If you want to confirm your intuition that intelligent women marry less intelligent men because intelligent women have lower self-esteem, or whatever, then you need to compare a group of women selected randomly (the control group) with a group of women who have equally low self-esteem. Then compare the intelligence gap between the women and their husbands across the groups. If the gap is greater for the group of women with low self-esteem then this would be a valid cause.

In statistics the correlation coefficient is a measurement (expressed as a value between 0 and 1) of the relative weight of shared factors. Perfect correlation = 1. Understanding the correlation coefficient between measures can help us gauge whether our intuition is correct. The lower the correlation coefficient the less likely that factor serves as explanation or cause, i.e. the less likely our intuition is right.

So consider randomness, correlation and regression to the mean the next time you find yourself reaching for an explanation, that ‘*it is a mathematically inevitable consequence of the fact that luck played a role… Not a very satisfactory story – we would all prefer a causal account – but that is all there is.*‘

### RESOURCES

__Thinking, Fast and Slow__by Daniel Kahneman