Spurious relationship
In statistics, a spurious relationship or spurious correlation[1][2] is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor (referred to as a "common response variable", "confounding factor", or "lurking variable").
Examples[edit]
An example of a spurious relationship can be found in the time-series literature, where a spurious regression is one that provides misleading statistical evidence of a linear relationship between independent non-stationary variables. In fact, the non-stationarity may be due to the presence of a unit root in both variables.[3][4] In particular, any two nominal economic variables are likely to be correlated with each other, even when neither has a causal effect on the other, because each equals a real variable times the price level, and the common presence of the price level in the two data series imparts correlation to them. (See also spurious correlation of ratios.)
Another example of a spurious relationship can be seen by examining a city's ice cream sales. The sales might be highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both. The heat wave is an example of a hidden or unseen variable, also known as a confounding variable.
Another commonly noted example is a series of Dutch statistics showing a positive correlation between the number of storks nesting in a series of springs and the number of human babies born at that time. Of course there was no causal connection; they were correlated with each other only because they were correlated with the weather nine months before the observations.[5]
In rare cases, a spurious relationship can occur between two completely unrelated variables without any confounding variable, as was the case between the success of the Washington Commanders professional football team in a specific game before each presidential election and the success of the incumbent President's political party in said election. For 16 consecutive elections between 1940 and 2000, the Redskins Rule correctly matched whether the incumbent President's political party would retain or lose the Presidency. The rule eventually failed shortly after Elias Sports Bureau discovered the correlation in 2000; in 2004, 2012 and 2016, the results of the Commanders' game and the election did not match.[6][7][8] In a similar spurious relationship involving the National Football League, in the 1970s, Leonard Koppett noted a correlation between the direction of the stock market and the winning conference of that year's Super Bowl, the Super Bowl indicator; the relationship maintained itself for most of the 20th century before reverting to more random behavior in the 21st.[9]
Hypothesis testing[edit]
Often one tests a null hypothesis of no correlation between two variables, and chooses in advance to reject the hypothesis if the correlation computed from a data sample would have occurred in less than (say) 5% of data samples if the null hypothesis were true. While a true null hypothesis will be accepted 95% of the time, the other 5% of the times having a true null of no correlation a zero correlation will be wrongly rejected, causing acceptance of a correlation which is spurious (an event known as Type I error). Here the spurious correlation in the sample resulted from random selection of a sample that did not reflect the true properties of the underlying population.
There are several other relationships defined in statistical analysis as follows.