Chi-squared distribution
In probability theory and statistics, the chi-squared distribution (also chi-square or -distribution) with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.
This article is about the mathematics of the chi-squared distribution. For its uses in statistics, see chi-squared test. For the music group, see Chi2 (band).Notation
The chi-squared distribution is a special case of the gamma distribution and the univariate Wishart distribution. Specifically if
then (where is the shape parameter and the scale parameter of the gamma distribution) and .
The scaled chi-squared distribution is a reparametrization of the gamma distribution and the univariate Wishart distribution. Specifically if
then and .
The chi-squared distribution is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals.[2][3][4][5] This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.
The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in finding the confidence interval for estimating the population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks.
The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables, each divided by their respective degrees of freedom.
Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed sample.
The chi-squared distribution is also often encountered in magnetic resonance imaging.[19]
Computational methods[edit]
Table of χ2 values vs p-values[edit]
The -value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. A low p-value, below the chosen significance level, indicates statistical significance, i.e., sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and non-significant results.
The table below gives a number of p-values matching to for the first 10 degrees of freedom.
These values can be calculated evaluating the quantile function (also known as "inverse CDF" or "ICDF") of the chi-squared distribution;[21] e. g., the χ2 ICDF for p = 0.05 and df = 7 yields 2.1673 ≈ 2.17 as in the table above, noticing that 1 – p is the p-value from the table.
History[edit]
This distribution was first described by the German geodesist and statistician Friedrich Robert Helmert in papers of 1875–6,[22][23] where he computed the sampling distribution of the sample variance of a normal population. Thus in German this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".
The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness of fit, for which he developed his Pearson's chi-squared test, published in 1900, with computed table of values published in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII).
The name "chi-square" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter Chi, writing −½χ2 for what would appear in modern notation as −½xTΣ−1x (Σ being the covariance matrix).[24] The idea of a family of "chi-squared distributions", however, is not due to Pearson but arose as a further development due to Fisher in the 1920s.[22]