Katana VentraIP

Chi-squared distribution

In probability theory and statistics, the chi-squared distribution (also chi-square or -distribution) with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.

This article is about the mathematics of the chi-squared distribution. For its uses in statistics, see chi-squared test. For the music group, see Chi2 (band).

Notation

or

(known as "degrees of freedom")

if , otherwise

[1]

The chi-squared distribution is a special case of the gamma distribution and the univariate Wishart distribution. Specifically if then (where is the shape parameter and the scale parameter of the gamma distribution) and .


The scaled chi-squared distribution is a reparametrization of the gamma distribution and the univariate Wishart distribution. Specifically if then and .


The chi-squared distribution is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals.[2][3][4][5] This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.


The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in finding the confidence interval for estimating the population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks.

of independence in contingency tables

Chi-squared test

of goodness of fit of observed data to hypothetical distributions

Chi-squared test

for nested models

Likelihood-ratio test

in survival analysis

Log-rank test

for stratified contingency tables

Cochran–Mantel–Haenszel test

Wald test

Score test

If then is approximately normally distributed with mean and unit variance (1922, by , see (18.23), p. 426 of Johnson.[4]

R. A. Fisher

[16]

As , ()

normal distribution

( with non-centrality parameter )

noncentral chi-squared distribution

If then has the chi-squared distribution

if are random variables, then where .

i.i.d.

The box below shows some based on independent random variables that have probability distributions related to the chi-squared distribution:

statistics

The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables, each divided by their respective degrees of freedom.


Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed sample.


The chi-squared distribution is also often encountered in magnetic resonance imaging.[19]

Computational methods[edit]

Table of χ2 values vs p-values[edit]

The -value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. A low p-value, below the chosen significance level, indicates statistical significance, i.e., sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and non-significant results.


The table below gives a number of p-values matching to for the first 10 degrees of freedom.

These values can be calculated evaluating the quantile function (also known as "inverse CDF" or "ICDF") of the chi-squared distribution;[21] e. g., the χ2 ICDF for p = 0.05 and df = 7 yields 2.1673 ≈ 2.17 as in the table above, noticing that 1 – p is the p-value from the table.

History[edit]

This distribution was first described by the German geodesist and statistician Friedrich Robert Helmert in papers of 1875–6,[22][23] where he computed the sampling distribution of the sample variance of a normal population. Thus in German this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".


The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness of fit, for which he developed his Pearson's chi-squared test, published in 1900, with computed table of values published in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII). The name "chi-square" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter Chi, writing −½χ2 for what would appear in modern notation as −½xTΣ−1x (Σ being the covariance matrix).[24] The idea of a family of "chi-squared distributions", however, is not due to Pearson but arose as a further development due to Fisher in the 1920s.[22]

Earliest Uses of Some of the Words of Mathematics: entry on Chi squared has a brief history

from Yale University Stats 101 class.

Course notes on Chi-Squared Goodness of Fit Testing

Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e. g. Σx², for a normal population

Simple algorithm for approximating cdf and inverse cdf for the chi-squared distribution with a pocket calculator

Values of the Chi-squared distribution