Chi-squared distribution

In probability theory and statistics, the chi-squared distribution (also chi-square or $\chi ^{2}$ -distribution) with $k$ degrees of freedom is the distribution of a sum of the squares of $k$ independent standard normal random variables.

This article is about the mathematics of the chi-squared distribution. For its uses in statistics, see chi-squared test. For the music group, see Chi2 (band).

Notation

$\chi ^{2}(k)\;$ or $\chi _{k}^{2}\!$

$k\in \mathbb {N} ^{*}~~$ (known as "degrees of freedom")

$x\in (0,+\infty )\;$ if $k=1$ , otherwise $x\in [0,+\infty )\;$

${\frac {1}{2^{k/2}\Gamma (k/2)}}\;x^{k/2-1}e^{-x/2}\;$

${\frac {1}{\Gamma (k/2)}}\;\gamma \left({\frac {k}{2}},\,{\frac {x}{2}}\right)\;$

$k$

$\approx k{\bigg (}1-{\frac {2}{9k}}{\bigg )}^{3}\;$

$\max(k-2,0)\;$

$2k\;$

${\sqrt {8/k}}\,$

${\frac {12}{k}}$

${\begin{aligned}{\frac {k}{2}}&+\log \left(2\Gamma {\Bigl (}{\frac {k}{2}}{\Bigr )}\right)\\&\!+\left(1-{\frac {k}{2}}\right)\psi \left({\frac {k}{2}}\right)\end{aligned}}$

$(1-2t)^{-k/2}{\text{ for }}t<{\frac {1}{2}}\;$

$(1-2it)^{-k/2}$ ^[1]

$(1-2\ln t)^{-k/2}{\text{ for }}0<t<{\sqrt {e}}\;$

The chi-squared distribution $\chi _{k}^{2}$ is a special case of the gamma distribution and the univariate Wishart distribution. Specifically if $X\sim \chi _{k}^{2}$ then $X\sim {\text{Gamma}}(\alpha ={\frac {k}{2}},\theta =2)$ (where $\alpha$ is the shape parameter and $\theta$ the scale parameter of the gamma distribution) and $X\sim {\text{W}}_{1}(1,k)$ .

The scaled chi-squared distribution $s^{2}\chi _{k}^{2}$ is a reparametrization of the gamma distribution and the univariate Wishart distribution. Specifically if $X\sim s^{2}\chi _{k}^{2}$ then $X\sim {\text{Gamma}}(\alpha ={\frac {k}{2}},\theta =2s^{2})$ and $X\sim {\text{W}}_{1}(s^{2},k)$ .

The chi-squared distribution is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals.^[2]^[3]^[4]^[5] This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in finding the confidence interval for estimating the population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, such as Friedman's analysis of variance by ranks.

of independence in contingency tables

Chi-squared test

of goodness of fit of observed data to hypothetical distributions

Chi-squared test

for nested models

Likelihood-ratio test

in survival analysis

Log-rank test

for stratified contingency tables

Cochran–Mantel–Haenszel test

Wald test

Score test

If $X\sim \chi ^{2}(k)$ then ${\sqrt {2X}}$ is approximately normally distributed with mean ${\sqrt {2k-1}}$ and unit variance (1922, by , see (18.23), p. 426 of Johnson.^[4]

R. A. Fisher

[16]

As $k\to \infty$ , $(\chi _{k}^{2}-k)/{\sqrt {2k}}~{\xrightarrow {d}}\ N(0,1)\,$ ()

normal distribution

$\chi _{k}^{2}\sim {\chi '}_{k}^{2}(0)$ ( with non-centrality parameter $\lambda =0$ )

noncentral chi-squared distribution

If $Y\sim \mathrm {F} (\nu _{1},\nu _{2})$ then $X=\lim _{\nu _{2}\to \infty }\nu _{1}Y$ has the chi-squared distribution $\chi _{\nu _{1}}^{2}$

if $X_{1},...,X_{n}$ are $N(\mu ,\sigma ^{2})$ random variables, then $\sum _{i=1}^{n}(X_{i}-{\overline {X_{i}}})^{2}\sim \sigma ^{2}\chi _{n-1}^{2}$ where ${\overline {X_{i}}}={\frac {1}{n}}\sum _{i=1}^{n}X_{i}$ .

i.i.d.

The box below shows some based on $X_{i}\sim N(\mu _{i},\sigma _{i}^{2}),i=1,\ldots ,k$ independent random variables that have probability distributions related to the chi-squared distribution:

statistics

The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student's t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables, each divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed sample.

The chi-squared distribution is also often encountered in magnetic resonance imaging.^[19]

Computational methods[edit]

Table of $χ 2$ values vs $p$ -values[edit]

The ${\textstyle p}$ -value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. A low p-value, below the chosen significance level, indicates statistical significance, i.e., sufficient evidence to reject the null hypothesis. A significance level of 0.05 is often used as the cutoff between significant and non-significant results.

The table below gives a number of p-values matching to $\chi ^{2}$ for the first 10 degrees of freedom.

These values can be calculated evaluating the quantile function (also known as "inverse CDF" or "ICDF") of the chi-squared distribution;^[21] e. g., the $χ 2$ ICDF for $p = 0.05$ and $df = 7$ yields $2.1673 \approx 2.17$ as in the table above, noticing that $1 - p$ is the p-value from the table.

History[edit]

This distribution was first described by the German geodesist and statistician Friedrich Robert Helmert in papers of 1875–6,^[22]^[23] where he computed the sampling distribution of the sample variance of a normal population. Thus in German this was traditionally known as the Helmert'sche ("Helmertian") or "Helmert distribution".

The distribution was independently rediscovered by the English mathematician Karl Pearson in the context of goodness of fit, for which he developed his Pearson's chi-squared test, published in 1900, with computed table of values published in (Elderton 1902), collected in (Pearson 1914, pp. xxxi–xxxiii, 26–28, Table XII). The name "chi-square" ultimately derives from Pearson's shorthand for the exponent in a multivariate normal distribution with the Greek letter Chi, writing $-½χ 2$ for what would appear in modern notation as $-½ x T Σ -1 x$ (Σ being the covariance matrix).^[24] The idea of a family of "chi-squared distributions", however, is not due to Pearson but arose as a further development due to Fisher in the 1920s.^[22]

Earliest Uses of Some of the Words of Mathematics: entry on Chi squared has a brief history

from Yale University Stats 101 class.

Chi-squared distribution

Notation

Notation

Parameters

Support

PDF

CDF

Mean

Median

Mode

Variance

Skewness

Excess kurtosis

Entropy

MGF

CF

PGF

Chi-squared test

Chi-squared test

Likelihood-ratio test

Log-rank test

Cochran–Mantel–Haenszel test

Wald test

Score test

R. A. Fisher

[16]

normal distribution

noncentral chi-squared distribution

i.i.d.

statistics

Computational methods[edit]

Table of $χ 2$ values vs $p$ -values[edit]

History[edit]

Earliest Uses of Some of the Words of Mathematics: entry on Chi squared has a brief history

Course notes on Chi-Squared Goodness of Fit Testing

Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e. g. Σx², for a normal population

Simple algorithm for approximating cdf and inverse cdf for the chi-squared distribution with a pocket calculator

Values of the Chi-squared distribution

Chi-squared distribution

Notation

Notation

Chi-squared test

Chi-squared test

Likelihood-ratio test

Log-rank test

Cochran–Mantel–Haenszel test

Wald test

Score test

R. A. Fisher

[16]

normal distribution

noncentral chi-squared distribution

i.i.d.

statistics

Computational methods[edit]

Table of χ2 values vs p-values[edit]

History[edit]

Earliest Uses of Some of the Words of Mathematics: entry on Chi squared has a brief history

Course notes on Chi-Squared Goodness of Fit Testing

Mathematica demonstration showing the chi-squared sampling distribution of various statistics, e. g. Σx², for a normal population

Simple algorithm for approximating cdf and inverse cdf for the chi-squared distribution with a pocket calculator

Values of the Chi-squared distribution

Table of $χ 2$ values vs $p$ -values[edit]