Gini coefficient
In economics, the Gini coefficient (/ˈdʒiːni/ JEE-nee), also known as the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income inequality, the wealth inequality, or the consumption inequality[3] within a nation or a social group. It was developed by Italian statistician and sociologist Corrado Gini.
Not to be confused with Gini impurity.
The Gini coefficient measures the inequality among the values of a frequency distribution, such as levels of income. A Gini coefficient of 0 reflects perfect equality, where all income or wealth values are the same, while a Gini coefficient of 1 (or 100%) reflects maximal inequality among values, a situation where a single individual has all the income while all others have none.[4][5]
The Gini coefficient was proposed by Corrado Gini as a measure of inequality of income or wealth.[6] For OECD countries in the late 20th century, considering the effect of taxes and transfer payments, the income Gini coefficient ranged between 0.24 and 0.49, with Slovakia being the lowest and Mexico the highest.[7] African countries had the highest pre-tax Gini coefficients in 2008–2009, with South Africa having the world's highest, estimated to be 0.63 to 0.7.[8][9] However, this figure drops to 0.52 after social assistance is taken into account, and drops again to 0.47 after taxation.[10] The country with the lowest Gini coefficient is Slovakia, with a Gini coefficient of 0.232.[11] The Gini coefficient of the global income in 2005 has been estimated to be between 0.61 and 0.68 by various sources.[12][13]
There are some issues in interpreting a Gini coefficient, as the same value may result from many different distribution curves. To mitigate this, the demographic structure should be taken into account. Countries with an aging population, or those with an increased birth rate, experience an increasing pre-tax Gini coefficient even if real income distribution for working adults remains constant. Many scholars have devised over a dozen variants of the Gini coefficient.[14][15][16]
History[edit]
The Gini coefficient was developed by the Italian statistician Corrado Gini and published in his 1912 paper Variabilità e mutabilità (English: variability and mutability).[17][18] Building on the work of American economist Max Lorenz, Gini proposed that the difference between the hypothetical straight line depicting perfect equality, and the actual line depicting people's incomes, be used as a measure of inequality.[19]
Alternatives[edit]
Given the limitations of the Gini coefficient, other statistical methods are used in combination or as an alternative measure of population dispersity. For example, entropy measures are frequently used (e.g. the Atkinson index or the Theil Index and Mean log deviation as special cases of the generalized entropy index). These measures attempt to compare the distribution of resources by intelligent agents in the market with a maximum entropy random distribution, which would occur if these agents acted like non-interacting particles in a closed system following the laws of statistical physics.
Relation to other statistical measures[edit]
There is a summary measure of the diagnostic ability of a binary classifier system that is also called the Gini coefficient, which is defined as twice the area between the receiver operating characteristic (ROC) curve and its diagonal. It is related to the AUC (Area Under the ROC Curve) measure of performance given by [91] and to Mann–Whitney U. Although both Gini coefficients are defined as areas between certain curves and share certain properties, there is no simple direct relationship between the Gini coefficient of statistical dispersion and the Gini coefficient of a classifier.
The Gini index is also related to the Pietra index — both of which measure statistical heterogeneity and are derived from the Lorenz curve and the diagonal line.[92][93][28]
In certain fields such as ecology, inverse Simpson's index is used to quantify diversity, and this should not be confused with the Simpson index . These indicators are related to Gini. The inverse Simpson index increases with diversity, unlike the Simpson index and Gini coefficient, which decrease with diversity. The Simpson index is in the range [0, 1], where 0 means maximum and 1 means minimum diversity (or heterogeneity). Since diversity indices typically increase with increasing heterogeneity, the Simpson index is often transformed into inverse Simpson, or using the complement , known as the Gini-Simpson Index.[94]
Gini coefficients for pre-modern societies[edit]
In recent decades, researchers have attempted to estimate Gini coefficients for pre-20th century societies. In the absence of household income surveys and income taxes, scholars have relied on proxy variables. These include wealth taxes in medieval European city states, patterns of landownership in Roman Egypt, variation of the size of houses in societies from ancient Greece to Aztec Mexico, and inheritance and dowries in Babylonian society. Other data does not directly document variations in wealth or income but are known to reflect inequality, such as the ratio of rents to wages or of labor to capital.[95]
Other uses[edit]
Although the Gini coefficient is most popular in economics, it can, in theory, be applied in any field of science that studies a distribution. For example, in ecology, the Gini coefficient has been used as a measure of biodiversity, where the cumulative proportion of species is plotted against the cumulative proportion of individuals.[96] In health, it has been used as a measure of the inequality of health-related quality of life in a population.[97] In education, it has been used as a measure of the inequality of universities.[98] In chemistry it has been used to express the selectivity of protein kinase inhibitors against a panel of kinases.[99] In engineering, it has been used to evaluate the fairness achieved by Internet routers in scheduling packet transmissions from different flows of traffic.[100]
The Gini coefficient is sometimes used for the measurement of the discriminatory power of rating systems in credit risk management.[101]
A 2005 study accessed US census data to measure home computer ownership and used the Gini coefficient to measure inequalities amongst whites and African Americans. Results indicated that although decreasing overall, home computer ownership inequality was substantially smaller among white households.[102]
A 2016 peer-reviewed study titled Employing the Gini coefficient to measure participation inequality in treatment-focused Digital Health Social Networks[103] illustrated that the Gini coefficient was helpful and accurate in measuring shifts in inequality, however as a standalone metric it failed to incorporate overall network size.
Discriminatory power refers to a credit risk model's ability to differentiate between defaulting and non-defaulting clients. The formula , in the calculation section above, may be used for the final model and at the individual model factor level to quantify the discriminatory power of individual factors. It is related to the accuracy ratio in population assessment models.
The Gini coefficient has also been applied to analyze inequality in dating apps.[104][105]
Kaminskiy and Krivtsov[106] extended the concept of the Gini coefficient from economics to reliability theory and proposed a Gini–type coefficient that helps to assess the degree of aging of non−repairable systems or aging and rejuvenation of repairable systems. The coefficient is defined between -1 and 1 and can be used in both empirical and parametric life distributions. It takes negative values for the class of decreasing failure rate distributions and point processes with decreasing failure intensity rate and is positive for the increasing failure rate distributions and point processes with increasing failure intensity rate. The value of zero corresponds to the exponential life distribution or the Homogeneous Poisson Process.