Katana VentraIP

Fairness (machine learning)

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. For example gender, ethnicity, sexual orientation or disability. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumer.

Assuming is binary, if and are not , and and are not statistically independent either, then independence and separation cannot both hold except for rhetorical cases.

statistically independent

If as a has positive probability for all its possible values and and are not statistically independent, then separation and sufficiency cannot both hold except for rhetorical cases.

joint distribution

Individual fairness criteria[edit]

An important distinction among fairness definitions is the one between group and individual notions.[38][39][34][40] Roughly speaking, while group fairness criteria compare quantities at a group level, typically identified by sensitive attributes (e.g. gender, ethnicity, age, etc.), individual criteria compare individuals. In words, individual fairness follow the principle that "similar individuals should receive similar treatments".


There is a very intuitive approach to fairness, which usually goes under the name of fairness through unawareness (FTU), or blindness, that prescribes not to explicitly employ sensitive features when making (automated) decisions. This is effectively a notion of individual fairness, since two individuals differing only for the value of their sensitive attributes would receive the same outcome.


However, in general, FTU is subject to several drawbacks, the main being that it does not take into account possible correlations between sensitive attributes and non-sensitive attributes employed in the decision-making process. For example, an agent with the (malignant) intention to discriminate on the basis of gender could introduce in the model a proxy variable for gender (i.e. a variable highly correlated with gender) and effectively using gender information while at the same time being compliant to the FTU prescription.


The problem of what variables correlated to sensitive ones are fairly employable by a model in the decision-making process is a crucial one, and is relevant for group concepts as well: independence metrics require a complete removal of sensitive information, while separation-based metrics allow for correlation, but only as far as the labeled target variable "justify" them.


The most general concept of individual fairness was introduced in the pioneer work by Cynthia Dwork and collaborators in 2012[41] and can be thought of as a mathematical translation of the principle that the decision map taking features as input should be built such that it is able to "map similar individuals similarly", that is expressed as a Lipschitz condition on the model map. They call this approach fairness through awareness (FTA), precisely as counterpoint to FTU, since they underline the importance of choosing the appropriate target-related distance metric in order to assess which individuals are similar in specific situations. Again, this problem is very related to the point raised above about what variables can be seen as "legitimate" in particular contexts.

sensitive attributes (),

target variable (),

mediators () between and , representing possible indirect effects of sensitive attributes on the outcome,

variables possibly sharing a common cause with (), representing possible spurious (i.e., non causal) effects of the sensitive attributes on the outcome.

Causal fairness measures the frequency with which two nearly identical users or applications who differ only in a set of characteristics with respect to which resource allocation must be fair receive identical treatment.[42]


An entire branch of the academic research on fairness metrics is devoted to leverage causal models to assess bias in machine learning models. This approach is usually justified by the fact that the same observational distribution of data may hide different causal relationships among the variables at play, possibly with different interpretations of whether the outcome are affected by some form of bias or not.[30]


Kusner et al.[43] propose to employ counterfactuals, and define a decision-making process counterfactually fair if, for any individual, the outcome does not change in the counterfactual scenario where the sensitive attributes are changed. The mathematical formulation reads:





that is: taken a random individual with sensitive attribute and other features and the same individual if she had , they should have same chance of being accepted. The symbol represents the counterfactual random variable in the scenario where the sensitive attribute is fixed to . The conditioning on means that this requirement is at the individual level, in that we are conditioning on all the variables identifying a single observation.


Machine learning models are often trained upon data where the outcome depended on the decision made at that time.[44] For example, if a machine learning model has to determine whether an inmate will recidivate and will determine whether the inmate should be released early, the outcome could be dependent on whether the inmate was released early or not. Mishler et al.[45] propose a formula for counterfactual equalized odds:





where is a random variable, denotes the outcome given that the decision was taken, and is a sensitive feature.


Plecko and Bareinboim[46] propose a unified framework to deal with causal analysis of fairness. They suggest the use of a Standard Fairness Model, consisting of a causal graph with 4 types of variables:


Within this framework, Plecko and Bareinboim[46] are therefore able to classify the possible effects that sensitive attributes may have on the outcome. Moreover, the granularity at which these effects are measured—namely, the conditioning variables used to average the effect—is directly connected to the "individual vs. group" aspect of fairness assessment.

Algorithmic bias

Machine learning

Representational harm