Types of bias[edit]

Sampling bias[edit]

Sampling bias is systematic error due to a non-random sample of a population,[2] causing some members of the population to be less likely to be included than others, resulting in a biased sample, defined as a statistical sample of a population (or non-human factors) in which all participants are not equally balanced or objectively represented.[3] It is mostly classified as a subtype of selection bias,[4] sometimes specifically termed sample selection bias,[5][6][7] but some classify it as a separate type of bias.[8]


A distinction of sampling bias (albeit not a universally accepted one) is that it undermines the external validity of a test (the ability of its results to be generalized to the rest of the population), while selection bias mainly addresses internal validity for differences or similarities found in the sample at hand. In this sense, errors occurring in the process of gathering the sample or cohort cause sampling bias, while errors in any process thereafter cause selection bias.


Examples of sampling bias include self-selection, pre-screening of trial participants, discounting trial subjects/tests that did not run to completion and migration bias by excluding subjects who have recently moved into or out of the study area, length-time bias, where slowly developing disease with better prognosis is detected, and lead time bias, where disease is diagnosed earlier for participants than in comparison populations, although the average course of disease is the same.

Mitigation[edit]

In the general case, selection biases cannot be overcome with statistical analysis of existing data alone, though Heckman correction may be used in special cases. An assessment of the degree of selection bias can be made by examining correlations between exogenous (background) variables and a treatment indicator. However, in regression models, it is correlation between unobserved determinants of the outcome and unobserved determinants of selection into the sample which bias estimates, and this correlation between unobservables cannot be directly assessed by the observed determinants of treatment.[21]


When data are selected for fitting or forecast purposes, a coalitional game can be set up so that a fitting or forecast accuracy function can be defined on all subsets of the data variables.

or reporting bias, the distortion produced in community perception or meta-analyses by not publishing uninteresting (usually negative) results, or results which go against the experimenter's prejudices, a sponsor's interests, or community expectations.

publication bias

the general tendency of humans to give more attention to whatever confirms our pre-existing perspective; or specifically in experimental science, the distortion produced by experiments that are designed to seek confirmatory evidence instead of trying to disprove the hypothesis.

confirmation bias

exclusion bias, results from applying different criteria to cases and controls in regards to participation eligibility for a study/different variables serving as basis for exclusion.

Selection bias is closely related to: