Katana VentraIP

Multiple comparisons problem

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously[1] or estimates a subset of parameters selected based on the observed values.[2]

The larger the number of inferences made, the more likely erroneous inferences become. Several statistical techniques have been developed to address this problem, for example, by requiring a stricter significance threshold for individual comparisons, so as to compensate for the number of inferences being made. Methods for family-wise error rate give the probability of false positives resulting from the multiple comparisons problem.

History[edit]

The problem of multiple comparisons received increased attention in the 1950s with the work of statisticians such as Tukey and Scheffé. Over the ensuing decades, many procedures were developed to address the problem. In 1996, the first international conference on multiple comparison procedures took place in Tel Aviv.[3] This is an active research area with work being done by, for example Emmanuel Candès and Vladimir Vovk.

Suppose the treatment is a new way of teaching writing to students, and the control is the standard way of teaching writing. Students in the two groups can be compared in terms of grammar, spelling, organization, content, and so on. As more attributes are compared, it becomes increasingly likely that the treatment and control groups will appear to differ on at least one attribute due to random alone.

sampling error

Suppose we consider the efficacy of a in terms of the reduction of any one of a number of disease symptoms. As more symptoms are considered, it becomes increasingly likely that the drug will appear to be an improvement over existing drugs in terms of at least one symptom.

drug

q-value

F. Betz, T. Hothorn, P. Westfall (2010), Multiple Comparisons Using R, CRC Press

and M. J. van der Laan (2008), Multiple Testing Procedures with Application to Genomics, Springer

S. Dudoit

Farcomeni, A. (2008). "A Review of Modern Multiple Hypothesis Testing, with particular attention to the false discovery proportion". Statistical Methods in Medical Research. 17 (4): 347–388. :10.1177/0962280206079046. hdl:11573/142139. PMID 17698936. S2CID 12777404.

doi

Phipson, B.; Smyth, G. K. (2010). "Permutation P-values Should Never Be Zero: Calculating Exact P-values when Permutations are Randomly Drawn". Statistical Applications in Genetics and Molecular Biology. 9: Article39. :1603.05766. doi:10.2202/1544-6115.1585. PMID 21044043. S2CID 10735784.

arXiv

P. H. Westfall and S. S. Young (1993), Resampling-based Multiple Testing: Examples and Methods for p-Value Adjustment, Wiley

P. Westfall, R. Tobias, R. Wolfinger (2011) Multiple comparisons and multiple testing using SAS, 2nd edn, SAS Institute

A gallery of examples of implausible correlations sourced by data dredging

An xkcd comic about the multiple comparisons problem, using jelly beans and acne as an example

[1]