Replication crisis

The replication crisis^[a] is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,^[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

The replication crisis is frequently discussed in relation to psychology and medicine, where considerable efforts have been undertaken to reinvestigate classic results, to determine whether they are reliable, and if they turn out not to be, the reasons for the failure.^[3]^[4] Data strongly indicate that other natural and social sciences are affected as well.^[5]

The phrase replication crisis was coined in the early 2010s^[6] as part of a growing awareness of the problem. Considerations of causes and remedies have given rise to a new scientific discipline, metascience,^[7] which uses methods of empirical research to examine empirical research practice.

Considerations about reproducibility can be placed into two categories. Reproducibility in the narrow sense refers to re-examining and validating the analysis of a given set of data. Replication refers to repeating the experiment or study to obtain new, independent data with the goal of reaching the same or similar conclusions.

Controversies around social priming research: In the early 2010s, the well-known "elderly-walking" study by social psychologist John Bargh and colleagues failed to replicate in two direct replications.^[25] This experiment was part of a series of three studies that had been widely cited throughout the years, was regularly taught in university courses, and had inspired a large number of conceptual replications. Failures to replicate the study led to much controversy and a heated debate involving the original authors.^[26] Notably, many of the conceptual replications of the original studies also failed to replicate in subsequent direct replications.^[27]^[28]^[29]^[30]

[24]

Controversies around experiments on extrasensory perception: Social psychologist conducted a series of experiments supposedly providing evidence for the controversial phenomenon of extrasensory perception.^[31] Bem was highly criticized for his study's methodology and upon reanalysis of the data, no evidence was found for the existence of extrasensory perception.^[32] The experiment also failed to replicate in subsequent direct replications.^[33] According to Romero, what the community found particularly upsetting was that many of the flawed procedures and statistical tools used in Bem's studies were part of common research practice in psychology.

Daryl Bem

Amgen and Bayer reports on lack of replicability in biomedical research: Scientists from biotech companies and Bayer Healthcare reported alarmingly low replication rates (11–20%) of landmark findings in preclinical oncological research.^[34]^[35]

Amgen

Publication of studies on p-hacking and questionable research practices: Since the late 2000s, a number of studies in showed how commonly adopted practices in many scientific fields, such as exploiting the flexibility of the process of data collection and reporting, could greatly increase the probability of false positive results.^[36]^[37]^[38] These studies suggested how a significant proportion of published literature in several scientific fields could be nonreplicable research.

metascience

Prevalence[edit]

In psychology[edit]

Several factors have combined to put psychology at the center of the conversation.^[54]^[55] Some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.^[56] Much of the focus has been on the area of social psychology,^[57] although other areas of psychology such as clinical psychology,^[58]^[59]^[60] developmental psychology,^[61]^[62]^[63] and educational research have also been implicated.^[64]^[65]^[66]^[67]^[68]

In August 2015, the first open empirical study of reproducibility in psychology was published, called The Reproducibility Project: Psychology. Coordinated by psychologist Brian Nosek, researchers redid 100 studies in psychological science from three high-ranking psychology journals (Journal of Personality and Social Psychology, Journal of Experimental Psychology: Learning, Memory, and Cognition, and Psychological Science). 97 of the original studies had significant effects, but of those 97, only 36% of the replications yielded significant findings (p value below 0.05).^[11] The mean effect size in the replications was approximately half the magnitude of the effects reported in the original studies. The same paper examined the reproducibility rates and effect sizes by journal and discipline. Study replication rates were 23% for the Journal of Personality and Social Psychology, 48% for Journal of Experimental Psychology: Learning, Memory, and Cognition, and 38% for Psychological Science. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%).^[69]

A study published in 2018 in Nature Human Behaviour replicated 21 social and behavioral science papers from Nature and Science, finding that only about 62% could successfully reproduce original results.^[70]^[71]

Similarly, in a study conducted under the auspices of the Center for Open Science, a team of 186 researchers from 60 different laboratories (representing 36 different nationalities from six different continents) conducted replications of 28 classic and contemporary findings in psychology.^[72]^[73] The study's focus was not only whether the original papers' findings replicated but also the extent to which findings varied as a function of variations in samples and contexts. Overall, 50% of the 28 findings failed to replicate despite massive sample sizes. But if a finding replicated, then it replicated in most samples. If a finding was not replicated, then it failed to replicate with little variation across samples and contexts. This evidence is inconsistent with a proposed explanation that failures to replicate in psychology are likely due to changes in the sample between the original and replication study.^[73]

Results of a 2022 study suggest that many earlier brain–phenotype studies ("brain-wide association studies" (BWAS)) produced invalid conclusions as the replication of such studies requires samples from thousands of individuals due to small effect sizes.^[74]^[75]

Replication crisis

[24]

Daryl Bem

Amgen

metascience

Prevalence[edit]

In psychology[edit]

Base rate fallacy

Black swan theory

Correlation does not imply causation

Data dredging

Decline effect

Estimation statistics

Exploratory data analysis

Extension neglect

Falsifiability

Invalid science

Misuse of statistics

Naturalism

Observer bias

p-value

Problem of induction

Sampling bias

Selection bias

Statistical hypothesis testing

Uniformitarianism