Replication crisis
The replication crisis[a] is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.
The replication crisis is frequently discussed in relation to psychology and medicine, where considerable efforts have been undertaken to reinvestigate classic results, to determine whether they are reliable, and if they turn out not to be, the reasons for the failure.[3][4] Data strongly indicate that other natural and social sciences are affected as well.[5]
The phrase replication crisis was coined in the early 2010s[6] as part of a growing awareness of the problem. Considerations of causes and remedies have given rise to a new scientific discipline, metascience,[7] which uses methods of empirical research to examine empirical research practice.
Considerations about reproducibility can be placed into two categories. Reproducibility in the narrow sense refers to re-examining and validating the analysis of a given set of data. Replication refers to repeating the experiment or study to obtain new, independent data with the goal of reaching the same or similar conclusions.
Prevalence[edit]
In psychology[edit]
Several factors have combined to put psychology at the center of the conversation.[54][55] Some areas of psychology once considered solid, such as social priming, have come under increased scrutiny due to failed replications.[56] Much of the focus has been on the area of social psychology,[57] although other areas of psychology such as clinical psychology,[58][59][60] developmental psychology,[61][62][63] and educational research have also been implicated.[64][65][66][67][68]
In August 2015, the first open empirical study of reproducibility in psychology was published, called The Reproducibility Project: Psychology. Coordinated by psychologist Brian Nosek, researchers redid 100 studies in psychological science from three high-ranking psychology journals (Journal of Personality and Social Psychology, Journal of Experimental Psychology: Learning, Memory, and Cognition, and Psychological Science). 97 of the original studies had significant effects, but of those 97, only 36% of the replications yielded significant findings (p value below 0.05).[11] The mean effect size in the replications was approximately half the magnitude of the effects reported in the original studies. The same paper examined the reproducibility rates and effect sizes by journal and discipline. Study replication rates were 23% for the Journal of Personality and Social Psychology, 48% for Journal of Experimental Psychology: Learning, Memory, and Cognition, and 38% for Psychological Science. Studies in the field of cognitive psychology had a higher replication rate (50%) than studies in the field of social psychology (25%).[69]
A study published in 2018 in Nature Human Behaviour replicated 21 social and behavioral science papers from Nature and Science, finding that only about 62% could successfully reproduce original results.[70][71]
Similarly, in a study conducted under the auspices of the Center for Open Science, a team of 186 researchers from 60 different laboratories (representing 36 different nationalities from six different continents) conducted replications of 28 classic and contemporary findings in psychology.[72][73] The study's focus was not only whether the original papers' findings replicated but also the extent to which findings varied as a function of variations in samples and contexts. Overall, 50% of the 28 findings failed to replicate despite massive sample sizes. But if a finding replicated, then it replicated in most samples. If a finding was not replicated, then it failed to replicate with little variation across samples and contexts. This evidence is inconsistent with a proposed explanation that failures to replicate in psychology are likely due to changes in the sample between the original and replication study.[73]
Results of a 2022 study suggest that many earlier brain–phenotype studies ("brain-wide association studies" (BWAS)) produced invalid conclusions as the replication of such studies requires samples from thousands of individuals due to small effect sizes.[74][75]