The “reproducibility crisis” in science is erupting again. A research project attempted to replicate 21 social science experiments published from 2010 to 2015 in the prestigious journals Science and Nature. Only 13 replication attempts succeeded. The other eight were duds, with no observed effects consistent with the original findings.
The failures do not necessarily mean the original results were erroneous, as the authors of this latest replication effort note. There could have been gremlins of some type in the second try. But even in the replications that succeeded, the authors noted, the observed effect was on average only about 75 percent as large as the first time around.
The researchers conclude that there is a systematic bias in published findings, “partly due to false positives and partly due to the overestimated effect sizes of true positives.”
The two-year replication project, published Monday in the journal Nature Human Behaviour, is likely to roil research institutions and scientific journals that in recent years have grappled with reproducibility issues. The ability to replicate a finding is fundamental to experimental science. This latest project provides a reminder that the publication of a finding in a peer-reviewed journal does not make it true.
Scientists are under attack from ideologues, special interests and conspiracy theorists who reject the evidence-based consensus in such areas as evolution, climate change, the safety of vaccines and cancer treatment. The replication crisis is different; it is largely an in-house problem with experimental design and statistical analysis.
Refreshingly, other scientists have a pretty good detector for which studies are likely to stand the test of time. In this latest effort, the researchers asked nearly 400 peers to predict which studies would replicate and to what extent the effect sizes would be duplicated. The prediction market got it remarkably right. The study’s authors suggest that scientific journals could tap into the “wisdom of crowds” when deciding how to treat submitted papers with novel results.
“I would have expected results to be more reproducible in these journals,” said John Ioannidis, a professor of medicine at Stanford. He was not involved in this new research, but is closely associated with the issue of reproducibility because of his authorship of an influential and extraordinarily provocative 2005 article with the headline “Why Most Published Research Findings Are False.”
Simine Vazire, a University of California at Davis psychologist who is also active in the reproducibility movement, said the new project’s replication success – 10 out of 17 experiments published in Science, and 3 out of 4 published in Nature – “is not OK.” She said, “There’s no reason why the most prestigious journals shouldn’t demand pretty strong evidence,” and added that these experiments would not have been difficult to attempt to replicate before publication.
One of the studies that didn’t replicate attempted to study whether self-reported religiosity would change among test subjects who had first been asked to look at an image of the famous Auguste Rodin sculpture “The Thinker.” The study found that people became less religious after exposure to that image.
“Our study in hindsight was outright silly,” said Will Gervais, an associate professor of psychology at the University of Kentucky. Gervais said that his original study oversold a “random flip in the data,” although other parts of his paper did replicate.
The advocates for greater reproducibility believe that publication pressures create an environment ripe for false positives. Scientists need to publish, and journal editors are eager to publish novel, interesting findings.
Brian Nosek, the leader of this latest reproducibility effort, is executive director of the Center for Open Science, a nonprofit that promotes transparency and reproducibility in research. In an interview with The Washington Post, he acknowledged that the focus on false positives comes at a time when science is already under attack from special interests. But he said, “I think the benefits far, far outweigh the risks.”
He went on: “The reason to trust science is because science doesn’t trust itself. We are constantly questioning the basis of our claims and the methods we use to test those claims. That’s why science is so credible.”
Nosek and his allies have drawn heat for their efforts. A major report led by Nosek and published in 2015 in Science found that only about 40 percent of 100 psychology experiments could be replicated (the precise percentage depended on how one defined a successful replication). But that report incited sharp criticism from Harvard psychologist Dan Gilbert and three other researchers, who in a letter to the journal argued that many of the replication experiments didn’t follow the original protocols.
Gilbert and his colleagues argued that, in fact, the results of the Nosek-led project were consistent with psychology experiments being largely replicable.
A statement issued by the journal Science pointed out that all the experiments scrutinized in this latest effort were published before a decision several years ago by Science, Nature and other journals to adopt new guidelines designed to increase reproducibility, in part by greater sharing of data. “Our editorial standards have tightened,” said the statement from Science.
Science deputy editor emeritus Barbara Jasny said in an interview that the failure to replicate studies does not mean the original experiments were faulty, because “there are differences in protocol, there are differences in study samples.” She noted that the journal Science serves an interdisciplinary audience.
“We do judge on more than just technical competence. We look for papers that may have applications in different fields. We look for papers that are important advances in their own field,” she said.
Jasny said it’s important for graduate schools to have uniform methods for teaching students how to design experiments and analyze statistics. She advocates more funding for replication studies.
“You can say, ‘Oh, this is terrible, it didn’t replicate.’ Or you could say, ‘This is the way science works. It evolves. People do more studies,'” Jasny said. “Not every paper is going to be perfect when it comes out.”