2023-07-31 Mon 21:23 > [!names]- [[Provisional-placeholder-stand-in name]](s) > - Reproducibility crisis > - Replication crisis > - Replicability crisis Replication is the cornerstone of science; replicability indicates that a found effect exists in the world independently of accidental conditions. Stirrings of a problem with the replicability of research in psychology, medicine, and the social sciences have been percolating since the 1950s, but things got explicit in the 2010s. ## Big events Here are some chronologically ordered studies and reports that have gotten a lot of attention. - [[Ref. John Ioannidis 2005 - Why Most Published Research Findings Are False]] - In 2005 John Ionnidis published this spicy title based on mathematical modeling showing that a field that fails to follow a bunch of epistemic best practices will wind up with mostly false research findings. "A research finding is less likely to be true when the studies conducted in a field are smaller; when effect sizes are smaller; when there is a greater number and lesser preselection of tested relationships; where there is greater flexibility in designs, definitions, outcomes, and analytical modes; when there is greater financial and other interest and prejudice; and when more teams are involved in a scientific field in chase of statistical significance."" - [[Ref. Stephane Doyen 2012 - Behavioral Priming- It's All in the Mind, but Whose Mind?]] - In 2012 the Doyen study failed to replicate the famous 1996 "elderly walking" study ([[Ref. John Bargh 1996 - Automaticity of social behavior- Direct effects of trait construct and stereotype activation on action]]) in which it was found that "participants for whom an elderly stereotype was primed walked more slowly down the hallway when leaving the experiment than did control participants, consistent with the content of that stereotype." (Bargh) - In fact, Doyen found that only if the *experimenter* was led to believe in such an effect would the effect be found in the subject: "Our second experiment was aimed at manipulating the beliefs of the experimenters: Half were led to think that participants would walk slower when primed congruently, and the other half was led to expect the opposite. Strikingly, we obtained a walking speed effect, but only when experimenters believed participants would indeed walk slower." - [[Related notes]] https://journals.sagepub.com/doi/10.1177/1745691612465253 for a nice overview of the state of affairs in 2012. - [[Ref. Open Science Collaboration (Nosek) 2015 - Estimating the reproducibility of psychological science]] - By 2015 the crisis of confidence was real, and in 2015 the Open Science Collaboration, coordinated by Nosek, released the results of a big collaborative effort to explicitly study replication in psychology by replicating 100 studies from three high-ranking journals. Of the 97 papers originally finding a significant effect, only 36% found a significant effect. - [[Ref. Gilbert 2016 - Comment on “Estimating the reproducibility of psychological science”]] made a direct response to OSC 2015. - Here they're arguing that the OCS study basically was guilty of a lot of what it implied about original studies: that they had allowed bias and error to creep in, either because of carelessness or [[Non-alethic incentive]]s, either consciously or not, and as a result the research is bunk. Gilbert 2016 says that: - One: OCS "allowed considerable infidelities [between replication study methods and og study methods] that introduced random error and decreased the replication rate but then compared their results to a benchmark that did not take this error into account." - Two: OSC studies were underpowered - each replication was attempted once. If they had attempted 35 replications each, as a previous study by the same coordinating author did, they would have found a replication rate of 85% rather than 47%. - Three: OCS study infidelities were biased. One might assume that the infidelities between OSC studies and the original studies they intended to replicate were random in their direction of effect. Or one might assume that there was [[Motivated reasoning]] involved and the bias had a tendentious effect. OSC asked original authors whether or not they endorsed the replication protocols; for those replication studies that were endorsed by original authors, the replication rate was 60%, compared to 15% for those that were not. Obviously this could be because authors of shoddy studies didn't want their work tested, but Gilbert points out that it could also be a valid signal on legitimacy of replication methodology. - So, the net here is kinda funny. In effect they're saying, "No, psychological research is fine and isn't rife with methodological errors. Except this one; this one is rife. Forget this one." - [[Ref. Timothy Errington 2021 - Investigating the replicability of preclinical cancer biology]] - Same kind of thing as [[Ref. Open Science Collaboration (Nosek) 2015 - Estimating the reproducibility of psychological science]], but for cancer rather than psychology - 50 experiments fro 23 papers were repeated, and there were lots of problems. See https://www.cos.io/rpcb - But the authors' main message is something like: researchers routinely and in some cases uniformly use practices that diverge from best practices; we should all cut that shit out. - [[Ref. Alexander Bird 2018 - Understanding the Replication Crisis as a Base Rate Fallacy]] - Not sure how much his was a "big event" (paper has 59 citations according to Google Scholar as of 2023-08-01 Tue 14:42) but it's a good perspective in the mix - Bird says basically, the bucket from which tested hypotheses are drawn contains overwhelmingly false hypotheses, so when an experimental test concludes that a hypothesis is true, one needs to take the base rate into account and recognize that the probability of it actually being true is lower than whatever the test seems to declare. - So for Bird, it's a picture kinda like sifting for gold. You start with a bucket of sand that has a few flecks of gold mixed in. You put that mixture through a process that differentially removes more sand than gold. You keep repeating that. The stuff persisting through iterations of this process isn't guaranteed to be gold, but it's stoachastically more likely to be than what you started with — and thus is more valuable. Your process is producing value. Keep doing it. ## Suppose it's true. Why does the research suck? Wikipedia has a good list of [potential causes](https://en.wikipedia.org/wiki/Replication_crisis#Causes). Here are a few: - Mediatization, commodification, and politization of science - Publication bias — null result is boring, doesn't get published, leading to misleading literature base and biased meta-analyses - Publish or perish culture — scientists are under a strong [[Non-alethic incentive]] while planning and performing their research — produce results that will get published or lose your career. This leads them to do funky stuff, like skip replication studies and studies likely to produce a null result, because these are boring and won't get published, and to design experiments with the goal of finding an interesting result rather than finding the truth. - Shitty reporting — scientists almost never publish all of the information that would be necessary to actually reproduce the experiment. - Questionable research practices "QRPs" like HARKing (hypothesizing after results are known), insuffient experimenter blinding, failing to report all of the data, and statistical fallacies. ## Suppose it's not true. Why do many studies fail to replicate, if they're so good? If the crisis is false, and the research has actually been high quality, then we shouldn't panic ourselves into [[False Crisis Syndrome]]. We should look at why we are producing apparent evidence that our research sucks when really our research doesn't suck. Two big possible reasons: 1. Research about research sucks. See [[1-pub/Ref. Gilbert 2016 - Comment on “Estimating the reproducibility of psychological science”|Ref. Gilbert 2016 - Comment on “Estimating the reproducibility of psychological science”]]. ![[spiderman pointing meme.png]] 2. [[Base rate fallacy]]. Actually science is hard and though the process isn't perfect or immediate, epistemic value is slowly accreting. See [[Ref. Alexander Bird 2018 - Understanding the Replication Crisis as a Base Rate Fallacy]]. ## So what's going on here? (2023-08-01 Tue) Three thoughts: 1. A little bit of [[Everyone sucks here (ESH)]] — the scientist as disinterested rational agent, [[Homo Economicus]], is fantasy, and in reality we see humans behaving the parochial, chauvinistic, self-interested, self- and other-deceiving ways that they do. Yes, human scientists are motivated to (coerced?) by the systems in which they operate to publish found effects, and yes they're doing all kinds of subtle and not-so-subtle non-[[Alethic]] things to get to that outcome. We should adapt our systems, norms, and practices to account for humans as they are rather than as our idealization projects them to be. 2. Maybe the replication rate isn't as low as eg [[Open Science Collaboration (Nosek) 2015 - Estimating the reproducibility of psychological science.pdf]] says, maybe for the reasons that [[1-pub/Ref. Gilbert 2016 - Comment on “Estimating the reproducibility of psychological science”|Ref. Gilbert 2016 - Comment on “Estimating the reproducibility of psychological science”]] says. And maybe the replication rate is lower than we'd want, even given all the fixes Gilbert proposes. And maybe that's okay, for the reasons that [[Ref. Alexander Bird 2018 - Understanding the Replication Crisis as a Base Rate Fallacy]] says. 3. [[Ref. Michael Strevens 2020 - The knowledge machine]] gives a good look at the messy reality of science — in which researchers are routinely taking creative liberties in the design of their experiments and the interpretation (and dismissal) of their data. In the Strevens picture, science is messy, but the big machine corrects itself and wiggles toward truth over time. It's the best we've got and let's keep going. ## My takeaway (2023-08-01 Tue 15:25) So the question is: Okay, so has the research actually been low-quality, or not? My current judgment is: kinda. - There are lots of improvements to make on research practices and culture, and doing so will result in higher-quality research. - And, despite its flaws, the research enterprise as a whole is sound and we should keep going. So where do we go from here? Probably not 'cancel all existing psychological research and reboot the field.' Probably something more like: 'yeah it's messy, keep going, and keep doing metascience to make it better, and [[Integrate your learnings]].'