← return to docs

Defining Replication

Generally speaking, we define a replication as "an experiment which is done to test an effect claim made in prior research." (following Nosek & Errington, 2020)

In our replications database we distinguish three types of replications:

direct — This is when a previously published experimental procedure is repeated as closely as possible to see if the same result can be obtained. Of course, there are always some unavoidable differences (different participants, different lab, different time period).

close — when possible we also distinguish two subtypes - close experiment and close extension. In a "close experiment", the scientists are testing for the same effect observed in a previous experiment, but they make one or more deliberate changes to the experimental procedure. In a "close extension", scientists are testing whether an observed effect generalizes to a different setting. Typically, there is a theoretical reason for suspecting the effect will generalize. Some close replications may have minor changes to both experiment and setting, and therefore can't be neatly categorized as either a close experiment or close extension.

conceptual — The scientists are testing for the same effect observed in a previous experiment, but using a fundamentally different experimental procedure.

There are no clear boundaries between these categories. Rather, there is a spectrum from direct to conceptual.

Classifying the results of replication experiments

We classify the replication experiment results into four categories:

  • Successful - the new experiment found evidence the effect exists.
  • Inconclusive - the new experiment could not determine one way or another whether the effect exists. We try to avoid this categorization whenever possible.
  • Unsuccessful - the new experiment found evidence that the effect does not exist.
  • Reversal - the new experiment found evidence for the opposite effect.

We pull effect size statistics when possible and try to normalize them to a 0-1 scale. See our page on that.

Further Discussion

Replication terms diagram Figure: 49 replication terms used in the literature to distinguish types of replication. Most of these come from a 2010 survey by Gómez et al. The intended meanings of many of these terms overlap.

As we mentioned above, replication is a spectrum, ranging from direct replication to conceptual replication. We consider any experiment on that spectrum to be a replication. People have come up with many different terms:

"Technical" replication (also called or "robustness checking") - where a new experiment is not done, but raw data from an existing experiment is reanalyzed using the reported procedures. Or, it may involve simply running provided code on data to get results (this is called "frictionless reproduction").

"Exact", "direct", or "narrow-sense" replication - where an experimental procedure is repeated as closely as possible, usually following the specifications for the procedure given in the original paper. This is the most common understanding of the term "replication".

"Close" or "systematic" replication - where an experimental procedure is repeated closely, but with one or more intentional changes.

"Conceptual" or "broad-sense" replication - where a finding from a previous experiment is tested in a new experiment using a different experimental procedure.

Re-analyses of previously published data are not replications

A replication must involve a new experiment. Therefore, we do not consider re-analyses of previously published data to be replications. We note that previous authors have considered the reanalysis of data as "technical replications". Currently our database does not contain technical replications, although it may in the future. People have come up with numerous terms for different parts of this spectrum:

The discovery of mistakes in prior analyses are not "replication failures"

We are interested in the discovery of mistakes, but we consider them "errata", not replication failures.

"Original experiment" vs "replication experiment"

For direct and close replications, the authors almost always reference a specific experiment they were trying to replicate or extend. For conceptual replications, the situation can be a bit more murky as the experiment may be testing for an effect that was previously found in several different previously published experiments. In such cases, we generally view the earliest published experiment as the original experiment.

We are interested in effects, not papers

We believe the proper level of analysis is effects, not papers. While it is often true that scientific papers have one central claim, a lot of papers report many effects. A replication paper may replicate some of those effects and fail to replicate others.

Theorizing is the heart and soul of science

The act of taking down experimental measurements by itself does not constitute science any more than collecting stamps or watching birds could be considered doing science. The heart and soul of science is abstracting away from individual observations to develop theories that have predictive power. Science progresses through better theories - more elegant, more general, and more precise. Theorizing is the beating heart of science, and to understand the health of science we must understand how good a job scientists are doing at theorizing. Today, many areas of science are shockingly atheoretical, to the point that we question the degree to which they are science at all. Of course, experimentation helps to inform and inspire theorizing, but all experiments themselves are theory-laden. Whether the observations of a particular experiment can be replicated is interesting, but more interesting is whether theories hold up to repeated experimental assaults.

Therefore, we are most interested in the replicability of the general effects that scientists claim on the basis of their theories, not whether a particular narrow observation can be replicated. Consider a clinical trial on Prozac. In a narrow sense all the experiment may have shown was that "Prozac helps with depression in people diagnosed with depression by a clinician at a Houston-area hospital system in the years 2010-2015", but based on theoretical considerations, the authors claim their work gives strong evidence for the general effect that "Prozac helps with depression for all humans". This is based on a theoretical understanding of how the brain works. Other examples of general claims are "neutrinos can travel faster than light", "the MMR vaccine causes autism", "power posing increases success in job interviews", and "water can exist in a polymerized form" (all of those claims failed replication and are now considered false).