Effect Size Types and Their Normalization
| Effect Size Type | Abbreviation in Database | Convertible to ? |
|---|---|---|
| Cohen's d | d | Yes |
| Hedges' g | g | Yes |
| Odds Ratio | OR | Yes |
| Hazard Ratio | HR | Yes |
| Risk Ratio | RR | Yes (approximate) |
| Eta Squared | etasq | Yes |
| Partial Eta Squared | partial etasq | Yes (approximate) |
| Cohen's f | f | Yes |
| Cohen's f² | f² | Yes |
| R Squared | R² | Yes |
| Phi Coefficient | phi | Yes |
| Pearson Correlation | r | Yes (already ) |
| t-test | t | Yes |
| F-test | F | Yes |
| z-test | z | Yes |
| Chi-squared | χ² | Yes |
| Incidence Rate Difference | IRD | No |
| Glass' delta | Glass' delta | No |
| Cliff's delta | Cliff's delta | No |
| Cohen's w | w | No |
| Regression coefficient (standardized) | β | No |
| Regression coefficient (unstandardized) | b | No |
| Probability Difference | PD | No |
| Cohen's (paired) | dz | No |
| Log Ratio of Means (signed) | log ROM | No |
| Spearman's rank correlation | Spearman's r | No |
The Metascience Observatory's replications database contains a wide variety of reported effect size types. To achieve commensurability between these types we convert them into an equivalent or approximate Pearson correlation coefficient () when possible. This converts effect sizes to a 0 to 1 scale. Not all effect size types can be converted this way, but many can.
To consistently show reversals in effect magnitude as negatives, we always report the original effect as being positive. The replication effect sizes are then coded with a sign reflecting whether they match the original direction (positive) or reverse it (negative).
Cohen's d
Cohen's gives a standardized measure of the difference between two group's means (Cohen, 1988). It is defined as:
Where:
- : The means of the two groups.
- : The pooled standard deviation of the two groups.
Normalization to 0–1 Scale (Conversion to )
The standard conversion formula used is (Borenstein et al., 2009, p. 48):
Note: If sample sizes are equal (), this simplifies to the commonly seen approximation (Cohen, 1988).
Hedges' g
Hedges' is a bias-corrected version of Cohen's that adjusts for the slight upward bias of in small samples (Hedges, 1981). It is defined as:
Where:
- : Cohen's .
- : The correction factor, , where .
Normalization to 0–1 Scale (Conversion to )
Because is on the same scale as , the same conversion formula is used (Borenstein et al., 2009, p. 48):
Note: If sample sizes are equal (), this simplifies to .
Odds Ratio (OR)
The Odds Ratio measures the association between an exposure and an outcome, representing the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring in the absence of that exposure.
Where:
- : The probability of the event in the first group (e.g., treatment group).
- : The probability of the event in the second group (e.g., control group).
Normalization to 0–1 scale
This is a two-step process where the Log Odds Ratio is first converted to Cohen's , and then to (Chinn, 2000):
- Convert to :
- Convert to :
Hazard Ratio (HR)
The Hazard Ratio is a measure of effect size commonly used in survival analysis (e.g., Cox proportional hazards regression). It represents the ratio of the hazard rates between two groups over time.
Where:
- : The hazard rate in the first group (e.g., treatment group) at time .
- : The hazard rate in the second group (e.g., control group) at time .
Normalization to 0–1 Scale (Conversion to )
The Hazard Ratio is converted using the same formula as the Odds Ratio. This approximation is most accurate when the event rate is low (< 10-15%) or follow-up time is short, conditions under which HR ≈ OR (Chinn, 2000):
- Convert to :
- Convert to :
Note: This conversion is an approximation. For common events or long follow-up periods, HR and OR can diverge, making the conversion less precise.
Risk Ratio (RR)
The Risk Ratio (also called Relative Risk) measures the ratio of the probability of an event occurring in an exposed group versus the probability in an unexposed group. It is commonly estimated from cohort studies or count-based regression models (e.g., Poisson or negative binomial regression).
Where:
- : The probability (or rate) of the event in the first group (e.g., exposed group).
- : The probability (or rate) of the event in the second group (e.g., unexposed group).
Normalization to 0–1 Scale (Conversion to )
The Risk Ratio is converted using the same log-based formula as the Odds Ratio and Hazard Ratio. This approximation is most accurate when event rates are low, a condition under which RR ≈ OR.
- Convert to :
- Convert to :
Note: When event rates are high, RR and OR diverge (RR is always closer to 1.0 than OR for the same data), making the conversion less precise. For rare events (< 10%), RR ≈ OR and the approximation is good.
Eta Squared ()
Eta squared is a measure of effect size in analysis of variance (ANOVA) that represents the proportion of total variance in the dependent variable that is associated with the membership of different groups defined by an independent variable (Cohen, 1988).
Where:
- : The sum of squares for the effect (between-groups).
- : The total sum of squares.
Normalization to 0–1 Scale (Conversion to )
The conversion is a two-step process, first converting to Cohen's , then to (Cohen, 1988;
- Convert to :
- Convert to :
Note: This is algebraically equivalent to , but the code implements the two-step conversion.
Partial Eta Squared ()
Partial eta squared is a variant of eta squared commonly reported by statistical software (e.g., SPSS) in factorial ANOVA designs. Unlike eta squared, which divides by the total sum of squares, partial eta squared divides only by the sum of squares for the effect plus the error sum of squares, excluding variance attributable to other factors in the design.
Where:
- : The sum of squares for the effect of interest.
- : The sum of squares for the error term.
Normalization to 0–1 Scale (Conversion to )
The same conversion formula used for eta squared is applied:
- Convert to :
- Convert to :
This is algebraically equivalent to .
Important Caveats: For effects with 1 numerator degree of freedom (i.e., two-group comparisons, which covers most replication studies), partial eta squared equals eta squared and the conversion is exact. In multi-factor ANOVA designs with more than 1 numerator df, partial eta squared removes variance from other factors from the denominator, so the resulting can be inflated compared to what a one-way design would yield. However, the Cambridge MRC Cognition and Brain Sciences Unit statistics wiki says that one can "convert a partial eta-squared to a Cohen's d by regarding the partial eta-squared as a squared correlation." At least for direct replication comparisons — where both the original and replication use the same design — this conversion is appropriate because any inflation applies equally to both studies, preserving the relative comparison.
Cohen's f
Cohen's is an effect size measure used commonly in the context of F-tests (ANOVA) and regression, representing the dispersion of means relative to the standard deviation (Cohen, 1988).
Where:
- : Eta squared (the proportion of variance explained).
Normalization to 0–1 Scale (Conversion to )
The conversion is a two-step process (Cohen, 1988):
- Convert to :
- Convert to :
Note: This is algebraically equivalent to .
Cohen's f² ()
Cohen's is the squared version of Cohen's , commonly used in regression contexts to measure effect size (Cohen, 1988).
Where:
- : The coefficient of determination.
Normalization to 0–1 Scale (Conversion to )
The conversion is a two-step process:
- Convert to :
- Convert to :
R Squared ()
(the coefficient of determination) represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.
Where:
- : The sum of squares of residuals (unexplained variance).
- : The total sum of squares (total variance).
Normalization to 0–1 Scale (Conversion to )
The database normalizes this value by simply taking the square root:
Phi Coefficient ()
The Phi coefficient is a measure of association for two binary variables (Cramér, 1946).
Where:
- : The frequencies in a contingency table.
Normalization to 0–1 Scale (Conversion to )
No conversion is needed for the Phi coefficient, as it is already equivalent to the Pearson correlation coefficient calculated for binary data.
Pearson Correlation ()
The Pearson correlation coefficient measures the linear correlation between two sets of data (Pearson, 1895).
Where:
- : Individual sample points.
- : The sample means.
Normalization to 0–1 Scale
This metric serves as the target scale for the database, so no conversion is needed. As mentioned above, to maintain the "0 to 1" magnitude scale required by the database's coding scheme, original effect sizes are taken as their absolute value:
Test Statistics
The database can also convert APA-formatted test statistics directly to (Rosenthal, 1991; Borenstein et al., 2009).
t-test
Format: t(df) = value (e.g., t(10) = 2.5)
Conversion to :
Sign is preserved (negative t produces negative r).
F-test (df1 = 1 only)
Format: F(df1, df2) = value (e.g., F(1, 20) = 4.5)
Constraint: Only convertible when df1 = 1.
Conversion to :
- Convert F to t:
- Convert t to r:
Always positive (F-tests are non-directional).
z-test
Format: z = value, N = value (e.g., z = 2.81, N = 34)
Conversion to :
Sign is preserved.
Chi-squared (df = 1 only)
Format: χ2(1, N = value) = value or x2(1, N = value) = value (e.g., χ2(1, N = 12) = 5)
Constraint: Only convertible when df = 1.
Conversion to :
Always positive.
Glass' delta
Glass's (delta) is a standardized mean difference that uses only the control group's standard deviation as the denominator, rather than the pooled SD used by Cohen's (Glass, 1976).
Where:
- : The means of the two groups.
- : The standard deviation of the control group only.
Why not converted to : The standard -to- conversion assumes a pooled standard deviation. Using only one group's SD introduces asymmetry that makes the conversion unreliable without additional information about group variance ratios.
Cliff's delta
Cliff's is a non-parametric effect size that measures the degree of overlap between two distributions (Cliff, 1993). It represents the probability that a randomly selected observation from one group is larger than a randomly selected observation from the other, minus the reverse probability.
Where:
- : Observations from group 1.
- : Observations from group 2.
- : The sample sizes of the two groups.
- : The count of all pairwise comparisons where exceeds .
Range: to , where indicates complete overlap.
Why not converted to : Cliff's delta is a non-parametric, ordinal-level measure with no distributional assumptions. Converting it to Pearson's (a parametric measure) would require assumptions about the underlying distributions that the statistic was specifically designed to avoid.
Cohen's w
Cohen's is an effect size measure for chi-squared tests of goodness-of-fit or independence (Cohen, 1988). It quantifies the discrepancy between observed and expected proportions.
Where:
- : The observed (or alternative hypothesis) proportion in category .
- : The expected (or null hypothesis) proportion in category .
- : The number of categories.
Why not converted to : Cohen's applies to multi-category frequency comparisons and does not map onto the two-variable linear association that Pearson's measures. While in the special case of a table, the general case involves tables of arbitrary size.
Spearman's rank correlation
Spearman's (rho) measures the monotonic relationship between two variables using their ranks rather than raw values (Spearman, 1904).
Where:
- : The difference between the ranks of the -th paired observation.
- : The number of paired observations.
Range: to , identical to Pearson's .
Why not converted to : Although Spearman's is on the same numerical scale as Pearson's , it measures monotonic (not linear) association and is computed on ranks rather than raw values. Treating it as interchangeable with Pearson's in meta-analytic comparisons would conflate two distinct constructs.
Non-Convertible Effect Sizes
The following effect sizes cannot be reliably converted to and thus will not have an entry computed for the replication_es_r and original_es_r columns:
- Incidence Rate Difference (IRD) — raw percentage-point differences between groups, on a scale of roughly −100 to +100, incompatible with the standardized 0–1 scale
- Cramér's V
- Cohen's h
- Cohen's (standardized mean difference for paired designs)
- Cliff's delta
- Cohen's w
- Regression coefficients (, )
- Semi-partial correlations ()
- Chi-squared with df > 1
- Percentages
References
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Introduction to meta-analysis. John Wiley & Sons.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin, 114(3), 494–509.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Cramér, H. (1946). Mathematical methods of statistics. Princeton University Press.
Glass, G. V. (1976). Primary, secondary, and meta-analysis of research. Educational Researcher, 5(10), 3–8.
Hedges, L. V. (1981). Distribution theory for Glass's estimator of effect size and related estimators. Journal of Educational Statistics, 6(2), 107–128.
Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240–242.
Rosenthal, R. (1991). Meta-analytic procedures for social research (Rev. ed.). Sage Publications.
Chinn, S. (2000). A simple method for converting an odds ratio to effect size for use in meta-analysis. Statistics in Medicine, 19(22), 3127–3131.
Spearman, C. (1904). The proof and measurement of association between two things. The American Journal of Psychology, 15(1), 72–101.