Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research

Sebastian Calónico; Sebastian Galiani

doi:10.3386/w34050

Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research

Sebastian Calónico & Sebastian Galiani

Working Paper 34050

DOI 10.3386/w34050

Issue Date July 2025

Empirical research in the social and medical sciences frequently involves testing multiple hypotheses simultaneously, increasing the risk of false positives due to chance. Classical multiple testing procedures, such as the Bonferroni correction, control the family-wise error rate (FWER) but tend to be overly conservative, reducing statistical power. Stepwise alternatives like the Holm and Hochberg procedures offer improved power while maintaining error control under certain dependence structures. However, these standard approaches typically ignore hierarchical relationships among hypotheses—structures that are common in settings such as clinical trials and program evaluations, where outcomes are often logically or causally linked. Hierarchical multiple testing procedures—including fixed sequence, fallback, and gatekeeping methods—explicitly incorporate these relationships, providing more powerful and interpretable frameworks for inference. This paper reviews key hierarchical methods, compares their statistical properties and practical trade-offs, and discusses implications for applied empirical research.

Nothing to disclose. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
MARC RIS BibTeΧ

Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research

Related

Topics

Programs

More from the NBER