Beyond Bonferroni: Hierarchical Multiple Testing in Empirical Research
Empirical research in the social and medical sciences frequently involves testing multiple hypotheses simultaneously, increasing the risk of false positives due to chance. Classical multiple testing procedures, such as the Bonferroni correction, control the family-wise error rate (FWER) but tend to be overly conservative, reducing statistical power. Stepwise alternatives like the Holm and Hochberg procedures offer improved power while maintaining error control under certain dependence structures. However, these standard approaches typically ignore hierarchical relationships among hypotheses—structures that are common in settings such as clinical trials and program evaluations, where outcomes are often logically or causally linked. Hierarchical multiple testing procedures—including fixed sequence, fallback, and gatekeeping methods—explicitly incorporate these relationships, providing more powerful and interpretable frameworks for inference. This paper reviews key hierarchical methods, compares their statistical properties and practical trade-offs, and discusses implications for applied empirical research.