Beyond Bonferroni: Multiple Testing in Empirical Research
Empirical work in economics routinely tests many hypotheses at once, but applied researchers often lack clear guidance on how to handle the resulting multiplicity. This paper offers a practical guide. We argue that the choice of error criterion — the chance of any false rejection, or the share of false rejections among discoveries—should follow from the structure of the decision the tests inform. Beyond standard corrections like Bonferroni and Holm, we make the case for resampling-based procedures, particularly Romano–Wolf, which exploit dependence among test statistics — the central feature of multiple testing in economic applications. We also recommend hierarchical methods that use causal or logical structure to deliver more powerful results. By linking these tools to real applications, we provide a clear roadmap, anchored in pre-specification, for credible empirical work.
-
-
Copy CitationSebastian Calónico and Sebastian Galiani, "Beyond Bonferroni: Multiple Testing in Empirical Research," NBER Working Paper 34050 (2025), https://doi.org/10.3386/w34050.Download Citation
-