Sniff Tests in Economics: Aggregate Distribution of Their Probability Values and Implications for Publication Bias
The increasing demand for rigor in empirical economics has led to the growing use of auxiliary tests (balance, specification, over-identification, placebo, etc.) supporting the credibility of a paper's main results. We dub these "sniff tests" because standards for passing are subjective and rejection is bad news for the author. Sniff tests offer a new window into publication bias since authors prefer them to be insignificant, the reverse of standard statistical tests. Collecting a sample of nearly 30,000 sniff tests across 60 economics journals, we provide the first estimate of their aggregate probability-value (p-value) distribution. For the subsample of balance tests in randomized controlled trials (for which the distribution of p-values is known to be uniform absent publication bias, allowing reduced-form methods to be employed) estimates suggest that 45% of failed tests remain in the "file drawer" rather than being published. For the remaining sample with an unknown distribution of p-values, structural estimates suggest an even larger file-drawer problem, as high as 91%. Fewer significant sniff tests show up in top-tier journals, smaller tables, and more recent articles. We find no evidence of author manipulation other than a tendency to overly attribute significant sniff tests to bad luck.
Document Object Identifier (DOI): 10.3386/w25058