Sparse Modeling Under Grouped Heterogeneity with an Application to Asset Pricing
Sparse models, though long preferred and pursued by social scientists, can be ineffective or unstable relative to large models, for example, in economic predictions (Giannone et al., 2021). To achieve sparsity for economic interpretation while exploiting big data for superior empirical performance, we introduce a general framework that jointly clusters observations (via new decision trees) and locally selects variables (with Bayesian priors) for modeling panel data with potential grouped heterogeneity. We derive analytical marginal likelihoods as global split criteria in our Bayesian Clustering Model (BCM), to incorporate economic guidance, address parameter and model uncertainties, and prevent overfitting. We apply BCM to asset pricing and estimate uncommon-factor models for data-driven asset clusters and macroeconomic regimes. We find (i) cross-sectional heterogeneity linked to (non-linear interactions of) return volatility, size, and value, (ii) structural changes in factor relevance predicted by market volatility and valuation, and (iii) MKTRF and SMB as common factors and multiple uncommon factors across characteristics-managed-market-timed clusters. BCM helps explain volatility- or size-related anomalies, exploit within-group tests, and mitigate the “factor zoo” problem. Overall, BCM outperforms benchmark common-factor models in pricing and investments in U.S. equities, e.g., attaining out-of-sample cross-sectional R2s exceeding 25% for multiple clusters and Sharpe ratio of tangency portfolios tripling built from ME-B/M 5 × 5 portfolios.
We thank Rohit Allena (discussant), Doron Avramov, I-Hsuan Ethan Chiang (discussant), Tarun Chordia, John Cochrane, Darrell Duffie, Jianqing Fan, P. Richard Hahn, Cam Harvey, Zhiguo He, Yael Hochberg, Yongmiao Hong, David Hirshleifer, Bob Jarrow, Serhiy Kozak, Sophia Zhengzi Li, David Ng, Markus Pelger, Xiao Qiao, Alberto Rossi, Olivier Scaillet, Michael Sockin, Oleg Sokolinskiy (discussant), Artem Streltsov, Daniel Titman, Fabio Trojani, Junbo Wang (discussant), Dacheng Xiu, Mao Ye, Guofu Zhou, and seminar and conference participants at BlackRock, CityU HK, Columbia, Cornell, CUHK, EasternFA, 3rd Frontiers of Factor Investing Conference, HKU, HUST, KAIST Digital Finance Conference, Macquarie University, MFA, Mid-South DATA Conference, NBER-NSF Seminar on Bayesian Inference in Econometrics and Statistics, NYU Courant, Oxford, Princeton, 2023 Tongji Finance Symposium, Rice, Rutgers Business School, 6th Shanghai Financial Forefront Symposium, Stanford MS&E, SWFA, UCAS, University of Geneva, USC Marshall, UT Austin, Xiamen University, and 2023 XJTLU AI and Big Data in Accounting and Finance Conference for invaluable comments and discussions. This paper subsumes results in manuscripts titled “Uncommon Factors for Bayesian Asset Clusters” and “Uncommon Factors and Asset Heterogeneity in the Cross Section and Time Series.” We thank Ripple’s UBRI for research support as well as Yuanzhi Wang and Qianshu Zhang for outstanding research assistance. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.