Predicting Returns With Text Data
We introduce a new text-mining methodology that extracts sentiment information from news articles to predict asset returns. Unlike more common sentiment scores used for stock return prediction (e.g., those sold by commercial vendors or built with dictionary-based methods), our supervised learning framework constructs a sentiment score that is specifically adapted to the problem of return prediction. Our method proceeds in three steps: 1) isolating a list of sentiment terms via predictive screening, 2) assigning sentiment weights to these words via topic modeling, and 3) aggregating terms into an article-level sentiment score via penalized likelihood. We derive theoretical guarantees on the accuracy of estimates from our model with minimal assumptions. In our empirical analysis, we text-mine one of the most actively monitored streams of news articles in the financial system|the Dow Jones Newswires|and show that our supervised sentiment model excels at extracting return-predictive signals in this context.
We thank Kangying Zhou and Mingye Yin for excellent research assistance. We benefited from discussions with Torben Andersen, Robert Engle, Timothy Loughran, Xavier Gabaix, as well as seminar and conference participants at the New York University, Yale University, Ohio State University, Hong Kong University of Science and Technology, University of Zurich, AQR, T. Rowe Price, NBER Conference on Big Data: Long-Term Implications for Financial Markets and Firms, the 12th SoFiE Annual Meeting, China International Conference in Finance, Market Microstructure and High Frequency Data Conference at the University of Chicago, the Panel Data Forecasting Conference at USC Dornsife Institute, JHU Carey Finance Conference, SIAM conference on Financial Mathematics & Engineering, Financial Econometrics and New Finance Conference at Zhejiang University, SAIF International Conference on FinTech, FinTech Symposium in Guanghua School of Management, and the 8th Annual Workshop on Applied Econometrics and Data Science at the University of Liverpool. We gratefully acknowledge the computing support from the Research Computing Center at the University of Chicago.The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.
Bryan T. Kelly
The views and opinions are those of the authors and do not necessarily reflect the views of AQR Capital Management, its affiliates, or its employees, do not constitute an offer or solicitation of an offer, or any advice or recommendation, to purchase any securities or other financial instruments, and may not be construed as such.