NATIONAL BUREAU OF ECONOMIC RESEARCH, INC.

SI 2013 Econometrics Lectures: Econometric Methods for High-Dimensional Data

Victor Chernozhukov, Matthew Gentzkow, Christian Hansen, Jesse Shapiro, Matthew Taddy

July 15-16, 2013

 

Hotel Marlowe

25 Edwin H. Land Boulevard

Cambridge, Massachusetts

PROGRAM

 
Monday, July 15:

8:15 am

8:45 am

Coffee and pastries

Matthew Taddy, University of Chicago
Prediction with high-dimensional data (1) (slides)
Metrics: False discovery, predictive performance, model probabilities
Model choice: FDR control, cross-validation, information criteria
Algorithms for finding candidate models, forward stepwise regression

10:15 am

Break


10:45 am


Matthew Taddy, University of Chicago
Prediction with high-dimensional data (2)
Penalized maximum likelihood and sparse regularization
Lasso model fit and penalty selection
Properties and performance of regularization methods

12:15 pm

Lunch


1:30 pm


Matthew Taddy, University of Chicago
Introduction to factor models
Principal components analysis and regression
Selection for Factor Models


2:30 pm


Break

2:45 pm

Matthew Gentzkow, University of Chicago and NBER
Jesse Shapiro, University of Chicago and NBER
Applications: Using text as data (slides)
* Partisanship in the news media
* Sentiment in financial news
* Topic modeling

4:45 pm

Adjourn

 

 

Tuesday, July 16:

8:15 am

8:45 am

Coffee and pastries

Victor Chernozhukov, Massachusetts Institute of Technology (slides)
Estimating treatment effects with high-dimensional data (1): Selecting instruments


10:15 am


Break

10:30 am

Victor Chernozhukov, Massachusetts Institute of Technology
Estimating treatment effects with high-dimensional data (2): Selecting control variables

12:00 pm

Lunch


1:30 pm


Christian Hansen, University of Chicago
Applications: Estimating treatment effects with high-dimensional data (slides)
* Selecting control variables
* Modeling trends in panel differences-in-differences
* Selecting from a large set of randomized instruments


3:30 pm


Break


3:45 pm


Matthew Gentzkow, University of Chicago and NBER
Jesse Shapiro, University of Chicago and NBER
Nuts and bolts: Computing with large data (slides)
* Collaborative computing
* Databases
* Distributed computing

5:15 pm

Adjourn



REFERENCES

Prediction:

Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. Elements of Statistical Learning: Data Mining, Inference and Prediction. Second Edition.  http://www-stat.stanford.edu/~tibs/ElemStatLearn/

Text Mining:

Taddy, Matthew. 2013. “Multinomial Inverse Regression for Text Analysis.” Journal of the American Statistical Association, forthcoming. http://arxiv.org/abs/1012.2098

Blei, David M., and John D. Lafferty. 2007. “A Correlated Topic Model of Science.” The Annals of Applied Statistics 1 (1) (June 1): 17–35. http://www.jstor.org/stable/4537420

Treatment Effects:

Alexandre Belloni, Victor Chernozhukov, Christian Hansen, "Inference for High-Dimensional Sparse Econometric Models", Advances in Economics and Econometrics, 10th World Congress of Econometric Society, 2010 (http://arxiv.org/pdf/1201.0220v1.pdf


Belloni, Alexandre, Daniel Chen, Victor Chernozhukov and Christian Hansen.  2012. “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain.” Econometrica 80(6): 2369-2430. http://onlinelibrary.wiley.com/doi/10.3982/ECTA9626/pdf

BelloniAlexandre, Victor Chernozhukov and Christian Hansen.  2013.  “Inference on treatment effects after selection amongst high-dimensional controls.” CEMMAPhttp://www.cemmap.ac.uk/wps/cwp261313.pdf



Leeb, Hannes and Benedikt M. Pötscher. 2008. “Can one estimate the unconditional distribution of post-model-selection estimators?” Econometric Theory 24(2): 338–376. http://dx.doi.org/10.1017/S0266466608080158

Code and Data

Gentzkow, Matthew and Jesse Shapiro. 2013. Code and Data for the Social Sciences: A Practitioner’s Guide. http://faculty.chicagobooth.edu/jesse.shapiro/research/CodeAndData.pdf

LINKS TO CODE FOR EXAMPLES USED IN LECTURES

 

Prediction
 
semiconductor examples: http://faculty.chicagobooth.edu/matt.taddy/teaching/scripts/semiconductors.R
class on factor analysis: http://faculty.chicagobooth.edu/matt.taddy/teaching/scripts/07Factors.R
In general, all data and code is on my teaching page:  http://faculty.chicagobooth.edu/matt.taddy/teaching
And the penalized regression software is here: http://www.cran.r-project.org/web/packages/gamlr

 

Treatment Effects


http://www.mit.edu/~vchern/#veryhigh
http://faculty.chicagobooth.edu/christian.hansen/research/#Code