TY - JOUR
AU - Manning,Willard G.
AU - Mullahy,John
TI - Estimating Log Models: To Transform or Not to Transform?
JF - National Bureau of Economic Research Technical Working Paper Series
VL - No. 246
PY - 1999
Y2 - November 1999
DO - 10.3386/t0246
UR - http://www.nber.org/papers/t0246
L1 - http://www.nber.org/papers/t0246.pdf
N1 - Author contact info:
Willard G. Manning
University of Chicago
Harris School of Public Policy Studies
1155 East 60th Street, Room 176
Chicago, IL 60637
E-Mail: w-manning@uchicago.edu
John Mullahy
University of Wisconsin-Madison
Dept. of Population Health Sciences
787 WARF, 610 N. Walnut Street
Madison, WI 53726
Tel: 608/265-5410
Fax: 608/263-2820
E-Mail: jmullahy@facstaff.wisc.edu
AB - Data on health care expenditures, length of stay, utilization of health services, consumption of unhealthy commodities, etc. are typically characterized by: (a) nonnegative outcomes; (b) nontrivial fractions of zero outcomes in the population (and sample); and (c) positively-skewed distributions of the nonzero realizations. Similar data structures are encountered in labor economics as well. This paper provides simulation-based evidence on the finite-sample behavior of two sets of estimators designed to look at the effect of a set of covariates x on the expected outcome, E(y|x), under a range of data problems encountered in every day practice: generalized linear models (GLM), a subset of which can simply be viewed as differentially weighted nonlinear least-squares estimators, and those derived from least-squares estimators for the ln(y). We consider the first- and second- order behavior of these candidate estimators under alternative assumptions on the data generating processes. Our results indicate that the choice of estimator for models of ln(E(x|y)) can have major implications for empirical results if the estimator is not designed to deal with the specific data generating mechanism. Garden-variety statistical problems - skewness, kurtosis, and heteroscedasticity - can lead to an appreciable bias for some estimators or appreciable losses in precision for others.
ER -