Estimating Log Models: To Transform or Not to Transform?

Willard G. Manning, John Mullahy

NBER Technical Working Paper No. 246
Issued in November 1999
NBER Program(s):Technical Working Papers

Data on health care expenditures, length of stay, utilization of health services, consumption of unhealthy commodities, etc. are typically characterized by: (a) nonnegative outcomes; (b) nontrivial fractions of zero outcomes in the population (and sample); and (c) positively-skewed distributions of the nonzero realizations. Similar data structures are encountered in labor economics as well. This paper provides simulation-based evidence on the finite-sample behavior of two sets of estimators designed to look at the effect of a set of covariates x on the expected outcome, E(y|x), under a range of data problems encountered in every day practice: generalized linear models (GLM), a subset of which can simply be viewed as differentially weighted nonlinear least-squares estimators, and those derived from least-squares estimators for the ln(y). We consider the first- and second- order behavior of these candidate estimators under alternative assumptions on the data generating processes. Our results indicate that the choice of estimator for models of ln(E(x|y)) can have major implications for empirical results if the estimator is not designed to deal with the specific data generating mechanism. Garden-variety statistical problems - skewness, kurtosis, and heteroscedasticity - can lead to an appreciable bias for some estimators or appreciable losses in precision for others.

download in pdf format
   (2062 K)

email paper

Machine-readable bibliographic record - MARC, RIS, BibTeX

Document Object Identifier (DOI): 10.3386/t0246

Published: Manning, Willard G. and John Mullahy. "Estimating Log Models: To Transform Or Not To Transform?," Journal of Health Economics, 2001, v20(4,Jul), 461-494.

Users who downloaded this paper also downloaded* these:
Manning, Basu, and Mullahy t0293 Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data
Mullahy t0228 Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics
Mullahy t0245 Interaction Effects and Difference-in-Difference Estimation in Loglinear Models
Michael w0246 Causation Among Socioeconomic Time-Series
Enami and Mullahy w14512 Tobit at Fifty: A Brief History of Tobin's Remarkable Estimator, of Related Empirical Methods, and of Limited Dependent Variable Econometrics in Health Economics
NBER Videos

National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email:

Contact Us