Estimating Log Models: To Transform or Not to Transform?

Willard G. Manning, John Mullahy

NBER Technical Working Paper No. 246
Issued in November 1999
NBER Program(s):   TWP

Data on health care expenditures, length of stay, utilization of health services, consumption of unhealthy commodities, etc. are typically characterized by: (a) nonnegative outcomes; (b) nontrivial fractions of zero outcomes in the population (and sample); and (c) positively-skewed distributions of the nonzero realizations. Similar data structures are encountered in labor economics as well. This paper provides simulation-based evidence on the finite-sample behavior of two sets of estimators designed to look at the effect of a set of covariates x on the expected outcome, E(y|x), under a range of data problems encountered in every day practice: generalized linear models (GLM), a subset of which can simply be viewed as differentially weighted nonlinear least-squares estimators, and those derived from least-squares estimators for the ln(y). We consider the first- and second- order behavior of these candidate estimators under alternative assumptions on the data generating processes. Our results indicate that the choice of estimator for models of ln(E(x|y)) can have major implications for empirical results if the estimator is not designed to deal with the specific data generating mechanism. Garden-variety statistical problems - skewness, kurtosis, and heteroscedasticity - can lead to an appreciable bias for some estimators or appreciable losses in precision for others.

download in pdf format
   (2062 K)

email paper

Machine-readable bibliographic record - MARC, RIS, BibTeX

Document Object Identifier (DOI): 10.3386/t0246

Published: Manning, Willard G. and John Mullahy. "Estimating Log Models: To Transform Or Not To Transform?," Journal of Health Economics, 2001, v20(4,Jul), 461-494.

Users who downloaded this paper also downloaded these:
Manning, Basu, and Mullahy t0293 Generalized Modeling Approaches to Risk Adjustment of Skewed Outcomes Data
Mullahy t0228 Much Ado About Two: Reconsidering Retransformation and the Two-Part Model in Health Economics
Mullahy t0245 Interaction Effects and Difference-in-Difference Estimation in Loglinear Models
Enami and Mullahy w14512 Tobit at Fifty: A Brief History of Tobin's Remarkable Estimator, of Related Empirical Methods, and of Limited Dependent Variable Econometrics in Health Economics
Deardorff Determinants of Bilateral Trade: Does Gravity Work in a Neoclassical World?
NBER Videos

National Bureau of Economic Research, 1050 Massachusetts Ave., Cambridge, MA 02138; 617-868-3900; email:

Contact Us