1 Hedonic Regressions: A Consumer Theory Approach Erwin Diewert1, April 22, 2001 Department of Economics and NBER, University of British Columbia, Vancouver, B.C., Canada, V6T 1Z1. Email: diewert@econ.ubc.ca Abstract A hedonic regression regresses the price of various models of a product (or service) on the characteristics that describe the product. The existing economic theory that justifies a hedonic regression is extremely complex. The present paper takes a very simple consumer theory approach in order to justify a family of functional forms for a hedonic regression. The main simplifying assumption is that every consumer has the same hedonic utility function, which describes how consumers evaluate alternative models with different characteristics. This hedonic utility function is assumed to be separable from other goods, which is the second main simplifying assumption. The paper also examines alternative functional forms for the hedonic utility function from the viewpoint of their flexibility properties; i.e., how well they can approximate arbitrary functional forms. The paper notes that hedonic regressions that regress the model price on a linear function of the characteristics is not consistent with the consumer approach adopted in the paper. Finally, the paper compares traditional statistical agency matched model techniques for dealing with quality change with the hedonic regression approach and indicates under what conditions the two approaches are likely to coincide. Key Words Hedonic regression, flexible functional forms, consumer theory, characteristics, quality change, matched models, consumer price index. Journal of Economic Literature Classification System Numbers C23, C43, C51, D11, D12, E31 1. Introduction This paper started out as a comment on Silver and Heravi (2001). This very useful and interesting paper follows in the tradition started by Silver (1995), who was the first to use scanner data in a systematic way in order to construct index numbers. In the present 1 The author is indebted to Paul Armknecht, Bert Balk, Ernst Berndt, Jeff Bernstein, Angus Deaton, Robert Feenstra, Dennis Fixler, Robert Gillingham, Alice Nakamura, Richard Schmalensee, Mick Silver, Yrjö Vartia and Kam Yu for helpful comments and to the Social Sciences and Humanities Research Council of Canada for financial support. 2 paper by Silver and Heravi, the authors collect an enormous data set on virtually all sales of washing machines in the U.K. for the 12 months in the year 1998. They use this detailed price and quantity information, along with information on the characteristics of each machine, in order to compute various aggregate monthly price indexes for washing machines, taking into account the problems associated with the changing quality of washing machines. In particular, the authors consider three broad types of approach to the estimation of quality adjusted prices using scanner data: • the usual time series dummy variable hedonic regression technique that does not make use of quantity data on sales of models; • matched model techniques where unit values of matched models in each of the two periods being compared are used as the basic prices to go along with the quantities sold in each period (and then ordinary index number theory is used to aggregate up these basic prices and quantities) and • an exact hedonic approach based on the work of Feenstra (1995). The authors also used their scanner data base on washing machines in order to replicate statistical agency sampling techniques. What I found remarkable about the author’s results is that virtually all2 of their calculated price indexes showed a very substantial drop in the quality adjusted prices for washing machines of about 6 % to 10 % over the year. Most of their indexes showed a drop in the aggregate price of washing machines in the 8 to 10 % range. In the U.K. Retail Price Index, washing machines belongs to the electrical appliances section, which includes a wide variety of appliances, including irons, toasters, refrigerators, etc. From January 1998 to December 1998, the electrical appliances RPI component went from 98.6 to 98.0, a drop of 0.6 percentage points. Now it may be that the non washing machine components of the electrical appliances index increased in price enough over this period to cancel out the large apparent drop in the price of washing machines but I think that this is somewhat unlikely. Thus we have a bit of a puzzle: why do scanner data and hedonic regression studies of price change find, on average, much smaller increases in price compared to the corresponding official indexes that include the specific commodity being studied?3 One explanation for this puzzle (if it is a puzzle) might run as follows. At some point in time, the statistical agency initiates a sample of models whose prices are to be collected until the next sample initiation period. Unless some of these models disappear, no other models will be added to the sample. Thus what may be happening is that the market throws up new models over the period of time between sample initiations. These new models benefit from technical progress and tend to have lower prices (quality adjusted) than the models that the statistical agency is following. In theory, the producers of these outmoded models should drop their prices to match the new competition but perhaps instead they simply stop producing these outmoded models, leaving their prices 2 The one exception was a unit value index which was the average price over all washing machines with no adjustments for the changing mix of machines. This quality unadjusted index showed a drop of only one percent over the year. It is particularly interesting that Feensta’s (1995) exact hedonic approach gave much the same answers as the other approaches. 3 See Diewert (1998) for a review of the scanner data studies up to that point in time. 3 unchanged (or not dropping them enough). However, until every last model of these outmoded models is sold, the statistical agency continues to follow their price movements, which are no longer representative of the market.4 If a model disappears, there is the possibility that the replacement model chosen by the statistical agency is not linked in at a low enough quality adjusted price5, since the use of hedonic regressions is not all that widespread in statistical agencies. These two factors may help to explain why the hedonic regression approach tends to give lower rates of price increase in rapidly changing markets compared to the rates obtained by statistical agencies. There is another factor which may help to explain why scanner data studies that use matched samples obtain lower rates of price increase (or higher rates of price decrease, as in the case of the washing machines) than those obtained by statistical agencies. Consider the list of models at the sample initiation period. Some of these models will turn out to be “winners” in the marketplace; i.e., they offer the most quality adjusted value.6 Now, over time, consumers will buy increasing amounts of these winning models but this in turn will allow the producers of these winning models to lower their prices, since their per unit output fixed costs will be lower as their markets expand. In a scanner data superlative index number computation of the aggregate market price over all models, these “winner” models that have rapid declines in price will get a higher quantity weighting over time, leading to a lower overall measure of price change than that obtained by the statistical agency, since the agency will be aggregating their sample prices using fixed weights.7 I do not have any substantial criticisms of the Silver and Heravi (2001) paper; I think that they have done a very fine job indeed. Since I do not have any substantial criticisms of the paper, the question is: what should I do in the remainder of this comment? What I will do is discuss various methodological issues that the authors did not have the space to cover.8 Thus in section 2 below, I revisit Sherwin Rosen’s (1974) classic paper on hedonics in an attempt to get a much simpler model than the one that he derived. In particular, I make enough simplifying assumptions so that Rosen’s very general model reduces down to the 4 If this hypothesis is true, older models should have a tendency to have positive residuals in hedonic regressions. Berndt, Griliches and Rappaport (1995; 264), Kokoski, Moulton and Zieschang (1999; 155) and Koskimäki and Vartia (2001; 4) find evidence to support this hypothesis for desktop computers, fresh vegetables and computers respectively. 5 Also when a model disappears, typically statistical agencies ask their price collectors to look for the model that is the closest substitute to the obsolete model, which means that the closest model is also approaching obsolescence. 6 These models should have negative residuals at the sample initiation period in a hedonic regression. 7 This point is made by Berndt and Rappaport (2001). However, it is interesting that both Silver and Heravi and Berndt, Griliches and Rappaport (1995) find that this weighting bias was relatively low in their washing machine and computer studies where they compared matched model superlative indexes with the results of unweighted hedonic regressions. Berndt, Griliches and Rappaport (1995) found this weighting bias for computers to be around 0.7 percentage points per year. 8 I should mention that many of the methodology questions are discussed more fully in a companion paper that deals with television sets in the UK rather than washing machines; see Silver (1999b). 4 usual time series dummy variable hedonic regression model used by Silver and Heravi. The assumptions that are required to get this simple model are quite restrictive but hopefully, in the future, other researchers will figure out ways of relaxing some of these assumptions. It should be mentioned that I take a traditional consumer demand approach to the problems involved in setting up an econometric framework for estimating hedonic preferences; i.e., I do not attempt to model the producer supply side of the market.9 Another major purpose of this section is to indicate why linear hedonic regression models (where the dependent variable is the model price and the time dummy enters in the regression in a linear fashion) are unlikely to be consistent with microeconomic theory. In section 3, we look at the problems involved in choosing a functional form for the hedonic regression. Some of the issues considered in this section are: • A comparison between the three most commonly used functional forms for hedonic regressions. • How hedonic regression techniques can be used in order to model the choice of package size. • Should we choose flexible functional forms when undertaking hedonic regressions? • Should we use nonparametric functional forms? Silver and Heravi (2001) noted that there is a connection between matched model techniques for making quality adjustments and hedonic regression techniques: essentially, the hedonic method allows information on nonmatching observations to be used whereas information on models that suddenly appear or disappear in the marketplace must be discarded using the matched model methodology. Triplett (2001) has also considered the connection between the two approaches in an excellent survey of the hedonic regression literature. One of the most interesting results that Triplett derives is a set of conditions that will cause a hedonic regression model to give the same results as a matched model. In section 4, we generalize this result to cover a more general class of regression models than considered by Triplett and we extend his results from the two period case to the many period case. One of the features of the Silver and Heravi paper is their use of sales information on models as well as the usual model price and characteristics information that is used in traditional hedonic regression exercises. In section 5 below, we look at some of the issues involved in running hedonic regressions when sales information is available. Section 6 provides some comments on Feenstra’s (1995) exact hedonic price index approach, which is used by Silver and Heravi. Our tentative conclusion is that it is not really necessary to use Feenstra’s approach if one is willing to make the simplifying assumptions that we make in section 2 below. 9 Thus I am following Muellbauer’s (1974; 977) example where he says that his “approach is unashamedly one-sided; only the demand side is treated. ... Its subject matter is therefore rather different from that of the recent paper by Sherwin Rosen. The supply side and the simultaneity problems which may arise are ignored.” 5 Finally, section 7 generalizes our hedonic model presented in section 2 to a more general situation where completely separate hedonic regressions are run in each period as opposed to running one great big hedonic regression over all periods in the sample. Section 8 concludes. 2. The Theory of Hedonic Price Indexes Revisited Hedonic regression models pragmatically regress the price of one unit of a commodity (a “model” or “box”) on a function of the characteristics of the model and a time dummy variable. It is assumed that a sample of model prices can be collected for two or more time periods along with a vector of the associated model characteristics. An interesting theoretical question is: can we provide a microeconomic interpretation for the function of characteristics on the right hand side of the regression? Rosen (1974) in his classic paper on hedonics does this. However, his economic model turns out to be extremely complex. In this section, we will rework his model10, making two significant changes: • We will assume that every consumer has the same separable subutility function, f(z1,...,zN) that gives the consumer the subutility Z = f(z) from the purchase of one unit of the complex hedonic commodity that has the vector of characteristics z ≡ (z1,...,zN).11 • The subutility that the consumer gets from consuming Z units of the hedonic commodity is combined with the consumption of X units of a composite “other” commodity to give the consumer an overall utility of u = Ut(X,Z) in period t, where Ut is the period t “macro” utility function. Rosen (1974; 38) normalized the price of X to be unity. We will not do this; instead, we will have an explicit period t price , pt, for one unit of the general consumption commodity X. We start off by considering the set of X and Z combinations that can yield the consumer’s period t utility level, ut. This is the set {(X,Z) : Ut(X,Z) = ut}, which of course is the consumer’s period t indifference curve over equivalent combinations of the general consumption commodity X and the hedonic commodity Z. Now solve the equation Ut(X,Z) = ut for X as a function of ut and Z; i.e., we have12 (1) X = gt(ut,Z). 10 We used Rosen’s notation which was somewhat different than that used by Silver and Heravi. 11 We do not assume that all possible models exist in the marketplace. In fact, we will assume that only a finite set of models exist in each period. However, we do assume that the consumer has preferences over all possible models, where each model is indexed by its vector of characteristics, z = (z1,...,zN). Thus each consumer will prefer a potential model with characteristics vector z1 = (z11,...,zN1) over another potential model with the characteristics vector z2 = (z12,...,zN2) if and only if f(z1) > f(z2). 12 If the period t indifference curve intersects both axes, then gt(ut,Z) will only be defined for a range of nonnegative Z up to an upper bound. 6 We will assume that this indifference curve slopes downward and in fact, we will make the stronger assumption that gt is differentiable with respect to Z and (2) ∂gt(ut,Z)/∂Z < 0. Let pt and Pt be the prices for one unit of X and Z respectively in period t. The consumer’s period t expenditure minimization problem may be defined as follows: (3) minX,Z {ptX + PtZ : X = gt(ut,Z)} = minZ {ptgt(ut,Z) + PtZ}. The first order necessary condition for Z to solve (3) is: (4) pt ∂gt(ut,Z)/∂Z + Pt = 0. Equation (4) can now be rearranged to give the price of the hedonic aggregate Pt as a function of the period t utility level ut and the price of general consumption pt: (5) Pt = − pt ∂gt(ut,Z)/∂Z > 0 where the inequality follows from assumption (2) above. We now interpret the right hand side of (5) as the consumer’s period t willingness to pay price function wt(Z,ut,pt): (6) wt(Z,ut,pt) ≡ − pt ∂gt(ut,Z)/∂Z. Thus as we travel down the consumer’s period t indifference curve, for each point (indexed by Z) on this curve, (6) gives us the amount of money the consumer would be willing to pay per unit of Z in order to stay on the same indifference curve, which is indexed by the utility level ut. The period t willingness to pay value function vt can now be defined as the product of the quantity of Z consumed times the corresponding per unit willingness to pay price, wt(Z,ut,pt): (7) vt(Z,ut,pt) ≡ Z wt(Z,ut,pt) = − Z pt ∂gt(ut,Z)/∂Z where the last equality follows using (6). The function vt is the counterpart to Rosen’s (1974; 38) value or bid function; it gives us the amount of money the consumer is willing to pay in order to consume Z units. All of the above algebra has an interpretation that is independent of the hedonic model; it is simply an exposition of how to derive a willingness to pay price and value function using a consumer’s preferences defined over two commodities. However, we now assume that the consumer has a separable subutility function, f(z1,...,zN) that gives the consumer the subutility Z = f(z) from the purchase of one unit of the complex hedonic 7 commodity13 that has the vector of characteristics z ≡ (z1,...,zN). Note that we have assumed that the function f is time invariant.14 We now assume that the consumer’s period t utility function is Ut(X, f(z)). The above algebra on willingness to pay is still valid. In particular, our new period t willingness to pay price function, for a particular model with characteristics z = (z1,...,zn), is: (8) wt(f(z),ut,pt) ≡ − pt ∂gt(ut,f(z))/∂Z. Our new period t willingness to pay value function (which is the amount of money the consumer is willing to pay to have the services of a model with characteristics vector z) is: (9) vt(f(z),ut,pt) ≡ f(z) wt(f(z),ut,pt) = − f(z) pt ∂gt(ut,f(z))/∂Z. Now suppose that there are Kt models available to the consumer in period t, where model k sells at the per unit price of Pkt and has the vector of characteristics zkt ≡ (z1kt,...,zNkt) for k = 1,2,...,Kt. If the consumer purchases a unit of model k in period t, then we can equate the model price Pkt to the appropriate willingness to pay value defined by (9) where z is replaced by zkt; i.e., the following equations should hold: (10) Pkt = − f(zkt) pt ∂gt(ut,f(zkt))/∂Z ; t = 1,...,T ; k = 1,...,Kt. What is the meaning of the separability assumption? Suppose the hedonic commodity is an automobile and suppose that there are only three characteristics: number of seats in the vehicle, fuel economy and horsepower. The separability assumption means that the consumer can trade off these three characteristics and determine the utility of any auto with any mix of these three characteristics independently of his or her other choices of commodities. In particular, the utility ranking of automobile models is independent of the number of children the consumer might have or what the price of gasoline might be. Obviously, the separability assumption is not likely to be exactly satisfied in the real world but in order to make our model tractable, we are forced to make this somewhat restrictive assumption. 13 If a consumer purchases say two units of a model at price P that has characteristics z1,...,zN, then we can model this situation by introducing an artificial model that sells at price 2P and has characteristics 2z1,...,2zN. Thus the hedonic surface, Z = f(z) consists of only the most efficient models including the artificial models. 14 We do not assume that f(z) is a quasiconcave or concave function of z. In normal consumer demand theory, f(z) can be assumed to be quasiconcave without loss of generality because linear budget constraints and the assumption of perfect divisibility will imply that “effective” indifference curves enclose convex sets. However, as Rosen (1974; 37-38) points out, in the case of hedonic commodities, the various characteristics cannot be untied. Moreover, perfect divisibility cannot be assumed and not all possible combinations of characteristics will be available on the marketplace. Thus the usual assumptions made in “normal” consumer demand theory are not satisfied in the hedonic context. Note also that while we placed a smoothness assumption on the macro functions gt(u,Z) (the existence of the partial derivative ∂gt(u,Z)/∂Z), we do not place any smoothness restrictions on the hedonic subutility function f(z). 8 Another aspect of our model needs some further explanation. We are explicitly assuming that consumers cannot purchase fractional units of each model; they can purchase only a nonnegative integer amount of each model; i.e., we are explicitly assuming indivisibilities on the supply side of our model. Thus in each period, there are only a finite number of models of the hedonic commodity available so that while the consumer is assumed to have continuous preferences over all possible combinations of characteristics (z1,...,zN), in each period, there are only a finite number of isolated models that are available on the market. At this point, we further specialize our model. We assume that every consumer has the same hedonic subutility function15 f(z) and consumer i has the following linear indifference curve macro utility function in period t: (11) git(uit,Z) ≡ − at Z + bit uit ; t = 1,...,T ; i = 1,...,I where at and bit are positive constants. Thus for each period t and each consumer i, the period t indifference curve between combinations of X and Z is linear, with the constant slope − at being the same for all consumers.16 However, note that we are allowing this slope to change over time. Now differentiate (11) with respect to Z and substitute this partial derivative into (10). The resulting equations are:17 (12) Pkt = pt at f(zkt) ; t = 1,...,T ; k = 1,...,Kt. Now define the aggregate price of one unit of Z in period t as18: (13) ρt ≡ pt at ; t = 1,...,T and substitute (13) into (12) in order to obtain our basic system of hedonic equations:19 15 The sameness assumption is very strong and needs some justification. This assumption is entirely analogous to the assumption that consumers have the same homothetic preferences over say food. Although this assumption is not justified for some purposes, for the purpose of constructing a price index for food, it suffices since we are mostly interested in capturing the substitution effects in the aggregate price of food as the relative prices of food components vary. In a similar fashion, we are interested in determining how the “average” consumer values a faster computer speed against more memory; i.e., we are primarily interested in hedonic substitution effects. 16 We do not require a linear indifference curve globally but only locally over a certain range of purchases. Alternatively, we can view the linear indifference curve as providing a first order approximation to a nonlinear indifference curve. 17 Comparing (12) with (10), it can be seen that the simplifying assumptions (11) enabled us to get rid of the terms ∂gt(uit,f(zkt))/∂Z, which depend on individual consumer indifference curves between the hedonic commodity and other commodities. If we had individual household data on the consumption of hedonic and other commodities, then we could use normal consumer demand techniques in order to estimate the parameters that characterized these indifference curves. 18 We have switched to subscripts from superscripts in keeping with the conventions for parameters in regression models; i.e., the constants ρt will be regression parameters in what follows. Note also that ρt is the product of the price of the “other” commodity pt times the period t slope parameter at. We need to allow this slope parameter to change over time in order to be able to model the demand for high technology hedonic commodities, which have been falling in price relative to “other” commodities; i.e., we think of at as decreasing over time for high technology commodities. 9 (14) Pkt = ρt f(zkt) ; t = 1,...,T ; k = 1,...,Kt. Now all we need to do is postulate a functional form for the hedonic subutility function f and add a stochastic specification to (14) and we have our basic hedonic regression model. The unknown parameters in f along with the period t hedonic price parameters ρt can then be estimated.20 It is possible to generalize the above model but get the same model (14) if we replace the composite “other” commodity X by h(x), where x is a consumption vector and h is a linearly homogeneous, increasing and concave aggregator function. Instead of equations (12), under these new assumptions, we end up with the following equations: (15) Pkt = c(pt)at f(zkt) ; t = 1,...,T ; k = 1,...,Kt, where pt is now the vector of prices for the x commodities in period t and c is the unit cost or expenditure function that is dual to h.21 Now redefine ρt as c(pt)at and we still obtain the basic system of hedonic equations (14). Equations (14) have one property that is likely to be present in more complex and realistic models of consumer choice. This property is that the model prices in period t are homogeneous of degree one in the general price level pt. Thus if pt is replaced by λpt for any λ > 0 (think of a sudden hyperinflation where λ is large), then equations (12) and (14) imply that the model prices should become λPkt. Note that this homogeneity property will not hold for the following additive hedonic model: (16) Pkt = ρt + f(zkt) ; t = 1,...,T ; k = 1,...,Kt. Thus I would lean towards ruling out running hedonic regressions based on the linear model (16) on a priori grounds. Note that hedonic models that take the logarithm of the 19 Our basic model ends up being very similar to one of Muellbauer’s (1974; 988-989) hedonic models; see in particular his equation (32). 20 It is possible to rework the above theory and give it a producer theory interpretation. The counterpart to the expenditure minimization problem (3) is now the following profit maximization problem: max X,Z {PtZ − wtX : X = gt(kt,Z)} where Z is hedonic output and Pt is a period t price for one unit of the hedonic output, wt is the period t price of a variable input and X is the quantity used of it, kt is the period t quantity of a fixed factor (capital say) and gt is the firm’s factor requirements function. Assuming that Z = f(z), we end up with the following producer theory counterpart to (10): Pkt = f(zkt)∂gt(kt,f(zkt))/∂Z. The counterpart to assumption (11) is for firm i, git(kit,Z) ≡ atZ − bitkit and the counterpart to (12) becomes Pkt = wtatf(zkt). However, the producer theory model assumptions are not as plausible as the corresponding consumer theory model assumptions. In particular, it is not very likely that each producer will have the same period t aggregate price for a unit of variable input wt and it is not very likely that each firm producing in the hedonic market will have the same technology parameter at. But the key assumption that will not generally be satisfied in the producer context is that each producer is able to produce the entire array of hedonic models whereas, in the consumer context, it is quite plausible that each consumer has the possibility of purchasing and consuming each model. 21 Define c as c(pt) ≡ minx {pt•x : h(x) = 1} where pt•x denotes the inner product between the vectors pt and x. 10 model price Pkt as the dependent variable will tend to be consistent with our basic hedonic equations (14) whereas linear models like (16) will not be consistent with the normal linear homogeneity properties implied by microeconomic theory. We turn now to a discussion of some of the problems involved in choosing a functional form for the hedonic subutility function f(z).22 3. Functional Form Issues 3.1 Frequently Used Functional Forms The three most commonly used functional forms in the hedonic regression literature are the log-log, the semilog and the linear.23 We consider each in turn. In the log-log model, the hedonic aggregator function f is defined in terms of its logarithm as (17) ln f(z1,...,zN) ≡ α0 + ∑n=1N αn ln zn where the αn are the unknown parameters to be estimated. If we take logarithms of both sides of (14), use (17) and add error terms εkt, we obtain the following hedonic regression model: (18) ln Pkt = βt + α0 + ∑n=1N αn ln znkt + εkt ; t = 1,...,T ; k = 1,...,Kt where βt ≡ ln ρt for t = 1,...,T. In order to identify all of the parameters, we require a normalization on the βt and α0. Typically, we set β1 = 0, which is equivalent to a1p1 = 1. If we want to impose linear homogeneity (or constant returns to scale) on the hedonic subutility function f(z), we can do this by setting ∑n=1N αn = 1. In the semilog model, the logarithm of the hedonic function f(z) is defined as: (19) ln f(z1,...,zN) ≡ α0 + ∑n=1N αnzn. If we take logarithms of both sides of (14), use (18) and add error terms εkt, we obtain the following hedonic regression model: (20) ln Pkt = βt + α0 + ∑n=1N αnznkt + εkt ; t = 1,...,T ; k = 1,...,Kt where βt ≡ ln ρt for t = 1,...,T. Again, in order to identify all of the parameters, we require a normalization on the βt and α0, such as β1 = 0, which is equivalent to a1p1 = 1. 22 Our discussion draws heavily on Triplett (2001) and Berndt (1991; Chapter 4). 23 See Berndt (1991; Chapter 4) for historical references to the early use of these functional forms. 11 The semilog model has a disadvantage compared to the log-log model: it is not possible to impose constant returns to scale on the semilog hedonic function f(z).24 However, the semilog model has an advantage compared to the log-log model: the semilog model can deal with situations where one or more characteristics znkt are equal to zero whereas the log-log model cannot. This is an important consideration if new characteristics come on to the market during the sample period. In the linear model, the hedonic function f(z) is a simple linear function of the characteristics: (21) f(z1,...,zN) ≡ α0 + ∑n=1N αnzn. Substituting (21) into (14) and adding the error terms εkt, we obtain the following hedonic regression model: (22) Pkt = ρt [α0 + ∑n=1N αnznkt] + εkt; t = 1,...,T ; k = 1,...,Kt. Again, in order to identify all of the parameters, we require a normalization on the ρt and αn, such as ρ1 = 0, which is equivalent to a1p1 = 1. Unfortunately, (22) is a nonlinear regression model whereas the earlier log-log and semilog models were linear regression models. Constant returns to scale on the linear hedonic function can be imposed by setting α0 = 0. The model (22) can also readily deal with the introduction into the marketplace of new characteristics. It can be seen that none of the 3 models (18), (20) or (22) totally dominates the other two models; each of the 3 models has at least one advantage over the other two. Due to the nonlinear form of (22), this model has not been estimated very frequently if at all. However, the following closely related model has been estimated countless times: (23) Pkt = ρt + α0 + ∑n=1N αnznkt + εkt; t = 1,...,T ; k = 1,...,Kt. As was indicated in the previous section, the linear model (23) is unlikely to be consistent with microeconomic theory and so we cannot recommend its use. 3.2 Hedonic Regressions and the Problem of Package Size 24 For some purposes, it is convenient to allow the hedonic utility function to be the type of utility function that is assumed in index number theory, where usually it is assumed that the utility function is homogeneous of degree one, increasing and concave. For example, if we want to use the hedonic framework to model tied purchases (i.e., two commodities are sold together at a single price), then the hedonic utility function becomes an ordinary utility function, f(z1,z2), where z1 and z2 are the quantities of the two commodities that are in the tied package. In this situation, it may be reasonable to assume that f is homogeneous of degree one in which case, the price of a package consisting of z1 and z2 unit of the two commodities is c(p1,p2)f(z1,z2) where c(p1,p2) ≡ min z’s {p1z1 + p2z2 : f(z1,z2) = 1} is the unit cost function which is dual to the utility function f. There are many other applications where it would be useful to allow f to be a linearly homogeneous function. 12 For many commodities, the price declines as the volume purchased increases. How can this phenomenon be modeled using the hedonic regression framework? Suppose that the vector of characteristics z ≡ (z1,...,zN) is a scalar so that N = 1 and the single characteristic quantity z1 is the package size; i.e., it is the quantity of a homogeneous commodity that is contained in the package sold. In this case, it is natural to take the hedonic subutility function f(z1) to be a continuous monotonically nondecreasing function of one variable with f(0) = 0. We drop the subscript 1 in what follows. A simple specification for f(z) is to let it be a piecewise linear, continuous function or a linear spline. In the case of 3 linear segments, the system of estimating equations (14) would look like the following system after adding errors to (14): for t = 1,...,T, we have: (24) Pkt = ρt α1zkt + εkt if 0 ≤ zkt ≤ z1* = ρt [α1z1* + α2{zkt − z1*}] + εkt if z1* ≤ zkt ≤ z2* = ρt [α1z1* + α2{z2* − z1*} + α3{zkt − z2*}] + εkt if z2* ≤ zkt. The predetermined package sizes, z1* and z2*, where we switch from one linear segment to the next, are called break points. The unknown parameters to be estimated are ρ1,...,ρT, α1, α2 and α3. As usual, not all of these parameters can be identified so it is necessary to impose a normalization such as ρ1 = 1. There are two difficulties with the system of estimating equations (24): • The regression is nonlinear in the unknown parameters. • The estimated coefficients α1, α2 and α3 should be nonnegative.25 If an initial regression yields a negative αi, then the regression can be rerun, replacing αi by (αi)2. We turn now to a discussion of the flexibility properties of an assumed hedonic subutility function f(z). 3.3 Flexibility Issues In normal consumer demand theory, we usually ask that the functional form for the consumer’s utility function (or any of its dual representations) be flexible; i.e., we ask that our assumed functional form be able to approximate an arbitrary twice continuously differentiable utility function to the second order.26 In the hedonic regression literature, this requirement that the functional form for the utility function be flexible has generally not been imposed27. For example, the functional forms considered in section 3.1 above 25 Pakes (2001) argues that we should not expect our hedonic regression estimates to satisfy monotonicity restrictions based on the strategic behavior of firms as they introduce new models. However, for credibility reasons, it is likely that statistical agencies will want to impose monotonicity restrictions. 26 See Diewert (1974; 127-133) (1993; 158-164) for examples of flexible functional forms. 27 An exception to this statement is the recent paper by Yu (2001). His discussion is similar to our discussion in many respects and is more general in some respects. 13 are only capable of providing a linear approximation rather than a quadratic one. The reason why flexible functional forms have not been used in the hedonic literature to a greater extent is probably due to the multicollinearity problem; i.e., if we attempt to estimate a hedonic subutility function f(z) that is capable of providing a second order approximation, then it may have too many unknown parameters to be estimated accurately.28 Nevertheless, it may be useful to consider the costs and benefits of using alternative flexible functional forms in the hedonic context. For our first flexible functional form for f(z), consider the following translog functional form,29 which generalizes our earlier log-log hedonic aggregator function defined by (17) above: (25) ln f(z1,...,zN) ≡ α0 + ∑n=1N αn ln zn + (1/2) ∑i=1N ∑j=1N αij ln zi ln zj where the αn and the αij are the unknown parameters to be estimated. If we take logarithms of both sides of (14), use (25) and add error terms εkt, we obtain the following translog hedonic regression model: (26) ln Pkt = βt + α0 + ∑n=1N αn ln znkt + (1/2) ∑i=1N ∑j=1N αij ln zikt ln zjkt + εkt ; αij = αji ; t = 1,...,T ; k = 1,...,Kt where βt ≡ ln ρt for t = 1,...,T. In order to identify all of the parameters, we require a normalization on the βt and α0. Typically, we set β1 = 0, which is equivalent to a1p1 = 1. If we want to impose linear homogeneity (or constant returns to scale) on the hedonic subutility function f(z), we can do this by setting ∑n=1N αn = 1 and imposing the restrictions ∑j=1N αij = 0 for i = 1,...,N. Obviously, the translog model (26) contains the log-log model (18) as a special case.30 The translog hedonic model (26) has two nice properties: • The right hand side of (26) is linear in the unknown parameters so that linear regression techniques can be used in order to estimate the unknown parameters. • Constant returns to scale can readily be imposed on the translog hedonic utility function f(z) without destroying the flexibility of the functional form. The main disadvantage of the translog hedonic model is that like the log-log model, it cannot deal with the zero characteristics problem. 28 The situation in normal consumer demand theory can be more favorable to the accurate estimation of flexible functional forms because we will have an entire system of estimating equations in the normal context. Thus if there are N commodities and price and quantity observations for T periods on H households, we will have H(N−1)T degrees of freedom to work with in the usual systems approach to estimating consumer preferences. In the hedonic regression framework, we have K1+K2+...+KT or roughly KT degrees of freedom, where K is the average number of models in each period. 29 See Christensen, Jorgenson and Lau (1975). 30 In view of our discussion in section 2 above, the translog f(z) does not have to satisfy any curvature conditions. 14 For our second flexible functional form, consider the following generalization of the semilog hedonic utility function (19): (27) ln f(z1,...,zN) ≡ α0 + ∑n=1N αnzn + (1/2) ∑i=1N ∑j=1N αij zizj where the αn and the αij are the unknown parameters to be estimated. If we take logarithms of both sides of (14), use (27) and add error terms εkt, we obtain the following semilog quadratic hedonic regression model: (28) ln Pkt = βt + α0 + ∑n=1N αnznkt + (1/2) ∑i=1N ∑j=1N αij zikt zjkt + εkt ; t = 1,...,T ; k = 1,...,Kt where βt ≡ ln ρt for t = 1,...,T. Again, in order to identify all of the parameters, we require a normalization on the βt and α0, such as β1 = 0, which is equivalent to a1p1 = 1. The semilog quadratic model has a disadvantage compared to the translog model: it is not possible to impose constant returns to scale on the semilog quadratic hedonic function f(z). Both models share the advantage of being linear in the unknown parameters. However, the semilog quadratic model has an advantage compared to the translog model: the semilog model can deal with situations where one or more characteristics znkt are equal to zero whereas the translog model cannot. This is an important consideration if new characteristics come on to the market during the sample period. For our third flexible functional form for the hedonic utility function f(z), consider the following generalized linear functional form:31 (29) f(z1,...,zN) ≡ α0 + ∑n=1N αn (zn)1/2 + (1/2) ∑i=1N ∑j=1N αij (zi)1/2 (zj)1/2 where the αn and the αij are the unknown parameters to be estimated. Note that (29) generalizes our earlier linear functional form (21).32 Substituting (29) into (14) and adding the error terms εkt, we obtain the following generalized linear hedonic regression model: (30) Pkt = ρt [α0 + ∑n=1N αn (znkt)1/2 + (1/2) ∑i=1N ∑j=1N αij (zikt)1/2 (zjkt)1/2] + εkt; t = 1,...,T ; k = 1,...,Kt. As usual, in order to identify all of the parameters, we require a normalization on the ρt, αn and αij such as ρ1 = 0, which is equivalent to a1p1 = 1. Unfortunately, (30) is a nonlinear regression model whereas the earlier translog and semilog quadratic models were linear regression models. Constant returns to scale on the generalized linear hedonic function can be imposed by setting αn = 0 for n = 0,1,...,N. The model (22) can also readily deal with the introduction into the marketplace of new characteristics. 31 See Diewert (1971). 32 Let the αn and αij for i≠j all equal 0 in (29) and we obtain (21). 15 As was the case in section 3.1 above, none of the three flexible hedonic regression models presented in this section totally dominates the remaining two models. Models (26) and (28) have the advantage of being linear regression models whereas (30) is nonlinear. Model (26) cannot deal very well with the introduction of new characteristics during the sample period whereas (28) and (30) can. Constant returns to scale in characteristics can readily be imposed in models (26) and (30) whereas this is not possible with model (28). Thus each of the three models has two favorable characteristics and one unfavorable characteristic. 3.4 Nonparametric Functional Forms It is possible to address the functional form problem in a nonparametric manner using generalized dummy variable techniques.33 Suppose that there are only two characteristics that are important for the models on the market during periods t = 1,...,T. Suppose further that there are only I configurations of the first characteristic and J configurations of the second characteristic during the sample period where I and J are integers greater than one.34 Suppose further that in period t, we have Kijt observations that have first characteristic in group i and second characteristic in group j. Denote the kth observation in period t in this i,j grouping as zijkt = (z1ijkt, z2ijkt). For this configuration of characteristics, we define the corresponding hedonic utility as follows: (31) f(zijkt) ≡ αij ; t = 1,...,T ; i = 1,...,I ; j = 1,...,J ; k = 1,...,Kijt. Let Pijkt denote the period t price for observation k that has model characteristics that put it in the i,j grouping of models. Substituting (31) into (14) and adding the error term εijkt leads to the following (nonlinear) generalized dummy variable hedonic regression model: (32) Pijkt = ρtαij + εijkt ; t = 1,...,T ; i = 1,...,I ; j = 1,...,J ; k = 1,...,Kijt. As usual, not all of the parameters ρt for t = 1,...,T and αij for i = 1,...,I and j = 1,...,J can be identified and so it is necessary to impose a normalization on the parameters like ρ1 = 1. The hedonic regression model (32) is nonlinear. However, in this case, we can reparameterize our theoretical model so that we end up with a linear regression model. Suppose that we take logarithms of both sides of (31). Then defining lnαij as γij, we have: 33 The material that we are going to present in this section is essentially equivalent to what statisticians call an analysis of variance model (a two way layout with interaction terms); see Chapter 4 in Scheffé (1959). 34 Alternatively, we group observations so that all models having a quantity z1 of the first characteristic between 0 and z1* are in group 1, all models having a quantity z1 of the first characteristic between z1* and z2* are in group 2,..., and all models having a quantity z1 of the first characteristic between zI−1* and zI* are in group I. We do a similar grouping of the models for the second characteristic. Thus any model k in each period falls into one of IJ discrete groupings of models. 16 (33) ln f(zijkt) ≡ γij ; t = 1,...,T ; i = 1,...,I ; j = 1,...,J ; k = 1,...,Kijt. Substituting (33) into (14) after taking logarithms of both sides of (14) and adding the error term εijkt leads to the following linear generalized dummy variable hedonic regression model: (34) lnPijkt = βt + γij + εijkt ; t = 1,...,T ; i = 1,...,I ; j = 1,...,J ; k = 1,...,Kijt where βt ≡ lnρt for t = 1,...,T. As usual, not all of the parameters βt for t = 1,...,T and γij for i = 1,...,I and j = 1,...,J can be identified and so it is necessary to impose a normalization on the parameters like β1 = 0, which corresponds to ρ1 = 1. Which of the two generalized dummy variable hedonic regression models (32) or (34) is “best”? Obviously, they both have exactly the same economic content but of course, the stochastic specifications for the two models differ. Hence, we would have to look at the statistical properties of the residuals in the two models to determine which is better.35 However, without looking at residuals, the linear regression model (34) will be much easier to implement than the nonlinear model (32), especially for large data sets. The linear generalized dummy variable hedonic regression models (32) and (34) have two major advantages over the traditional flexible functional form models listed in section 3.3 above: • The dummy variable models (32) and (34) are completely nonparametric and hence are much more flexible than traditional flexible functional forms. • The dummy variable models can easily accommodate discrete characteristic spaces. However, the dummy variable hedonic regressions also have some disadvantages: • There can be an enormous number of parameters to estimate, particularly if there are a large number of distinct characteristics. • If we attempt to reduce the number of parameters by having fewer class intervals for each characteristic, we will introduce more variance into our estimated coefficients. • Different investigators will choose differing numbers of classification cells; i.e., differing dummy variable hedonic specifications made by different hedonic operators will choose differing I’s and J’s, leading to a lack of reproducibility in the models.36 35 There is another consideration involved in choosing between (32) and (34). The parameters that we are most interested in are the ρt, not their logarithms, the βt. However, as Berndt (1991; 127) noted, “explaining variations in the natural logarithm of price is not the same as explaining variations in price”. Thus Silver and Heravi (2001) and Triplett (2001) both note that the antilog of the least squares estimator for βt will not be an unbiased estimator of ρt under the usual stochastic specification and they cite Goldberger (1968) for a method of correcting this bias. Koskimäki and Vartia (2001; 15) also deal with this problem. These considerations would lead one to favor estimating (32) rather than (34). 36 The reproducibility issue is very important for statistical agencies. 17 • If j is held constant, then the αij and γij coefficients should increase (or at least not decrease) as i increases from 1 to I.37 Similarly, if i is held constant, then the αij and γij coefficients should increase (or at least not decrease) as j increases from 1 to J. The regression models (32) and (34) ignore these restrictions and it may be difficult to impose them.38 Nevertheless, I believe that these generalized dummy variable hedonic regression techniques look very promising. These models, along with other nonparametric models, deserve a serious look by applied researchers. 4. Hedonic Regressions and Traditional Methods for Quality Adjustment Silver and Heravi (2001) demonstrated how traditional matched model techniques for making quality adjustments can be reinterpreted in the context of hedonic regression models. Triplett (2001) and Koskimäki and Vartia (2001; 9) also have some results along these lines. In this section, we review two of Triplett’s results. Suppose that the hedonic regression equations (14) hold in period t and we want to compare the quality of model 1 with that of model 2. Then it can be seen that the first two equations in (14) imply that the utility of variety 2 relative to variety 1 is (35) f(z2t)/f(z1t) = [P2t/ρt]/[ P1t/ρt] = P2t / P1t ; i.e., the utility or intrinsic value to the consumer of model 2 relative to the utility of model 1 is just the price ratio, P2t / P1t. Thus in this case, a quality adjustment that falls out of a hedonic regression model is equivalent to a “traditional” statistical agency quality adjustment technique, which is to use the observed price ratio of the two commodities in the same period as an indicator of the relative quality of the two commodities.39 In a second example showing how traditional statistical agency quality adjustment techniques can be related to hedonic regressions, Triplett (2001) showed that under certain conditions, the usual matched model method for calculating an overall measure of price change going from one period to the next (using geometric means) was identical to the results obtained using a hedonic regression model.40 We now look at Triplett’s result in a somewhat more general framework. Recall our standard hedonic regression model equations (14) above. Suppose further that the logarithm of f(z) is a linear function in J unknown parameters, α1,...,αJ; i.e., we have: 37 We follow the usual convention that individual characteristics are defined in such a way that a larger quantity of any characteristic yields a larger utility to the consumer. 38 Note that there are comparable monotonicity restrictions that the continuous hedonic models listed in sections 3.1 and 3.3 should also satisfy and it will be difficult to impose these conditions for these models as well. 39 We are ignoring the error terms in the hedonic regressions in making this argument. 40 Koskimäki and Vartia (2001; 9) state a similar more general result, which is very similar to the result that we obtain below. 18 (36) ln f(zkt) ≡ α1 + ∑j=2J xj(zkt) αj ; t = 1,...,T ; k = 1,...,Kt where the functions xj(zkt) are known. Note that we have assumed that x1(zkt) ≡ 1; i.e., we have assumed that the functional form for ln f(z) has a constant term in it. Now take logarithms of both sides of equations (14), substitute (36) into these logged equations and add stochastic terms εkt to obtain the following system of regression equations: (37) ln Pkt = βt + α1 + ∑j=2J xj(zkt) αj + εkt ; t = 1,...,T ; k = 1,...,Kt where as usual, we have defined βt ≡ ln ρt for t = 1,...,T. A normalization if required in order to identify all of the parameters in (37). We choose the normalization ρ1 = 1, which translates into the following normalization: (38) β1 = 0. Using matrix notation, we can write the period t equations in (37) as (39) yt = 1t βt + Xt α + εt ; t = 1,...,T where yt ≡ [lnP1t,...,lnPKtt]′ is a period t vector of logarithms of model prices (where ′ denotes the transpose of the preceding vector), βt is the scalar parameter lnρt, 1t is a column vector consisting of Kt ones, Xt is a Kt by J matrix of exogenous variables, α ≡ [α1,...,αJ]′ is a column vector of parameters that determine the hedonic subutility function and εt ≡ [ε1t,...,εKtt]′ is a column vector of period t disturbances. Now rewrite the system of equations (39) in stacked form as (40) y = Wγ + ε where y′ ≡ [y1′,...,yT′], ε′ ≡ [ε1′,...,εT′], γ′ ≡ [β2, β3,..., βT,α1,...,αJ] and the matrix W is a somewhat complicated matrix which is constructed using the column vectors 1t and the Kt by J matrices Xt for t = 1,...,T.41 The vector of least squares estimators for the components of γ is (41) γ* ≡ (W′W)−1W′y. Define the vector of least squares residuals e by (42) e ≡ y − Wγ* = y − W(W′W)−1W′y. It is well known that the vector of least squares residuals e is orthogonal to the columns of W; i.e., we have: 41 Note that we used the normalization (38) in order to eliminate the parameter β1 from the parameter vector γ. 19 (43) W′e = W′[y − W(W′W)−1W′y] = W′y − W′y = 0T−1+J′ where 0T−1+J is a vector of zeros of dimension T−1+J. Now premultiply both sides of e ≡ y − Wγ* by the transposes of the first T−1 columns of W. Using (43), we obtain the following equations: (44) 0 = 1t′yt − 1t′1t βt* − 1t′Xt α* ; t = 2,3,...,T where βt* is the least squares estimator for βt and α* ≡ [α1*,...,αJ*]′ is the vector of least squares estimators for α ≡ [α1,...,αJ]′. Now column T in W corresponds to the constant term α1 and hence is a vector of ones. Premultiply both sides of (42) by this column and using (43), we obtain the following equation: (45) 0 = ∑t=1T 1t′yt − ∑t=2T 1t′1t βt* − ∑t=1T 1t′Xtα*. Substitute equations (44) into (45) in order to obtain the following equation: (46) 1t′y1 = 1t′X1α*. Noting that 1t′1t = Kt (the number of model prices collected in period t), we can rewrite equations (44) as follows: (47) βt* = (1/Kt) ∑k=1Kt ykt − (1/Kt) 1t′Xt α* ; t = 2,3,...,T. The βt* defined by the right hand side of (47) can be given an interesting interpretation as an arithmetic average of the vector of quality adjusted period t logarithmic prices yt − Xtα*. However, a very interesting result emerges from using (46) and (47) if we assume that the sample of model prices is matched for all T periods (so that in each period, exactly the same models are priced). If the sample is matched, then each Xt , matrix is exactly the same (and all Kt equal a common sample size K). If the common Xt matrix is the K by T−1+J matrix X, then using (46) and (47) gives us the following formula for βt*: (48) βt* = (1/K) ∑k=1K ykt − (1/K) ∑k=1K yk1 ; t = 2,3,...,T. Thus in the matched sample case, taking the exponential of βt* as our estimator of ρt and recalling that ykt ≡ ln Pkt, we have (49) ρt* ≡ [∏k=1K Pkt]1/K / [∏k=1K Pk1]1/K = [∏k=1K (Pkt/Pk1)]1/K ; t = 2,3,...,T; i.e., the hedonic regression approach in the matched model case gives exactly the same result for the overall measure of price change going from period 1 to t as we would get by taking the geometric mean of the matched model price relatives for the two periods under consideration. Triplett indicated that this result was true for the case T = 2 and assuming that f was the log-log hedonic utility function described in section 3.1 above. 20 I think that the Silver and Heravi (2001) paper and the Triplett (2001) Manual are both very useful in that they indicate very explicitly that traditional matched model techniques for quality adjustment can be quite closely related to the results of a hedonic regression approach. This correspondence between the two methods should help to demystify hedonic methods to some extent. Furthermore, as stressed by Silver and Heravi and Triplett, the statistical advantage in using the hedonic regression approach over the matched model approach increases as the lack of matching increases; i.e., the hedonic technique uses all of the model information between the two periods under consideration whereas the matched model approach can by definition use only the information on models that are present in the marketplace during both periods. 5. Hedonic Regressions and the Use of Quantity Weights The hedonic regression study by Silver and Heravi (2001) is relatively unique in that they not only had data on the prices and characteristics of washing machines sold in the UK in 1998, they also had data on the sales of each model. The question that we want to address in this section is: how exactly should quantity data be used in a hedonic regression study? We start out by considering a very simple model where there is only one variety in the market during period t but we have K price observations, Pkt, on this model during period t, along with the corresponding quantity sold at each of these prices, qkt. Under these assumptions, our basic hedonic regression equations (14) for period t become: (50) Pkt = ρt f(zkt) = ρt ; k = 1,2,...,K where we can set f(zkt) = 1, since all K transactions are on exactly the same model. From viewing (50), we see that ρt can be interpreted as some sort of average of the K period t observed transaction prices, Pkt. The relative frequency at which the price Pkt is observed in the marketplace during period t can be defined as: (52) θkt ≡ qkt / ∑i=1K qit. The expected value of the discrete distribution of period t prices is (53) ρt* ≡ ∑k=1K θkt Pkt = ∑k=1K qkt Pkt / ∑i=1K qkt using (52). Note that the far right hand side of (53) is a unit value. Thus quantity data on the sales of a model can be used to form a representative average price for the model in a period and that representative price is an overall sales weighted average price for the model or a unit value.42 42 One could think of other ways of weighting the prices Pkt. For example, we could use the expenditure share for all models sold at the price Pkt during period t equal to skt ≡ Pktqkt / ∑i=1K Pitqit for k = 1,...,K as a weighting factor for Pkt. The representative period t average price using these weights becomes ρt** ≡ ∑k=1K 21 How can we derive the unit value estimator for the representative period t price ρt using a hedonic regression? There are at least two ways of doing this. Look at equation k in the system of price equations (50). Since there are qkt sales at this price in period t, we could repeat the equation Pkt = ρt a number of times, qkt times to be exact. Let 1k be a vector of dimension qkt. Then using vector notation, we could write rewrite the system of equations (50), repeating each price Pkt the appropriate number of times a transaction took place in period t at that price as follows: (54) 1kPkt = 1kρt ; k = 1,2,...,K. Now add error terms to each equation in (54) and calculate the least squares estimator for the resulting linear regression. This estimator turns out to be the unit value estimator ρt* defined by (53). The second way of deriving the unit value estimator for the representative period t price ρt using a hedonic regression is to multiply both sides of equation k in (50) by the square root of the quantity sold of model k in period t, (qkt)1/2, and then add an error term, εkt. We obtain the following system of equations: (55) (qkt)1/2Pkt = (qkt)1/2ρt + εkt ; k = 1,2,...,K. Note that the left hand side variables in (55) are known. Now treat (55) as a linear regression with the unknown parameter ρt to be estimated. It can be verified that the least squares estimator for ρt is the unit value estimator ρt* defined by (53).43 Thus we can use a weighted least squares hedonic regression as a way of obtaining a more representative average model price for period t. The above discussion may help to explain why Silver and Heravi (2001) used sales weighted hedonic regressions in their regression models. The use of quantity weighted regressions will diminish the influence of unrepresentative prices44 and should lead to a better measure of central tendency for the distribution of quality adjusted model prices; i.e., the use of quantity weights should lead to more accurate estimates of the ρt parameters in equations (14). sktPkt. Note that if we divide this price into the value of period t transactions, ∑i=1K Pitqit, we obtain the corresponding quantity estimator, [∑i=1K Pitqit]2 / ∑k=1K [Pkt]2qkt, which is not easy to interpret. On the other hand, if we divide the unit value estimator of aggregate period t price, ρt* defined by (53), into the value of period t transactions, ∑i=1K Pitqit, we obtain the simple sum of quantities transacted during period t, ∑k=1K qkt, as the corresponding quantity estimator. The use of unit values to aggregate over transactions pertaining to a homogeneous commodity within a period to obtain a single representative price and quantity for the period under consideration was advocated by Walsh (1901; 96) (1921; 88), Davies (1924; 187) and Diewert (1995; 20-24). 43 Berndt (1991; 127) presents a similar econometric argument justifying the weighted least squares model (54) in terms of a model involving heteroskedastic variances for the untransformed model. 44 Griliches (1961) (1971; 5) made this observation many years ago. 22 6. Exact Hedonic Indexes Silver and Heravi (2001) spend a considerable amount of effort in evaluating two of Feenstra’s (1995) bounds to an exact hedonic index. In section 2, we made some rather strong simplifying assumptions on the structure of consumer preferences, assumptions that were rather different than those made by Feenstra. In this section, we look at the implications of our assumptions for constructing exact hedonic indexes.45 Recall our basic hedonic equations (14) again: Pkt = ρt f(zkt) for t = 1,...,T and k = 1,...,Kt. We assume that the price Pkt is the average price for all the models of type k sold in period t and we let qkt be the number of units sold of model k in period t. Recall that the number of models in the marketplace during period t was Kt. In this section, we will assume that there are K models in the marketplace over all T periods in our sample period. If a particular model k is not sold at all during period t, then we will assume that Pkt and qkt are both zero. With these conventions in mind, the total value of consumer purchases during period t is equal to: (56) ∑k=1K Pktqkt = ∑k=1K ρtf(zk)qkt ; t = 1,...,T. The hedonic subutility function f has done all of the hard work in our model in converting the utility yielded by model k in period t into a “standard” utility f(zk) that is cardinally comparable across models. Then for each model type k, we just multiply by the total number of units sold in period t, qkt, in order to obtain the total period t market quantity of the hedonic commodity, Qt say. Thus we have:46 (57) Qt ≡ ∑k=1K f(zk)qkt ; t = 1,...,T. The corresponding aggregate price for the hedonic commodity is ρt. Thus in our highly simplified model, the aggregate exact period t price and quantity for the hedonic commodity is ρt and Qt defined by (57), which can readily be calculated, provided we have estimated the parameters in the hedonic regression (14) and provided that we have data on quantities sold during each period, the qkt.47 Once ρt and Qt have been determined for t = 1,...,T, then these aggregate price and quantity estimates for the hedonic commodity can be combined with the aggregate prices and quantities of nonhedonic commodities using normal index number theory. 45 Our assumptions are also quite different from those made by Fixler and Zieschang (1992) who took yet another approach to the construction of exact hedonic indexes. 46 This is a counterpart to the quantity index defined by Muellbauer (1974; 988) in one of his hedonic models; see his equation (30). Of course, treating ρt as a price for the hedonic commodity quantity aggregate defined by (57) can be justified by appealing to Hicks’ (1946; 312-313) Aggregation Theorem, since the model prices Pkt = ρt f(zk) all have the common factor of proportionality, ρt. 47 If we have data for the qkt, then it is best to run sales weighted regressions as was discussed in the previous section. If we do not have complete market data on individual model sales but we do have total sales in each period, then we can run the hedonic regression model (14) using a sample of model prices and then divide period t sales by our estimated ρt parameter in order to obtain an estimator for Qt. 23 We conclude this section by discussing one other aspect of the Silver and Heravi paper: namely, their use of matched model superlative indexes. A matched model price index for the hedonic commodity between periods t and t+1 is constructed as follows. Let I(t, t+1) be the set of models k that are sold in both periods t and t+1. Then the matched model Laspeyres and Paasche price indexes going from period t to period t+1, PL and PP respectively, are: (58) PLt ≡ [∑k∈I(t, t+1) Pkt+1 qkt]/[∑k∈I(t, t+1) Pkt qkt] ; (59) PPt ≡ [∑k∈I(t, t+1) Pkt+1 qkt+1]/[∑k∈I(t, t+1) Pkt qkt+1]. In the above matched model indexes, we compare only models that were sold in both periods under consideration. Thus we are throwing away some of our price information (on prices that were present in only one of the two periods). The matched model superlative Fisher Ideal price index going from period t to t+1 is PFt ≡ [PLtPPt]1/2; i.e., it is the square root of the product of the matched model Laspeyres and Paasche indexes. Now it is possible to compare the matched model Fisher measure of price change going from period t to t+1, PFt, to the corresponding measure of aggregate price change that we could get from our hedonic model, which it ρt+1/ρt. We would hope that these measures of price change would be quite similar, particularly if the proportion of matched models is high for each period (as it is for the Silver and Heravi data). Silver and Heravi (2001) make this comparison for their hedonic models and find that the matched Fisher ends up about 2% lower for their UK washing machine data for 1998 compared to the hedonic models. It seems quite possible that this relatively large discrepancy could be due to the fact that the Silver and Heravi hedonic functional forms are only capable of providing a first order approximation to arbitrary hedonic preferences whereas the superlative indexes can provide a second order approximation and thus substitution effects are bigger for the superlative matched model price indexes.48 Thus an important implication of the Silver and Heravi paper emerges: it is not necessary to undertake a hedonic study if • detailed data on the price and quantity sold of each model are available and • between consecutive periods, the number of new and disappearing models is small, so that matching is relatively large. We turn now to our final topic: a discussion of the additional problems that occur if we relax the assumption that the hedonic subutility function f(z) is time invariant. 7. Changing Tastes and the Hedonic Utility Function 48 In favor of this interpretation is the fact that the matched model Laspeyres index was roughly the same as the hedonic indexes computed by Silver and Heravi. However, there are other factors at work and this “explanation” may well be incomplete. 24 Several economists have suggested that there are good reasons why the hedonic utility function f(z) introduced in section 2 above may depend on time t.49 In this section, we consider what changes need to be made to our basic hedonic model outlined in section 2 if we replace our time invariant hedonic utility function f(z) by one that depends on time, say ft(z).50 If we replace our old f(z) in section 2 by ft(z) and make the same other assumptions as we made there, we find that instead of our old equations (14), we now end up with the following equations. (60) Pkt = ρt ft(zkt) ; t = 1,...,T ; k = 1,...,Kt. Up to this point, nothing much has changed from our previous section 2 model which assumed a time invariant hedonic subutility function f(z), except that our new subutility function ft(z) will naturally have some time dependent parameters in it. However, there is another major change that is associated with our new model (60). Recall that in the time invariant models discussed in section 3, we required only one normalization on the parameters, like ρ1 = 1. In our new time dependent framework, we require a normalization on the parameters in (60) for each period; i.e., we now require T normalizations on the parameters instead of one in order to identify the ρt and the α parameters which characterize ft(z). The simplest way to obtain the required normalizations is to make the hypothesis that the utility that a reference model with characteristics z* ≡ (z1*,...,zN*) gives the consumer the same utility across all periods in the sample. If we choose this reference utility level to be unity, then this hypothesis translates into the following restrictions on the parameters of ft(z): (61) ft(z*) = 1 ; t = 1,...,T. Equations (60) and (61) now become our basic system of hedonic regression equations that replace our old system (14) plus the normalization ρ1 = 1.51 49 More precisely, Silver (1999a) and Pakes (2001) make very strong arguments (based on industrial organization theory) that the hedonic regression coefficients that are estimated using period t data should depend on t. Griliches (1961) also argued that the hedonic regression coefficients were unlikely to be constant over periods. 50 Before we proceed to our general discussion of time dependent hedonic aggregator functions ft(z), we note a simple method originally due to Court (1939) and Griliches (1961) for allowing for time dependence that does not require any new methodology: simply use the previous time independent methodology but restrict the regression to two consecutive periods. This will give us a measure of overall price change for the hedonic commodity going from period t to t+1 say. Then run another hedonic regression using only the data for periods t+1 and t+2, which will give us a measure of price change going from period t+1 to t+2. And so on. 51 If we define the imputed price of the reference model in period t as Pt*, it can be seen using (60) and (61) that Pt* = ρt for t = 1,...,T. Now in actual practice, when unrestricted period t hedonic regressions are run in isolation, researchers omit the time dummy and just regress say ln Pkt on ln ft(zkt) where the right hand side regression variables have a constant term. Then the researcher estimates the period t aggregate price of the 25 How should we choose the functional form for ft(z)? Obviously, there are many possibilities. However, the simplest possibility (and it is the one chosen by Silver and Heravi) is to allow the αn parameters that we defined for various functional forms in section 3 above to depend on t; i.e., the αn defined in section 3 are replaced by αnt and each period t parameter set is estimated by a hedonic regression that uses only the price and characteristics data for period t.52 We leave to the reader the details involved in reworking our old algebra in section 3, changing the αn into αnt and imposing the nomalizations (61) in place of our old normalization, ρ1 = 1. So far, so good. It seems that we have greatly generalized our old “static” hedonic model at virtually no cost. However, there is a hidden cost. Our new system of regression equations, (60) and (61), is in general not invariant to the choice of the reference model with characteristics vector z*. Thus if we choose a different reference model with characteristics vector z** ≠ z* and replace the normalizations (61) by (62) ft(z**) = 1 ; t = 1,...,T. then in general, the new estimates for the aggregate hedonic commodity prices ρt will change. Thus the cost of assuming a time dependent hedonic utility function is a lack of invariance in the relative prices of the aggregate hedonic commodity over time to our utility function normalizations (61) or (62). This lack of invariance in our estimated ρt need not be a problem for statistical agencies, provided that we can agree on a “reasonable” choice for the reference model that is characterized by the characteristics vector z*, since the important factor for the agency is to obtain “reasonable” and reproducible estimates for the aggregate hedonic commodity prices. Based on some discussion of this problem in Silver (1999b; 47), a preliminary suggestion is that we take z* to be the sales weighted average vector of characteristics of models that appeared during the sample period: (63) z* ≡ ∑k=1K ∑t=1T qkt zk / ∑k=1K ∑t=1T qkt where we have reverted to the notation used in section 6; i.e., K is the total number of distinct models that we sold in the market over all T periods in our sample and qkt is the number of models that have the vector of characteristics zk that were sold in period t.53 hedonic commodity as ρt* ≡ ft(z*) where z* is a conveniently chosen vector of reference characteristics. This procedure is equivalent to our time dummy procedure using the normalizations (61). 52 If quantity sales data are available, then we recommend the weighted regression approach explained in section 5; recall equations (55). Also, in this case, if models are sold at more than one price in any given period, then we could weight each distinct price by its sales at that price or simply aggregate over sales of the specific model k in period t and let Pkt be the unit value price over all of these sales. In what follows, we assume that the second alternative is chosen. 53 If quantity information on sales of models, qkt, is not available, then define z* as an unweighted arithmetic mean of the zk. 26 Thus once we pick functional forms for the ft(z) and add stochastic terms to (60), equations (60), (61) and definition (63) completely specify our new hedonic regression framework. Of course, we still recommend that quantity weights (if available) be used in the econometric estimation for reasons explained in section 5 above; recall equations (55) above. However, if the number of time periods in our sample T is large, then there is a danger that the overall characteristics vector z* defined by (63) may not be very representative for any one or two consecutive periods. Thus we now suggest a different method of normalizing or making comparable the time dependent hedonic utility functions ft(z) that will deal with this lack of representativity problem. For each time period t, define zt* to be the sales weighted average vector of characteristics of models that appeared during period t: (64) zt* ≡ ∑k=1K qkt zk / ∑k=1K qkt ; t = 1,...,T. Recall our basic hedonic regression equations (60), Pkt = ρt ft(zkt). Now make the following normalizations: (65) ρt = 1 ; t = 1,...,T. Assuming that the parameters of the period t hedonic utility functions ft(z) have been estimated, we can now define the period t to t+1 Laspeyres, Paasche54 and Fisher type hedonic price indexes respectively as follows: (66) PLt,t+1 ≡ ft+1(zt*)/ft(zt*) ; t =1,...,T−1; (67) PPt,t+1 ≡ ft+1(zt+1*)/ft(zt+1*) ; t =1,...,T−1; (68) PFt,t+1 ≡ [PLt,t+1 PPt,t+1]1/2 ; t =1,...,T−1. The Fisher type hedonic price index is our preferred index. It can be seen that the Laspeyres and Paasche indexes defined by (66) and (67) can be quite closely related to Feenstra’s upper and lower bounding indexes to his true index (and this superlative exact hedonic methodology is used by Silver and Heravi (2001)), depending on what functional form for ft is chosen. Once the parameters that characterize the time dependent hedonic utility functions ft(z) have been estimated along with the associated aggregate period t hedonic commodity 54 Berndt, Griliches and Rappaport (1995; 262-263) and Berndt and Rappaport (2001) define the Laspeyres and Paasche type hedonic indexes in this way. However, the basic idea dates back to Griliches (1971; 59) and Dhrymes (1971; 111-112). Note that (66) and (67) break down if the vector of characteristics in period t is totally different from the vector of characteristics in period t+1. Similarly, problems can arise if some characteristics are zero in one period and nonzero in another period; recall the log of zero problem discussed in section 3 above. 27 prices ρt,55 then we can define period t aggregate demand for the hedonic commodity by:56 (69) Qt ≡ ∑k=1K ft(zk)qkt ; t = 1,...,T. The above model is our suggested direct method for forming exact aggregate period t prices and quantities, ρt and Qt, for the hedonic commodity. It is possible to use the outputs of hedonic regressions in another more indirect way, along with normal index number theory, in order to construct aggregate price and quantity indexes for the hedonic commodity.57 Recall equations (58) and (59) in the previous section, which defined the matched model Laspeyres and Paasche price indexes over hedonic models going from period t to t+1. The problem with these indexes is that they throw away information on models that are sold in only one of the two periods under consideration. One way of using this discarded information is to use the hedonic regressions in order to impute the missing prices.58 Suppose that model k was either unavailable or not sold in period t (so that qkt = 0) but that it was sold during period t+1 (so that Pkt+1 and qkt+1 are positive). The problem is that we have no price Pkt for this model in period t when it was not sold. However, for period t+1, our hedonic regression equation for this model is the following equation (neglecting the error term): (70) Pkt+1 = ρt+1ft+1(zk). Now we can use the estimated period t+1 hedonic utility function ft+1 and the estimated period t aggregate price for the hedonic commodity, ρt, in order to define an imputed price for model k in period t as follows: (71) Pkt* ≡ ρtft+1(zk) = ρt [Pkt+1/ρt+1] using (70) = [ρt /ρt+1] Pkt+1. Thus the imputed price for model k in period t, Pkt*, is equal to the observed model k price in period t+1, Pkt+1, times the reciprocal of the estimated rate of overall change in the price of the hedonic commodity going from period t to t+1, [ρt /ρt+1]. Now suppose that model k sold in period t (so that Pkt and qkt are positive) but that model k either disappeared or was not sold in period t+1 (so that qkt+1 is 0). The problem is that 55 In our second method where we set the ρt equal to unity, define ρ1 = 1 and ρt+1 = ρtPFt,t+1 for t =1,2,...,T−1 where the Fisher type hedonic chain index PFt,t+1 is defined by (68). In this second method, once the aggregate prices ρt have been determined, we obtain the aggregate quantities Qt as the deflated values, ∑k=1K Pktqkt / ρt , rather than using equations (69). 56 If quantity weights are not available, then we cannot compute Qt. 57 See Moulton (1996; 170) for an exposition of these methods. 58 See Armknecht and Maitland-Smith (1999) for a nice review of imputation methods. 28 we have no price Pkt+1 for this model in period t+1 when it was not sold. However, for period t, our hedonic regression equation for model k is the following equation (neglecting the error term): (72) Pkt = ρtft(zk). Now we can use the estimated period t hedonic utility function ft and the estimated period t+1 aggregate price for the hedonic commodity, ρt+1, in order to define an imputed price for model k in period t+1 as follows: (73) Pkt+1* ≡ ρt+1ft(zk) = ρt+1 [Pkt/ρt] using (72) = [ρt+1 /ρt] Pkt. Thus the imputed price for model k in period t+1, Pkt+1*, is equal to the observed model k price in period t, Pkt, times the estimated rate of overall change in the price of the hedonic commodity going from period t to t+1, [ρt+1 /ρt].59 Now we can use the imputed prices defined by (71) and (73) in order to obtain price and quantity information on all models that were present in one or both of periods t and t+1 and hence we can calculate the following completely matched Laspeyres and Paasche price indexes: (74) PLt ≡ [∑k=1K Pkt+1 qkt]/[∑k=1K Pkt qkt] ; (75) PPt ≡ [∑k=1K Pkt+1 qkt+1]/[∑k=1K Pkt qkt+1] where we use the imputed price Pkt* defined by (71) in place of the missing Pkt if qkt = 0 but qkt+1 is positive and we use the imputed price Pkt+1* defined by (73) in place of the missing Pkt+` if qkt+1 = 0 but qkt is positive.60 Comparing our new Laspeyres and Paasche price indexes defined by (74) and (75) to our old matched model Laspeyres and Paasche price indexes defined by (58) and (59), it can be seen that our new indexes do not throw away any relevant price and quantity information and hence can be expected to be more “accurate” in some sense. 8. Conclusion A number of tentative conclusions can be drawn from the Silver and Heravi (2001) paper and this discussion of it: • Traditional superlative index number techniques that aggregate up model data based on matched models can give more or less the same answer as a hedonic approach, provided that the amount of matching is relatively large. 59 I believe that the approach outlined here is consistent with the approach used by Silver and Heravi to generate imputed prices for missing models. Triplett (2001) outlines other approaches. 60 Obviously, if both qkt and qkt+1 are zero, then we do not require estimators for the missing prices Pkt and t+1 Pk in order to compute the Laspeyres and Paasche indexes defined by (74) and (75). 29 • Linear hedonic regressions are difficult to justify on theoretical grounds (at least based on our highly simplified approach to hedonic regressions) and hence should be avoided if possible. • If completely unconstrained hedonic regressions are run on the data of each period, then care should be taken in the choice of a reference model that allows us to compare the utility of the hedonic commodity across periods. In particular, the estimates of aggregate price change in the hedonic commodity will in general not be invariant to the choice of the reference model. • The use of quantity weights in hedonic regression models is strongly recommended if possible. • Under certain conditions, if models are matched in each period, then the hedonic regression approach will give exactly the same answer as a traditional statistical agency approach to the calculation of an elementary index. • We have not achieved a consensus on exactly what the “best practice” hedonic regression specification should be but flexible functional form considerations should probably be a factor in the discussion of this problem. References Armknecht, P.A. and Fenella-Maitland-Smith (1999), “Price Imputation and Other Techniques for Dealing with Missing Observations, Seasonality and Quality Change in Price Indices”, pp. 25-49 in Proceedings of the Measurement of Inflation Conference, M. Silver and D. Fenwick (eds.), Cardiff University, August 31-September 1, London: Office for National Statistics. Berndt, E.R. (1991), The Practice of Econometrics: Classic and Contemporary, Reading, MA: Addison-Wesley. Berndt, E.R. and N.J. Rappaport (2001), “Price and Quality of Desktop and Mobile Personal Computers: A Quarter Century Historical Overview”, The American Economic Review, forthcoming. Berndt, E.R., Z. Griliches, and N.J. Rappaport (1995), “Econometric Estimates of Price Indexes for Personal Computers in the 1990's,” Journal of Econometrics 68, 243-268. Christensen, L.R., D.W. Jorgenson and L.J. Lau (1975), “Transcendental Logarithmic Utility Functions”, American Economic Review 65, 367-383. Court, A.T. (1939), “Hedonic Price Indexes with Automotive Examples”, pp. 99-117 in The Dynamics of Automobile Demand, New York: General Motors Corporation. Davies, G.R. (1924), “The Problem of a Standard Index Number Formula”, Journal of the American Statistical Association 19, 180-188. Diewert, W.E. (1971), “An Application of the Shephard Duality Theorem: A Generalized Leontief Production Function”, Journal of Political Economy 79, 481-507. 30 Diewert, W.E. (1974), “Applications of Duality Theory”, pp. 106-171 in Frontiers of Quantitative Economics, Volume 2, M.D. Intriligator and D.A. Kendrick (eds.), Amsterdam: North-Holland. Diewert, W.E. (1993), “Duality Approaches to Microeconomic Theory”, pp. 105-175 in Essays in Index Number Theory, Volume 1, W.E. Diewert and A.O. Nakamura (eds.), Amsterdam: North-Holland. Diewert, W.E. (1995), “Axiomatic and Economic Approaches to Elementary Price Indexes”, Discussion Paper 95-01, Department of Economics, University of British Columbia, Vancouver, Canada, V6T 1Z1. Available on the Web at: http://web.arts.ubc.ca/econ/diewert/hmpgdie.htm Diewert, W.E. (1998), “Index Number Issues in the Consumer Price Index”, The Journal of Economic Perspectives, 12, (1998), 47-58. Dhrymes, P.J. (1971), “Price and Quality Changes in Consumer Capital Goods: An Empirical Study”, pp. 88-149 in Price Indexes and Quality Change, Z. Griliches (ed.), Cambridge, MA: Harvard University Press. Feenstra, R.C. (1995), “Exact Hedonic Price Indices”, Review of Economics and Statistics 77, 634-654. Fixler, D. and K.D. Zieschang (1992), “Incorporating Ancillary Measures of Process and Quality Change into a Superlative Productivity Index”, The Journal of Productivity Analysis 2, 245-267. Goldberger, A.A. (1968), “The Interpretation and Estimation of Cobb-Douglas Functions”, Econometrica 35, 464-472. Griliches, Z. (1971), “Introduction: Hedonic Price Indexes Revisited”, pp. 3-15 in Price Indexes and Quality Change, Z. Griliches (ed.), Cambridge, MA: Harvard University Press. Griliches, Z. (1961), “Hedonic Price Indexes for Automobiles: An Econometric Analysis of Quality Change”, in The Price Statistics of the Federal Government, G. Stigler (chairman): Washington D.C.: Government Printing Office. Reprinted as pp. 55-87 in Price Indexes and Quality Change, Z. Griliches (ed.), Cambridge, MA: Harvard University Press. Hicks, J.R. (1946), Value and Capital, Second Edition, Oxford: Clarendon Press. Kokoski, M.F., B.R. Moulton and K.D. Zieschang (1999), “Interarea Price Comparisons for Heterogeneous Goods and Several Levels of Commodity Aggregation”, pp. 123-166 in International and Interarea Comparisons of Income, Output and Prices, A. Heston and 31 R.E. Lipsey (eds,), NBER Studies in Income and Wealth 61, Chicago: The University of Chicago Press. Koskimäki, T. and Y. Vartia (2001), “Beyond Matched Pairs and Griliches-Type Hedonic Methods for Controlling Quality Changes in CPI Sub-Indices”, Paper presented at the 6th Ottawa Group Meeting, Canberra, Australia, April. Moulton, B. (1996), “Bias in the Consumer Price Index: What is the Evidence?”, Journal of Economic Perspectives 10:4, 139-177. Muellbauer, J. (1974), “Household Production Theory, Quality, and the ‘Hedonic Technique’”, The American Economic Review 64:6, 977-994. Pakes, A. (2001), “Some Notes on Hedonic Price Indices, with an Application to PC’s”, paper presented at the NBER Productivity Program Meeting, March 16, Cambridge MA. Scheffé, H. (1959), The Analysis of Variance, New York: John Wiley and Sons. Silver, M. (1995), “Elementary Aggregates, Micro-Indices and Scanner Data: Some Issues in the Compilation of Consumer Prices”, Review of Income and Wealth 41, 427- 438. Silver, M. (1999a), “Bias in the Compilation of Consumer Price Indices When Different Models of an Item Coexist”, pp. 21-37 in Proceedings of the Fourth Meeting of the International Working Group on Price Indices, Washington D.C., April 22-24, 1998, W. Lane (ed.), Washington D.C,: U.S. Department of Labor, Bureau of Labor Statistics. Silver, M. (1999b), “An Evaluation of the Use of Hedonic Regressions for Basic Components of Consumer Price Indices”, The Review of Income and Wealth 45:1, 41-56. Silver, M. and S. Heravi (2001), “The Measurement of Quality-Adjusted Price Changes”, paper presented at the NBER Conference on Scanner Data and Price Indexes, September 15-16, 2000 at Arlington, Virginia, forthcoming in an CRIW-NBER Volume edited by R. Feenstra and M. Shapiro. Rosen, S. (1974), “Hedonic Prices and Implicit Markets: Product Differentiation in Pure Competition”, Journal of Political Economy 82:1, 34-55. Triplett, J. (2001), Handbook on Quality Adjustment of Price Indexes for Information and Communication Technology Products, forthcoming, Paris: OECD. Walsh, C.M. (1901), The Measurement of General Exchange Value, New York: Macmillan and Company. Walsh, C.M. (1921), The Problem of Estimation, London: P.S. King and Son. 32 Yu, K. (2001), “Trends in Internet Access Prices in Canada”, Paper presented at the 6th Ottawa Group Meeting, Canberra, Australia, April.