next up previous
Next: About this document

Econ 551b Econometrics II
Suggested Solution to Problem Set 4

Prof. John Rust, Hiu Man Chan

Spring, 1999

QUESTION 1 First order condition satisfied by the MLE is

displaymath1142

Taylor series expansion around tex2html_wrap_inline1218 yields

displaymath1143

where tex2html_wrap_inline1220 is between tex2html_wrap_inline1222 and tex2html_wrap_inline1224 . Rearranging yields

displaymath1144

For the first term, tex2html_wrap_inline1226 implies tex2html_wrap_inline1228 and tex2html_wrap_inline1230 , and by LLN and continuous mapping theorem, we have

displaymath1145

For the second term, by CLT for triangular arrays. (See Note below).

displaymath1146

Finally by Slutsky theorem,

displaymath1147

Therefore MLE is regular.

To establish this result, we need (i) tex2html_wrap_inline1232 int tex2html_wrap_inline1234 so that tex2html_wrap_inline1236 , (ii) tex2html_wrap_inline1238 are independent, and (iii) Lindeberg condition

For all tex2html_wrap_inline1240

displaymath1148

where

eqnarray131

in order to apply CLT for triangular array tex2html_wrap_inline1242 .

Note: The CLT needed above for the second term is not the usual sort of CLT, but a special CLT for triangular arrays.

Definition: A triangular array is a doubly indexed sequence of random vectors tex2html_wrap_inline1244 where the index i runs from tex2html_wrap_inline1248 for each N and the index N runs from tex2html_wrap_inline1254 . Furthermore for each N, the sequence tex2html_wrap_inline1258 is IID.

Note that the random vectors tex2html_wrap_inline1260 are a triangular array, since for each N the random variables tex2html_wrap_inline1238 , tex2html_wrap_inline1266 is an IID sequence from the density tex2html_wrap_inline1268 . It follows that tex2html_wrap_inline1260 , tex2html_wrap_inline1266 is also an IID sequence, and thus the sequence tex2html_wrap_inline1274 is a triangular array of random vectors.

We now sketch the proof of a central limit theorem for triangular arrays of random variables. This proof can then be extended to a proof of the CLT for triangular arrays of random vectors via the Cramer-Wold device, i.e. if tex2html_wrap_inline1244 is a triangular array of random vectors in tex2html_wrap_inline1278 and we can show that for each non-random vector tex2html_wrap_inline1280 that tex2html_wrap_inline1282 is a triangular array of random variables that satisfy the CLT with asymptotic distribution tex2html_wrap_inline1284 where tex2html_wrap_inline1286 is a tex2html_wrap_inline1288 covariance matrix, then we can conclude that the original sequence tex2html_wrap_inline1244 satisfies a (vector) CLT with asymptotic distribution tex2html_wrap_inline1292 .

CLT for Triangular Arrays of Random Variables Consider a triangular array of random variables tex2html_wrap_inline1244 satisfying tex2html_wrap_inline1296 and tex2html_wrap_inline1298 . If tex2html_wrap_inline1300 then we have:

displaymath1149

Proof: (sketch) We show the moment generating function for the normalized sum tex2html_wrap_inline1302 converges to the moment generating for a tex2html_wrap_inline1304 , i.e. tex2html_wrap_inline1306 . We have

  equation175

where tex2html_wrap_inline1308 is the moment generating function for tex2html_wrap_inline1310 . We can do a Taylor series approximation of this function about 0:

  equation185

where tex2html_wrap_inline1314 is a point on the line segment joining tex2html_wrap_inline1316 and 0. Note that by the basic property of moment generating functions we have tex2html_wrap_inline1320 , and tex2html_wrap_inline1322 . Furthermore the assumption that tex2html_wrap_inline1324 implies that

displaymath1150

Now substituting equation (2) into equation (1) we obtain

displaymath1151

Now we use a result from calculus (which can be proved by taking logs and appealing to L'Hôpital's rule) that if tex2html_wrap_inline1326 then tex2html_wrap_inline1328 . Using this result we obtain:

  equation210

Equation (3) shows that the limit of the limit of the moment generating function of the normalized sum tex2html_wrap_inline1302 equals the moment generating function for a tex2html_wrap_inline1304 random variable. By appealing to Bochner's theorem (which actually requires characteristic functions, which is simply the moment generating function evaluated at t i instead of t, where tex2html_wrap_inline1338 ), it follows that the normalized sum tex2html_wrap_inline1302 converges in distribution to a tex2html_wrap_inline1304 random variable, completing the proof of the CLT for triangular arrays.

QUESTION 2 Twice differentiating tex2html_wrap_inline1344 , we get:

displaymath1152

Hence,

displaymath1153

And we know

eqnarray250

where the third equality holds by assuming that the regularity conditions for interchanging integration and differentiation operations hold.
Therefore,

displaymath1154

QUESTION 3 Let's assume that all the six assumptions given in your class notes hold. To find the asymptotic distribution of the NLLS estimator, we first establish its consistency.
Let

displaymath1155

and

displaymath1156

Then tex2html_wrap_inline1346 .
Now since

eqnarray290

and the first term is independent of tex2html_wrap_inline1348 . Therefore

displaymath1157

Under correct specification, it is clear that tex2html_wrap_inline1350 , so tex2html_wrap_inline1352 . Under misspecification, the NLLS estimator converges to a point tex2html_wrap_inline1354 that gives the best mean squared approximation in the parametric family of tex2html_wrap_inline1356 to tex2html_wrap_inline1358 .
Next, using the fact that tex2html_wrap_inline1360 (in more accurate term, tex2html_wrap_inline1362 ), perform Taylor Series Expansion about tex2html_wrap_inline1364 :

displaymath1158

With tex2html_wrap_inline1366 . Rearranging the terms, we get

displaymath1159

Note that the data is iid, and that

eqnarray359

Therefore we can apply Uniform Law of Large Numbers to obtain

displaymath1160

And by Central Limit Theorem,

displaymath1161

As a result, by Slutsky Theorem,

displaymath1162

Under correct specification, tex2html_wrap_inline1368 can be simplified since with tex2html_wrap_inline1370 ,

displaymath1163

Therfore,

displaymath1164

Furthermore, as tex2html_wrap_inline1372 are iid and independent of tex2html_wrap_inline1374 , we have

eqnarray488

Where tex2html_wrap_inline1376 is the variance of tex2html_wrap_inline1372 .
With cancellations we finally get:

displaymath1165

QUESTION 4

(a)
We are asked to compute maximum likelihood estimates of the parameter vector tex2html_wrap_inline1380 in the model given by:

equation537

where we are instructed to believe that tex2html_wrap_inline1382 are IID draws from tex2html_wrap_inline1384 and tex2html_wrap_inline1386 . Actually you were mislead: the errors tex2html_wrap_inline1372 are heteroscedastic, so conditional on tex2html_wrap_inline1390 we have tex2html_wrap_inline1392 where tex2html_wrap_inline1394 . So you will be estimating a misspecified model, and later in Econ 551 we will discuss test statistics which are capable of detecting this misspecification. In the meantime your job is to calculate the MLE's of the parameters, tex2html_wrap_inline1396 . The first step is to write down the likelihood function tex2html_wrap_inline1398 for the data, tex2html_wrap_inline1400 . In general ``brute force'' maximization of tex2html_wrap_inline1398 may not be a good idea: it might be better to try a ``divide and conquer'' strategy. Note that the the joint density of (y,x), tex2html_wrap_inline1406 , is a product of a conditional likelihood of y given x, tex2html_wrap_inline1412 , times the marginal density of x, tex2html_wrap_inline1416 . It is easy to see that this factorization or separability in the joint likelihood enables us to compute the MLEs for the tex2html_wrap_inline1418 parameters and the tex2html_wrap_inline1420 parameters independently. It also implies a block diagonality property which enable us to show that the asymptotic distributions of these parameters are independent.

eqnarray565

where tex2html_wrap_inline1422 denotes the conditional likelihood of the tex2html_wrap_inline1424 's given the tex2html_wrap_inline1238 's, tex2html_wrap_inline1428 denotes the marginal likelihood of the tex2html_wrap_inline1238 's, tex2html_wrap_inline1432 and tex2html_wrap_inline1434 . The separability of parameters in the third and forth equation allows us to break the estimation problem into two subproblems, which ordinarily makes the programming considerably easier and computations considerably faster:

(1)
calculate the MLE's for tex2html_wrap_inline1436 from the conditional likelihood tex2html_wrap_inline1438 :

equation612

with FOC's

eqnarray625

Note that there is further separability in this first subproblem: the FOC for tex2html_wrap_inline1440 is the same as the FOC for nonlinear least squares (NLS) estimation of tex2html_wrap_inline1440 in equation (2) above, ignoring tex2html_wrap_inline1376 since it doesn't affect the solution for tex2html_wrap_inline1446 . Once we have computed the NLS estimate of tex2html_wrap_inline1446 , we use the second equation to compute tex2html_wrap_inline1450 as the sample variance of the estimated NLS residuals.

(2)
calculate the MLE's for tex2html_wrap_inline1452 from the marginal likelihood

equation645

with FOC's:

eqnarray656

respectively. Using attached Gauss code nlreg.gpr for computing NLS estimates (the sum of squared errors, derivatives and hessian are coded in the eval.g procedure) we are able to numerically solve for a vector tex2html_wrap_inline1454 that sets the FOC for tex2html_wrap_inline1440 given in equation (2) above to zero. There are closed-form expressions for the MLEs for the remaining parameters:

eqnarray678

displaymath1166

Figures 1 and 2 below plot the true and estimated regression functions for this problem. Figure 1 plots the data points also: we see that both the true and estimated regression functions are generally quite close to each other and both go through the middle of the ``data cloud''. However we see that the MLE gives a substantially downward biased estimate of tex2html_wrap_inline1460 and this causes the estimated regression function to make big divergences from the true regression function at extreme high and low values of x, say |x| > 4. However since there are very few high or low values of x in the sample, the NLS and MLE are not able to ``penalize'' this divergence: the MLE sets tex2html_wrap_inline1466 since it helps fit the data around x=0 where most of the data points are. Figure 2 provides a blow up of Figure 1 without the data points to show you how the estimated regression function diverges from the truth near x=0. Overall, despite the misspecification of the heteroscedasticity, the MLE and NLS estimators seem to do a pretty good job of uncovering the true regression function, at least for those x's where we have sufficient data. Note that the data plot indicates heteroscedasticity, since the variance of the data around the regression function is bigger in the middle of the graph (near x=0) than at large positive or negative values of x.

Since the true model used to generate data1.asc has heteroscedastic and not homoscedastic error terms tex2html_wrap_inline1478 as assumed here, it is easy to show that the MLE tex2html_wrap_inline1450 of the misspecified model converges to the expectation of tex2html_wrap_inline1394 . We can calculate tex2html_wrap_inline1484 as follows:

eqnarray735

We leave it to you to show that even though the model is misspecified, the MLE tex2html_wrap_inline1450 converges to tex2html_wrap_inline1484 as tex2html_wrap_inline1490 . Indeed in this case we find that tex2html_wrap_inline1492 , which happens to be almost exactly equal to the ``true'' value.

Recall that the asymptotic distribution of the standardized MLE estimator is given by:

equation748

where tex2html_wrap_inline1494 is the Hessian and tex2html_wrap_inline1496 is the information matrix (both evaluated at tex2html_wrap_inline1498 ). However the likelihood is misspecified in this case (due to heteroscedasticity), and it is easy to verify that the equality of tex2html_wrap_inline1500 does not hold, so the correct asymptotic covariance matrix is given by the White ``misspecification consistent'' formula in equation (4) rather than by the inverse of the information matrix tex2html_wrap_inline1502 For example, consider the tex2html_wrap_inline1504 block of tex2html_wrap_inline1506 , or tex2html_wrap_inline1508 :

displaymath1167

We see that a sufficient condition for the two expressions above to equal each other is tex2html_wrap_inline1510 for all x, i.e. for the model to be homoscedastic as you were asked to assume. The failure of this equality can be a basis for a specification test statistic that can detect model misspecification which we will discuss in more detail below. Note that even despite the misspecification, the separability property implies that the Hessian is a block diagonal matrix:

displaymath1168

displaymath1169

where tex2html_wrap_inline1514 . It is also easy to verify that despite the misspecification of the heteroscedasicity, the information matrix tex2html_wrap_inline1506 is still block diagonal. Block diagonality of tex2html_wrap_inline1506 and tex2html_wrap_inline1520 implies that the covariance matrix of tex2html_wrap_inline1522 is block diagonal. One can see further block diagonality between tex2html_wrap_inline1440 and tex2html_wrap_inline1376 and between tex2html_wrap_inline1528 and tex2html_wrap_inline1530 . This block diagonality is not just a consequence of the separability in both the marginal and conditional likelihood in the parameters describing the mean or conditional mean ( tex2html_wrap_inline1528 and tex2html_wrap_inline1440 , respectively) and the variance parameters ( tex2html_wrap_inline1530 and tex2html_wrap_inline1376 , respectively). You should verify through direct calculation that this block diagonality is a result of the symmetry of the normal distribution, which implies that tex2html_wrap_inline1540 , where the conditional distribution of tex2html_wrap_inline1542 given X is tex2html_wrap_inline1546 , and similarly, the distribution of tex2html_wrap_inline1548 , which implies that tex2html_wrap_inline1550 for any positive odd integer k. If the normal distribution were not symmetric the block diagonality property wouldn't hold.

Using the block diagonality property, it is easy to compute the standard errors of the full parameter vector, tex2html_wrap_inline1554 . The covariance matrix of tex2html_wrap_inline1446 is given by 1/N times the upper tex2html_wrap_inline1504 block of tex2html_wrap_inline1562 and it is easy to verify that this is the same as the covariance matrix for the nonlinear least squares estimator for tex2html_wrap_inline1564 (which is also numerically identical to the MLE) that is output from the nlreg.gpr program. By working with equation (4), you can show that the estimated variance of tex2html_wrap_inline1376 is given by tex2html_wrap_inline1568 where tex2html_wrap_inline1570 is the sample analog of the fourth central moment of tex2html_wrap_inline1572 :

displaymath1170

There is a similar formula for the estimated variance of tex2html_wrap_inline1574 . We have tex2html_wrap_inline1576 , so the estimated standard error of tex2html_wrap_inline1574 is 0.036. The estimated standard error of tex2html_wrap_inline1450 is tex2html_wrap_inline1584 .

(b)
If the model is correctly specified, we know that the information equality will hold which implies that:

equation810

We compare White's misspecification consistent estimate to the inverse of information:

displaymath1171

displaymath1172

where all the estimates are given from the results of eval.g. Following White (``Maximum Likelihood Estimation of Misspecified Models'' Econometrica 1982) we can construct a formal hypothesis test statistic using the difference between the estimates of the upper diagonal elements of tex2html_wrap_inline1520 and the corresponding elements of tex2html_wrap_inline1506 . This statistic should be small if the null hypothesis of correct specification is true (since tex2html_wrap_inline1590 in that case), and large if the model is misspecified (since it is not necessarily true that tex2html_wrap_inline1590 is the model is misspecified). The large difference in the two difference estimates of the covariance matrix for tex2html_wrap_inline1446 suggests that tex2html_wrap_inline1596 and tex2html_wrap_inline1506 are different, and hence that the model is misspecified. However we did not actually compute the actual test statistic to see at what level of significance null would actually be rejected (i.e. to compute the marginal signficance level of the test statistic) since we didn't expect you to know about this particular specification test statistic at this stage of the course.

(c)
If the true model were log-linear, i.e. if

displaymath1173

where tex2html_wrap_inline1600 , then it is easy to see that the tex2html_wrap_inline1424 are conditionally lognormally distributed so it is valid to take log transformation and estimate tex2html_wrap_inline1418 by OLS:

displaymath1174

. It would not difficult to show that the OLS estimates of the log-linearized model are actually the maximum likelihood estimates of the original lognormal specification (make sure you understand this by writing down the lognormal likelihood function and verifying what we just said is true)! However the error term for the specification of Model I in part (a) is additive and not multiplicative, and so the log transformation is generally not valid. Indeed, there is a positive probability of observing negative realizations of tex2html_wrap_inline1606 , something that has zero probability under the lognormal specification. Thus, to even do OLS one must screen out all tex2html_wrap_inline1608 pairs where tex2html_wrap_inline1606 is negative, something that is generally not a good idea. When one does OLS on this nonrandomly selected subsample, it is not surprising that the estimates are highly biased:

displaymath1175

We can show analytically why the OLS estimates for Model II will be inconsistent when the true model is Model I with additive normal errors rather than multiplicative lognormal errors. After screening out negative y's, the OLS estimator solves:

equation864

As tex2html_wrap_inline1490 , we can show that the right hand side of the above equation converges uniformly to

displaymath1176

In general the tex2html_wrap_inline1440 that minimizes the expression above is not the same as the tex2html_wrap_inline1440 that minimizes

displaymath1177

which is the true tex2html_wrap_inline1564 when the conditional mean function tex2html_wrap_inline1624 is correctly specified. Therefore, we conclude that the probability limits for tex2html_wrap_inline1454 's from the two different models( Model I and Model II) are not the same.\

(d)
Figure 3 below plots the squared residuals tex2html_wrap_inline1628 from the MLE/NLS estimation results in part (a). The figure also plots the results of the following nonlinear regression:

equation872

where as before tex2html_wrap_inline1630 and tex2html_wrap_inline1632 is a conformable tex2html_wrap_inline1634 parameter vector. We see substantial evidence of heteroscedasticity, confirming our earlier visual impression in looking at the data in figure 1. The estimated conditional variance function looks concave and symmetric w.r.t. y-axis, almost like a normal density. Table 3 summarizes the estimation results. According to table 3, the regression coefficient for the quadratic term is significant and seems to dominate the form of the heteroscedasticity plotted in figure 3. Figure 4 does a blow up, plotting the estimated and true conditional variance functions.

displaymath1178

Some students may have used simple OLS, estimating a specification like

equation896

rather than the exponential specification in equation (7). This is also OK since we weren't specific about what type of tool to use to check for heteroscedasticity. The only disadvantage of the OLS specification in (8) over the exponential specification in (7) is that the latter doesn't guarantee that tex2html_wrap_inline1642 for all x. However we find that even in the linear specification most of the predicted tex2html_wrap_inline1646 values are indeed positive. Figure 4 compares the predicted values of tex2html_wrap_inline1636 using both specifications, and we can see that the negative predicted values of tex2html_wrap_inline1636 occur at the extreme high and low values of the observed x's.

(e)
Now we consider full information maximum likelihood (FIML) estimation of Model III, which is the same as model I, but with an exponential specification for conditional heteroscedasticity. Thus the joint density for (y,x) is given by:

equation898

where tex2html_wrap_inline1656 , and tex2html_wrap_inline1384 . We want to consider simultaneous or ``full information maximum likelihood'' (FIML) estimation of the parameter vector tex2html_wrap_inline1660 . The log-likelihood function tex2html_wrap_inline1398 is given by:

eqnarray920

where tex2html_wrap_inline1432 and tex2html_wrap_inline1666 The gradients of tex2html_wrap_inline1398 with respect to tex2html_wrap_inline1670 are:

eqnarray958

The gradients for tex2html_wrap_inline1672 with respect to tex2html_wrap_inline1452 are the same as in part (a). The hessian matrix for tex2html_wrap_inline1422 with respect to tex2html_wrap_inline1670 is given by:

eqnarray990

It is easy to verify (using the law of iterated expectations) that when tex2html_wrap_inline1680 , the expectation of tex2html_wrap_inline1682 , i.e. we have block diagonality between the tex2html_wrap_inline1440 and tex2html_wrap_inline1632 parameters (assuming the model is correctly specified). Similarly one can verify that the tex2html_wrap_inline1688 block of the information matrix tex2html_wrap_inline1690 is zero. This implies that the asymptotic covariance between the maximum likelihood estimates tex2html_wrap_inline1446 and tex2html_wrap_inline1694 is zero, so they are asymptotically independently distributed. This independence suggests the following 2-step procedure to obtain initial consistent estimates of tex2html_wrap_inline1696 : 1) estimate tex2html_wrap_inline1440 by NLS (see attached Gauss code eval_nls.g and shell program nlreg.gpr), 2) use the estimated squared residuals tex2html_wrap_inline1700 to estimate the tex2html_wrap_inline1632 parameters by NLS using the exponential specification in equation (7). We did this using the same eval_nls.g procedure we used for step 1, with a slight modification of nlreg.gpr to substitute tex2html_wrap_inline1704 instead of tex2html_wrap_inline1424 as the dependent variable in the regression.

However we can do even better than this. We can do a 3rd step, weighted NLS or feasible generalized least squares (FGLS) estimation of tex2html_wrap_inline1440 using the estimated conditional variance tex2html_wrap_inline1710 from step 2 as weights. The procedure eval_fgls.g provides the code to do the FGLS estimation. Due to the block diagonality property, it is not hard to show that the FGLS estimates of tex2html_wrap_inline1564 have the same asymptotic distribution as the MLE: i.e. FGLS is asymptotically efficient in this case. To see this, note that the gradient and hessian of tex2html_wrap_inline1714 with respect to tex2html_wrap_inline1440 is the same as the gradient and hessian for the following FGLS criterion function:

equation1018

We know that the block diagonality property implies that as long as tex2html_wrap_inline1694 is any consistent estimator of tex2html_wrap_inline1720 that a solution tex2html_wrap_inline1446 to the FOC tex2html_wrap_inline1724 is asymptotically efficient. But since this is also the FOC for the FGLS estimator (10), it follows that tex2html_wrap_inline1726 is also an asymptotically efficient estimator, i.e. it attains not only the Chamberlain efficiency bound for condition moment restrictions, but the Cramer-Rao lower bound as well. It is not hard to show that these two bounds coincide in this case: make sure you understand this by verifying the equality yourself.

It is not apparent that the tex2html_wrap_inline1694 obtained from the 4th step of our estimation procedure which regresses the squared residuals from the FGLS estimation in step 3 on tex2html_wrap_inline1730 will be asymptotically efficient since the first order condition for tex2html_wrap_inline1632 from maximizing tex2html_wrap_inline1734 with respect to tex2html_wrap_inline1632 does not appear to be the same as the FOC for tex2html_wrap_inline1632 from the nonlinear regression in step 4 of our suggested estimation procedure. So to get fully efficient estimates, we can use tex2html_wrap_inline1740 as starting values for direct FIML estimation of the full parameter vector tex2html_wrap_inline1696 using tex2html_wrap_inline1744 . The procedure eval_fiml.g and the shell program mle.gpr implement full maximum likelihood estimation of Model III (note we have also provided the procedure hesschk.g to allow you to compare numerical and analytically calculated values of the hessian matrix, verifying that the analytic formulas for the hessian matrix given above are correct). This full model is rather delicate and we were unable to get it to converge starting from tex2html_wrap_inline1746 . However we had no problems with convergence starting from tex2html_wrap_inline1740 . Table 4 below compares the FGLS and MLE estimates.

displaymath1179

We see that the FIML and FGLS estimates of tex2html_wrap_inline1440 are very close to each other and the standard errors are nearly identical, as we would expect from the theoretical result shown above that the FGLS estimator of tex2html_wrap_inline1440 is asymptotically efficient. There are more significant differences in the FIML and FGLS estimates of tex2html_wrap_inline1632 . In particular the standard errors of the FGLS estimates are significantly larger than the FIML estimates of tex2html_wrap_inline1632 , which suggests that the FGLS estimates are not asymptotically efficient. Students should be able to verify that this is the case by deriving analytic formulas for the asymptotic covariance matrix for the MLE and FGLS estimators of tex2html_wrap_inline1632 .




next up previous
Next: About this document

econ551
Wed Apr 7 13:19:23 EDT 1999