next up previous
Next: About this document

Solutions to Problem Set 0

Economics 551, Yale University

Woocheol Kim and John Rust

Question 1 We are asked to compute maximum likelihood estimates of the parameter vector tex2html_wrap_inline671 in the model given by:

equation14

where we are instructed to believe that tex2html_wrap_inline673 are IID draws from tex2html_wrap_inline675 and tex2html_wrap_inline677 . Actually you were mislead: the errors tex2html_wrap_inline679 are heteroscedastic, so conditional on tex2html_wrap_inline681 we have tex2html_wrap_inline683 where tex2html_wrap_inline685 . So you will be estimating a misspecified model, and later in Econ 551 we will discuss test statistics which are capable of detecting this misspecification. In the meantime your job is to calculate the MLE's of the parameters, tex2html_wrap_inline687 . The first step is to write down the likelihood function tex2html_wrap_inline689 for the data, tex2html_wrap_inline691 . In general ``brute force'' maximization of tex2html_wrap_inline689 may not be a good idea: it might be better to try a ``divide and conquer'' strategy. Note that the the joint density of (y,x), tex2html_wrap_inline697 , is a product of a conditional likelihood of y given x, tex2html_wrap_inline703 , times the marginal density of x, tex2html_wrap_inline707 . It is easy to see that this factorization or separability in the joint likelihood enables us to compute the MLEs for the tex2html_wrap_inline709 parameters and the tex2html_wrap_inline711 parameters independently. It also implies a block diagonality property which enable us to show that the asymptotic distributions of these parameters are independent.

eqnarray42

where tex2html_wrap_inline713 denotes the conditional likelihood of the tex2html_wrap_inline715 's given the tex2html_wrap_inline717 's, tex2html_wrap_inline719 denotes the marginal likelihood of the tex2html_wrap_inline717 's, tex2html_wrap_inline723 and tex2html_wrap_inline725 . The separability of parameters in the third and forth equation allows us to break the estimation problem into two subproblems, which ordinarily makes the programming considerably easier and computations considerably faster:

(1)
calculate the MLE's for tex2html_wrap_inline727 from the conditional likelihood tex2html_wrap_inline729 :

equation89

with FOC's

eqnarray102

Note that there is further separability in this first subproblem: the FOC for tex2html_wrap_inline731 is the same as the FOC for nonlinear least squares (NLS) estimation of tex2html_wrap_inline731 in equation (2) above, ignoring tex2html_wrap_inline735 since it doesn't affect the solution for tex2html_wrap_inline737 . Once we have computed the NLS estimate of tex2html_wrap_inline737 , we use the second equation to compute tex2html_wrap_inline741 as the sample variance of the estimated NLS residuals.

(2)
calculate the MLE's for tex2html_wrap_inline743 from the marginal likelihood

equation122

with FOC's:

eqnarray133

respectively. Using attached Gauss code nlreg.gpr for computing NLS estimates (the sum of squared errors, derivatives and hessian are coded in the eval_nls.g procedure) we are able to numerically solve for a vector tex2html_wrap_inline745 that sets the FOC for tex2html_wrap_inline731 given in equation (2) above to zero. There are closed-form expressions for the MLEs for the remaining parameters:

eqnarray155

Figures 1 and 2 below plot the true and estimated regression functions for this problem. Figure 1 plots the data points also: we see that both the true and estimated regression functions are generally quite close to each other and both go through the middle of the ``data cloud''. However we see that the MLE gives a substantially downward biased estimate of tex2html_wrap_inline751 and this causes the estimated regression function to make big divergences from the true regression function at extreme high and low values of x, say |x| > 4. However since there are very few high or low values of x in the sample, the NLS and MLE are not able to ``penalize'' this divergence: the MLE sets tex2html_wrap_inline757 since it helps fit the data around x=0 where most of the data points are. Figure 2 provides a blow up of Figure 1 without the data points to show you how the estimated regression function diverges from the truth near x=0. Overall, despite the misspecification of the heteroscedasticity, the MLE and NLS estimators seem to do a pretty good job of uncovering the true regression function, at least for those x's where we have sufficient data. Note that the data plot indicates heteroscedasticity, since the variance of the data around the regression function is bigger in the middle of the graph (near x=0) than at large positive or negative values of x.

Since the true model used to generate data1.asc has heteroscedastic and not homoscedastic error terms tex2html_wrap_inline769 as assumed here, it is easy to show that the MLE tex2html_wrap_inline741 of the misspecified model converges to the expectation of tex2html_wrap_inline685 . We can calculate tex2html_wrap_inline775 as follows:

eqnarray212

We leave it to you to show that even though the model is misspecified, the MLE tex2html_wrap_inline741 converges to tex2html_wrap_inline775 as tex2html_wrap_inline781 . Indeed in this case we find that tex2html_wrap_inline783 , which happens to be almost exactly equal to the ``true'' value.

Recall that the asymptotic distribution of the standardized MLE estimator is given by:

equation225

where tex2html_wrap_inline785 is the Hessian and tex2html_wrap_inline787 is the information matrix (both evaluated at tex2html_wrap_inline789 ). However the likelihood is misspecified in this case (due to heteroscedasticity), and it is easy to verify that the equality of tex2html_wrap_inline791 does not hold, so the correct asymptotic covariance matrix is given by the White ``misspecification consistent'' formula in equation (4) rather than by the inverse of the information matrix tex2html_wrap_inline793 For example, consider the tex2html_wrap_inline795 block of tex2html_wrap_inline797 , or tex2html_wrap_inline799 :

displaymath644

We see that a sufficient condition for the two expressions above to equal each other is tex2html_wrap_inline801 for all x, i.e. for the model to be homoscedastic as you were asked to assume. The failure of this equality can be a basis for a specification test statistic that can detect model misspecification which we will discuss in more detail below. Note that even despite the misspecification, the separability property implies that the Hessian is a block diagonal matrix:

displaymath645

displaymath646

where tex2html_wrap_inline805 . It is also easy to verify that despite the misspecification of the heteroscedasicity, the information matrix tex2html_wrap_inline797 is still block diagonal. Block diagonality of tex2html_wrap_inline797 and tex2html_wrap_inline811 implies that the covariance matrix of tex2html_wrap_inline813 is block diagonal. One can see further block diagonality between tex2html_wrap_inline731 and tex2html_wrap_inline735 and between tex2html_wrap_inline819 and tex2html_wrap_inline821 . This block diagonality is not just a consequence of the separability in both the marginal and conditional likelihood in the parameters describing the mean or conditional mean ( tex2html_wrap_inline819 and tex2html_wrap_inline731 , respectively) and the variance parameters ( tex2html_wrap_inline821 and tex2html_wrap_inline735 , respectively). You should verify through direct calculation that this block diagonality is a result of the symmetry of the normal distribution, which implies that tex2html_wrap_inline831 , where the conditional distribution of tex2html_wrap_inline833 given X is tex2html_wrap_inline837 , and similarly, the distribution of tex2html_wrap_inline839 , which implies that tex2html_wrap_inline841 for any positive odd integer k. If the normal distribution were not symmetric the block diagonality property wouldn't hold.

Using the block diagonality property, it is easy to compute the standard errors of the full parameter vector, tex2html_wrap_inline845 . The covariance matrix of tex2html_wrap_inline737 is given by 1/N times the upper tex2html_wrap_inline795 block of tex2html_wrap_inline853 and it is easy to verify that this is the same as the covariance matrix for the nonlinear least squares estimator for tex2html_wrap_inline855 (which is also numerically identical to the MLE) that is output from the nlreg.gpr program. By working with equation (4), you can show that the estimated variance of tex2html_wrap_inline735 is given by tex2html_wrap_inline859 where tex2html_wrap_inline861 is the sample analog of the fourth central moment of tex2html_wrap_inline863 :

displaymath647

There is a similar formula for the estimated variance of tex2html_wrap_inline865 . We have tex2html_wrap_inline867 , so the estimated standard error of tex2html_wrap_inline865 is 0.036. The estimated standard error of tex2html_wrap_inline741 is tex2html_wrap_inline875 .

Question 2. If the model is correctly specified, we know that the information equality will hold which implies that:

equation288

We compare White's misspecification consistent estimate to the inverse of information:

displaymath648

displaymath649

where all the estimates are given from the results of eval.g. Following White (``Maximum Likelihood Estimation of Misspecified Models'' Econometrica 1982) we can construct a formal hypothesis test statistic using the difference between the estimates of the upper diagonal elements of tex2html_wrap_inline811 and the corresponding elements of tex2html_wrap_inline797 . This statistic should be small if the null hypothesis of correct specification is true (since tex2html_wrap_inline881 in that case), and large if the model is misspecified (since it is not necessarily true that tex2html_wrap_inline881 is the model is misspecified). The large difference in the two difference estimates of the covariance matrix for tex2html_wrap_inline737 suggests that tex2html_wrap_inline887 and tex2html_wrap_inline797 are different, and hence that the model is misspecified. However we did not actually compute the actual test statistic to see at what level of significance null would actually be rejected (i.e. to compute the marginal signficance level of the test statistic) since we didn't expect you to know about this particular specification test statistic at this stage of the course.

Question 3. If the true model were log-linear, i.e. if

displaymath650

where tex2html_wrap_inline891 , then it is easy to see that the tex2html_wrap_inline715 are conditionally lognormally distributed so it is valid to take log transformation and estimate tex2html_wrap_inline709 by OLS:

displaymath651

. It would not difficult to show that the OLS estimates of the log-linearized model are actually the maximum likelihood estimates of the original lognormal specification (make sure you understand this by writing down the lognormal likelihood function and verifying what we just said is true)! However the error term for the specification of Model I in question 1 is additive and not multiplicative, and so the log transformation is generally not valid. Indeed, there is a positive probability of observing negative realizations of tex2html_wrap_inline897 , something that has zero probability under the lognormal specification. Thus, to even do OLS one must screen out all tex2html_wrap_inline899 pairs where tex2html_wrap_inline897 is negative, something that is generally not a good idea. When one does OLS on this nonrandomly selected subsample, it is not surprising that the estimates are highly biased:

displaymath652

We can show analytically why the OLS estimates for Model II will be inconsistent when the true model is Model I with additive normal errors rather than multiplicative lognormal errors. After screening out negative y's, the OLS estimator solves:

equation343

As tex2html_wrap_inline781 , we can show that the right hand side of the above equation converges uniformly to

displaymath653

In general the tex2html_wrap_inline731 that minimizes the expression above is not the same as the tex2html_wrap_inline731 that minimizes

displaymath654

which is the true tex2html_wrap_inline855 when the conditional mean function tex2html_wrap_inline915 is correctly specified. Therefore, we conclude that the probability limits for tex2html_wrap_inline745 's from the two different models( Model I and Model II) are not the same.\

Question 4 Figure 3 below plots the squared residuals tex2html_wrap_inline919 from the MLE/NLS estimation results in Question 1. The figure also plots the results of the following nonlinear regression:

equation352

where as before tex2html_wrap_inline921 and tex2html_wrap_inline923 is a conformable tex2html_wrap_inline925 parameter vector. We see substantial evidence of heteroscedasticity, confirming our earlier visual impression in looking at the data in figure 1. The estimated conditional variance function looks concave and symmetric w.r.t. y-axis, almost like a normal density. Table 3 summarizes the estimation results. According to table 3, the regression coefficient for the quadratic term is significant and seems to dominate the form of the heteroscedasticity plotted in figure 3. Figure 4 does a blow up, plotting the estimated and true conditional variance functions.

displaymath655

Some students may have used simple OLS, estimating a specification like

equation376

rather than the exponential specification in equation (7). This is also OK since we weren't specific about what type of tool to use to check for heteroscedasticity. The only disadvantage of the OLS specification in (8) over the exponential specification in (7) is that the latter doesn't guarantee that tex2html_wrap_inline933 for all x. However we find that even in the linear specification most of the predicted tex2html_wrap_inline937 values are indeed positive. Figure 4 compares the predicted values of tex2html_wrap_inline927 using both specifications, and we can see that the negative predicted values of tex2html_wrap_inline927 occur at the extreme high and low values of the observed x's.

Question 5. Now we consider full information maximum likelihood (FIML) estimation of Model III, which is the same as model I, but with an exponential specification for conditional heteroscedasticity. Thus the joint density for (y,x) is given by:

equation379

where tex2html_wrap_inline947 , and tex2html_wrap_inline675 . We want to consider simultaneous or ``full information maximum likelihood'' (FIML) estimation of the parameter vector tex2html_wrap_inline951 . The log-likelihood function tex2html_wrap_inline689 is given by:

eqnarray401

where tex2html_wrap_inline723 and tex2html_wrap_inline957 The gradients of tex2html_wrap_inline689 with respect to tex2html_wrap_inline961 are:

eqnarray439

The gradients for tex2html_wrap_inline963 with respect to tex2html_wrap_inline743 are the same as in Question 1. The hessian matrix for tex2html_wrap_inline713 with respect to tex2html_wrap_inline961 is given by:

eqnarray471

It is easy to verify (using the law of iterated expectations) that when tex2html_wrap_inline971 , the expectation of tex2html_wrap_inline973 , i.e. we have block diagonality between the tex2html_wrap_inline731 and tex2html_wrap_inline923 parameters (assuming the model is correctly specified). Similarly one can verify that the tex2html_wrap_inline979 block of the information matrix tex2html_wrap_inline981 is zero. This implies that the asymptotic covariance between the maximum likelihood estimates tex2html_wrap_inline737 and tex2html_wrap_inline985 is zero, so they are asymptotically independently distributed. This independence suggests the following 2-step procedure to obtain initial consistent estimates of tex2html_wrap_inline987 : 1) estimate tex2html_wrap_inline731 by NLS (see attached Gauss code eval_nls.g and shell program nlreg.gpr ), 2) use the estimated squared residuals tex2html_wrap_inline991 to estimate the tex2html_wrap_inline923 parameters by NLS using the exponential specification in equation (7). We did this using the same eval_nls.g procedure we used for step 1, with a slight modification of nlreg.gpr to substitute tex2html_wrap_inline995 instead of tex2html_wrap_inline715 as the dependent variable in the regression.

However we can do even better than this. We can do a 3rd step, weighted NLS or feasible generalized least squares (FGLS) estimation of tex2html_wrap_inline731 using the estimated conditional variance tex2html_wrap_inline1001 from step 2 as weights. The procedure eval_fgls.g provides the code to do the FGLS estimation. Due to the block diagonality property, it is not hard to show that the FGLS estimates of tex2html_wrap_inline855 have the same asymptotic distribution as the MLE: i.e. FGLS is asymptotically efficient in this case. To see this, note that the gradient and hessian of tex2html_wrap_inline1005 with respect to tex2html_wrap_inline731 is the same as the gradient and hessian for the following FGLS criterion function:

equation499

We know that the block diagonality property implies that as long as tex2html_wrap_inline985 is any consistent estimator of tex2html_wrap_inline1011 that a solution tex2html_wrap_inline737 to the FOC tex2html_wrap_inline1015 is asymptotically efficient. But since this is also the FOC for the FGLS estimator (10), it follows that tex2html_wrap_inline1017 is also an asymptotically efficient estimator, i.e. it attains not only the Chamberlain efficiency bound for condition moment restrictions, but the Cramer-Rao lower bound as well. It is not hard to show that these two bounds coincide in this case: make sure you understand this by verifying the equality yourself.

It is not apparent that the tex2html_wrap_inline985 obtained from the 4th step of our estimation procedure which regresses the squared residuals from the FGLS estimation in step 3 on tex2html_wrap_inline1021 will be asymptotically efficient since the first order condition for tex2html_wrap_inline923 from maximizing tex2html_wrap_inline1025 with respect to tex2html_wrap_inline923 does not appear to be the same as the FOC for tex2html_wrap_inline923 from the nonlinear regression in step 4 of our suggested estimation procedure. So to get fully efficient estimates, we can use tex2html_wrap_inline1031 as starting values for direct FIML estimation of the full parameter vector tex2html_wrap_inline987 using tex2html_wrap_inline1035 . The procedure eval_fiml.g and the shell program mle.gpr implement full maximum likelihood estimation of Model III (note we have also provided the procedure hesschk.g to allow you to compare numerical and analytically calculated values of the hessian matrix, verifying that the analytic formulas for the hessian matrix given above are correct). This full model is rather delicate and we were unable to get it to converge starting from tex2html_wrap_inline1037 . However we had no problems with convergence starting from tex2html_wrap_inline1031 . Table 4 below compares the FGLS and MLE estimates.

displaymath656

We see that the FIML and FGLS estimates of tex2html_wrap_inline731 are very close to each other and the standard errors are nearly identical, as we would expect from the theoretical result shown above that the FGLS estimator of tex2html_wrap_inline731 is asymptotically efficient. There are more significant differences in the FIML and FGLS estimates of tex2html_wrap_inline923 . In particular the standard errors of the FGLS estimates are significantly larger than the FIML estimates of tex2html_wrap_inline923 , which suggests that the FGLS estimates are not asymptotically efficient. Students should be able to verify that this is the case by deriving analytic formulas for the asymptotic covariance matrix for the MLE and FGLS estimators of tex2html_wrap_inline923 .

Question 6 The Nadaraya-Watson estimator is used to provide a nonparametric estimation of tex2html_wrap_inline1051

equation525

where tex2html_wrap_inline1053 and tex2html_wrap_inline1055 is defined to be a Gaussian density function. For the choice of a bandwidth parameter, tex2html_wrap_inline1057 a rule of thumb is used: tex2html_wrap_inline1059 with tex2html_wrap_inline1061 The Gauss program that computed these estimates is kernel.gpr. You should experiment with different bandwidths, showing that when h is much less than the automatically chosen value of h=.34 the fitted regression tends to be too wiggly, and too smooth when h is substantially greater than h=0.34, so in this case the automatically chosen bandwidth seems like the best compromise. There are more sophisticated ways to choose bandwidths such as ``least squares cross validation'' described in class. The simple rule given above does almost as well and is much simpler to compute. The kernel.gpr program also includes two types of polynomial series estimation, ordinary polynomials and nonlinear least squares using the exponential specification considered in problem 4. Figure 5 and Figure 6 compare the different estimators for the data in data1.asc and data2.asc, respectively. We can see that all of the estimators give similar answers in the region of the x-space where most of the data points lie, i.e. around x=0.

Questions 7 and 8 To explore the form of heteroscedasticity, we repeat the procedures described above, but using the squared residuals from Question 6 as dependent variable. Figure 7, which plots true and estimated conditional variance functions tex2html_wrap_inline1075 , shows that similar to the results on estimation of the conditional expectation in part 6, all of the different parametric and nonparametric estimation methods give similar answers in the region of the x-space where most of the data points lie, i.e. around x=0. Note particularly that the parametric exponential specification of tex2html_wrap_inline927 and the nonparametric kernel density estimate of tex2html_wrap_inline927 are quite close to each other except at extreme values of x. The same story also emerges in Figure 8, which plots the estimated squared residuals from the first stage NLS estimates of the exponential specification of the conditional expectation function. We didn't plot estimates of tex2html_wrap_inline1075 from the exponential specification, tex2html_wrap_inline1089 since we were unable to get the nlreg.gpr program to converge, even when we started the estimation from the true values of tex2html_wrap_inline1011 . This problem is probably a result of the large values chosen for the tex2html_wrap_inline1011 components, tex2html_wrap_inline1095 , especially under the polynomial specification at extreme values of x where the computer runs into underflow or overflow problems. With some ``playing around'' one might be able to coax nlreg.gpr into convergence, but here is a case where the parameter estimates from the exponential specification seems rather fragile and non-robust.




next up previous
Next: About this document

John Rust
Sun Apr 12 18:46:07 CDT 1998