next up previous
Next: About this document

Econ 551: Lecture Note 8
Increasing Efficiency of Maximum Likelihood
by Imposing Nonlinear Restrictions

1. Background: This note shows how we can increase the efficiency of maximum likelihood estimation (or any type of linear or nonlinear estimation for that matter) by imposing extra a priori information about the problem. The prior information about a problem comes in two forms:

1.
The prior information embodied in the specification of the likelihood function, i.e. the parametric density tex2html_wrap_inline160 for the observable x's.

2.
Additional prior information embodied in linear or nonlinear restrictions on the tex2html_wrap_inline164 parameters. We can write these restrictions in the general form:

equation21

where tex2html_wrap_inline164 is a tex2html_wrap_inline168 vector and g is a tex2html_wrap_inline172 vector: i.e. we assume we have J linear or nonlinear restrictions on the K parameters of the model. In general J could be less than or equal to K.

2. Asymptotic Considerations Note that imposing the extra prior information tex2html_wrap_inline182 is not necessary to obtain consistency of the maximum likelihood estimator. Assuming that the density tex2html_wrap_inline160 is correctly specified, the information inequality implies that the expected log-likelihood is maximized at tex2html_wrap_inline186 :

equation25

If the additional restrictions tex2html_wrap_inline182 are also correctly specified, i.e. tex2html_wrap_inline190 , then it is easy to see that asymptotically the constraint that tex2html_wrap_inline182 is non-binding:

equation28

However in finite samples the restriction that tex2html_wrap_inline182 will generally be binding, i.e. one can generally get a higher value for the log-likelihood of the unrestricted maximum likelihood estimator tex2html_wrap_inline196 defined by:

equation34

than for the restricted maximum likelihood estimator tex2html_wrap_inline198 defined by:

equation40

Lemma: Under the standard regularity conditions for maximum likelihood (see e.g. White, 1982) and assuming the likelihood tex2html_wrap_inline160 is correctly specified, then we have:

eqnarray46

where tex2html_wrap_inline202 is the tex2html_wrap_inline204 information matrix given by:

equation54

tex2html_wrap_inline206 is the tex2html_wrap_inline204 covariance matrix given by:

equation59

and tex2html_wrap_inline210 is the tex2html_wrap_inline212 matrix of partial derivatives of tex2html_wrap_inline214 with respect to tex2html_wrap_inline164 :

equation67

Proof: We have already derived the asymptotic distribution of the unrestricted ML estimator in Econ 551, so we restrict attention to the asymptotic distribution of the restricted ML estimator tex2html_wrap_inline198 . Note that we can convert the restricted ML problem in equation (5) to an unrestricted optimization problem by introducing a tex2html_wrap_inline172 vector of Lagrange multipliers tex2html_wrap_inline222 and forming the Lagrangian function:

equation73

The first order or Kuhn-Tucker conditions for this constrained optimization problem are given by:

eqnarray80

As is standard practice for determining the asymptotic distribution of nonlinear estimators, we ``linearize'' the problem by doing a first-order Taylor-series expansion of the Kuhn-Tucker conditions about the limiting values of tex2html_wrap_inline224 as tex2html_wrap_inline226 . Since we assume the density tex2html_wrap_inline160 is the true data generating process when tex2html_wrap_inline186 , then under standard regularity conditions we can establish the uniform consistency of tex2html_wrap_inline232 to tex2html_wrap_inline234 which implies that tex2html_wrap_inline236 with probability 1. If we let tex2html_wrap_inline238 denote the Lagrange multiplier for the limiting version of equatio (11) when tex2html_wrap_inline226 , we can see from the discussion of equations (2) and (3) that the constraint tex2html_wrap_inline182 is non-binding asymptotically, which implies that tex2html_wrap_inline244 . So doing a Taylor expansion of equations (12) and (13) about tex2html_wrap_inline246 we get:

eqnarray82

where tex2html_wrap_inline248 and tex2html_wrap_inline250 are points on the line segment joining tex2html_wrap_inline198 and tex2html_wrap_inline254 . We can write these equations in matrix form as:

equation88

Assuming the first matrix in equation (16) is invertible, we can solve to get:

equation102

The Central Limit Theorem implies that tex2html_wrap_inline256 . The uniform law of large numbers implies that with probability 1 we have:

equation118

Furthermore it is straightforward to verify that the inverse of the matrix on the right hand side of (18) is given by

equation129

where tex2html_wrap_inline206 is given in equation (9), and tex2html_wrap_inline260 and tex2html_wrap_inline262 are given by:

eqnarray138

Using this formula, it is then straigthforward to verify that tex2html_wrap_inline264 as claimed in equation (7). It is obvious from equation (9) that tex2html_wrap_inline266 , i.e. the addition of extra a priori information reduces the asymptotic covariance matrix of tex2html_wrap_inline198 . This is just another way of saying that the restricted maximum likelihood estimator tex2html_wrap_inline198 is more efficient than the unrestricted maximum likelihood esitmator tex2html_wrap_inline196 .

Exercise: What is the asymptotic distribution of tex2html_wrap_inline274 ?




next up previous
Next: About this document

John Rust
Mon Apr 21 15:55:28 CDT 1997