next up previous
Next: About this document

. Econ 551: Lecture Notes
Endogenous Regressors and Instrumental Variables

0. Introduction

These notes introduce students to the problem of endogeneity in linear models and the method of instrumental variables that under certain circumstances allows consistent estimation of the structural coefficients of the endogenous regressors in the linear model. Sections 1 and 2 review the linear model and the method of ordinary least squares (OLS) in the abstract ( tex2html_wrap_inline928 ) setting, and the concrete ( tex2html_wrap_inline930 ) setting. The abstract setting allows us to define the ``theoretical'' regression coefficient to which the sample OLS estimator converges as the sample size tex2html_wrap_inline932 . Section 3 discusses the issue of non-uniqueness of the OLS coefficients if the regressor matrix does not have full rank, and describes some ways to handle this. Seftion 4 reviews the two key asymptotic properties of the OLS estimator, consistency and asymptotic normality. It derives a heteroscedasticity-consistent covariance matrix estimator for the limiting normal asymptotic distribution of the standardized OLS estimator. Section 5 introduces the problem of endogeneity, showing how it can arise in three different contexts. The next three sections demonstrate how the OLS estimator may not converge to the true coefficient values when we assume that the data are generated by some ``true'' underlying structural linear model. Section 6 discusses the problem of omitted variable bias. Section 7 discusses the problem of measurement error. Section 8 discusses the problem of simultaneous equations bias. Section 9 introduces the concept of an instrumental variable and proves the optimality of the two stage least squares (2SLS) estimator.

1. The Linear Model and Ordinary Least Squares (OLS) in tex2html_wrap_inline928 : We consider regression first in the abstract setting of the Hilbert space tex2html_wrap_inline928 . It is convenient to start with this infinite-dimensional space version of regression, since the least squares estimates can be viewed as the limiting result of doing OLS in tex2html_wrap_inline930 , as tex2html_wrap_inline940 . In tex2html_wrap_inline928 it is more transparent that we can do OLS under very general conditions, without assuming non-stochastic regressors, homoscedasticity, normally distributed errors, or that the true regression function is linear. Regression is simply the process of orthogonal projection of a dependent variable tex2html_wrap_inline944 onto the linear subspace space spanned by K random variables tex2html_wrap_inline948 . To be concrete, let tex2html_wrap_inline950 be a tex2html_wrap_inline952 dependent variable and tex2html_wrap_inline954 is a tex2html_wrap_inline956 vector of explanatory variables. Then as long as tex2html_wrap_inline958 and tex2html_wrap_inline960 exist and are finite, and as long as tex2html_wrap_inline960 is a nonsingular matrix, then we have the identity:

  equation23

where tex2html_wrap_inline964 is the least squares estimate given by:

  equation38

Note by construction, the residual term tex2html_wrap_inline966 is orthogonal to the regressor vector tex2html_wrap_inline954 ,

  equation42

where tex2html_wrap_inline970 defines the inner product between two random variables in tex2html_wrap_inline928 . The orthogonality condition (3) implies the Pythagorean Theorem

equation47

where tex2html_wrap_inline974 . From this we define the tex2html_wrap_inline976 as

equation49

Conceptually, tex2html_wrap_inline976 is the cosine of the angle tex2html_wrap_inline980 between the vectors tex2html_wrap_inline950 and tex2html_wrap_inline984 in tex2html_wrap_inline928 . The main point here is that the linear model (1) holds ``by construction'', regardless of whether the true relationship between tex2html_wrap_inline950 and tex2html_wrap_inline954 , the conditional expectation tex2html_wrap_inline992 is a linear or nonlinear function of tex2html_wrap_inline954 . In fact, the latter is simply the result of projecting tex2html_wrap_inline950 into a larger subspace of tex2html_wrap_inline928 , the space of all measurable functions of tex2html_wrap_inline954 . The second point is that definition of tex2html_wrap_inline964 insures the tex2html_wrap_inline954 matrix is ``exogenous'' in the sense of equation (3), i.e. the error term tex2html_wrap_inline966 is uncorrelated with the regressors tex2html_wrap_inline954 . In effect, we define tex2html_wrap_inline964 in such a way so the regressors tex2html_wrap_inline954 are exogenous by construction. It is instructive to repeat the simple mathematics leading up to this second conclusion. Using the identity (1) and the definition of tex2html_wrap_inline964 in (2) we have:

eqnarray57

2. The Linear Model and Ordinary Least Squares (OLS) in tex2html_wrap_inline930 : Consider regression in the ``concrete'' setting of the Hilbert space tex2html_wrap_inline930 . The dimension N is the number of observations, where we assume that these observations are IID realizations of the vector of random variables tex2html_wrap_inline1022 . Define tex2html_wrap_inline1024 and tex2html_wrap_inline1026 , where each tex2html_wrap_inline1028 is tex2html_wrap_inline952 and each tex2html_wrap_inline1032 is i tex2html_wrap_inline956 . Note y is now a vector in tex2html_wrap_inline930 . We can represent the tex2html_wrap_inline1040 matrix X as K vectors in tex2html_wrap_inline930 : tex2html_wrap_inline1048 , where tex2html_wrap_inline1050 is the tex2html_wrap_inline1052 column of X, a vector in tex2html_wrap_inline930 . Regression is simply the process of orthogonal projection of the dependent variable tex2html_wrap_inline1058 onto the linear subspace spanned by the K columns of tex2html_wrap_inline1062 . This gives us the identity:

  equation64

where tex2html_wrap_inline1064 is the least squares estimate given by:

  equation79

and by construction, the tex2html_wrap_inline1066 residual vector tex2html_wrap_inline1068 is orthogonal to the tex2html_wrap_inline1040 matrix of regressors:

  equation89

where tex2html_wrap_inline1072 defines the inner product between two random variables in the Hilbert space tex2html_wrap_inline930 . The orthogonality condition (13) implies the Pythagorean Theorem

equation97

where tex2html_wrap_inline1076 . From this we define the (uncentered) tex2html_wrap_inline976 as

equation99

Conceptually, tex2html_wrap_inline976 is the cosine of the angle tex2html_wrap_inline980 between the vectors y and tex2html_wrap_inline1086 in tex2html_wrap_inline930 .

The main point of these first two sections is that the linear model -- viewed either as a linear relationship between a ``dependent'' random variable tex2html_wrap_inline950 and a tex2html_wrap_inline956 vector of ``independent'' random variables tex2html_wrap_inline954 in tex2html_wrap_inline928 as in equation (1), or as a linear relationship between a vector-valued dependent variable y in tex2html_wrap_inline930 , and K independent variables making up the columns of the tex2html_wrap_inline1040 matrix X in equation (11) -- both hold ``by construction''. That is, regardless of whether the true relationship between y and X is linear, under very general conditions the Projection Theorem for Hilbert Spaces guarantees that there exists tex2html_wrap_inline1112 vectors tex2html_wrap_inline964 and tex2html_wrap_inline1064 such that tex2html_wrap_inline984 and tex2html_wrap_inline1120 equal the orthogonal projections of tex2html_wrap_inline950 and y onto the K-dimensional subspace of tex2html_wrap_inline928 and tex2html_wrap_inline930 spanned by the K variables in tex2html_wrap_inline954 and X, respectively. These coefficient vectors a constructed in such a way as to force the error terms tex2html_wrap_inline966 and tex2html_wrap_inline1140 to be orthogonal to tex2html_wrap_inline954 and X, respectively. When we speak about the problem of endogeneity, we mean a situation where we believe there is a that there is a ``true linear model'' tex2html_wrap_inline1146 relating tex2html_wrap_inline950 to tex2html_wrap_inline954 where the ``true coefficient vector'' tex2html_wrap_inline1152 is not necessarily equal to the least squares value tex2html_wrap_inline964 , i.e. the error tex2html_wrap_inline1156 is not necessarily orthogonal to tex2html_wrap_inline954 . We will provide several examples of how endogeneity can arise after reviewing the asymptotic properties of the OLS estimator.

3. Note on the Uniqueness of the Least Squares Coefficients

The Projection Theorem guarantees that in any Hilbert space H (including the two special cases tex2html_wrap_inline928 and tex2html_wrap_inline930 discussed above), the projection P(y|X) exists, where P(y|X) is the best linear predictor of an element tex2html_wrap_inline1170 . More precisely, if tex2html_wrap_inline1172 where each tex2html_wrap_inline1174 , then P(y|X) is the element of the smallest closed linear subspace spanned by the elements of X, tex2html_wrap_inline1180 that is closest to y:

equation110

It is easy to show that tex2html_wrap_inline1180 is a finite-dimensional linear subspace with dimension tex2html_wrap_inline1186 . The projection theorem tells us that P(y|X) is always uniquely defined, even if it can be represented as different linear combinations of the elements of X. However if X has full rank, the projection P(y|X) will have a unique representation given by

eqnarray115

Definition: We say X has full rank, if J = K, i.e. if the dimension of the linear subspace tex2html_wrap_inline1180 spanned by the elements of X equals the number of elements in X.

It is straightforward to show that X has full rank if and only if the K elements of X are linearly independent, which happens if and only if the tex2html_wrap_inline1212 matrix X'X is invertible. We use the heuristic notation X'X to denote the matrix whose (i,j) element is tex2html_wrap_inline1220 . To see the latter claim, suppose X'X is singular. Then there exists a vector tex2html_wrap_inline1224 such that tex2html_wrap_inline1226 and tex2html_wrap_inline1228 , where tex2html_wrap_inline1230 is the zero vector in tex2html_wrap_inline1232 . Then we have a'X' Xa = 0 or in inner product notation

equation124

However in a Hilbert space, an element has a norm of 0 iff it equals the 0 element in H. Since tex2html_wrap_inline1226 , we can assume without loss of generality that tex2html_wrap_inline1244 . Then we can rearrange the equation Xa = 0 and solve for tex2html_wrap_inline1248 to obtain:

equation127

where tex2html_wrap_inline1250 . Thus, if X'X is not invertible then X can't have full rank, since one of more elements of X are redundant in the sense that they can be exactly predicted by a linear combination of the remaining elements of X. Thus, it is just a matter of convention to eliminate the redundant elements of X to guarantee that it has full rank, which ensures that X'X exists and the least squares coefficient vector tex2html_wrap_inline1064 is uniquely defined by the standard formula

  equation129

Notice that the above equation applies to arbitrary Hilbert spaces H and is a shorthand for the tex2html_wrap_inline1268 that solves the following system of linear equations that consistent the normal equations for least squares:

eqnarray134

The normal equations follow from the orthogonality conditions tex2html_wrap_inline1270 , and can be written more compactly in matrix notation as

equation136

which is easily seen to be equivalent to the formula in equation (20) when X has full rank and the tex2html_wrap_inline1212 matrix X'X is invertible.

When X does not have full rank there are multiple solutions to the normal equations, all of which yield the same best prediction, P(y|X). In this case there are several ways to proceed. The most common way is to eliminate the redundant elements of X until the resulting reduced set of regressors has full rank. Alternatively one can compute P(y|X) via stepwise regression by squentially projecting y on tex2html_wrap_inline1248 , then projecting tex2html_wrap_inline1290 on tex2html_wrap_inline1292 and so forth. Finally, one can single out one of the many tex2html_wrap_inline1294 vectors that solve the normal equations to compute P(y|X). One approach is to use the shortest vector tex2html_wrap_inline1294 solving the normal equation, and leads to the following formula

  equation141

where tex2html_wrap_inline1300 is the generalized inverse of the square but non-invertible matrix X'X. The generalized inverse is computed by calculating the Jordan decomposition of [X'X] into a product of an orthonormal matrix W (i.e. a matrix satisfying W'W=WW'=I) and a diagonal matrix D whose diagonal elements are the eigenvalues of [X'X],

equation147

Then the generalized inverse is defined by

equation149

where tex2html_wrap_inline1314 is the diagonal matrix whose tex2html_wrap_inline1316 diagonal element is tex2html_wrap_inline1318 if the corresponding diagonal element tex2html_wrap_inline1320 of D is nonzero, and 0 otherwise.

Exercise: Prove that the generalized formula for tex2html_wrap_inline1064 given in equation (23) does in fact solve the normal equations and results in a valid solution for the best linear predictor tex2html_wrap_inline1328 . Also, verify that among all solutions to the normal equations, tex2html_wrap_inline1064 has the smallest norm.

4. Asymptotics of the OLS estimator. The sample OLS estimator tex2html_wrap_inline1064 can be viewed as the result of applying the ``analogy principle'', i.e. replacing the theoretical expectations in (2) with sample averages in (12). The Strong Law of Large Numbers (SLLN) implies that as tex2html_wrap_inline940 we have with probability 1,

equation160

The convergence above can be proven to hold uniformly for tex2html_wrap_inline1294 in compact subsets of tex2html_wrap_inline1232 . This implies a Uniform Strong Law of Large Numbers (USLLN) that implies the consistency of the OLS estimator (see Rust's lecture notes on ``Proof of the Uniform Law of Large Numbers''). Specifically, assuming tex2html_wrap_inline964 is uniquely identified (i.e. that it is the unique minimizer of tex2html_wrap_inline1342 , a result which holds whenever tex2html_wrap_inline954 has full rank as we saw in section 3), then with probability 1 we have

equation165

Given that we have a closed-form expression for the OLS estimators tex2html_wrap_inline964 in equation (2) and tex2html_wrap_inline1064 in equation (12), consistency can be established more directly by observing that the SLLN implies that with probability 1 the sample moments

equation169

So a direct appeal to Slutsky's Theorem establishes the consistency of the OLS estimator, tex2html_wrap_inline1350 , with probability 1.

The asymptotic distribution of the normalized OLS estimator, tex2html_wrap_inline1352 , can be derived by appealing to the Lindeberg-Levy Central Limit Theorem (CLT) for IID random vectors. That is we assume that tex2html_wrap_inline1354 are IID draws from some joint distribution F(y,X). Since tex2html_wrap_inline1358 where tex2html_wrap_inline1360 and

equation177

the CLT implies that

  equation181

Then, substituting for tex2html_wrap_inline1028 in the definition of tex2html_wrap_inline1064 in equation (12) and rearranging we get:

equation187

Appealing to the Slutsky Theorem and the CLT result in equation (30), we have:

equation196

where the tex2html_wrap_inline1212 covariance matrix tex2html_wrap_inline1368 is given by:

equation199

In finite samples we can form a consistent estimator of tex2html_wrap_inline1368 using the heteroscedasticity-consistent covariance matrix estimator tex2html_wrap_inline1372 given by:

  equation204

where tex2html_wrap_inline1374 . Actually, there is a somewhat subtle issue in proving that tex2html_wrap_inline1376 with probability 1. We cannot directly appeal to the SLLN to show that

equation215

since the estimated residuals tex2html_wrap_inline1378 are not IID random variables due to their common dependence on tex2html_wrap_inline1064 . To establish the result we must appeal to the Uniform Law of Large Numbers to show that uniformly for tex2html_wrap_inline1294 in a compact subset of tex2html_wrap_inline1232 we have:

equation220

Further more we must appeal to the following uniform convergence lemma:

Lemma: If tex2html_wrap_inline1386 uniformly with probability 1 for tex2html_wrap_inline1294 in a compact set, and if tex2html_wrap_inline1390 with probability 1, then with probability 1 we have:

equation227

These results enable us to show that

  eqnarray229

where tex2html_wrap_inline1230 is a tex2html_wrap_inline1212 matrix of zeros. Notice that we appealed to the ordinary SLLN to show that the first term on the right hand side of equation (38) converges to tex2html_wrap_inline1400 and the uniform convergence lemma to show that the remaining two terms converge to tex2html_wrap_inline1230 .

Finally, note that under the assumption of conditional independence, tex2html_wrap_inline1404 , and homoscedasticity, tex2html_wrap_inline1406 , the covariance matrix tex2html_wrap_inline1368 simplifies to the usual textbook formula:

equation246

However since there is no compelling reason to believe the linear model is homoscedastic, it is in general a better idea to play it safe and use the heteroscedasticity-consistent estimator given in equation (34).

5. Structural Models and Endogeneity As we noted above, the OLS parameter vector tex2html_wrap_inline964 exists under very weak conditions, and the OLS estimator tex2html_wrap_inline1064 converges to it. Further, by construction the residuals tex2html_wrap_inline1414 are orthogonal to X. However there are a number of cases where we believe there is a linear relationship between y and X,

equation251

where tex2html_wrap_inline1422 is not necessarily equal to the OLS vector tex2html_wrap_inline1424 and the error term tex2html_wrap_inline1426 is not necessarily orthogonal to X. This situation can occur for at least three different reasons:

1.
Omitted variable bias
2.
Errors in variables
3.
Simultaneous equations bias

We will consider omitted variable bias and errors in variables first since they are the easiest cases to understand how endogeneity problems arise. Then in the next section we will consider the simultaneous equations problem in more detail.

6. Omitted Variable Bias

Suppose that the true model is linear, but that we don't observe a subset of variables tex2html_wrap_inline1430 which are known to affect y. Thus, the ``true'' regression function can be written as:

equation259

where tex2html_wrap_inline1434 is tex2html_wrap_inline1436 and tex2html_wrap_inline1430 is tex2html_wrap_inline1440 , and tex2html_wrap_inline1442 and tex2html_wrap_inline1444 . Now if we don't observe tex2html_wrap_inline1446 , the OLS estimator tex2html_wrap_inline1448 based on N observations of the random variables tex2html_wrap_inline1452 converges to

  equation262

However we have:

  equation267

since tex2html_wrap_inline1444 for the ``true regression model'' when both tex2html_wrap_inline1434 and tex2html_wrap_inline1430 are included. Substituting equation (43) into equation (42) we obtain:

  equation272

We can see from this equation that the OLS estimator will generally not converge to the true parameter vector tex2html_wrap_inline1460 when there are omitted variables, except in the case where either tex2html_wrap_inline1462 or where tex2html_wrap_inline1464 , i.e. where the omitted variables are orthogonal to the observed included variables tex2html_wrap_inline1434 . Now consider the ``auxiliary regression between tex2html_wrap_inline1430 and tex2html_wrap_inline1434 :

  equation276

where tex2html_wrap_inline1472 is a tex2html_wrap_inline1474 matrix of regression coefficients, i.e. equation (45) denotes a system of tex2html_wrap_inline1476 regressions written in compact matrix notation. Note that by construction we have tex2html_wrap_inline1478 . Substituting equation (45) into equation (44) and simplifying, we obtain:

  equation284

In the special case where tex2html_wrap_inline1480 , we can characterize the omitted variable bias tex2html_wrap_inline1482 as follows:

1.
The asymptotic bias is 0 if tex2html_wrap_inline1486 or tex2html_wrap_inline1462 , i.e. if tex2html_wrap_inline1430 doesn't enter the regression equation ( tex2html_wrap_inline1462 ), or if tex2html_wrap_inline1430 is orthogonal to tex2html_wrap_inline1434 ( tex2html_wrap_inline1486 ). In either case, the restricted regression tex2html_wrap_inline1500 where tex2html_wrap_inline1502 is a valid regression and tex2html_wrap_inline1504 is a consistent estimator of tex2html_wrap_inline1460 .

2.
The asymptotic bias is positive if tex2html_wrap_inline1508 and tex2html_wrap_inline1510 , or if tex2html_wrap_inline1512 and tex2html_wrap_inline1514 . In this case, OLS converges to a distorted parameter tex2html_wrap_inline1516 which overestimates tex2html_wrap_inline1460 in order to ``soak up'' the part of the unobserved tex2html_wrap_inline1430 variable that is correlated with tex2html_wrap_inline1434 .

3.
The asymptotic bias is negative if tex2html_wrap_inline1508 and tex2html_wrap_inline1514 , or if tex2html_wrap_inline1512 and tex2html_wrap_inline1510 . In this case, OLS converges to a distorted parameter tex2html_wrap_inline1516 which underestimates tex2html_wrap_inline1460 in order to ``soak up'' the part of the unobserved tex2html_wrap_inline1430 variable that is correlated with tex2html_wrap_inline1434 .

Note that in cases 2. and 3., the OLS estimator tex2html_wrap_inline1504 converges to a biased limit tex2html_wrap_inline1516 to ensure that the error term tex2html_wrap_inline1544 is orthogonal to tex2html_wrap_inline1434 .

Exercise: Using the above equations, show that tex2html_wrap_inline1548 .

Now consider how a regression that includes both tex2html_wrap_inline1434 and tex2html_wrap_inline1430 automatically ``adjusts'' to converge to the true parameter vectors tex2html_wrap_inline1460 and tex2html_wrap_inline1556 . Note that the normal equations when we have both tex2html_wrap_inline1434 and tex2html_wrap_inline1430 are given by:

eqnarray290

Solving the first normal equation for tex2html_wrap_inline1460 we obtain:

equation292

Thus, the full OLS estimator for tex2html_wrap_inline1460 equals the biased OLS estimator that omits tex2html_wrap_inline1430 , tex2html_wrap_inline1568 , less a ``correction term'' tex2html_wrap_inline1570 that exactly offsets the asymptotic omitted variable bias tex2html_wrap_inline1482 of OLS derived above.

Now, substituting the equation for tex2html_wrap_inline1460 into the second normal equation and solving for tex2html_wrap_inline1556 we obtain:

equation297

The above formula has a more intuitive interpretation: tex2html_wrap_inline1556 can be obtained by regressing tex2html_wrap_inline950 on tex2html_wrap_inline1582 , where tex2html_wrap_inline1582 is the residual from the regression of tex2html_wrap_inline1430 on tex2html_wrap_inline1434 :

equation302

This is just the result of the second step of stepwise regression where the first step regresses tex2html_wrap_inline950 on tex2html_wrap_inline1434 , and the second step regresses the residuals tex2html_wrap_inline1594 on tex2html_wrap_inline1596 , where tex2html_wrap_inline1598 denotes the projection of tex2html_wrap_inline1430 on tex2html_wrap_inline1434 , i.e. tex2html_wrap_inline1604 where tex2html_wrap_inline1606 is given in equation (46) above. It is easy to see why this formula is correct. Take the original regression

  equation306

and project both sides on tex2html_wrap_inline1434 . This gives us

  equation309

since tex2html_wrap_inline1610 due to the orthogonality condition tex2html_wrap_inline1612 . Subtracting equation (52) from the regression equation (51), we get

equation314

This is a valid regression since tex2html_wrap_inline966 is orthogonal to tex2html_wrap_inline1430 and to tex2html_wrap_inline1434 and hence it must be orthogonal to the linear combination tex2html_wrap_inline1620 .

7. Errors in Variables

Endogeneity problems can also arise when there are errors in variables. Consider the regression model

  equation318

where tex2html_wrap_inline1622 and the stars denote the true values of the underlying variables. Suppose that we do not observe tex2html_wrap_inline1624 but instead we observe noisy versions of these variables given by:

eqnarray322

where tex2html_wrap_inline1626 , tex2html_wrap_inline1628 , and tex2html_wrap_inline1630 . That is, we assume that the measurement error is unbiased and uncorrelated with the disturbances tex2html_wrap_inline1068 in the regression equation, and the measurement errors in tex2html_wrap_inline1634 and tex2html_wrap_inline1636 are uncorrelated. Now the regression we actually do is based on the noisy observed values (y,x) instead of the underlying true values tex2html_wrap_inline1624 . Substituting for tex2html_wrap_inline1642 and tex2html_wrap_inline1636 in the regression equation (54), we obtain:

  equation325

Now observe that the mismeasured regression equation (56) has a composite error term tex2html_wrap_inline1646 that is not orthogonal to the mismeasured independent variable x. To see this, note that the above assumptions imply that

equation329

This negative covariance between x and tex2html_wrap_inline1068 implies that the OLS estimator of tex2html_wrap_inline1294 is asymptotically downward biased when there are errors in variables in the independent variable tex2html_wrap_inline1636 . Indeed we have:

equation331

Now consider the possibility of identifying tex2html_wrap_inline1294 by the method of moments. We can consistently estimate the three moments tex2html_wrap_inline1660 , tex2html_wrap_inline1662 and tex2html_wrap_inline1664 using the observed noisy measures (y,x). However we have

eqnarray340

Unfortunately we have 3 equations in 4 unknowns, tex2html_wrap_inline1670 . If we try to use higher moments of (y,x) to identify tex2html_wrap_inline1294 , we find that we always have more unknowns that equations.

8. Simultaneous Equations Bias

Consider the simple supply/demand example from chapter 16 of Greene. We have:

  eqnarray349

where y denotes income, p denotes price, and we assume that tex2html_wrap_inline1680 . Solving tex2html_wrap_inline1682 we can write the reduced-form which expresses the endogenous variables (p,q) in terms of the exogenous variable y:

  eqnarray356

By the assumption that y is exogenous in the structural equations (60), it follows that the two linear equations in the reduced form, (61), are valid regression equations; i.e. tex2html_wrap_inline1690 . However p is not an exogenous regressor in either the supply or demand equations in (60) since

eqnarray366

Thus, the endogeneity of p means that OLS estimation of the demand equation (i.e. a regression of q on p and y) will result in an overestimated (upward biased) price coefficient. We would expect that OLS estimation of the supply equation (i.e. a regression of q on p only) will result in an underestimated (downward biased) price coefficient, however it is not possible to sign the bias in general.

Exercise: Show that the OLS estimate of tex2html_wrap_inline1706 converges to

equation373

where

equation375

Since tex2html_wrap_inline1708 , it follows from the above result that OLS estimator is upward biased. It is possible that when tex2html_wrap_inline1710 is sufficiently small and tex2html_wrap_inline1460 is sufficiently large that the OLS estimate will converge to a positive value, i.e. it would lead us to incorrectly infer that the demand equation slopes upwards (Giffen good?) instead of down.

Exercise: Derive the probability limit for the OLS estimator of tex2html_wrap_inline1460 in the supply equation (i.e. a regression of q on p only). Show by example that this probability limit can be either higher or lower than tex2html_wrap_inline1460 .

Exercise: Show that we can identify tex2html_wrap_inline1460 from the reduced-form coefficients tex2html_wrap_inline1724 . Which other structural coefficients tex2html_wrap_inline1726 are identified?

9. Instrumental Variables We have provided three examples where we are interested in estimating the coefficients of a linear ``structural'' model, but where OLS estimates will produce misleading estimates due to a failure of the orthogonality condition tex2html_wrap_inline1728 in the linear structural relationship

  equation385

where tex2html_wrap_inline1152 is the ``true'' vector of structural coefficients. If tex2html_wrap_inline954 is endogenous, then tex2html_wrap_inline1734 , then tex2html_wrap_inline1736 , and the OLS estimator of the structural coefficients tex2html_wrap_inline1152 in equation (65) will be inconsistent. Is it possible to consistently estimate tex2html_wrap_inline1152 when tex2html_wrap_inline954 is endogenous? In this section we will show that the answer is yes provided we have access to a sufficient number of instrumental variables.

Definition: Given a linear structural relationship (65), we say the tex2html_wrap_inline956 vector of regressors tex2html_wrap_inline954 is endogenous if tex2html_wrap_inline1734 , where tex2html_wrap_inline1156 , and tex2html_wrap_inline1152 is the ``true'' structural coefficient vector.

Now suppose we have access to a tex2html_wrap_inline1754 vector of instruments, i.e. a random vector tex2html_wrap_inline1756 satisfying:

  eqnarray395

9.1 The exactly indentified case and the simple IV estimator. Consider first the exactly identified case where J = K, i.e. we have just as many instruments as endogenous regressors in the structural equation (65). Multiply both sides of the structural equation (65) by tex2html_wrap_inline1760 and take expectations. Using A2) we obtain:

  eqnarray402

If we assume that the tex2html_wrap_inline1212 matrix tex2html_wrap_inline1764 is invertible, we can solve the above equation for the tex2html_wrap_inline1112 vector tex2html_wrap_inline1768 :

  equation406

However plugging in tex2html_wrap_inline1770 from equation (67) we obtain:

equation412

The fact that tex2html_wrap_inline1772 motivates the definition of the simple IV estimator tex2html_wrap_inline1774 as the sample analog of tex2html_wrap_inline1768 in equation (68). Thus, suppose we have a random sample consisting of N IID observations of the random vectors tex2html_wrap_inline1780 , i.e. our data set consists of tex2html_wrap_inline1782 which can be represented in matrix form by the tex2html_wrap_inline1066 vector y, and the tex2html_wrap_inline1040 matrices Z and X.

Definition: Assume that the tex2html_wrap_inline1212 matrix Z'X exists. Then the simple IV estimator tex2html_wrap_inline1774 is the sample analog of tex2html_wrap_inline1768 given by:

equation427

Similar to the OLS estimator, we can appeal to the SLLN and Slutsky's Theorem to show that with probability 1 we have:

equation436

We can appeal to the CLT to show that

equation445

where

equation456

where we use the result that tex2html_wrap_inline1802 for any invertible matrix A. The covariance matrix tex2html_wrap_inline1806 can be consistently estimated by its sample analog:

  equation462

where tex2html_wrap_inline1808 . We can show that the estimator (74) is consistent using the same argument we used to establish the consistency of the heteroscedasticity-consistent covariance matrix estimator (34) in the OLS case. Finally, consider the form of tex2html_wrap_inline1806 in the homoscedastic case.

Definition: We say the error terms tex2html_wrap_inline1068 in the structural model in equation (65) are homoscedastic if there exists a nonnegative constant tex2html_wrap_inline1814 for which:

equation479

A sufficient condition for homoscedasticity to hold is tex2html_wrap_inline1816 and tex2html_wrap_inline1818 . Under homoscedasticity the asymptotic covariance matrix for the simple IV estimator becomes:

equation482

and if the above two sufficient conditions hold, it can be consistently estimated by its sample analog:

equation486

where tex2html_wrap_inline1820 is consistently estimated by:

  equation496

As in the case of OLS, we recommend using the heteroscedasticity consistent covariance matrix estimator (74) which will be consistent regardless of whether the true model (65) is homoscedastic or heteroscedastic rather than the estimator (78) which will be inconsistent if the true model is heteroscedastic.

9.2 The overidentified case and two stage least squares. Now consider the overidentified case, i.e. when we have more instruments than endogenous regressors, i.e. when J > K. Then the matrix tex2html_wrap_inline1764 is not square, and the simple IV estimator tex2html_wrap_inline1768 is not defined. However we can always choose a subset tex2html_wrap_inline1828 consisting of a tex2html_wrap_inline956 subvector of the tex2html_wrap_inline1832 random vector Z so that tex2html_wrap_inline1836 is square and invertible. More generally we could construct instruments by taking linear combinations of the full list of instrumental variables tex2html_wrap_inline1756 , where tex2html_wrap_inline1606 is a tex2html_wrap_inline1842 matrix.

  equation509

Example 1. Suppose we want our instrument vector tex2html_wrap_inline1828 to consist of the first K components of tex2html_wrap_inline1756 . Then we set tex2html_wrap_inline1850 where the I is a tex2html_wrap_inline1212 identity matrix and 0 is a tex2html_wrap_inline1858 matrix of zeros, and | denotes the horizontal concatenation operator.

Example 2. Consider the instruments given by tex2html_wrap_inline1862 where tex2html_wrap_inline1864 . It is straightforward to verify that this is a tex2html_wrap_inline1842 matrix. We can interpret tex2html_wrap_inline1868 as the matrix of regression coefficents from regressing tex2html_wrap_inline954 on tex2html_wrap_inline1756 . Thus tex2html_wrap_inline1874 is the projection of the endogenous variables tex2html_wrap_inline1876 onto the instruments tex2html_wrap_inline1756 . Since tex2html_wrap_inline954 is a vector of random variables, tex2html_wrap_inline1606 actually represents the horizontal concatenation of K separate tex2html_wrap_inline1754 regression coefficient vectors. We can write all the regressions compactly in vector form as

  equation516

where tex2html_wrap_inline1888 is tex2html_wrap_inline956 vector of error terms for each of the K regression equations. Thus, by definition of least squares, each component of tex2html_wrap_inline1888 must be orthogonal to the regressors tex2html_wrap_inline1756 , i.e.

  equation519

where tex2html_wrap_inline1230 is a tex2html_wrap_inline1842 matrix of zeros. We will shortly formalize the sense in which tex2html_wrap_inline1862 are the ``optimal instruments'' within the class of instruments formed from linear combinations of tex2html_wrap_inline1756 in equation (79). Intuitively, the optimal instruments should be the best linear predictors of the endogenous regressors tex2html_wrap_inline954 , and clearly, the instruments tex2html_wrap_inline1862 from the first stage regression (80) are the best linear predictors of the endogenous tex2html_wrap_inline954 variables.

Definition: Assume that tex2html_wrap_inline1914 exists where tex2html_wrap_inline1916 . Then we define tex2html_wrap_inline1918 by

  equation531

Definition: Assume that tex2html_wrap_inline1920 exists where tex2html_wrap_inline1862 and tex2html_wrap_inline1924 . Then we define tex2html_wrap_inline1926 by

  eqnarray543

Clearly tex2html_wrap_inline1926 is a special case of tex2html_wrap_inline1918 when tex2html_wrap_inline1932 . We refer to it as two stage least squares since tex2html_wrap_inline1926 can be computed in two stages:

Stage 1: Regress the endogenous variables tex2html_wrap_inline954 on the instruments tex2html_wrap_inline1756 to get the linear projections tex2html_wrap_inline1940 as in equation (80).

Stage 2: Regress tex2html_wrap_inline950 on tex2html_wrap_inline1944 instead of on tex2html_wrap_inline1756 as shown in equation (83). The projections tex2html_wrap_inline1944 essentially ``strip off'' the endogenous components tex2html_wrap_inline1888 of tex2html_wrap_inline954 , resulting in a valid regression equation for tex2html_wrap_inline1152 .

We can get some more intuition into the latter statement by rewriting the original structural equation (65) as:

  eqnarray568

where tex2html_wrap_inline1956 . Notice that tex2html_wrap_inline1958 as a consequence of equations (66) and (81). It follows from the projection theorem that equation (84) is a valid regression, i.e. that tex2html_wrap_inline1960 . Alternatively, we can simply use the same straightforward reasoning as we did for tex2html_wrap_inline1768 , substituting equation (65) for tex2html_wrap_inline950 and simplifying equations (82) and (83) to see that tex2html_wrap_inline1966 . This motivates the definitions of tex2html_wrap_inline1968 and tex2html_wrap_inline1970 as the sample analogs of tex2html_wrap_inline1918 and tex2html_wrap_inline1926 :

Definition: Assume tex2html_wrap_inline1976 where Z is tex2html_wrap_inline1980 and tex2html_wrap_inline1606 is tex2html_wrap_inline1842 , and W'X is invertible (this implies that tex2html_wrap_inline1988 ). Then the instrumental variables estimator tex2html_wrap_inline1968 is the sample analog of tex2html_wrap_inline1918 defined in equation (82):

eqnarray590

Definition: Assume that the tex2html_wrap_inline1994 matrix Z'Z and the tex2html_wrap_inline1212 matrix W'W are invertible, where tex2html_wrap_inline2002 and tex2html_wrap_inline2004 . The two-stage least squares estimator tex2html_wrap_inline1970 is the sample analog of tex2html_wrap_inline1926 defined in equation (83):

eqnarray611

where tex2html_wrap_inline2010 is the tex2html_wrap_inline2012 projection matrix

equation621

Using exactly the same arguments that we used to prove the consistency and asymptotic normality of the simple IV estimator, it is straightforward to show that tex2html_wrap_inline2014 and tex2html_wrap_inline2016 , where tex2html_wrap_inline1806 is the tex2html_wrap_inline1212 matrix given by:

equation630

Now we have a whole family of IV estimators depending on how we choose the tex2html_wrap_inline1842 matrix tex2html_wrap_inline1606 . What is the optimal choice for tex2html_wrap_inline1606 ? As we suggested earlier, the optimal choice should be tex2html_wrap_inline1864 since this results in a linear combination of instruments tex2html_wrap_inline1862 that is the best linear predictor of the endogenous regressors tex2html_wrap_inline954 .

Theorem: Assume that the error term tex2html_wrap_inline966 in the structural model (65) is homoscedastic. Then the optimal IV estimator is 2SLS, i.e. it has the smallest asymptotic covariance matrix among all IV estimators.

Proof: Under homoscedasticity, the asymptotic covariance matrix for the IV estimator is equal to

equation640

We now show this covariance matrix is minimized when tex2html_wrap_inline2036 , i.e. we show that

equation646

where tex2html_wrap_inline2038 is the asymptotic covariance matrix for 2SLS which is obtained by substituting tex2html_wrap_inline1864 into the formula above. Since tex2html_wrap_inline2042 if and only tex2html_wrap_inline2044 , it is sufficient to show that tex2html_wrap_inline2046 , or

equation653

Note that tex2html_wrap_inline2048 and tex2html_wrap_inline2050 , so our task reduces to showing that

equation659

However since tex2html_wrap_inline1916 fopr some tex2html_wrap_inline1842 matrix tex2html_wrap_inline1606 , it follows that the elements of tex2html_wrap_inline1828 must span a subspace of the linear subspace spanned by the elements of tex2html_wrap_inline1756 . Then the law of Iterated Projections implies that

equation661

This implies that there exists a tex2html_wrap_inline956 vector of error terms tex2html_wrap_inline1582 satisfying

  equation663

where tex2html_wrap_inline1582 satisfy the orthogonality relation

equation666

where tex2html_wrap_inline1230 is an tex2html_wrap_inline1212 matrix of zeros. Then using the identity (94) we have

eqnarray671

We conclude that tex2html_wrap_inline2046 , and hence tex2html_wrap_inline2076 , i.e. 2SLS has the smallest asymptotic covariance matrix among all IV estimators. tex2html_wrap_inline2078

There is an alternative algebraic proof that tex2html_wrap_inline2080 . Given a square symmetric positive semidefinite matrix A with Jordan decomposition A=W D W' (where W is an orthonormal matrix and D is a diagonal matrix with diagonal elements equal to the eigenvalues of A) we can define its square root tex2html_wrap_inline2092 as

equation679

where tex2html_wrap_inline2094 is a diagonal matrix whose diagonal elements equal the square roots of the diagonal elements of D. It is easy to verify that tex2html_wrap_inline2098 . Similarly if A is invertible we define tex2html_wrap_inline2102 as the matrix tex2html_wrap_inline2104 where tex2html_wrap_inline2106 is a diagonal matrix whos diagonal elements are the inverses of the square roots of the diagonal element of D. It is easy to verify that tex2html_wrap_inline2110 . Using these facts about matrix square roots, we can write

  equation692

where M is the tex2html_wrap_inline1212 matrix given by

equation699

It is straightforward to verify that M is idempotent, which implies that the right hand side of equation (98) is positive semidefinite. tex2html_wrap_inline2078

It follows that in terms of the asymptotics it is always better to use all available instruments tex2html_wrap_inline1756 . However the chapter in Davidson and MacKinnon shows that in terms of the finite sample performance of the IV estimator, using more instruments may not always be a good thing. It is easy to see that when the number of instruments J gets sufficient large, the IV estimator converges to the OLS estimator.

Exercise: Show that when J = N and the columns of Z are linearly independent that tex2html_wrap_inline2128 .

Exercise: Show that when J = K and the columns of Z are linearly independent that tex2html_wrap_inline2134 .

However there is a tension here, since using fewer instruments worsens the finite sample properties of the 2SLS estimator. A result due to Kinal Econometrica 1980 shows that the tex2html_wrap_inline2136 moment of the 2SLS estimator exists if and only if

equation715

Thus, if J=K 2SLS (which coincides with the SIV estimator by the exercise above) will not even have a finite mean. If we would like the 2SLS estimator to have a finite mean and variance we should have at least 2 more instruments than endogenous regressors. See section 7.5 of Davidson and MacKinnon for further discussion and monte carlo evidence.

Exercise: Assume that the errors are homoscedastic. Is it the case that in finite samples that the 2SLS estimator dominates the IV estimator in terms of the size of its estimated covariance matrix?

Hint: Note that under homoscedasticity, the inverse of the sample analog estimators of the covariance matrix for tex2html_wrap_inline1968 and tex2html_wrap_inline1970 are is given by:

eqnarray722

If we assume that tex2html_wrap_inline2144 , then the relative finite sample covariance matrices for IV and 2SLS depend on the difference

equation732

Show that if tex2html_wrap_inline1976 for some tex2html_wrap_inline1842 matrix tex2html_wrap_inline1606 that tex2html_wrap_inline2152 and this implies that the difference tex2html_wrap_inline2154 is idempotent.

Now consider a structural equation of the form

  equation734

where the tex2html_wrap_inline1436 random vector tex2html_wrap_inline1434 is known to be exogenous (i.e. tex2html_wrap_inline1612 ), but the tex2html_wrap_inline1440 random vector tex2html_wrap_inline1430 is suspected of being endogenous. It follows that tex2html_wrap_inline1434 can serve as instrumental variables for the tex2html_wrap_inline1430 variables.

Exercise: Is it possible to identify the tex2html_wrap_inline1556 coefficients using only tex2html_wrap_inline1434 as instrumental variables? If not, show why.

The answer to the exercise is clearly no: for example 2SLS based on tex2html_wrap_inline1434 alone will result in a first stage regression tex2html_wrap_inline1598 that is a linear function of tex2html_wrap_inline1434 , so the second stage of 2SLS would encounter multicollinearity. This shows that in order to identify tex2html_wrap_inline1556 we need additional instruments W that are excluded from the structural equation (103). This results in a full instrument list tex2html_wrap_inline2184 of size tex2html_wrap_inline2186 . The discussion above suggest that in order to identity tex2html_wrap_inline1556 we need tex2html_wrap_inline2190 and tex2html_wrap_inline2192 , otherwise we have a multicollinearity problem in the second stage. In summary, to do instrumental variables we need instruments Z which are:

1.
Uncorrelated with the error term tex2html_wrap_inline966 in the structural equation (103),

2.
Correlated with the included endogenous variables tex2html_wrap_inline1430 ,

3.
Contain components tex2html_wrap_inline1828 which are excluded from the structural equation (103).




next up previous
Next: About this document

econ551
Thu Feb 25 14:03:16 EST 1999