No Title

Next: About this document

Economics 551 Professor Rust
Spring 1999

Midterm Exam
(Due at start of class, February 24, 1999)

Part I: Regression Questions (computers not required for this part)

Do Question 1 and 2 out of 3 of the remaining Part I questions below.

Question 1 (200 points).

a.

Compute the OLS estimates

for the following 2-variable linear regression problem:

displaymath188

b.

Unfortunately, there was one student who didn't know how to invert a

matrix. Thinking that it would be unnecessary to estimate

as a whole, he proposed the following estimation formulas:

/ , and

/ .

Calculate these ``naive'' estimators of and and compare them with those obtained in b.

c.

What do you think about his approach? Will it work generally? Will the naive estimators be unbiased and consistent? If not, specify the conditions needed to justify his approach. What's the intuition behind those conditions?

d.

How can you generalize your argument in d to the case of a k-variable linear regression model?

e.

Show that if a regression contains a set of K mutually exclusive dummy variables

where the variables are mutually exclusive in the sense that if

(observation i is 1 for the

dummy variable), then

for all

(i.e. all of the other dummy variables dummy take the value 0 for observation i), then the K OLS regression estimates

in the regression

are given by

displaymath190

Show that is just the mean of y over the subpopulation of individuals i with .

Question 2. (100 points) Consider the general multivariate regression model

a.

Suppose you estimate the OLS estimate of

, and then compute

, the

vector of predicted values of y, and

, the

vector of error terms,

1.: What is the value of the inner product of and ?
2.: Justify your answer in part a-1 above. You can either use a geometric argument or an algebraic derivation. Can you give an intuitive explanation for your result?
3.: What are the implications of your answer for regression analysis ?

b.

Consider now the quantity

1.: Show that c has to be between zero and one by using your answer to part a above. (Hint: use the pythagorean theorem.)
2.: Does c provide any sort of measure of ``goodness of fit'' of the regression model? Explain your answer for full credit. What is the interpretation of the case where c=0? What is the interpretation of the case where c=1?

Question 3 (100 points). This question considers a regression through the origin and the connection with the geometric notion of projection in three dimensional space. You should be able to answer this question using simple matrix algebra, without the use of a computer. Consider the following model:

where

eqnarray74

a.

Write the model in the form

using matrices. Specify these matrices and their dimensions.

b.

Calculate (X'X) and its inverse. Verify that

where I is the

identity matrix.

c.

Derive the least squares estimates

and the predicted value

d.

In a three dimensional diagram, display the following:

1.: the subspace S spanned by the columns of X (shade region).
2.: the vectors y, and and .
3.: the orthogonal projection of y onto the subspace S sketched in part d-1 above.

e.

Derive the vector of residuals

and calculate:

1.: where . Does lie in the subspace spanned by the columns of X? Does your result for the value of shed any light on this?
2.: . Is this equal to zero? Explain your answer in either case.

f.

The orthogonal projection of y into the subspace S spanned by the columns of the matrix X is given by

. Calculate Py and verify that

g.

A symmetric square matrix A is idempotent if A'A = A. Show that P given above is idempotent. Calculate the rank of P. Is P an invertible matrix?

Question 4. (100 points) The following questions concern OLS estimation of the general linear model

where y is , X is and is an vector of error terms.

a.: What happens when we try to do OLS when the regressor matrix X has rank less than K? Is the X'X matrix invertible in this case? If not, does the OLS estimate exist?
b.: Does the problem of multicollinearity have anything to do with the rank of X?
c.: Show that when X'X is not invertible there are generally infinitely many solutions to the normal equations for the OLS estimator.
d.: Does the best fitting predicted y, , exist when X'X is not invertible? If yes, can you provide a formula for or a procedure for computing it?
e.: Define what is meant by the generalized inverse, of a square matrix A. Is the vector a solution to the normal equations if X'X is not invertible?
f.: Describe the process of stepwise regression and discuss whether this procedure will allow us to compute the predicted values of the dependent variable, . If X'X is invertible, will the coefficients produced by stepwise regression coincide with the coefficients from the standard OLS formula ?

Applied Regression/Instrumental Variable Questions (computers required for this part)

Do Question 0 and Question 1 or 2 and Question 3 or 4

Question 0. (200 points) Given an matrix of instruments and an matrix of endogenous regressors we form the instrumental variables estimator.

a.: What is the equation for the instrumental variables estimator? Consider separately the three cases, J=K, J > K and J < K.
b.: In the overidentified case, J > K, we have more instruments than endogenous regressors. Suppose we form a matrix of instruments for some matrix . Derive the formula for the class of IV estimators and show how it depends on the choice of . Is there an ``optimal'' choice for ? If so, describe what the optimal is and in what sense this choice is optimal.
c.: Sketch the argument for showing the consistency and asymptotic normality of the IV estimator for two cases: 1) the homoscedastic case, and 2) the heteroscedastic case.
d.: Justify your answer in part b above by showing that in the homoscedastic case your choice of results in an IV estimator that has the smallest asymptotic covariance matrix among all IV estimators.

Question 1. (100 points) Consider the vector of observations contained in the file pop (populations in each of the 50 states, which you also find in the directory at the Statlab). Load it and call it X.

a.

Compute (X'R X)/49, where R is given by

whre I is the

identity matrix and

is a

vector of ones. Could (X'RX)/49 ever be negative? Why or why not?

b.

Compute the sample standard deviation of the population of the U.S. and the sample variance by using simple Gauss commands.

c.

Compare your result in a. with your result in b. Are the answers to a. and b. the same or different? If they are the same, provide an explanation for why this is the case.

(HINT: you might want to examine X'RX. Recall the properties of the matrix R, namely R= R'R = R*R, and see what R does to X).

Question 2 (100 points). Consider the set of hypothetical data on the regress model below.

where

displaymath194

a.: Can you compute OLS estimates of the three unknowns ?
b.: Is there a problem of multicollinearity in this regression? If not, show that the columns of the X matrix are linearly independent. If so, show that the columns of the X matrix are linearly dependent.
c.: Throwing out any redundant columns of the X matrix if necessary, what is the of the regression?
d.: Suppose that there are two students in the econ 551 class, whose names are Jim and Tom. Suppose further that they estimated the parameters , and by trial and error. As a result, however, Tom and Jim got different answers, i.e., (-6,-10,6) and (-10,-2,2), respectively. And each of them argues that his answer is correct. What do you think about these two answers? Which answer fits better to the data (in the sense of having a higher )?

Question 3. (200 points) One researcher wants to estimate the money demand equation by the following regression:

where: real money balances (i.e., nominal money balance deflated by the price level), real GNP(i.e., nominal GNP deflated by the price level) and nominal interest rates.

a.

What signs of the

's do you expect from the economic theory? Explain why.

b.

Run the above regression using data files accessible via anonymous ftp from gemini.econ.yale.edu in the subdirectory pub/John_Rust/courses/econ161/stats/timedat/fmt or in the i: Spring99 econ161 timedat fmt directory at Statlab. You will be using the following variables: GNP=Nominal GNP, CPI=Price level, R_3MO=Interest rates, and M2=Nominal money balances. What is your estimates of

's ? On the basis of this evidence, what do you conclude about the validity of the money demand equation given above?

c.

Now another researcher is estimating somewhat different version of money demand equation given by

where nominal money balances, nominal interest rate, nominal GNP, and price level. What signs of 's do you expect? Why?

d.

Run the regression in part c using the data files given in b. What is your estimates of

's ? On the basis of this evidence, what do you conclude about the validity of the money demand equation given above?

e.

Compare the two regression models in terms of

and/or the plausibility of the estimated coefficients.

f.

Run a simple regression of aggregate consumption, C on a constant and GNP. What is your estimate of the marginal propensity to consume.?

g.

Using the residuals calculated from your regression in part f above, compute the serial correlation coefficient of the regression residuals (i.e. compute

). Are the residuals serially uncorrelated, or negatively or positively correlated? Does your finding contradict the normal equations that show that the residuals in a regression should be ``unpredictable'' in the sense of being uncorrelated with the independent variables in the regression?

Question 4 (200 points) Due to the fact that a large number of buyers and sellers interact in a market for a nearly homogeneous good, the market for soybeans is nearly perfectly competitive. Contracts for soybeans on the Chicago Board of Trade and the daily market or equilibrium price of soybeans is known as the spot price. Demand for soybeans is a function of the price of soybeans, p and personal income, y. Assume that the aggregate demand curve for soybeans is linear:

where is the quantity of soybeans demanded, p is the market price of soybeans, y is per capita income and represents other unobserved factors affecting the demand for soybeans. Assume the supply of soybeans is also a linear function of price, average rainfall r, and other factors :

a.: What does economic theory (or common sense) tell us about the signs of the coefficients of the demand curve? That is, do we expect the coefficients to be negative positive or zero? (Explain your reasoning for full credit).
b.: What does economic theory (or common sense) tell us about the signs of the coefficients of the supply curve? That is, do we expect the coefficients to be negative positive or zero? (Explain your reasoning for full credit).
c.: The file soy.asc available via anonymous ftp at gemini.econ.yale.edu in the subdirectory pub/John_Rust/courses/econ161/soy.asc (the easiest way to get the data is simply to click on the gemini.econ.yale.edu in the subdirectory soy.asc hyperlink on the version of this problem set on the Econ 551 web page). This data set contains 200 monthly observations of soybean market prices, quantities traded, per capita income y, and average rainfall, r. Retrieve these data and estimate the parameters and by running OLS on the demand and supply side equation separately. Report standard errors.
d.: Do the results from OLS confirm or disconfirm the hypotheses you have made in part a and b? Explain why you are not getting the expected results.
e.: Propose an estimator other than OLS that can improve your results. Explain the theory behind the improvement.
f.: Provide estimates and standard errors of estimates using the method proposed in part e. State clearly how the method proposed in part e is implemented for this particular problem, and with this particular data. Summarize your estimation results. Do the new results confirm the hypotheses?

About this document ...

Next: About this document

econ551
Mon Feb 22 15:32:01 EST 1999