next up previous
Next: About this document

Economics 551 Professor Rust
Spring 1999

Midterm Exam
(Due at start of class, February 24, 1999)

Part I: Regression Questions (computers not required for this part)

Do Question 1 and 2 out of 3 of the remaining Part I questions below.

Question 1 (200 points).

a.
Compute the OLS estimates tex2html_wrap_inline210 for the following 2-variable linear regression problem:

displaymath188

b.
Unfortunately, there was one student who didn't know how to invert a tex2html_wrap_inline212 matrix. Thinking that it would be unnecessary to estimate tex2html_wrap_inline214 as a whole, he proposed the following estimation formulas:

tex2html_wrap_inline216 / tex2html_wrap_inline218 , and

tex2html_wrap_inline220 / tex2html_wrap_inline222 .

Calculate these ``naive'' estimators of tex2html_wrap_inline224 and tex2html_wrap_inline226 and compare them with those obtained in b.

c.
What do you think about his approach? Will it work generally? Will the naive estimators be unbiased and consistent? If not, specify the conditions needed to justify his approach. What's the intuition behind those conditions?

d.
How can you generalize your argument in d to the case of a k-variable linear regression model?

e.
Show that if a regression contains a set of K mutually exclusive dummy variables tex2html_wrap_inline230 where the variables are mutually exclusive in the sense that if tex2html_wrap_inline232 (observation i is 1 for the tex2html_wrap_inline236 dummy variable), then tex2html_wrap_inline238 for all tex2html_wrap_inline240 (i.e. all of the other dummy variables dummy take the value 0 for observation i), then the K OLS regression estimates tex2html_wrap_inline248 in the regression

displaymath189

are given by

displaymath190

Show that tex2html_wrap_inline250 is just the mean of y over the subpopulation of individuals i with tex2html_wrap_inline232 .

Question 2. (100 points) Consider the general multivariate regression model

displaymath191

a.
Suppose you estimate the OLS estimate of tex2html_wrap_inline258 , tex2html_wrap_inline210 , and then compute tex2html_wrap_inline262 , the tex2html_wrap_inline264 vector of predicted values of y, and tex2html_wrap_inline268 , the tex2html_wrap_inline264 vector of error terms, tex2html_wrap_inline272 .

1.
What is the value of the inner product of tex2html_wrap_inline274 and tex2html_wrap_inline268 ?

2.
Justify your answer in part a-1 above. You can either use a geometric argument or an algebraic derivation. Can you give an intuitive explanation for your result?

3.
What are the implications of your answer for regression analysis ?

b.
Consider now the quantity tex2html_wrap_inline278 .

1.
Show that c has to be between zero and one by using your answer to part a above. (Hint: use the pythagorean theorem.)

2.
Does c provide any sort of measure of ``goodness of fit'' of the regression model? Explain your answer for full credit. What is the interpretation of the case where c=0? What is the interpretation of the case where c=1?

Question 3 (100 points). This question considers a regression through the origin and the connection with the geometric notion of projection in three dimensional space. You should be able to answer this question using simple matrix algebra, without the use of a computer. Consider the following model:

displaymath192

where

eqnarray74

a.
Write the model in the form tex2html_wrap_inline288 using matrices. Specify these matrices and their dimensions.

b.
Calculate (X'X) and its inverse. Verify that tex2html_wrap_inline292 where I is the tex2html_wrap_inline212 identity matrix.

c.
Derive the least squares estimates tex2html_wrap_inline210 and the predicted value tex2html_wrap_inline274 .

d.
In a three dimensional diagram, display the following:

1.
the subspace S spanned by the columns of X (shade region).

2.
the vectors y, tex2html_wrap_inline306 and tex2html_wrap_inline308 and tex2html_wrap_inline274 .

3.
the orthogonal projection of y onto the subspace S sketched in part d-1 above.

e.
Derive the vector of residuals tex2html_wrap_inline316 and calculate:

1.
tex2html_wrap_inline318 where tex2html_wrap_inline320 . Does tex2html_wrap_inline322 lie in the subspace spanned by the columns of X? Does your result for the value of tex2html_wrap_inline318 shed any light on this?

2.
tex2html_wrap_inline328 . Is this equal to zero? Explain your answer in either case.

f.
The orthogonal projection of y into the subspace S spanned by the columns of the matrix X is given by tex2html_wrap_inline336 . Calculate Py and verify that tex2html_wrap_inline340 .

g.
A symmetric square matrix A is idempotent if A'A = A. Show that P given above is idempotent. Calculate the rank of P. Is P an invertible matrix?

Question 4. (100 points) The following questions concern OLS estimation of the general linear model

equation89

where y is tex2html_wrap_inline354 , X is tex2html_wrap_inline358 and tex2html_wrap_inline360 is an tex2html_wrap_inline354 vector of error terms.

a.
What happens when we try to do OLS when the tex2html_wrap_inline358 regressor matrix X has rank less than K? Is the X'X matrix invertible in this case? If not, does the OLS estimate tex2html_wrap_inline210 exist?

b.
Does the problem of multicollinearity have anything to do with the rank of X?

c.
Show that when X'X is not invertible there are generally infinitely many solutions to the normal equations for the OLS estimator.

d.
Does the best fitting predicted y, tex2html_wrap_inline274 , exist when X'X is not invertible? If yes, can you provide a formula for tex2html_wrap_inline274 or a procedure for computing it?

e.
Define what is meant by the generalized inverse, tex2html_wrap_inline386 of a square matrix A. Is the tex2html_wrap_inline390 vector tex2html_wrap_inline392 a solution to the normal equations if X'X is not invertible?

f.
Describe the process of stepwise regression and discuss whether this procedure will allow us to compute the predicted values of the dependent variable, tex2html_wrap_inline274 . If X'X is invertible, will the coefficients produced by stepwise regression coincide with the coefficients from the standard OLS formula tex2html_wrap_inline400 ?

Applied Regression/Instrumental Variable Questions (computers required for this part)

Do Question 0 and Question 1 or 2 and Question 3 or 4

Question 0. (200 points) Given an tex2html_wrap_inline402 matrix of instruments and an tex2html_wrap_inline358 matrix of endogenous regressors we form the instrumental variables estimator.

a.
What is the equation for the instrumental variables estimator? Consider separately the three cases, J=K, J > K and J < K.

b.
In the overidentified case, J > K, we have more instruments than endogenous regressors. Suppose we form a tex2html_wrap_inline358 matrix of instruments tex2html_wrap_inline416 for some tex2html_wrap_inline418 matrix tex2html_wrap_inline420 . Derive the formula for the class of IV estimators and show how it depends on the choice of tex2html_wrap_inline420 . Is there an ``optimal'' choice for tex2html_wrap_inline420 ? If so, describe what the optimal tex2html_wrap_inline420 is and in what sense this choice is optimal.

c.
Sketch the argument for showing the consistency and asymptotic normality of the IV estimator for two cases: 1) the homoscedastic case, and 2) the heteroscedastic case.

d.
Justify your answer in part b above by showing that in the homoscedastic case your choice of tex2html_wrap_inline420 results in an IV estimator that has the smallest asymptotic covariance matrix among all IV estimators.

Question 1. (100 points) Consider the vector of observations contained in the file pop (populations in each of the 50 states, which you also find in the tex2html_wrap_inline430 directory at the Statlab). Load it and call it X.

a.
Compute (X'R X)/49, where R is given by tex2html_wrap_inline438 whre I is the tex2html_wrap_inline442 identity matrix and tex2html_wrap_inline322 is a tex2html_wrap_inline446 vector of ones. Could (X'RX)/49 ever be negative? Why or why not?

b.
Compute the sample standard deviation of the population of the U.S. and the sample variance by using simple Gauss commands.

c.
Compare your result in a. with your result in b. Are the answers to a. and b. the same or different? If they are the same, provide an explanation for why this is the case.

(HINT: you might want to examine X'RX. Recall the properties of the matrix R, namely R= R'R = R*R, and see what R does to X).

Question 2 (100 points). Consider the set of hypothetical data on the regress model below.

displaymath193

where

displaymath194

a.
Can you compute OLS estimates of the three unknowns tex2html_wrap_inline460 ?

b.
Is there a problem of multicollinearity in this regression? If not, show that the columns of the X matrix are linearly independent. If so, show that the columns of the X matrix are linearly dependent.

c.
Throwing out any redundant columns of the X matrix if necessary, what is the tex2html_wrap_inline468 of the regression?

d.
Suppose that there are two students in the econ 551 class, whose names are Jim and Tom. Suppose further that they estimated the parameters tex2html_wrap_inline224 , tex2html_wrap_inline226 and tex2html_wrap_inline474 by trial and error. As a result, however, Tom and Jim got different answers, i.e., (-6,-10,6) and (-10,-2,2), respectively. And each of them argues that his answer is correct. What do you think about these two answers? Which answer fits better to the data (in the sense of having a higher tex2html_wrap_inline468 )?

Question 3. (200 points) One researcher wants to estimate the money demand equation by the following regression:

displaymath195

where: tex2html_wrap_inline482 real money balances (i.e., nominal money balance deflated by the price level), tex2html_wrap_inline484 real GNP(i.e., nominal GNP deflated by the price level) and tex2html_wrap_inline486 nominal interest rates.

a.
What signs of the tex2html_wrap_inline258 's do you expect from the economic theory? Explain why.

b.
Run the above regression using data files accessible via anonymous ftp from gemini.econ.yale.edu in the subdirectory pub/John_Rust/courses/econ161/stats/timedat/fmt or in the i: tex2html_wrap_inline490 Spring99 tex2html_wrap_inline490 econ161 tex2html_wrap_inline490 timedat tex2html_wrap_inline490 fmt directory at Statlab. You will be using the following variables: GNP=Nominal GNP, CPI=Price level, R_3MO=Interest rates, and M2=Nominal money balances. What is your estimates of tex2html_wrap_inline258 's ? On the basis of this evidence, what do you conclude about the validity of the money demand equation given above?

c.
Now another researcher is estimating somewhat different version of money demand equation given by

displaymath196

where tex2html_wrap_inline508 nominal money balances, tex2html_wrap_inline486 nominal interest rate, tex2html_wrap_inline512 nominal GNP, and tex2html_wrap_inline514 price level. What signs of tex2html_wrap_inline258 's do you expect? Why?

d.
Run the regression in part c using the data files given in b. What is your estimates of tex2html_wrap_inline258 's ? On the basis of this evidence, what do you conclude about the validity of the money demand equation given above?

e.
Compare the two regression models in terms of tex2html_wrap_inline468 and/or the plausibility of the estimated coefficients.

f.
Run a simple regression of aggregate consumption, C on a constant and GNP. What is your estimate of the marginal propensity to consume.?

g.
Using the residuals calculated from your regression in part f above, compute the serial correlation coefficient of the regression residuals (i.e. compute tex2html_wrap_inline524 ). Are the residuals serially uncorrelated, or negatively or positively correlated? Does your finding contradict the normal equations that show that the residuals in a regression should be ``unpredictable'' in the sense of being uncorrelated with the independent variables in the regression?

Question 4 (200 points) Due to the fact that a large number of buyers and sellers interact in a market for a nearly homogeneous good, the market for soybeans is nearly perfectly competitive. Contracts for soybeans on the Chicago Board of Trade and the daily market or equilibrium price of soybeans is known as the spot price. Demand for soybeans is a function of the price of soybeans, p and personal income, y. Assume that the aggregate demand curve for soybeans is linear:

displaymath197

where tex2html_wrap_inline530 is the quantity of soybeans demanded, p is the market price of soybeans, y is per capita income and tex2html_wrap_inline536 represents other unobserved factors affecting the demand for soybeans. Assume the supply of soybeans tex2html_wrap_inline538 is also a linear function of price, average rainfall r, and other factors tex2html_wrap_inline542 :

displaymath198

a.
What does economic theory (or common sense) tell us about the signs of the coefficients tex2html_wrap_inline544 of the demand curve? That is, do we expect the tex2html_wrap_inline546 coefficients to be negative positive or zero? (Explain your reasoning for full credit).

b.
What does economic theory (or common sense) tell us about the signs of the coefficients tex2html_wrap_inline548 of the supply curve? That is, do we expect the tex2html_wrap_inline550 coefficients to be negative positive or zero? (Explain your reasoning for full credit).

c.
The file soy.asc available via anonymous ftp at gemini.econ.yale.edu in the subdirectory pub/John_Rust/courses/econ161/soy.asc (the easiest way to get the data is simply to click on the gemini.econ.yale.edu in the subdirectory soy.asc hyperlink on the version of this problem set on the Econ 551 web page). This data set contains 200 monthly observations of soybean market prices, quantities traded, per capita income y, and average rainfall, r. Retrieve these data and estimate the parameters tex2html_wrap_inline556 and tex2html_wrap_inline258 by running OLS on the demand and supply side equation separately. Report standard errors.

d.
Do the results from OLS confirm or disconfirm the hypotheses you have made in part a and b? Explain why you are not getting the expected results.

e.
Propose an estimator other than OLS that can improve your results. Explain the theory behind the improvement.

f.
Provide estimates and standard errors of estimates using the method proposed in part e. State clearly how the method proposed in part e is implemented for this particular problem, and with this particular data. Summarize your estimation results. Do the new results confirm the hypotheses?




next up previous
Next: About this document

econ551
Mon Feb 22 15:32:01 EST 1999