This course applies the probabilistic and limit-theoretic tools (WLLN, SLLN, CLT, etc.) presented
in Economics 551-A to conduct inference in a wide class of econometric
models. The course will focus on applications of econometric
methods to substantive problems, although we will discuss a number
general ``philosophical'' issues at various points in the course.
The first issue is whether one ought to use of Bayesian or Classical methods of inference. I will briefly cover Bayesian methods which have been
revitalized given recent developments in monte carlo simulation and
numerical integration. Nevertheless, Bayesian methods are still
computationally burdensome
and heavily linked to particular parametric functional forms,
limiting their applicability to semi- and nonparametric problems (discussed
further below). The primary focus of
this course is on classical
statistical inference using large-sample asymptotics to derive approximate
sampling distributions of various estimators.
The second issue is whether one ought to use
parametric, semi-parametric, or non-parametric estimation methods. The
issue is best framed as a trade-off between efficient estimation under strong
a priori assumptions about the underlying probabilistic structure (with
a consequent risk that the estimator will be inconsistent if these assumptions
are violated) versus consistent estimation under weak a priori assumptions (at the cost of slower rates of convergence and/or less efficient estimation of any particular
probabilistic structure). I argue that we do not face
an ``all or nothing'' choice between parametric and nonparametric methods,
rather the problem is to select an
appropriate method from an ``estimation possibility'' frontier depending
on the strength of the prior assumptions we are willing to impose in
any particular problem. A convenient way to trace out this frontier is
via parametric ``flexible functional
forms'' that are capable of approximating general probabilistic structures
arbitrarily well as the number of parameters increases. Nonlinear regression using series approximations, neural networks
with variable numbers of ``hidden units'', and ``seive'' methods
where the parameter space increases at an appropriate
rate with sample size such as
maximum likelihood estimation based on Hermite series expansions about a
Gaussian kernel are examples of this approach. In fact, we will
show that these ``flexible'' methods are generally the only feasible
way to go about non-parametric estimation since direct estimation by
optimizing an estimation criterion over an
infinite-dimensional space is generally an ``ill-posed'' problem.
Nearly all of the ``well-posed'' methods
rely on rules fixing the rate at which
the dimension of the parameter space increases with sample size
(or in the case of kernels and other smoothing methods, the rate
at which ``bandwidths'' and other smoothing parameters tend
to zero with sample size) or various penalty functions or
other data-driven procedures for determining how many parameters to include
in the estimation in order to avoid ``overfitting'' the data.
In some sense these procedures are semi-automated
methods for ``specification searching'', a procedure that has
been discredited by Bayesian econometricians such as Edward Leamer.
Paradoxically, it turns out that these sorts of specification searching
procedures consistently identify the true model, whereas Bayesian
methods run into serious difficulties in semi- and nonparametric contexts.
Specifically, if
the parameter space is infinite-dimensional the prior can completely
overwhelm the data in the sense that the posterior distribution is not guaranteed to
converge to a point mass at the true parameter value as the number of
observations tends to infinity.
The final issue is whether one ought to be doing structural or
reduced-form estimation of econometric models. I review the
Haavelmo-Koopmans-Marschak-Lucas arguments for the use of structural
econometric models that either have been derived from, or are consistent with,
an underlying economic theory. These arguments show that structural
models can be used to predict the effects of hypothetical policy or
environmental changes, whereas reduced-form models are generally only
capable of summarizing
responses to existing or historical policy or environmental changes.
On the other hand, structural models typically depend on strong, typically parametric a priori
identifying assumptions whereas reduced-form models can employ semi and non-parametric estimation methods that require much weaker assumptions
about the underlying structure.
I discuss the identification problem and show that many commonly analyzed
structural models are ``non-parametrically unidentified'' which implies
that one generally cannot estimate structural models using fully non-parametric
methods (although there are many cases where flexible
parametric and semi-parametric methods can be used to estimate the
structure). Where nonparametric methods can be useful is in specification
testing, i.e. comparing the reduced-form of the structural model
a nonparametric estimate of the reduced form. One possible way to resolve
the identification problem is by integrating experimental and survey data.
I will discuss this issue in the context of
comparing structural vs. experimental predictions of the impact of
job training programs.
I use Manski's (1988) ``analogy principle'' as an intuitive unifying concept
motivating the main ``classical'' estimation methods. Examples of
the approach include estimation
of the population mean by the sample mean, or estimation of the
population CDF by the
sample CDF. The analogy principle is the best way to understand
the seemingly bewildering array of econometric
estimators most of which can be classified
as extremum or M-estimators such as
linear and
nonlinear least squares, maximum likelihood, generalized method of
moments, minimum distance, minimum chi-square, etc.
We begin the course by
reviewing the theory of parametric estimation and the
fundamental efficiency bounds for unbiased least squares and maximum
likelihood estimators, namely the Gauss-Markov and Cramer-Rao lower bounds.
We also present extensions of the C-R bound to asymptotically-unbiased LAN estimators, Hájek's (1972) asymptotic local minimax bound, and
briefly discuss Bahadur's (1960,1967) ``large deviation'' bounds. I then
review results on the asymptotic equivalence of a number of different
nonlinear estimators
including method of moments, maximum likelihood,
minimum distance, and minimum chi-square in the special case of
multinomial distributions. Since multinomial distributions
are dense in the space of all distributions, these results can be applied
to derive Chamberlain's (1987,1992) efficiency bounds for semi-parametric
estimators based on conditional moment restrictions. I complete the review of
parametric estimation methods with a survey
model specification tests including the standard ``Holy Trinity'', Chi-square,
information-matrix, Hausman-Wu, and conditional moment tests. I briefly
discuss issues of optimality and power of these tests,
and the more difficult issues of sequential testing, model revision,
and model selection. The literature on ``model selection'' serves as
a bridge in moving from parametric to semi-parametric and non-parametric
estimation methods. We cover several papers showing that
there
exist ``automatic'' rules for ``specification searching'' over an
appropriately expanding family
of parametric models will result in sequence of selected
models that converges to an underlying
``true'' data generating process.
The second part of the course focuses on nonparametric estimation
of density and regression functions using kernels, nearest neighbor methods,
and various ``sieve'' estimation methods including
splines, series approximations, and neural networks.
I then turn to semiparametric models and
flexible ``seminonparametric'' models that represent the middle ground
between parametric and non-parametric estimation methods. Some of the
semiparametric estimation methods are versions of L-estimates (linear
combinations of order statistics) and R-estimates (estimates
dervied from rank tests). In addition, some recent semi-parametric
estimators are functionals of U-statistics, so I briefly
review the relevant LLN's, CLT's, and the concept of projections of a U-statistic. In order to compare the relative efficiency of the various
methods, I present
the Begun-Hwang-Hall-Wellner generalized version of the Cramer-Rao
lower bound for semiparametric models, and applications of this result
to a variety of models. In certain problems
the prior assumptions are so weak (such as the median independence
assumption underlying Manski's maximum score estimator) that information
for the parameteric component of the model is zero. This implies that a
-consistent estimator does not exist. I review rate of convergence
results for other non-parametric and semi-paramteric estimators delineating
problems for which standard
rates are achievable, versus
problems where convergence occurs at slower rates. In certain
cases one can smooth discontinuous semi-parametric objective functions in
such a way to guarantee consistency while still retaining
(or arbitrarily close to
)
convergence rates.
Examples include Powell's (1984), (1986) work on LAD
and quantile estimation of the censored regression model,
and Horowitz's (1992) smoothed maximum score estimator.
The final methodological module focuses on simulation estimation, which has
proven very useful for avoiding the computational burden of numerical integration
that previoulsy prevented insurmountable obstacles to estimation of
a number of econometric models.
I review several different types of simulation estimators for discrete
choice problems (where simulation is used to avoid high-dimensional
numerical integrations required to compute choice probabilities with
many alternatives) and macro/time-series applications.
The simulation estimators include
simulated maximum likelihood (SML), method of simulated momemts (MSM),
and method of simulated scores (MSS), and the semi-parametric minimum
distance estimator of Gallant and Tauchen. We review various types of
simulators
including crude frequency sampling as well as various types of
``smoothed'' probability estimators such as the Geweke-Hajivassiliou-Keane
(GHK) method that has the advantage of yielding objective functions that
are smooth functions of the model's parameters. We will also discuss the
use of antithetic variates, acceptance/rejection, importance sampling, and
Gibbs sampling methods (borrowed from the literature
on Bayesian pattern recognition) to reduce noise and accelerate
convergence of monte carlo methods.
The methodological principles outlined above will be illustrated in a variety of applied contexts:
GRADES:
The goal of this class is to introduce students to
state-of-the art methods as well as unsolved problems at the frontiers
of current research in econometrics. My philosophy is that the best
way to learn these methods and to appreciate their problems and limitations
is via ``hands-on'' applications.
Thus, grades in this course will be based on: 1) periodic
take home problems assigned during lectures (20% of grade),
2) a midterm exam (20% of grade) 3) a final exam (20% of grade) and
3) an original research paper
(30 pages maximum) due at the scheduled final exam period for
this course and a 15-30 minute in-class presentation describing
the topic, the data, and the econometric methods
to be used (40% of grade).
Most students will choose applied topics involving actual
estimation of a particular econometric model, although theoretically
oriented papers
are also welcome.
Main Texts (choose at least one for course)
Advanced/Specialized Texts (worth consulting but not required)
Overview of Methodological Debates in Econometrics