
 Start

2244
 Prefix

The conventional IV estimator (though consistent) is,
however, inefficient in the presence of heteroskedasticity. The usual approach today
when facing heteroskedasticity of unknown form is to use the Generalized Method of
Moments (GMM), introduced by L.
 Exact

Hansen (1982).
 Suffix

GMM makes use of the orthogonality conditions to allow for efficient estimation in the presence of heteroskedasticity
of unknown form.
In the twenty years since it was first introduced, GMM has become a very popular
tool among empirical researchers.
 (check this in PDF content)

 Start

4082
 Prefix

The usual
Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker tests for the presence of
heteroskedasticity in a regression equation can be applied to an IV regression only under restrictive assumptions. In Section 3 we discuss the test of
 Exact

Pagan and Hall (1983)
 Suffix

designed specifically for detecting the presence of heteroskedasticity in IV estimation,
and its relationship to these other heteroskedasticity tests.
Even when IV or GMM is judged to be the appropriate estimation technique, we
may still question its validity in a given application: are our instruments “good instruments”?
 (check this in PDF content)

 Start

5138
 Prefix

We may cast some light on whether the instruments satisfy the orthogonality
conditions in the context of an overidentified model: that is, one in which a surfeit of
instruments are available. In that context we may test the overidentifying restrictions
in order to provide some evidence of the instruments’ validity. We present the variants
of this test due to
 Exact

Sargan (1958), Basmann (1960) and,
 Suffix

in the GMM context, L. Hansen
(1982), and show how the generalization of this test, theCor “difference–in–Sargan”
test, can be used test the validity of subsets of the instruments.
Although there may well be reason to suspect non–orthogonality between regressors
and errors, the use of IV estimation to address this problem must be balanced against
the inevitable loss of efficiency vis–`a–vis OLS.
 (check this in PDF content)

 Start

5196
 Prefix

In that context we may test the overidentifying restrictions
in order to provide some evidence of the instruments’ validity. We present the variants
of this test due to Sargan (1958), Basmann (1960) and, in the GMM context, L.
 Exact

Hansen (1982), and
 Suffix

show how the generalization of this test, theCor “difference–in–Sargan”
test, can be used test the validity of subsets of the instruments.
Although there may well be reason to suspect non–orthogonality between regressors
and errors, the use of IV estimation to address this problem must be balanced against
the inevitable loss of efficiency vis–`a–vis OLS.
 (check this in PDF content)

 Start

6764
 Prefix

The syntax diagrams for these commands are
presented in the last section of the paper, and the electronic supplement presents annotated examples of their use.
2IV and GMM estimation
The “Generalized Method of Moments” was introduced by L. Hansen in his celebrated
1982 paper. There are a number of good modern texts that cover GMM, and one
recent prominent text,
 Exact

Hayashi (2000),
 Suffix

presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is Hansen
(2000). The exposition below draws on Hansen (2000), Chapter 11; Hayashi (2000),
Chapter 3; Wooldridge (2002), Chapter 8; Davidson and MacKinnon (1993), and Greene
(2000).
 (check this in PDF content)

 Start

6903
 Prefix

There are a number of good modern texts that cover GMM, and one
recent prominent text, Hayashi (2000), presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is
 Exact

Hansen (2000).
 Suffix

The exposition below draws on Hansen (2000), Chapter 11; Hayashi (2000),
Chapter 3; Wooldridge (2002), Chapter 8; Davidson and MacKinnon (1993), and Greene
(2000).
We begin with the standard IV estimator, and then relate it to the GMM framework.
 (check this in PDF content)

 Start

6954
 Prefix

There are a number of good modern texts that cover GMM, and one
recent prominent text, Hayashi (2000), presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is Hansen (2000). The exposition below draws on
 Exact

Hansen (2000),
 Suffix

Chapter 11; Hayashi (2000),
Chapter 3; Wooldridge (2002), Chapter 8; Davidson and MacKinnon (1993), and Greene
(2000).
We begin with the standard IV estimator, and then relate it to the GMM framework.
 (check this in PDF content)

 Start

6985
 Prefix

There are a number of good modern texts that cover GMM, and one
recent prominent text, Hayashi (2000), presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is Hansen (2000). The exposition below draws on Hansen (2000), Chapter 11;
 Exact

Hayashi (2000),
 Suffix

Chapter 3; Wooldridge (2002), Chapter 8; Davidson and MacKinnon (1993), and Greene
(2000).
We begin with the standard IV estimator, and then relate it to the GMM framework.
We then consider the issue of clustered errors, and finally turn to OLS.
2.1The method of instrumental variables
The equation to be estimated is, in matrix notation,
y=Xβ+u,E(uu′) = Ω(1)
with typical row
yi=Xiβ+ui(2)
The matrix
 (check this in PDF content)

 Start

7012
 Prefix

There are a number of good modern texts that cover GMM, and one
recent prominent text, Hayashi (2000), presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is Hansen (2000). The exposition below draws on Hansen (2000), Chapter 11; Hayashi (2000), Chapter 3;
 Exact

Wooldridge (2002),
 Suffix

Chapter 8; Davidson and MacKinnon (1993), and Greene
(2000).
We begin with the standard IV estimator, and then relate it to the GMM framework.
We then consider the issue of clustered errors, and finally turn to OLS.
2.1The method of instrumental variables
The equation to be estimated is, in matrix notation,
y=Xβ+u,E(uu′) = Ω(1)
with typical row
yi=Xiβ+ui(2)
The matrix of regressorsXisn×K, wherenis
 (check this in PDF content)

 Start

7042
 Prefix

There are a number of good modern texts that cover GMM, and one
recent prominent text, Hayashi (2000), presents virtually all the estimation techniques
discussed in the GMM framework. A concise on–line text that covers GMM is Hansen (2000). The exposition below draws on Hansen (2000), Chapter 11; Hayashi (2000), Chapter 3; Wooldridge (2002), Chapter 8;
 Exact

Davidson and MacKinnon (1993), and Greene (2000).
 Suffix

We begin with the standard IV estimator, and then relate it to the GMM framework.
We then consider the issue of clustered errors, and finally turn to OLS.
2.1The method of instrumental variables
The equation to be estimated is, in matrix notation,
y=Xβ+u,E(uu′) = Ω(1)
with typical row
yi=Xiβ+ui(2)
The matrix of regressorsXisn×K, wherenis the number of observations.
 (check this in PDF content)

 Start

9348
 Prefix

The instrumental variables estimator ofβis
βˆIV= (X′Z(Z′Z)−1Z′X)−1X′Z(Z′Z)−1Z′y= (X′PZX)−1X′PZy(8)
This estimator goes under a variety of names: the instrumental variables (IV) estimator, the generalized instrumental variables estimator (GIVE), or the twostage leastsquares (2SLS) estimator, the last reflecting the fact that the estimator can be calculated
in a two–step procedure. We follow
 Exact

Davidson and MacKinnon (1993),
 Suffix

p. 220 and refer
to it as the IV estimator rather than 2SLS because the basic idea of instrumenting is
central, and because it can be (and in Stata, is more naturally) calculated in one step
as well as in two.
 (check this in PDF content)

 Start

10444
 Prefix

n
(15)
we obtain the estimated asymptotic variance–covariance matrix of the IV estimator:
V(ˆβIV) = ˆσ2(X′Z(Z′Z)−1Z′X)−1= ˆσ2(X′PZX)−1(16)
Note that some packages, including Stata’sivreg, include a degrees–of–freedom
correction to the estimate of ˆσ2by replacingnwithn−L. This correction is not
necessary, however, since the estimate of ˆσ2would not be unbiased anyway
 Exact

(Greene (2000),
 Suffix

p. 373). Ourivreg2routine defaults to the large–sample formulas for the
estimated error variance and covariance matrix; the user can request the small–sample
versions with the optionsmall.
2.2The Generalized Method of Moments
The standard IV estimator is a special case of a Generalized Method of Moments (GMM)
estimator.
 (check this in PDF content)

 Start

15138
 Prefix

This yields
βˆEGMM= (X′Z(Z′ˆΩZ)−1Z′X)−1X′Z(Z′ˆΩZ)−1Z′y(29)
with asymptotic variance
V(ˆβEGMM) = (X′Z(Z′ˆΩZ)−1Z′X)−1(30)
1This estimator goes under various names: “2stage instrumental variables”(2SIV),
 Exact

White (1982);
 Suffix

“2step 2stage least squares”, Cumby et al. (1983); “heteroskedastic 2stage least squares” (H2SLS);
Davidson and MacKinnon (1993), p. 599.
A variety of other feasible GMM procedures are also possible.
 (check this in PDF content)

 Start

15183
 Prefix

This yields
βˆEGMM= (X′Z(Z′ˆΩZ)−1Z′X)−1X′Z(Z′ˆΩZ)−1Z′y(29)
with asymptotic variance
V(ˆβEGMM) = (X′Z(Z′ˆΩZ)−1Z′X)−1(30)
1This estimator goes under various names: “2stage instrumental variables”(2SIV), White (1982); “2step 2stage least squares”,
 Exact

Cumby et al. (1983);
 Suffix

“heteroskedastic 2stage least squares” (H2SLS);
Davidson and MacKinnon (1993), p. 599.
A variety of other feasible GMM procedures are also possible. For example, the
procedure above can be iterated by obtaining the residuals from the two–step GMM
estimator, using these to calculate a newˆS, using this in turn to calculate the three–step
feasible efficient GMM estimator, and so forth,
 (check this in PDF content)

 Start

15253
 Prefix

This yields
βˆEGMM= (X′Z(Z′ˆΩZ)−1Z′X)−1X′Z(Z′ˆΩZ)−1Z′y(29)
with asymptotic variance
V(ˆβEGMM) = (X′Z(Z′ˆΩZ)−1Z′X)−1(30)
1This estimator goes under various names: “2stage instrumental variables”(2SIV), White (1982); “2step 2stage least squares”, Cumby et al. (1983); “heteroskedastic 2stage least squares” (H2SLS);
 Exact

Davidson and MacKinnon (1993),
 Suffix

p. 599.
A variety of other feasible GMM procedures are also possible. For example, the
procedure above can be iterated by obtaining the residuals from the two–step GMM
estimator, using these to calculate a newˆS, using this in turn to calculate the three–step
feasible efficient GMM estimator, and so forth, for as long as the user wishes or until
the estimator converges; this is the “i
 (check this in PDF content)

 Start

17767
 Prefix

Instead of first obtaining an optimal weighting matrix and
then taking it as given when maximizing Equation (20), we can write the optimal weighting matrix
as a function ofˆβ, and chooseˆβto maximizeJ(ˆβ) =ngn(ˆβ)′W(ˆβ)gn(ˆβ). This is the “continuously
updated GMM” of
 Exact

Hansen et al. (1996);
 Suffix

it requires numerical optimization methods.
3It is worth noting that the IV estimator is not the only such efficient GMM estimator under
conditional homoskedasticity. Instead of treating ˆσ2as a parameter to be estimated in a second
stage, what if we return to the GMM criterion function and minimize by simultaneously choosing
What are the implications of
 (check this in PDF content)

 Start

21329
 Prefix

In effect, under conditional homoskedasticity, the continuously
updated GMM estimator is the LIML estimator. Calculating the LIML estimator does not require
numerical optimatization methods; it can be calculated as the solution to an eigenvalue problem (see,
e.g.,
 Exact

Davidson and MacKinnon (1993),
 Suffix

pp. 644–51).
defineˆΩCas the block–diagonal form
ΩˆC=
Σˆ10
..
.
Σˆm
..
.
0ˆΣM
(36)
then an estimator ofSthat is consistent in the presence of arbitrary intra–cluster
correlation is
Sˆ=1
n
(Z′ˆΩCZ)(37)
The earliest reference to this approach to robust estimation in the presence of clustering of which we are aware is White (1984), pp. 135–6.
 (check this in PDF content)

 Start

21745
 Prefix

Σˆm
..
.
0ˆΣM
(36)
then an estimator ofSthat is consistent in the presence of arbitrary intra–cluster
correlation is
Sˆ=1
n
(Z′ˆΩCZ)(37)
The earliest reference to this approach to robust estimation in the presence of clustering of which we are aware is
 Exact

White (1984),
 Suffix

pp. 135–6. It is commonly employed in
the context of panel data estimation; see Wooldridge (2002), p. 193, Arellano (1987)
and K ́ezdi (2002). It is the standard Stata approach to clustering, implemented in, e.g.,
robust,regressandivreg2.4
The cluster–robust covariance matrix for IV estimation is obtained exactly as in
the preceding subsection except usingˆSas defined in Equation (
 (check this in PDF content)

 Start

21840
 Prefix

Σˆm
..
.
0ˆΣM
(36)
then an estimator ofSthat is consistent in the presence of arbitrary intra–cluster
correlation is
Sˆ=1
n
(Z′ˆΩCZ)(37)
The earliest reference to this approach to robust estimation in the presence of clustering of which we are aware is White (1984), pp. 135–6. It is commonly employed in
the context of panel data estimation; see
 Exact

Wooldridge (2002),
 Suffix

p. 193, Arellano (1987)
and K ́ezdi (2002). It is the standard Stata approach to clustering, implemented in, e.g.,
robust,regressandivreg2.4
The cluster–robust covariance matrix for IV estimation is obtained exactly as in
the preceding subsection except usingˆSas defined in Equation (37).
 (check this in PDF content)

 Start

21869
 Prefix

0ˆΣM
(36)
then an estimator ofSthat is consistent in the presence of arbitrary intra–cluster
correlation is
Sˆ=1
n
(Z′ˆΩCZ)(37)
The earliest reference to this approach to robust estimation in the presence of clustering of which we are aware is White (1984), pp. 135–6. It is commonly employed in
the context of panel data estimation; see Wooldridge (2002), p. 193,
 Exact

Arellano (1987) and
 Suffix

K ́ezdi (2002). It is the standard Stata approach to clustering, implemented in, e.g.,
robust,regressandivreg2.4
The cluster–robust covariance matrix for IV estimation is obtained exactly as in
the preceding subsection except usingˆSas defined in Equation (37).
 (check this in PDF content)

 Start

23451
 Prefix

But users should take
care that, if theclusteroption is used, then it ought to be the case thatM >> K.5
4There are other approaches to dealing with clustering that put more structure on the Ω matrix
and hence are more efficient but less robust. For example, the
 Exact

Moulton (1986)
 Suffix

approach to obtaining
consistent standard errors is in effect to specify an “error components” (a.k.a. “random effects”)
structure in Equation (36): Σmis a matrix with diagonal elementsσ2u+σ2vand offdiagonal elements
σ2v.
 (check this in PDF content)

 Start

24741
 Prefix

but correct inference is still possible through the use of the Eicker–Huber–
White “sandwich” robust covariance estimator, and this estimator can also be derived
using the general formula for the asymptotic variance of a GMM estimator with a sub–
optimal weighting matrix, Equation (24).
A natural question is whether a more efficient GMM estimator exists, and the answer
is “yes”
 Exact

(Chamberlain (1982), Cragg (1983)).
 Suffix

If the disturbance is heteroskedastic,
there are no endogenous regressors, and the researcher has available additional moment
conditions, i.e., additional variables that do not appear in the regression but that are
known to be exogenous, then the efficient GMM estimator is that of Cragg (1983),
dubbed “heteroskedastic OLS” (HOLS) by Davidson and MacKinnon (1993), p. 600.
 (check this in PDF content)

 Start

25077
 Prefix

If the disturbance is heteroskedastic,
there are no endogenous regressors, and the researcher has available additional moment
conditions, i.e., additional variables that do not appear in the regression but that are
known to be exogenous, then the efficient GMM estimator is that of
 Exact

Cragg (1983),
 Suffix

dubbed “heteroskedastic OLS” (HOLS) by Davidson and MacKinnon (1993), p. 600. It
can be obtained in precisely the same way as feasible efficient two–step GMM except
now the first–step inefficient but consistent estimator used to generate the residuals is
OLS rather than IV.
 (check this in PDF content)

 Start

25130
 Prefix

If the disturbance is heteroskedastic,
there are no endogenous regressors, and the researcher has available additional moment
conditions, i.e., additional variables that do not appear in the regression but that are
known to be exogenous, then the efficient GMM estimator is that of Cragg (1983), dubbed “heteroskedastic OLS” (HOLS) by
 Exact

Davidson and MacKinnon (1993),
 Suffix

p. 600. It
can be obtained in precisely the same way as feasible efficient two–step GMM except
now the first–step inefficient but consistent estimator used to generate the residuals is
OLS rather than IV.
 (check this in PDF content)

 Start

25966
 Prefix

The advantages of GMM over IV are clear: if heteroskedasticity is present, the GMM
estimator is more efficient than the simple IV estimator, whereas if heteroskedasticity
is not present, the GMM estimator is no worse asymptotically than the IV estimator.
Nevertheless, the use of GMM does come with a price. The problem, as
 Exact

Hayashi (2000)
 Suffix

points out (p. 215), is that the optimal weighting matrixˆSat the core of efficient
GMM is a function of fourth moments, and obtaining reasonable estimates of fourth
moments may require very large sample sizes.
 (check this in PDF content)

 Start

26749
 Prefix

If in fact the error is homoskedastic,
IV would be preferable to efficient GMM. For this reason a test for the presence of
heteroskedasticity when one or more regressors is endogenous may be useful in deciding
whether IV or GMM is called for. Such a test was proposed by
 Exact

Pagan and Hall (1983), and
 Suffix

we have implemented it in Stata asivhettest. We describe this test in the next
section.
3Testing for heteroskedasticity
The Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker statistics are standard tests of the presence of heteroskedasticity in an OLS regression.
 (check this in PDF content)

 Start

27271
 Prefix

We describe this test in the next
section.
3Testing for heteroskedasticity
The Breusch–Pagan/Godfrey/Cook–Weisberg and White/Koenker statistics are standard tests of the presence of heteroskedasticity in an OLS regression. The principle is
to test for a relationship between the residuals of the regression andpindicator variables that are hypothesized to be related to the heteroskedasticity.
 Exact

Breusch and Pagan (1979), Godfrey (1978), and Cook and Weisberg (1983)
 Suffix

separately derived the same
test statistic. This statistic is distributed asχ2withpdegrees of freedom under the
null of no heteroskedasticity, and under the maintained hypothesis that the error of the
regression is normally distributed.
 (check this in PDF content)

 Start

27592
 Prefix

Breusch and Pagan (1979), Godfrey (1978), and Cook and Weisberg (1983) separately derived the same
test statistic. This statistic is distributed asχ2withpdegrees of freedom under the
null of no heteroskedasticity, and under the maintained hypothesis that the error of the
regression is normally distributed.
 Exact

Koenker (1981)
 Suffix

noted that the power of this test
is very sensitive to the normality assumption, and presented a version of the test that
relaxed this assumption. Koenker’s test statistic, also distributed asχ2punder the null,
is easily obtained asnR2c, whereR2cis the centeredR2from an auxiliary regression of
the squared residuals from the original regression on the indicator variables.
 (check this in PDF content)

 Start

28179
 Prefix

, also distributed asχ2punder the null,
is easily obtained asnR2c, whereR2cis the centeredR2from an auxiliary regression of
the squared residuals from the original regression on the indicator variables. When the
indicator variables are the regressors of the original equation, their squares and their
crossproducts, Koenker’s test is identical to White’snR2cgeneral test for heteroskedasticity
 Exact

(White (1980)).
 Suffix

These tests are available in Stata, following estimation with
regress, using ourivhettestas well as viahettestandwhitetst.
As Pagan and Hall (1983) point out, the above tests will be valid tests for heteroskedasticity in an IV regression only if heteroskedasticity is present in that equation
andnowhere else in the system.
 (check this in PDF content)

 Start

28331
 Prefix

When the
indicator variables are the regressors of the original equation, their squares and their
crossproducts, Koenker’s test is identical to White’snR2cgeneral test for heteroskedasticity (White (1980)). These tests are available in Stata, following estimation with
regress, using ourivhettestas well as viahettestandwhitetst.
As
 Exact

Pagan and Hall (1983)
 Suffix

point out, the above tests will be valid tests for heteroskedasticity in an IV regression only if heteroskedasticity is present in that equation
andnowhere else in the system. The other structural equations in the system (corresponding to the endogenous regressorsX1) must also be homoskedastic, even though
they are not being explicitly estimated.6Pagan and Hall derive a test which r
 (check this in PDF content)

 Start

29035
 Prefix

Under the null of homoskedasticity in the IV regression, the Pagan–Hall
statistic is distributed asχ2p, irrespective of the presence of heteroskedasticity elsewhere
in the system. A more general form of this test was separately proposed by
 Exact

White (1982).
 Suffix

Our implementation is of the simpler Pagan–Hall statistic, available with the
commandivhettestafter estimation byivreg,ivreg2, orivgmm0. We present the
Pagan–Hall test here in the format and notation of the original White (1980) and White
(1982) tests, however, to facilitate comparisons with the other tests noted above.7
Let Ψ be then×pmatrix of indicator variables hypothesized to be related to
 (check this in PDF content)

 Start

29265
 Prefix

Our implementation is of the simpler Pagan–Hall statistic, available with the
commandivhettestafter estimation byivreg,ivreg2, orivgmm0. We present the
Pagan–Hall test here in the format and notation of the original
 Exact

White (1980) and White (1982)
 Suffix

tests, however, to facilitate comparisons with the other tests noted above.7
Let Ψ be then×pmatrix of indicator variables hypothesized to be related to the
heteroskedasticity in the equation, with typical row Ψi.
 (check this in PDF content)

 Start

29737
 Prefix

These indicator variables must
be exogenous, typically either instruments or functions of the instruments. Common
choices would be:
1. The levels, squares, and crossproducts of the instrumentsZ(excluding the constant), as in the
 Exact

White (1980)
 Suffix

test. This is the default inivhettest.
2. The levels only of the instrumentsZ(excluding the constant). This is available
inivhettestby specifying theivlevoption.
6For a more detailed discussion, see Pagan and Hall (1983) or Godfrey (1988), pp. 189–90.
7We note here that the original Pagan–Hall paper has a serious typo in the presentation of their
nonnormalityrobust statistic.
 (check this in PDF content)

 Start

29949
 Prefix

The levels, squares, and crossproducts of the instrumentsZ(excluding the constant), as in the White (1980) test. This is the default inivhettest.
2. The levels only of the instrumentsZ(excluding the constant). This is available
inivhettestby specifying theivlevoption.
6For a more detailed discussion, see
 Exact

Pagan and Hall (1983)
 Suffix

or Godfrey (1988), pp. 189–90.
7We note here that the original Pagan–Hall paper has a serious typo in the presentation of their
nonnormalityrobust statistic. Their equation (58b), p. 195, is missing the term (in their terminology)
−2μ3ψ(ˆX′ˆX)−1ˆX′D(D′D)−1.
 (check this in PDF content)

 Start

29974
 Prefix

This is the default inivhettest.
2. The levels only of the instrumentsZ(excluding the constant). This is available
inivhettestby specifying theivlevoption.
6For a more detailed discussion, see Pagan and Hall (1983) or
 Exact

Godfrey (1988),
 Suffix

pp. 189–90.
7We note here that the original Pagan–Hall paper has a serious typo in the presentation of their
nonnormalityrobust statistic. Their equation (58b), p. 195, is missing the term (in their terminology)
−2μ3ψ(ˆX′ˆX)−1ˆX′D(D′D)−1.
 (check this in PDF content)

 Start

30293
 Prefix

.
6For a more detailed discussion, see Pagan and Hall (1983) or Godfrey (1988), pp. 189–90.
7We note here that the original Pagan–Hall paper has a serious typo in the presentation of their
nonnormalityrobust statistic. Their equation (58b), p. 195, is missing the term (in their terminology)
−2μ3ψ(ˆX′ˆX)−1ˆX′D(D′D)−1. The typo reappears in the discussion of the test by
 Exact

Godfrey (1988).
 Suffix

The correction published in Pesaran and Taylor (1999) is incomplete, as it applies only to the version
of the Pagan–Hall test with a single indicator variable.
3. The “fitted value” of the dependent variable.
 (check this in PDF content)

 Start

30337
 Prefix

Their equation (58b), p. 195, is missing the term (in their terminology)
−2μ3ψ(ˆX′ˆX)−1ˆX′D(D′D)−1. The typo reappears in the discussion of the test by Godfrey (1988). The correction published in
 Exact

Pesaran and Taylor (1999)
 Suffix

is incomplete, as it applies only to the version
of the Pagan–Hall test with a single indicator variable.
3. The “fitted value” of the dependent variable. This isnotthe usual fitted value of
the dependent variable,Xˆβ.
 (check this in PDF content)

 Start

31333
 Prefix

Let
Ψ =1n
∑n
i=1Ψidimension =n×p
Dˆ≡1n∑ni=1Ψ′i(ˆu2i−ˆσ2)dimension =n×1
ˆΓ =1
n
∑n
i=1(Ψi−
Ψ)ˆ′Xiˆuidimension =p×K
(38)
ˆμ3=1n
∑n
i=1ˆu
3
i
ˆμ4=1n
∑n
i=1ˆu
4
i
Xˆ=PzX
Ifuiis homoskedastic and independent ofZi, then
 Exact

Pagan and Hall (1983)
 Suffix

(Theorem 8) show that under the null of no heteroskedasticity,
nˆD′ˆB−1ˆD
A
∼χ2p(39)
where
Bˆ=B1+B2+B3+B4
B1= (ˆμ4−ˆσ4)1n(Ψi−Ψ)′(Ψi−Ψ)
B2=−2ˆμ31nΨ′ˆX(1nˆX′ˆX)−1ˆΓ′
B3=B′2
B4= 4ˆσ21nˆΓ′(1nˆX′ˆX)−1ˆΓ
(40)
This is the default statistic produced byivhettest.
 (check this in PDF content)

 Start

32804
 Prefix

The Pagan–Hall statistic has not been widely used in practice, perhaps because it
is not a standard feature of most regression packages. For a discussion of the relative
merits of the Pagan–Hall test, including some Monte Carlo results, see
 Exact

Pesaran and Taylor (1999).
 Suffix

Their findings suggest caution in the use of the Pagan–Hall statistic
particularly in small samples; in these circumstances thenR2cstatistic may be preferred.
4Testing the relevance and validity of instruments
4.1Testing the relevance of instruments
An instrumental variable must satisfy two requirements: it must be correlated with
the included endogenous variable(s), and ortho
 (check this in PDF content)

 Start

33642
 Prefix

The
first stage regressions are reduced form regressions of the endogenous variablesX1on
the full set of instrumentsZ; the relevant test statistics here relate to the explanatory
power of the excluded instrumentsZ1in these regressions. A statistic commonly used,
as recommended e.g., by
 Exact

Bound et al. (1995),
 Suffix

is theR2of the first–stage regression
with the included instruments “partialledout”.8Alternatively, this may be expressed
as theF–test of the joint significance of theZ1instruments in the first–stage regression.
 (check this in PDF content)

 Start

35570
 Prefix

The statistics proposed by
Bound et al. are able to diagnose instrument relevance only in the presence of a single
endogenous regressor. When multiple endogenous regressors are used, other statistics
are required.
One such statistic has been proposed by
 Exact

Shea (1997)
 Suffix

: a “partialR2” measure that
takes the intercorrelations among the instruments into account.9For a model containing
a single endogenous regressor, the twoR2measures are equivalent. The distribution of
Shea’s partialR2statistic has not been derived, but it may be interpreted like anyR2.
 (check this in PDF content)

 Start

36403
 Prefix

The Bound et al. measures and the Shea partialR2statistic can be obtained via
thefirstorffirstoptions on theivreg2command.
The consequence of excluded instruments with little explanatory power is increased
bias in the estimated IV coefficients
 Exact

(Hahn and Hausman (2002b)).
 Suffix

If their explanatory
power in the first stage regression is nil, the model is in effect unidentified with respect to
that endogenous variable; in this case, the bias of the IV estimator is the same as that of
the OLS estimator, IV becomes inconsistent, and nothing is gained from instrumenting
(ibid.
 (check this in PDF content)

 Start

36852
 Prefix

in the first stage regression is nil, the model is in effect unidentified with respect to
that endogenous variable; in this case, the bias of the IV estimator is the same as that of
the OLS estimator, IV becomes inconsistent, and nothing is gained from instrumenting
(ibid.). If the explanatory power is simply “weak”,10conventional asymptotics fail.
What is surprising is that, as
 Exact

Staiger and Stock (1997) and
 Suffix

others have shown, the
“weak instrument” problem can arise even when the first stage tests are significant at
conventional levels (5% or 1%) and the researcher is using a large sample. One rule of
thumb is that for a single endogenous regressor, anF–statistic below 10 is cause for
concern (Staiger and Stock (1997) p. 557).
 (check this in PDF content)

 Start

37177
 Prefix

What is surprising is that, as Staiger and Stock (1997) and others have shown, the
“weak instrument” problem can arise even when the first stage tests are significant at
conventional levels (5% or 1%) and the researcher is using a large sample. One rule of
thumb is that for a single endogenous regressor, anF–statistic below 10 is cause for
concern
 Exact

(Staiger and Stock (1997)
 Suffix

p. 557). Since the size of theIVbias is increasing in
the number of instruments (Hahn and Hausman (2002b)), one recommendation when
faced with this problem is to be parsimonious in the choice of instruments.
 (check this in PDF content)

 Start

37283
 Prefix

One rule of
thumb is that for a single endogenous regressor, anF–statistic below 10 is cause for
concern (Staiger and Stock (1997) p. 557). Since the size of theIVbias is increasing in
the number of instruments
 Exact

(Hahn and Hausman (2002b)),
 Suffix

one recommendation when
faced with this problem is to be parsimonious in the choice of instruments. For further
discussion see, e.g., Staiger and Stock (1997), Hahn and Hausman (2002a), Hahn and
Hausman (2002b), and the references cited therein.
9The Shea partialR2statistic may be easily computed according to the simplification presented in
Godfrey (1999), who demonstrates that Shea’s statistic f
 (check this in PDF content)

 Start

37444
 Prefix

Since the size of theIVbias is increasing in
the number of instruments (Hahn and Hausman (2002b)), one recommendation when
faced with this problem is to be parsimonious in the choice of instruments. For further
discussion see, e.g.,
 Exact

Staiger and Stock (1997), Hahn and Hausman (2002a), Hahn and Hausman (2002b), and
 Suffix

the references cited therein.
9The Shea partialR2statistic may be easily computed according to the simplification presented in
Godfrey (1999), who demonstrates that Shea’s statistic for endogenous regressorimay be expressed as
R2p=
νOLSi,i
νIVi,i
[
(1−R2IV)
(1−R2OLS)
]
whereνi,iis the estimated asymptotic variance of the coefficient.
10One approach in the literature, following Staiger and Stock (1
 (check this in PDF content)

 Start

37651
 Prefix

For further
discussion see, e.g., Staiger and Stock (1997), Hahn and Hausman (2002a), Hahn and Hausman (2002b), and the references cited therein.
9The Shea partialR2statistic may be easily computed according to the simplification presented in
 Exact

Godfrey (1999),
 Suffix

who demonstrates that Shea’s statistic for endogenous regressorimay be expressed as
R2p=
νOLSi,i
νIVi,i
[
(1−R2IV)
(1−R2OLS)
]
whereνi,iis the estimated asymptotic variance of the coefficient.
10One approach in the literature, following Staiger and Stock (1997), is to define “weak” as meaning
that the first stage reduced form coefficients are in aN1/2neighborhood of zero, or equivalently, holding
 (check this in PDF content)

 Start

37894
 Prefix

(2002b), and the references cited therein.
9The Shea partialR2statistic may be easily computed according to the simplification presented in
Godfrey (1999), who demonstrates that Shea’s statistic for endogenous regressorimay be expressed as
R2p=
νOLSi,i
νIVi,i
[
(1−R2IV)
(1−R2OLS)
]
whereνi,iis the estimated asymptotic variance of the coefficient.
10One approach in the literature, following
 Exact

Staiger and Stock (1997),
 Suffix

is to define “weak” as meaning
that the first stage reduced form coefficients are in aN1/2neighborhood of zero, or equivalently, holding
the expectation of the first stageFstatistic constant as the sample size increases.
 (check this in PDF content)

 Start

38149
 Prefix

i
νIVi,i
[
(1−R2IV)
(1−R2OLS)
]
whereνi,iis the estimated asymptotic variance of the coefficient.
10One approach in the literature, following Staiger and Stock (1997), is to define “weak” as meaning
that the first stage reduced form coefficients are in aN1/2neighborhood of zero, or equivalently, holding
the expectation of the first stageFstatistic constant as the sample size increases. See also
 Exact

Hahn and Hausman (2002b).
 Suffix

4.2Overidentifying restrictions in GMM
We turn now to the second requirement for an instrumental variable. How can the
instrument’s independence from an unobservable error process be ascertained?
 (check this in PDF content)

 Start

39315
 Prefix

as a standard diagnostic in any overidentified instrumental variables estimation.11These are tests of the joint hypotheses
of correct model specification and the orthogonality conditions, and a rejection may
properly call either or both of those hypotheses into question.
In the context of GMM, the overidentifying restrictions may be tested via the commonly employedJstatistic of
 Exact

Hansen (1982).
 Suffix

This statistic is none other than the value
of the GMM objective function (20), evaluated at the efficient GMM estimatorˆβEGMM.
Under the null,
J(ˆβEGMM) =ng(ˆβ)′ˆS−1g(ˆβ)
A
∼χ2L−K(41)
In the case of heteroskedastic errors, the matrixˆSis estimated using theˆΩ matrix (27),
and theJstatistic becomes
J(ˆβEGMM) = ˆuZ′(Z′ˆΩZ)−1Zˆu′
A
∼χ2L−K(42)
With clustered errors, theˆΩCmatrix (37) can be us
 (check this in PDF content)

 Start

40724
 Prefix

TheJstatistic is calculated and displayed byivreg2
when thegmm,robust, orclusteroptions are specified. In the last case, theJstatistic
will be consistent in the presence of arbitrary intra–cluster correlation. This can be
quite important in practice:
 Exact

Hoxby and Paserman (1998)
 Suffix

have shown that the presence
of intra–cluster correlation can readily cause a standard overidentification statistic to
over–reject the null.
11Thus Davidson and MacKinnon (1993), p. 236: “Tests of overidentifying restrictions should be
calculated routinely whenever one computes IV estimates.
 (check this in PDF content)

 Start

40896
 Prefix

This can be
quite important in practice: Hoxby and Paserman (1998) have shown that the presence
of intra–cluster correlation can readily cause a standard overidentification statistic to
over–reject the null.
11Thus
 Exact

Davidson and MacKinnon (1993),
 Suffix

p. 236: “Tests of overidentifying restrictions should be
calculated routinely whenever one computes IV estimates.” Sargan’s own view, cited in Godfrey (1988),
p. 145, was that regression analysis without testing the orthogonality assumptions is a “pious fraud”.
4.3Overidentifying restrictions in IV
In the special case of linear instrumental variables under conditional heteroskedasticity,
t
 (check this in PDF content)

 Start

41081
 Prefix

can be
quite important in practice: Hoxby and Paserman (1998) have shown that the presence
of intra–cluster correlation can readily cause a standard overidentification statistic to
over–reject the null.
11Thus Davidson and MacKinnon (1993), p. 236: “Tests of overidentifying restrictions should be
calculated routinely whenever one computes IV estimates.” Sargan’s own view, cited in
 Exact

Godfrey (1988),
 Suffix

p. 145, was that regression analysis without testing the orthogonality assumptions is a “pious fraud”.
4.3Overidentifying restrictions in IV
In the special case of linear instrumental variables under conditional heteroskedasticity,
the concept of theJstatistic considerably predates the development of GMM estimation
techniques.
 (check this in PDF content)

 Start

41568
 Prefix

analysis without testing the orthogonality assumptions is a “pious fraud”.
4.3Overidentifying restrictions in IV
In the special case of linear instrumental variables under conditional heteroskedasticity,
the concept of theJstatistic considerably predates the development of GMM estimation
techniques. Theivreg2procedure routinely presents this test, labelled as Sargan’s
statistic
 Exact

(Sargan (1958))
 Suffix

in the estimation output.
Just as IV is a special case of GMM, Sargan’s statistic is a special case of Hansen’s
Junder the assumption of conditional homoskedasticity. Thus if we use the IV optimal
weighting matrix (34) together with the expression forJ(41), we obtain
Sargan’s statistic =
1
ˆσ2
uˆ′Z(Z′Z)−1Z′ˆu=
uˆ′Z(Z′Z)−1Z′ˆu
ˆu′ˆu/n
=
uˆ′PZˆu
uˆ′ˆu/n
(43)
It is easy to see from (43) that Sargan’
 (check this in PDF content)

 Start

42559
 Prefix

ThenR2uof this auxiliary regression
will have aχ2L−Kdistribution under the null hypothesis that all instruments are orthogonal to the error. This auxiliary regression test is that performed byoveridafter
ivreg, and the statistic is also automatically reported byivreg2.12A good discussion
of this test is presented in
 Exact

Wooldridge (2002),
 Suffix

p. 123.
The literature contains several variations on this test. The main idea behind these
variations is that there is more than one way to consistently estimate the variance in
the denominator of (43).
 (check this in PDF content)

 Start

42832
 Prefix

The literature contains several variations on this test. The main idea behind these
variations is that there is more than one way to consistently estimate the variance in
the denominator of (43). The most important of these is that of
 Exact

Basmann (1960).
 Suffix

Independently of Sargan, Basmann proposed anF(L−K,n−L)test of overidentifying
restrictions:
Basmann’sFstatistic =
uˆ′PZˆu/(L−K)
uˆ′MZˆu/(n−L)
(44)
whereMZ≡I−PZis the “annihilator” matrix andLis the total number of instruments.
 (check this in PDF content)

 Start

44076
 Prefix

Consequently,overidcalculates the uncenteredR2itself; the uncentered
total sum of squares of the auxiliary regression needed for the denominator ofR2uis simply the residual
sum of squares of the original IV regression.
13See
 Exact

Davidson and MacKinnon (1993),
 Suffix

pp. 235–6. The Basmann statistic uses the error variance
from the estimate of their equation (7.54), and the pseudo–Fform of the Basmann statistic is given by
equation (7.55); the Sargan statistic is given by their (7.57).
 (check this in PDF content)

 Start

46592
 Prefix

Another common problem arises when
the researcher has prior suspicions about the validity of a subset of instruments, and
wishes to test them.
In these contexts, a “difference–in–Sargan” statistic may usefully be employed.15
The test is known under other names as well, e.g.,
 Exact

Ruud (2000)
 Suffix

calls it the “distance
difference” statistic, and Hayashi (2000) follows Eichenbaum et al. (1988) and dubs it
theCstatistic; we will use the latter term. TheCtest allows us to test a subset of
the original set of orthogonality conditions.
 (check this in PDF content)

 Start

46653
 Prefix

In these contexts, a “difference–in–Sargan” statistic may usefully be employed.15
The test is known under other names as well, e.g., Ruud (2000) calls it the “distance
difference” statistic, and
 Exact

Hayashi (2000)
 Suffix

follows Eichenbaum et al. (1988) and dubs it
theCstatistic; we will use the latter term. TheCtest allows us to test a subset of
the original set of orthogonality conditions. The statistic is computed as the difference
between two Sargan statistics (or, for efficient GMM, twoJstatistics): that for the
(restricted, fully efficient) regression using the entire set of overidentifying res
 (check this in PDF content)

 Start

46676
 Prefix

In these contexts, a “difference–in–Sargan” statistic may usefully be employed.15
The test is known under other names as well, e.g., Ruud (2000) calls it the “distance
difference” statistic, and Hayashi (2000) follows
 Exact

Eichenbaum et al. (1988) and
 Suffix

dubs it
theCstatistic; we will use the latter term. TheCtest allows us to test a subset of
the original set of orthogonality conditions. The statistic is computed as the difference
between two Sargan statistics (or, for efficient GMM, twoJstatistics): that for the
(restricted, fully efficient) regression using the entire set of overidentifying restrictions,
versus that for the (unres
 (check this in PDF content)

 Start

47525
 Prefix

For excluded instruments, this is equivalent to dropping them from the instrument list.
For included instruments, theCtest hypothecates placing them in the list of included
endogenous variables: in essence, treating them as endogenous regressors. TheCtest,
14See
 Exact

Ahn (1995),
 Suffix

Proposition 1, or, for an alternative formulation, Wooldridge (1995), Procedure
3.2.
15See Hayashi (2000), pp. 218–21 and pp. 232–34 or Ruud (2000), Chapter 22, for comprehensive
presentations.
distributedχ2with degrees of freedom equal to the loss of overidentifying restrictions
(i.e., the number of suspect instruments being tested), has the null hypothesis that the
specified variables are pro
 (check this in PDF content)

 Start

47588
 Prefix

For included instruments, theCtest hypothecates placing them in the list of included
endogenous variables: in essence, treating them as endogenous regressors. TheCtest,
14See Ahn (1995), Proposition 1, or, for an alternative formulation,
 Exact

Wooldridge (1995),
 Suffix

Procedure
3.2.
15See Hayashi (2000), pp. 218–21 and pp. 232–34 or Ruud (2000), Chapter 22, for comprehensive
presentations.
distributedχ2with degrees of freedom equal to the loss of overidentifying restrictions
(i.e., the number of suspect instruments being tested), has the null hypothesis that the
specified variables are proper instruments.
 (check this in PDF content)

 Start

47626
 Prefix

For included instruments, theCtest hypothecates placing them in the list of included
endogenous variables: in essence, treating them as endogenous regressors. TheCtest,
14See Ahn (1995), Proposition 1, or, for an alternative formulation, Wooldridge (1995), Procedure
3.2.
15See
 Exact

Hayashi (2000),
 Suffix

pp. 218–21 and pp. 232–34 or Ruud (2000), Chapter 22, for comprehensive
presentations.
distributedχ2with degrees of freedom equal to the loss of overidentifying restrictions
(i.e., the number of suspect instruments being tested), has the null hypothesis that the
specified variables are proper instruments.
 (check this in PDF content)

 Start

47672
 Prefix

For included instruments, theCtest hypothecates placing them in the list of included
endogenous variables: in essence, treating them as endogenous regressors. TheCtest,
14See Ahn (1995), Proposition 1, or, for an alternative formulation, Wooldridge (1995), Procedure
3.2.
15See Hayashi (2000), pp. 218–21 and pp. 232–34 or
 Exact

Ruud (2000),
 Suffix

Chapter 22, for comprehensive
presentations.
distributedχ2with degrees of freedom equal to the loss of overidentifying restrictions
(i.e., the number of suspect instruments being tested), has the null hypothesis that the
specified variables are proper instruments.
 (check this in PDF content)

 Start

48866
 Prefix

More
precisely,ˆSfrom the restricted estimation is used to form the restrictedJstatistic, and
the submatrix ofˆSwith rows/columns corresponding to the unrestricted estimation is
used to form theJstatistic for the unrestricted estimation; see
 Exact

Hayashi (2000),
 Suffix

p. 220.
TheCtest is conducted inivreg2by specifying theorthogoption, and listing the
instruments (either included or excluded) to be challenged. The equation must still be
identified with these instruments either removed or reconsidered as endogenous if the
Cstatistic is to be calculated.
 (check this in PDF content)

 Start

49783
 Prefix

This illustrates how the Hansen–Sargan overidentification test is
an “omnibus” test for the failure ofanyof the instruments to satisfy the orthogonality
conditions, but at the same time requires that the investigator believe that at leastsome
of the instruments are valid; see
 Exact

Ruud (2000),
 Suffix

p. 577.
4.5Tests of overidentifying restrictions as Lagrange multiplier (score)
tests
The Sargan test can be viewed as analogous to a Lagrange multiplier (LM) or score
test.16In the case of OLS, the resemblance becomes exact.
 (check this in PDF content)

 Start

51016
 Prefix

If thegmmoption is chosen, HOLS estimates
are reported along with a robust LM statistic. As usual, theclusteroption generates
16For a detailed discussion of the relationship between the different types of tests in a GMM framework, see
 Exact

Ruud (2000),
 Suffix

Chapter 22.
a statistic that is robust to arbitrary intra–cluster correlation.
If the estimation method is OLS but the error is not homoskedastic, then the standard LM test is no longer valid. A heteroskedasticity–robust version is, however, available.17The robust LM statistic for OLS is numerically equivalent to theJstatistic
from feasible efficient two–step GMM, i.e.
 (check this in PDF content)

 Start

52136
 Prefix

As Wooldridge states, “...an important cost of
performing IV estimation whenxanduare uncorrelated: the asymptotic variance of the
IV estimator is always larger, and sometimes much larger, than the asymptotic variance
of the OLS estimator.”
 Exact

(Wooldridge (2003),
 Suffix

p. 490) Naturally, this loss of efficiency
is a price worth paying if the OLS estimator is biased and inconsistent; thus a test of
the appropriateness of OLS, and the necessity to resort to instrumental variables or
GMM methods, would be very useful.
 (check this in PDF content)

 Start

53494
 Prefix

Denote byˆβcthe estimator that is consistent under both the null and the alternative
hypotheses, and byˆβethe estimator that is fully efficient under the null but inconsistent
if the null is not true. The
 Exact

Hausman (1978)
 Suffix

specification test takes the quadratic form
H=n(ˆβc−ˆβe)′D−(ˆβc−ˆβe)
where
D=
(
V(ˆβc)−V(ˆβe)
)(45)
and whereV(ˆβ) denotes a consistent estimate of the asymptotic variance ofβ, and the
17See Wooldridge (2002), pp. 58–61, and Wooldridge (1995) for more detailed discussion.
operator−denotes a generalized inverse.
 (check this in PDF content)

 Start

53692
 Prefix

The Hausman (1978) specification test takes the quadratic form
H=n(ˆβc−ˆβe)′D−(ˆβc−ˆβe)
where
D=
(
V(ˆβc)−V(ˆβe)
)(45)
and whereV(ˆβ) denotes a consistent estimate of the asymptotic variance ofβ, and the
17See
 Exact

Wooldridge (2002),
 Suffix

pp. 58–61, and Wooldridge (1995) for more detailed discussion.
operator−denotes a generalized inverse.
A Hausman statistic for a test of endogeneity in an IV regression is formed by choosing OLS as the efficient estimatorˆβeand IV as the inefficient but consistent estimator
βˆc.
 (check this in PDF content)

 Start

53726
 Prefix

The Hausman (1978) specification test takes the quadratic form
H=n(ˆβc−ˆβe)′D−(ˆβc−ˆβe)
where
D=
(
V(ˆβc)−V(ˆβe)
)(45)
and whereV(ˆβ) denotes a consistent estimate of the asymptotic variance ofβ, and the
17See Wooldridge (2002), pp. 58–61, and
 Exact

Wooldridge (1995)
 Suffix

for more detailed discussion.
operator−denotes a generalized inverse.
A Hausman statistic for a test of endogeneity in an IV regression is formed by choosing OLS as the efficient estimatorˆβeand IV as the inefficient but consistent estimator
βˆc.
 (check this in PDF content)

 Start

55780
 Prefix

If a common
estimate ofσis used, then the generalized inverse ofDis guaranteed to exist and a
positive test statistic is guaranteed.19
If the Hausman statistic is formed using the OLS estimate of the error variance,
then theDmatrix in Equation (45) becomes
D= ˆσ2OLS
(
(X′PZX)−1−(X′X)−1
)
(47)
This version of the endogeneity test was first proposed by
 Exact

Durbin (1954) and
 Suffix

separately
by Wu (1973) (hisT4statistic) and Hausman (1978). It can be obtained within Stata by
usinghausmanwith thesigmamoreoption in conjunction with estimation byregress,
ivregand/orivreg2.
If the Hausman statistic is formed using the IV estimate of the error variance, then
theDmatrix becomes
D= ˆσ2IV
(
(X′PZX)−1−(X′X)−1
)
(48)
18Readers should also bear in mind here and below that the estimat
 (check this in PDF content)

 Start

55811
 Prefix

If a common
estimate ofσis used, then the generalized inverse ofDis guaranteed to exist and a
positive test statistic is guaranteed.19
If the Hausman statistic is formed using the OLS estimate of the error variance,
then theDmatrix in Equation (45) becomes
D= ˆσ2OLS
(
(X′PZX)−1−(X′X)−1
)
(47)
This version of the endogeneity test was first proposed by Durbin (1954) and separately
by
 Exact

Wu (1973)
 Suffix

(hisT4statistic) and Hausman (1978). It can be obtained within Stata by
usinghausmanwith thesigmamoreoption in conjunction with estimation byregress,
ivregand/orivreg2.
If the Hausman statistic is formed using the IV estimate of the error variance, then
theDmatrix becomes
D= ˆσ2IV
(
(X′PZX)−1−(X′X)−1
)
(48)
18Readers should also bear in mind here and below that the estimates of the error variance
 (check this in PDF content)

 Start

55842
 Prefix

, then the generalized inverse ofDis guaranteed to exist and a
positive test statistic is guaranteed.19
If the Hausman statistic is formed using the OLS estimate of the error variance,
then theDmatrix in Equation (45) becomes
D= ˆσ2OLS
(
(X′PZX)−1−(X′X)−1
)
(47)
This version of the endogeneity test was first proposed by Durbin (1954) and separately
by Wu (1973) (hisT4statistic) and
 Exact

Hausman (1978).
 Suffix

It can be obtained within Stata by
usinghausmanwith thesigmamoreoption in conjunction with estimation byregress,
ivregand/orivreg2.
If the Hausman statistic is formed using the IV estimate of the error variance, then
theDmatrix becomes
D= ˆσ2IV
(
(X′PZX)−1−(X′X)−1
)
(48)
18Readers should also bear in mind here and below that the estimates of the error variances may
or may not have smallsample cor
 (check this in PDF content)

 Start

56482
 Prefix

ˆσ2IV
(
(X′PZX)−1−(X′X)−1
)
(48)
18Readers should also bear in mind here and below that the estimates of the error variances may
or may not have smallsample corrections, according to the estimation package used and the options
chosen. If one of the variancecovariance matrices inDuses a smallsample correction, then so should
the other.
19The matrix difference in (47) and (48) has rankK1; see
 Exact

Greene (2000),
 Suffix

pp. 384–385. Intuitively,
the variables being tested are those not shared byXandZ, namely theK1endogenous regressors
X1. The Hausman statistic for the endogeneity test can also be expressed in terms of a test of the
coefficients of the endogenous regressors alone and the rest of theβs removed.
 (check this in PDF content)

 Start

56953
 Prefix

The Hausman statistic for the endogeneity test can also be expressed in terms of a test of the
coefficients of the endogenous regressors alone and the rest of theβs removed. In this alternate form,
the matrix difference in the expression equivalent to (47) is positive definite and a generalized inverse
is not required. See
 Exact

Bowden and Turkington (1984),
 Suffix

pp. 50–51.
This version of the statistic was proposed by separately by Wu (1973) (hisT3statistic)
and Hausman (1978). It can be obtained within Stata by usinghausmanwith the
(undocumented)sigmalessoption.
 (check this in PDF content)

 Start

57109
 Prefix

In this alternate form,
the matrix difference in the expression equivalent to (47) is positive definite and a generalized inverse
is not required. See Bowden and Turkington (1984), pp. 50–51.
This version of the statistic was proposed by separately by
 Exact

Wu (1973)
 Suffix

(hisT3statistic)
and Hausman (1978). It can be obtained within Stata by usinghausmanwith the
(undocumented)sigmalessoption.
Use ofhausmanwith thesigmamoreorsigmalessoptions avoids the additional
annoyance that because Stata’shausmantries to deduce the correct degrees of freedom
for the test from the rank of the matrixD, it may sometimes come up with the wrong
answer.
 (check this in PDF content)

 Start

57140
 Prefix

In this alternate form,
the matrix difference in the expression equivalent to (47) is positive definite and a generalized inverse
is not required. See Bowden and Turkington (1984), pp. 50–51.
This version of the statistic was proposed by separately by Wu (1973) (hisT3statistic)
and
 Exact

Hausman (1978).
 Suffix

It can be obtained within Stata by usinghausmanwith the
(undocumented)sigmalessoption.
Use ofhausmanwith thesigmamoreorsigmalessoptions avoids the additional
annoyance that because Stata’shausmantries to deduce the correct degrees of freedom
for the test from the rank of the matrixD, it may sometimes come up with the wrong
answer.
 (check this in PDF content)

 Start

58467
 Prefix

Given the choice between forming the Hausman statistic using either ˆσ2OLSor ˆσ2IV,
the standard choice is the former (the Durbin statistic) because under the null both
are consistent but the former is more efficient. The Durbin flavor of the test has the
additional advantage of superior performance when instruments are weak
 Exact

(Staiger and Stock (1997)).
 Suffix

5.2Extensions: Testing a subset of the regressors for endogeneity,
and heteroskedasticrobust testing for IV and GMM estimation
In some contexts, the researcher may be certain that one or more regressors inX1is
endogenous but may question the endogeneity of the others.
 (check this in PDF content)

 Start

62737
 Prefix

In the conditional heteroskedasticity case, the
degrees of freedom will beLe−LcifLe−Lc≤Kc1but unknown otherwise (making the
test impractical).22
What, then, is the difference between the GMMCtest and the Hausman specification test? In fact, because the two estimators being tested are both GMM estimators,
the Hausman specification test is a test of linear combinations of orthogonality conditions
 Exact

(Ruud (2000),
 Suffix

pp. 578584). When the particular linear combination of orthogonality conditions being tested is the same for theCtest and for the Hausman test,
the two test statistics will be numerically equivalent.
 (check this in PDF content)

 Start

63226
 Prefix

We can state this more precisely
as follows: IfLe−Lc≤Kc1, theCstatistic and the Hausman statistic are numerically
21Users beware: thesigmamoreoption following arobustestimation will not only fail to accomplish
this, it will generate an invalid test statistic as well.
22See
 Exact

Hausman and Taylor (1981) and Newey (1985),
 Suffix

summarized by Hayashi (2000), pp. 233–34.
equivalent.23IfLe−Lc> Kc1, the two statistics will be numerically different, theC
statistic will haveLe−Lcdegrees of freedom, and the Hausman statistic will haveKc1
degrees of freedom in the conditional homoskedasticity case (and an unknown number
of degrees of freedom in the conditional heteroskedasticity case).
 (check this in PDF content)

 Start

63284
 Prefix

We can state this more precisely
as follows: IfLe−Lc≤Kc1, theCstatistic and the Hausman statistic are numerically
21Users beware: thesigmamoreoption following arobustestimation will not only fail to accomplish
this, it will generate an invalid test statistic as well.
22See Hausman and Taylor (1981) and Newey (1985), summarized by
 Exact

Hayashi (2000),
 Suffix

pp. 233–34.
equivalent.23IfLe−Lc> Kc1, the two statistics will be numerically different, theC
statistic will haveLe−Lcdegrees of freedom, and the Hausman statistic will haveKc1
degrees of freedom in the conditional homoskedasticity case (and an unknown number
of degrees of freedom in the conditional heteroskedasticity case).
 (check this in PDF content)

 Start

65078
 Prefix

faces
a trade–off when deciding which of the two tests to use: when the two tests differ, the
Hausman test is a test of linear combinations of moment conditions, and is more powerful
than theCtest at detecting violations on restrictions of these linear combinations, but
the latter test will be able to detect other violations of moment conditions that the
former test cannot. As
 Exact

Ruud (2000),
 Suffix

pp. 585, points out, one of the appealing features
of the Hausman test is that its particular linear combination of moment conditions also
determines the consistency of the more efficient GMM estimator.
 (check this in PDF content)

 Start

68308
 Prefix

Yet another asymptotically equivalent flavor of the DWH test is available for standard IV estimation under conditional homoskedasticity, and is included in the output
ofivendog. This is the test statistic introduced by
 Exact

Wu (1973)
 Suffix

(hisT2), and separately shown by Hausman (1978) to be calculated straightforwardly through the use of
auxiliary regressions. We will refer to it as the Wu–Hausman statistic.24
Consider a simplified version of our basic model (1) with a single endogenous regressorx1:
y=β1x1+X2β2+u,(49)
withX2≡Z2assumed exogenous (including the constant, if one is specified) and with
excluded instrumentsZ1as usua
 (check this in PDF content)

 Start

68356
 Prefix

Yet another asymptotically equivalent flavor of the DWH test is available for standard IV estimation under conditional homoskedasticity, and is included in the output
ofivendog. This is the test statistic introduced by Wu (1973) (hisT2), and separately shown by
 Exact

Hausman (1978) to
 Suffix

be calculated straightforwardly through the use of
auxiliary regressions. We will refer to it as the Wu–Hausman statistic.24
Consider a simplified version of our basic model (1) with a single endogenous regressorx1:
y=β1x1+X2β2+u,(49)
withX2≡Z2assumed exogenous (including the constant, if one is specified) and with
excluded instrumentsZ1as usual.
 (check this in PDF content)

 Start

69463
 Prefix

At−test of the significance of ˆvin
this auxiliary regression is then a direct test of the null hypothesis—in this context,
thatθ= 0:
y=β1x1+X2β2+θˆv+(51)
24A more detailed presentation of the test can be found in
 Exact

Davidson and MacKinnon (1993),
 Suffix

pp. 237–
42.
The Wu–Hausman test may be readily generalized to multiple endogenous variables,
since it merely requires the estimation of the first–stage regression for each of the endogenous variables, and augmentation of the original model with their residual series.
 (check this in PDF content)

 Start

70148
 Prefix

The test statistic then becomes anF−test, with numerator degrees of freedom equal
to the number of included endogenous variables. One advantage of the Wu–Hausman
F−statistic over the other DWH tests for IV vs. OLS is that with certain normality
assumptions, it is a finite sample test exactly distributed asF(see
 Exact

Wu (1973) and Nakamura and Nakamura (1981)). Wu (1974)
 Suffix

’s Monte Carlo studies also suggest that
this statistic is to be preferred to the statistic using just ˆσ2IV.
A version of the Wu–Hausman statistic for testing a subset of regressors is also
available, as Davidson and MacKinnon (1993), pp. 241–242 point out.
 (check this in PDF content)

 Start

70420
 Prefix

Wu (1974)’s Monte Carlo studies also suggest that
this statistic is to be preferred to the statistic using just ˆσ2IV.
A version of the Wu–Hausman statistic for testing a subset of regressors is also
available, as
 Exact

Davidson and MacKinnon (1993),
 Suffix

pp. 241–242 point out. The modified
test involves estimating the first–stage regression for each of theK1Bvariables inX1B
in order to generate a residual series. These residual seriesˆVBare then used to augment
the original model:
y=X1Aδ+X1Bλ+X2β+ˆVBΘ +(52)
which is then estimated via instrumental variables, with onlyX1Aspecified as included
endogenous variables.
 (check this in PDF content)

 Start

71308
 Prefix

An inconvenient complication here is that an ordinaryFtest for the significance
of Θ in this auxiliary regression willnotbe valid, because the unrestricted sum of
squares needed for the denominator is wrong, and obtaining the correct SSR requires
further steps (see
 Exact

Davidson and MacKinnon (1993),
 Suffix

chapter 7). Only in the special
case where the efficient estimator is OLS will an ordinaryF−test yield the correct test
statistic. The auxiliary regression approach to obtaining the Wu–Hausman statistic
described above has the further disadvantage of being computationally expensive and
practically cumbersome when there are more than a few endogenous variables to be
test
 (check this in PDF content)

 Start

72416
 Prefix

) =
Q∗
USSR/n
(53)
and the Wu–HausmanF−statistic can be written
WuHausman:F(K1B,n−K−K1B) =
Q∗/K1B
(USSR−Q∗)/(n−K−K1B)
(54)
whereQ∗is the difference between the restricted and unrestricted sums of squares given
by the auxiliary regression (51) or (52), andUSSRis the sum of squared residuals from
the efficient estimate of the model.25From the discussion in the preceding section,
25See
 Exact

Wu (1973)
 Suffix

or Nakamura and Nakamura (1981).Q∗can also be interpreted as the difference
between the sums of squares of the second–stage estimation of the efficient model with and without
however, we know that for tests of the endogeneity of regressors, theCstatistic and the
Hausman form of the DWH test are numerically equal, and when the error variance from
the more efficient estimation is used, the Hausman f
 (check this in PDF content)

 Start

72429
 Prefix

n
(53)
and the Wu–HausmanF−statistic can be written
WuHausman:F(K1B,n−K−K1B) =
Q∗/K1B
(USSR−Q∗)/(n−K−K1B)
(54)
whereQ∗is the difference between the restricted and unrestricted sums of squares given
by the auxiliary regression (51) or (52), andUSSRis the sum of squared residuals from
the efficient estimate of the model.25From the discussion in the preceding section,
25See Wu (1973) or
 Exact

Nakamura and Nakamura (1981).
 Suffix

Q∗can also be interpreted as the difference
between the sums of squares of the second–stage estimation of the efficient model with and without
however, we know that for tests of the endogeneity of regressors, theCstatistic and the
Hausman form of the DWH test are numerically equal, and when the error variance from
the more efficient estimation is used, the Hausman form of the DWH test is the Durbi
 (check this in PDF content)