# Probabability Inequalities

0. 12.28 Control Function Regression In this section we present an alternative way of computing the 2SLS estimator by least squares. It is useful in more complicated nonlinear contexts, and also in the linear model to construct tests for endogeneity. The structural and reduced form equations for the standard IV model are yi = x 0 1i1 + x 0 2i2 + ei x2i = 0 12z1i + 0 22z2i + u2i : Since the instrumental variable assumption speciÖes that E (ziei) = 0, x2i is endogenous (correlated with ei) if and only if u2i and ei are correlated. We can therefore consider the linear projection of ei on u2i ei = u 0 2i + “i = E u2iu 0 2i 1 E (u2iei) E (u2i”i) = 0: Substituting this into the structural form equation we Önd yi = x 0 1i1 + x 0 2i2 + u 0 2i + “i (12.61) E (x1i”i) = 0 E (x2i”i) = 0 E (u2i”i) = 0: Notice that x2i is uncorrelated with “i . This is because x2i is correlated with ei only through u2i , and “i is the error after ei has been projected orthogonal to u2i . If u2i were observed we could then estimate (12.61) by least-squares. While it is not observed, we can estimate u2i by the reduced-form residual ub2i = x2i b 0 1 CHAPTER 12. INSTRUMENTAL VARIABLES 443 as deÖned in (12.20). Then the coe¢ cients (1 ; 2 ; ) can be estimated by least-squares of yi on (x1i ; x2i ;ub2i). We can write this as yi = x 0 ib + ub 0 2ib + “bi (12.62) or in matrix notation as y = Xb + Ub 2b + b”. This turns out to be an alternative algebraic expression for the 2SLS estimator. Indeed, we now show that b = b 2sls. First, note that the reduced form residual can be written as Ub 2 = (In P Z) X2 where P Z is deÖned in (12.32). By the FWL representation b = Xf0 Xf 1 Xf0 y (12.63) where Xf = h Xf1; Xf2 i , with Xf1 = X1 Ub 2 Ub 0 2Ub 2 1 Ub 0 2X1 = X1 (since Ub 0 2X1 = 0) and Xf2 = X2 Ub 2 Ub 0 2Ub 2 1 Ub 0 2X2 = X2 Ub 2 X0 2 (In P Z) X2 1 X0 2 (In P Z) X2 = X2 Ub 2 = P ZX2. Thus Xf = [X1; P ZX2] = P ZX. Substituted into (12.63) we Önd b = X0P ZX 1 X0P Zy = b 2sls which is (12.33) as claimed. Again, what we have found is that OLS estimation of equation (12.62) yields algebraically the 2SLS estimator b 2sls. We now consider the distribution of the control function estimates. It is a generated regression model, and in fact is covered by the model examined in Section 12.27 after a slight reparametrization. Let wi = 0zi and ui = xi 0zi = (0 0 ;u 0 2i ) 0 . Then the main equation (12.61) can be written as yi = w0 i + u 0 2i + “i where = + 2 . This is the model in Section 12.27. Set b = b + b 2 It follows from (12.60) that as n ! 1 we have the joint distribution p n b 2 2 b d ! N (0;V ) where V = V CHAPTER 12. INSTRUMENTAL VARIABLES 444 V 22 = h 0E ziz 0 i 1 0E ziz 0 i e 2 i 0E ziz 0 i 1 i 22 V 2 = h E u2iu 0 2i 1 E uiz 0 i ei”i 0E ziz 0 i 1 i 2 V = E u2iu 0 2i 1 E u2iu 0 2i ”