# The Distribution of the Bootstrap Observations

Asymptotic Theory for Least Squares 7.1 Introduction It turns out that the asymptotic theory of least-squares estimation applies equally to the projection model and the linear CEF model, and therefore the results in this chapter will be stated for the broader projection model described in Section 2.18. Recall that the model is yi = x 0 i + ei for i = 1; :::; n; where the linear projection coe¢ cient is = E xix 0 i 1 E (xiyi): Maintained assumptions in this chapter will be random sampling (Assumption 1.2) and Önite second moments (Assumption 2.1). We restate these conditions here for clarity. Assumption 7.1 1. The observations (yi ; xi); i = 1; :::; n; are independent and identically distributed. 2. E y 2 < 1: 3. E kxk 2 < 1: 4. Qxx = E (xx0 ) is positive deÖnite. The distributional results will require a strengthening of these assumptions to Önite fourth moments. We discuss the speciÖc conditions in Section 7.3. 7.2 Consistency of Least-Squares Estimator In this section we use the weak law of large numbers (WLLN, Theorem 6.2 and Theorem 6.6) and continuous mapping theorem (CMT, Theorem 6.19) to show that the least-squares estimator b is consistent for the projection coe¢ cient : This derivation is based on three key components. First, the OLS estimator can be written as a continuous function of a set of sample moments. Second, the WLLN shows that sample moments CHAPTER 7. ASYMPTOTIC THEORY FOR LEAST SQUARES 221 converge in probability to population moments. And third, the CMT states that continuous functions preserve convergence in probability. We now explain each step in brief and then in greater detail. First, observe that the OLS estimator b = 1 n Xn i=1 xix 0 i !1 1 n Xn i=1 xiyi ! = Qb 1 xxQb xy is a function of the sample moments Qb xx = 1 n Pn i=1 xix 0 i and Qb xy = 1 n Pn i=1 xiyi : Second, by an application of the WLLN these sample moments converge in probability to the population moments. SpeciÖcally, the fact that (yi ; xi) are mutually independent and identically distributed implies that any function of (yi ; xi) is iid, including xix 0 i and xiyi : These variables also have Önite expectations under Assumption 7.1. Under these conditions, the WLLN (Theorem 6.6) implies that as n ! 1; Qb xx = 1 n Xn i=1 xix 0 i p ! E xix 0 i = Qxx (7.1) and Qb xy = 1 n Xn i=1 xiyi p ! E (xiyi) = Qxy : Third, the CMT ( Theorem 6.19) allows us to combine these equations to show that b converges in probability to : SpeciÖcally, as n ! 1; b = Qb 1 xxQb xy p ! Q1 xxQxy = : (7.2) We have shown that b p ! , as n ! 1: In words, the OLS estimator converges in probability to the projection coe¢ cient vector as the sample size n gets large. To fully understand the application of the CMT we walk through it in detail. We can write b = g Qb xx; Qb xy where g (A; b) = A1 b is a function of A and b: The function g (A; b) is a continuous function of A and b at all values of the arguments such that A1 exists. Assumption 7.1 speciÖes that Q1 xx exists and thus g (A; b) is continuous at A = Qxx: This justiÖes the application of the CMT in (7.2). For a slightly di§erent demonstration of (7.2), recall that (4.6) implies that b = Qb 1 xxQb xe (7.3) where Qb xe = 1 n Xn i=1 xiei : The WLLN and (2.23) imply Qb xe p ! E (xiei) = 0: Therefore b = Qb 1 xxQb xe CHAPTER 7. ASYMPTOTIC THEORY FOR LEAST SQUARES 222 which is the same as b p ! . Theorem 7.1 Consistency of Least-Squares Under Assumption 7.1, Qb xx p ! Qxx; Qb xy p ! Qxy ; Qb 1 xx p ! Q1 xx; Qb xe p ! 0; and b p ! as n ! 1: Theorem 7.1 states that the OLS estimator b converges in probability to as n increases, and thus b is consistent for . In the stochastic order notation, Theorem 7.1 can be equivalently written as b = + op(1): (7.4) To illustrate the e§ect of sample size on the least-squares estimator consider the least-squares regression ln(W agei) = 1Educationi + 2Experiencei + 3Experience2 i + 4 + ei : We use the sample of 24,344 white men from the March 2009 CPS. Randomly sorting the observations, and sequentially estimating the model by least-squares, starting with the Örst 5 observations, and continuing until the full sample is used, the sequence of estimates are displayed in Figure 7.1. You can see how the least-squares estimate changes with the sample size, but as the number of observations increases it settles down to the full-sample estimate b 1 = 0:114: 5000 10000 15000 20000 0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 Number of Observations OLS Estimation Figure 7.1: The Least-Squares Estimator b 1 as a Function of Sampl