# Existence and Uniqueness of the Conditional Expectation

That is, the MLE for is algebraically identical to the OLS estimator. Solving the second FOC (5.8) for b 2 mle we Önd b 2 mle = 1 n Xn i=1 yi x 0 ib mle2 = 1 n Xn i=1 yi x 0 ib ols2 = 1 n Xn i=1 eb 2 i = b 2 ols: Thus the MLE for 2 is identical to the OLS/moment estimator from (3.27). Since the OLS estimator and MLE under normality are equivalent, b is described by some authors as the maximum likelihood estimator, and by other authors as the least-squares estimator. It is important to remember, however, that b is only the MLE when the error e has a known normal distribution, and not otherwise. Plugging the estimators into (5.5) we obtain the maximized log-likelihood log L b mle; b 2 mle = n 2 log 2b 2 mle n 2 : (5.9) The log-likelihood is typically reported as a measure of Öt. It may seem surprising that the MLE b mle is numerically equal to the OLS estimator, despite emerging from quite di§erent motivations. It is not completely accidental. The least-squares estimator minimizes a particular sample loss function ñ the sum of squared error criterion ñ and most loss functions are equivalent to the likelihood of a speciÖc parametric distribution, in this case the normal regression model. In this sense it is not surprising that the least-squares estimator can be motivated as either the minimizer of a sample loss function or as the maximizer of a likelihood function. Carl Friedrich Gauss The mathematician Carl Friedrich Gauss (1777-1855) proposed the normal regression model, and derived the least squares estimator as the maximum likelihood estimator for this model. He claimed to have discovered the method in 1795 at the age of eighteen, but did not publish the result until 1809. Interest in Gaussís approach was reinforced by Laplaceís simultaneous discovery of the central limit theorem, which provided a justiÖcation for viewing random disturbances as approximately normal. 5.9 Distribution of OLS Coe¢ cient Vector In the normal linear regression model we can derive exact sampling distributions for the OLS/MLE estimator, residuals, and variance estimator. In this section we derive the distribution of the OLS coe¢ cient estimator. The normality assumption ei j xi N 0; 2 combined with independence of the observations has the multivariate implication e j X N 0; In 2 : That is, the error vector e is independent of X and is normally distributed. Recall that the OLS estimator satisÖes b = X0 CHAPTER 5. NORMAL REGRESSION AND MAXIMUM LIKELIHOOD 155 which is a linear function of e. Since linear functions of normals are also normal (Theorem 5.4), this implies that conditional on X, b X X0X 1 X0N 0; In 2 N 0; 2 X0X 1 X0X X0X 1 = N 0; 2 X0X 1 : An alternative way of writing this is b X N ; 2 X0X 1 : This shows that under the assumption of normal errors, the OLS estimator has an exact normal distribution. Theorem 5.13 In the linear regression model, b X N ; 2 X0X 1 : Theorems 5.4 and 5.13 imply that any a¢ ne function of the OLS estimator is also normally distributed, including individual components. Letting j and bj denote the j th elements of and b, we have bj X N j ; 2 h X0X 1 i jj : (5.10) Theorem 5.13 is a statement about the conditional distribution. What about the unconditional distribution? In Section 4.7 we presented Kinalís theorem about the existence of moments for the joint normal regression model. We re-state the result here. Theorem 5.14 (Kinal, 1980) If y,x are jointly normal, then for any r, E b r < 1 if and on