# conditional Expectation Function .

tic.

4.11 Estimation of Error Variance

The error variance

2 = E

e

2

i

can be a parameter of interest even in a heteroskedastic regression

or a projection model.

2 measures the variation in the ìunexplainedî part of the regression. Its

method of moments estimator (MME) is the sample average of the squared residuals:

b

2 =

1

n

Xn

i=1

eb

2

i

:

In the linear regression model we can calculate the mean of b

2

: From (3.29) and the properties

of the trace operator, observe that

b

2 =

1

n

e

0M e =

1

n

tr

e

# 0M e

1

n

tr M ee0

:

Then

E

b

2

j X

1

n

tr

E

M ee0

j X

1

n

tr ME

ee0

j X

1

n

tr (MD): (4.25)

Adding the assumption of conditional homoskedasticity E

e

2

i

j xi

=

2

; so that D = In

2

; then

(4.25) simpliÖes to

E

b

2

j X

1

n

tr M

2

CHAPTER 4. LEAST SQUARES REGRESSION 115

the Önal equality by (3.23). This calculation shows that b

2

is biased towards zero. The order of

the bias depends on k=n, the ratio of the number of estimated coe¢ cients to the sample size.

Another way to see this is to use (4.22). Note that

E

b

2

j X

1

n

Xn

i=1

E

eb

2

i

j X

1

n

Xn

i=1

(1 hii)

# 2

n k

n

2

the last equality using Theorem 3.6.

Since the bias takes a scale form, a classic method to obtain an unbiased estimator is by rescaling

the estimator. DeÖne

s

2 =

1

n k

Xn

i=1

eb

2

i

: (4.26)

By the above calculation,

E

s

2

j X

=

2

and

E

s

2

=

2

:

Hence the estimator s

2

is unbiased for

2

: Consequently, s

2

is known as the ìbias-corrected estimatorî for

2 and in empirical practice s

2

is the most widely used estimator for

2

:

Interestingly, this is not the only method to construct an unbiased estimator for

2

. An estimator constructed with the standardized residuals ei from (4.23) is

2 =

1

n

Xn

i=1

e

2

i =

1

n

Xn

i=1

(1 hii)

1

eb

2

i

:

You can show (see Exercise 4.9) that

E

2

j X

=

2

(4.27)

and thus

2

is unbiased for

2

(in the homoskedastic linear regression model).

When k=n is small (typically, this occurs when n is large), the estimators b

2

; s2 and

2 are

likely to be similar to one another. However, if k=n is large then s

2 and

2 are generally preferred

to b

2

: Consequently it is best to use one of the bias-corrected variance estimators in app