@let@token lecture 9: nonparametric regression (1)

25
Introduction Estimations: local modelling Cross Validation Assignments Lecture 9: Nonparametric Regression (1) Applied Statistics 2015 1 / 22

Upload: others

Post on 30-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Lecture 9: Nonparametric Regression (1)

Applied Statistics 2015

1 / 22

Page 2: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

The New Jersey Pick-It Lottery is a daily numbers game run by thestate of New Jersey. Buying a ticket entitles a player to pick a numberbetween 0 and 999. Half of the money bet each day goes into theprize pool. (The state takes the other half.) The state picks a winningnumber at random, and the prize pool is shared equally among allwinning tickets.

We analyze the first 254 drawings after the lottery started in 1975.Figure 1 shows a scatterplot of the winning numbers and their payoffs.

2 / 22

Page 3: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

Although all numbers are equally likely to win, numbers chosen byfewer people have bigger payoffs if they win because the prize is sharedamong fewer tickets.Question: can we find some pattern from the data? Are there numberswith larger payoffs? 3 / 22

Page 4: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It LotteryThe question can be answered by regression analysis.Linear regression: assumes the linear relation between payoff and win-ning number. The blue dashed line is the least squares regression line,which shows a general trend of higher payoffs for larger winning num-bers.

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

4 / 22

Page 5: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Nonparametric regression

Nonparametric regression do not assume any parametric structure. It isalso known as “learning a function” in the field of machine learning. Thereare n pairs of observations (x1, Y1), . . . , (xn, Yn). The response variableY is related to the covariate x by the equations

Yi = r(xi) + εi, i = 1, . . . , n

where r is the regression function, E(εi) = 0 and Var(εi) = σ2.

Here, we want to estimate r under weak assumptions withoutassuming a parametric model of r.

We are treating the covariate xi as fixed – fixed design. For randomdesign, the data are (Xi, Yi), i = 1, . . . , n and r(x) is the conditionalexpectation of Y given that X = x: r(x) = E(Y |X = x).

5 / 22

Page 6: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

A general idea behind different estimations

Note that Yi is the sum of r(xi) and some error, the expected valueof which is zero. This motivates to estimate r(x) by the average ofthose Yi where Xi is “close” to x.

Different ways of averaging and different measures of closeness leadto different estimators.

6 / 22

Page 7: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An Example

The data are n = 60 pairs of observations from a certain regressionmodel.

How to construc rn, an etimator of r?

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

x

Y

7 / 22

Page 8: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: RegressogramA regressogram is construced in a similar manner as that for histogram.Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equallyspaced bins denoted by B1, B2, . . . , Bm. Define the regressogram,

gn(x) =1

kj

∑i:xi∈Bj

Yi, for x ∈ Bj ,

where kj is the number of points in Bj . Here we use the convention 00 = 0.

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Regressogram (m=10)

x

Y

8 / 22

Page 9: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: RegressogramA regressogram is construced in a similar manner as that for histogram.Here we consider that xi ∈ [0, 1]. Devide the unit interval into m equallyspaced bins denoted by B1, B2, . . . , Bm. Define the regressogram,

gn(x) =1

kj

∑i:xi∈Bj

Yi, for x ∈ Bj ,

where kj is the number of points in Bj . Here we use the convention 00 = 0.

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Regressogram (m=10)

x

Y

8 / 22

Page 10: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Local averageFix h > 0,

rn(x) =

∑ni=1 I(x− h < xi ≤ x+ h)Yi∑ni=1 I(x− h < xi ≤ x+ h)

.

This is also called naive kernel estimator: rn(x) =∑n

i=112 1[−1,1)(

x−xih )Yi∑n

i=112 1[−1,1)(

x−xih )

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Local Average (h=0.2)

x

Y

9 / 22

Page 11: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Estimator: Local averageFix h > 0,

rn(x) =

∑ni=1 I(x− h < xi ≤ x+ h)Yi∑ni=1 I(x− h < xi ≤ x+ h)

.

This is also called naive kernel estimator: rn(x) =∑n

i=112 1[−1,1)(

x−xih )Yi∑n

i=112 1[−1,1)(

x−xih )

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2Local Average (h=0.2)

x

Y

9 / 22

Page 12: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Nadaraya-Watson EstimatorReplacing the box kernel by a general kernel in the local averageestimator, we obtain the Nadaraya-Watson estimator of r:

rn(x) =

∑ni=1K

(x−xi

h

)Yi∑n

i=1K(x−xi

h

) .

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Nadaraya−Watson (h=0.2, kernel=guassian)

x

Y

10 / 22

Page 13: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Nadaraya-Watson EstimatorReplacing the box kernel by a general kernel in the local averageestimator, we obtain the Nadaraya-Watson estimator of r:

rn(x) =

∑ni=1K

(x−xi

h

)Yi∑n

i=1K(x−xi

h

) .

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

Nadaraya−Watson (h=0.2, kernel=guassian)

x

Y

10 / 22

Page 14: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

The black curve indicates r(x), the real regression function.The underlying mode is: Yi = sin(8xi)− xi + x3i + εi, withεi ∼ N(0, 0.5).

●●

●●

●●

0.0 0.2 0.4 0.6 0.8 1.0

−2

−1

01

2

x

Y

11 / 22

Page 15: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Some comments

The three estimators can be written in the same form:

rn(x) =

n∑i=1

li(x)Yi.

Define the class of piecewise constant functionsFm = {m : m(t) =

∑mi=1 ciI(t ∈ Bi), ci ∈ R} . Then the

regressogram

gn = argminm∈Fm

n∑i=1

(Yi − m(xi))2.

The Nadaraya-Watson estimator can be considered as locally fittinga constant to to the data:

rn(x) = argminc∈R

n∑i=1

K

(x− xih

)(Yi − c)2.

12 / 22

Page 16: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Risk

For fixed x,

MSE(rn(x)) = E((rn(x)− r(x))2

)= (E(rn(x))− r(x))2+Var(rn(x)) .

For global index, we consider

MISE(rn) =E

(∫(rn(x)− r(x))2dx

)=

∫(E(rn(x))− r(x))2 dx+

∫Var(rn(x)) dx;

and the average mean square error

AMSE(rn) =1

n

n∑i=1

E((rn(xi)− r(xi))2

).

13 / 22

Page 17: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

Take the AMSE as the criteria. We would like to choose h to minimize

AMSE(h) =1

n

n∑i=1

E((rnh(xi)− r(xi))2

).

Since r is unknown, we need to estimate AMSE(h). As a first guess,one might think of the average residual sums of squares

1

n

n∑i=1

(Yi − rnh(xi))2 .

This turns out to be a bad choice. It usually leads to undersmoothing(overfitting). The reason is that it favors estimates which are toowell-adapted for the data and are not reasonsable for new observations.

14 / 22

Page 18: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

We estimate the risk using the leave-one-out cross validation scoredefined as

CV (h) =1

n

n∑i=1

(Yi − r(i)nh(xi)

)2,

where r(i)nh(xi) is the estimator based on {(xj , Yj), 1 ≤ j ≤ n, j 6= i}, i.e.

ommitting the observation (xi, Yi).

15 / 22

Page 19: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Cross Validation: choosing bandwidths

In order to compute the CV score, there is no need to fit the curve ntimes. Let rnh(x) =

∑ni=1 li(x)Yi. Then CV (h) can be written as

CV (h) =1

n

n∑i=1

(Yi − rnh(xi)1− li(xi)

)2

.

Hence

hcv = argminh

CV (h) = argminh

1

n

n∑i=1

(Yi − rnh(xi)1− li(xi)

)2

.

16 / 22

Page 20: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lotteryh = 1, 10, 20, 50.

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

17 / 22

Page 21: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

An example: Pick-It Lottery

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

0 200 400 600 800 1000

200

400

600

800

Number

Pay

off

The curve suggests that there were larger payoffs for numbers in theinterval [0, 100]. People tended to pick numbers starting with 2 and3. This pattern disappeared after 1976. People noticed the patternand changed their choices.

18 / 22

Page 22: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Lectures 10 – 12

19 / 22

Page 23: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 16

The data are the infant-mortality rates (infant death per 1000 livebirths) and GDP per capita (in U.S. dollars) for 193 countries in2003.Make a scatter plot of the data.Estimate the regression function with different approaches. Give yourcomments.

20 / 22

Page 24: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 17

Downloand the dataset CMB fromhttp://www.stat.cmu.edu/~larry/all-of-nonpar/data.html.Consider power as response variable and Multipole as covariate.Fit a model based on the first 400 observations:

Make a scatter plot of the data. Consider Nadaraya-Watsonestimator. Using CV (h) score to choose h. Present your estimate.

Repeat the procedure above, but for the whole data set.

21 / 22

Page 25: @let@token Lecture 9: Nonparametric Regression (1)

Introduction Estimations: local modelling Cross Validation Assignments

Group Presentation (April 20)

Group 18

Consider the following model.

Yi = r(xi) + εi,

where r(x) = x2 − 2x, x ∈ [0, 2] and εi iid from N(0, 0.52).

Simulate one sample {(xi, Yi), i = 1, . . . , 100}. You can choosexi = i/50. Fit a Nadaraya-Watson estimator to this data.Choose your kernel and h. Estimate MSE(rn(1)) by simulation.Hint: simulate many samples from the model and use the samplecounterpart as the estimator.Repeat step 2 for a different h. Compare the results and give yourcomments.

22 / 22