on nonparametric interval estimation of a regression function based on the resampling

28
ON NONPARAMETRIC INTERVAL ESTIMATION OF A REGRESSION FUNCTION BASED ON THE RESAMPLING ALEXANDER ANDRONOV Riga Technical University Riga, Latvia

Upload: dacia

Post on 10-Feb-2016

50 views

Category:

Documents


0 download

DESCRIPTION

ON NONPARAMETRIC INTERVAL ESTIMATION OF A REGRESSION FUNCTION BASED ON THE RESAMPLING. ALEXANDER ANDRONOV Riga Technical University Riga, Latvia. OUTLINE. 1. Introduction 2. Averaging Method 3. Median Smoothing 4. Case Study References. 1. INTRODUCTION. 2. AVERAGING METHOD. Lemma 1. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

ON NONPARAMETRIC INTERVAL ESTIMATION

OF A REGRESSION

FUNCTION BASED ON THE RESAMPLING

ALEXANDER ANDRONOVRiga Technical University

Riga, Latvia

Page 2: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

OUTLINE 1. Introduction 2. Averaging Method 3. Median Smoothing 4. Case Study References

Page 3: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

1. INTRODUCTIONWe consider nonparametric regression

xmY , (1.1)

where Y is a dependent variable, m is an unknown regression function, x is a d -dimensional vector of independent variables (regressors), is a random term. It is supposed that the random term has zero expectation ( 0E ) and variance )(2 xVar where 2 is an unknown constant and )(x is a known weighted function. Furthermore we have a sequence of independent observations ii xY , ,

ni ..,,2,1 . On that base we need to construct an upper confidence bound xm~ for xm at the point x corresponding to probability :

xmxmP ~ . (1.2)

Page 4: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Usual way [DiCicco and Efron, 1996] consists of using a consistent and asymptotic normal distributed estimate xm of xm . A final expression contains derivatives xm , xm and variance 2 that are replaced by corresponding estimators.

The resampling approach [Wu, 1986, Andronov and Afanasyeva, 2004] gives an alternative way that can be described as follows. For fixed point x we take k nearest neighbors

kxxx ...,,, 21 of x among nxxx ...,,, 21 (in some sense, for example using any kernel function iH xxK , Mahalanobis or other distance):

)(:,...,, 21 xIixxxx cik ,

where

....,,, among of neighborsnearest theof one is : 21 nic xxxxkxixI

Page 5: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Now we have sample kk YxYxYx ,...,,,,, 2211 instead of

nn YxYxYx ,...,,,,, 2211 .

Then we derive sample without replacement { riii ...,,, 21 } of size r ( kr ) from set {1, 2, …, k}, form resample YxYxYx r ,...,,,,, 2211 ,

where jij xx and

jij YY , and calculate estimate )(xm of our function

of interest xm .

Then we return all selected elements into initial samples and repeat this procedure R times. As a result the sequence of estimators xmxmxm R

...,,, 21 takes place. After ordering we have sequence xmxmxm R)()2()1( ...,,, where xmxm ii )1()( .

Let number R is selected so that R is an integer. Then we set xmxm R~ .

Page 6: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Averaging method and Median smoothing method for xm calculation are considered. Our main aim is to elaborate a numerical method for cover probability calculation:

xmxmPx ~)(Pr . (1.3)

It means that we need to known a distribution of the R -th order statistic

xm R )( . That is a main problem that it necessary to solve.

Page 7: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

2. AVERAGING METHOD At first we consider the method of kernel regression estimation [Hardle etc., 2004]. Let HK be any kernel function (Epanechnikov, Quartic and so on). Then Nadaraya-Watson point estimator xm is calculated by the formula

r

iiiHr

iiH

YxxKxxK

xm1

1

1

, (2.1)

where ix and

iY are the vectors of independent variables and dependent variable for the i -th elements of the resample, ri ...,,2,1 .

Page 8: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

The resampling procedure gives us sequence xmxmxm R ...,,, 21 ,

r

iiiHr

iiH

j jYjxxKjxxK

xm1

1

1

(2.2)

where jxi and jYi

are the vectors of independent variables and dependent variable for the i -th elements of the j -th resample,

ri ...,,2,1 , Rj ...,,2,1 .

With respect to (1.1) we have:

jxmjxxKjxxK

jxxmE i

r

iiHr

iiH

j

1

1

1 ,

Page 9: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

jxwjxxK

jxxK

jxxmVar i

r

iiH

r

iiH

j

1

22

1

2 ,

where jxjxjxjx r ...,,, 21 .

Then

z

r

iHr

iH

zjj zmzxK

zxKrk

zxmE

rk

xmE1

1

111 ,

(2.3) where the sums are taken on - a set of all r -samples without replacement from }....,,,{ 21

kxxx

Analogous expression we are able to write down for unconditional variance.

Page 10: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

At first let us calculate the second moment:

zi

r

iiH

r

iiH

z

zjYzxKE

zxKrk

zxmE

rk

xmE

2

12

1

22

)(11

)(1)(

.)()(2

)()(11

1

1 1

22

1

22

1

r

i

r

ijjijHiH

zii

r

iiH

r

iiH

zmzmzxKzxK

zmzwzxK

zxKrk

Page 11: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Now the variance can be calculated by formula

.)()()( 22 xmExmExmVar (2.4)

Now we need to calculate the covariance between two various estimates xm j

and xm j ' . We have for j j’:

.)()()(

)()()()())(),((2

'

''

xmExmxmE

xmxmxmxmExmxmCov

jj

jj

d

jj

(2.5)

Page 12: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING
Page 13: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Therefore

z r

iiH

jjzxKr

kxmxmCov

1

22

)('),(

.)(1 2

1

v

r

vzzmmHr

iiH

m

zwzxKvxK

(2.7)

To avoid computational difficulties, it is possible to consider the following estimate instead of (2.1):

r

iiY

rxm

1

1)( (2.8)

and corresponding those sequence xmxmxm R ...,,, 21 .

Expectations, variances and covariance matrix for this sequence of random variables can be determined using the following lemmas.

Page 14: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Lemma 1 Let kZZZ ...,,, 21 be independent random variables with expectations

k ...,,, 21 and variances 222

21 ...,,, k . Let

rZZZ ...,,, 21 be a random sample of size r from kZZZ ...,,, 21 without replacement and S be their sum:

rZZZS ...21 . Then

kkrSE ...21 , (2.9)

1

1 12

1

22

12

k

j

k

jiji

k

jjj kk

rkrk

rkkrSVar . (2.10)

Page 15: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Lemma 2 For the conditions of the previous Lemma let sample

rZZZ ...,,, 21 be returned into set kZZZ ...,,, 21 and the described procedure be repeated, so that we have new sample

rZZZ ...,,, 21 and a corresponding sum rZZZS ...21 . Then the covariance between S and S is calculated by formula

k

ii

k

iii k

rkkk

rkkrSSCov

1

22

1

2

11, . (2.11)

Page 16: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING
Page 17: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

3. MEDIAN SMOOTHINGAs it is noted by [Hardle etc, 2002] “Median smoothing may be described as the nearest-neighbor technique to solve the problem of estimating the conditional median function, rather than the conditional expectation function … . The conditional median function xYmed is more robust to outliers than the conditional expectation xYExm .”

For the resample rijYjx ii ...,,2,1;, the median smoother is defined as

jYjYjYmedxm rj ...,,, 21 . (3.1)

To evaluate the cover probability (1.3) we apply an approach that has been elaborated by [Andronov and Afanasyeva, 2004]. The idea of this approach is such: any median estimator coincides with some element of the initial sample. Therefore we need to calculate a corresponding probability for this element.

Page 18: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

At first we consider the following problem. Let kYYY ...,,, 21 be the order statistics for initial dependent variables

kYYY ...,,, 21 and dxQ , be the probability of event 1)( dd YxmY :

1)(, dd YxmYPdxQ , nd ...,,1,0 , (3.2)

where 0Y , 1nY .

Formally, this probability can be expressed by formula

zvdxz zv

vmxmFvmxmFdxQ 1,),(

(3.3)

where dx, is a set of all d -samples dxxxz ...,,, 21 without

replacement from kxxx ...,,, 21 , xPxF is a distribution

function of the random term from formula (1.1).

Page 19: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Furthermore for the concrete r -resample jx jxjxjx r

...,,, 21 and corresponding median smoothing (3.1) we are able to calculate a probability that calculated median smoothing

xm j will be greater than xm : if d is fixed then we have a case of

hypergeometrical distribution:

2/)1(

0

1,r

i irdk

id

rk

xmxmPdxq (3.4)

(we remind that r is an odd number).

Page 20: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

The conditional probability of interest (1.3) given fixed d is calculated by such way:

iRiR

Ridxqdxq

iR

dxmxmPdx

,1,~,Pr

. (3.5)

Finally we got the cover probability of interest:

2/1

0,Pr,~Pr

rk

ddxdxQxmxmPx . (3.6)

Page 21: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

4. Case StudyWe consider a numerical example that illustrates the median smoothing. Let regression function be a such:

21.01 xxxm , < x < . (4.1)

Of course one is unknown for us, but we wish to calculate a true cover probability for this regression if we use the above described median smoothing. The following assumptions are supposed: random term has the normal distribution with zero mean and variance 2 Var = 4, a number of observations (a size of the initial sample) n = 22, values of the independent variables are ix i – 4, i = 1, 2, … , 22.

Further we assume that the confidence probability = 0.8, a number of considered nearest neighbors k = 9, a resample size r = 5.

Page 22: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Gotten results are presented in Table 1 that contains the values of regression function m(x) and cover probabilities )(Pr x . The last have been calculated with respect to above described procedure. Table 1

The cover probabilities

x -3 -2 -1 0 1 2 3 m(x) 4.9 3.4 2.1 1 0.1 -0.6 -1.1

)(Pr x 0.932 0.925 0.921 0.920 0.922 0.928 0.934

x 4 5 6 7 8 10

m(x) -1.4 -1.5 -1.4 -1.1 -0.6 1 )(Pr x 0.939 0.941 0.939 0.934 0.928 0.920

The presented results show that the considered median smoothing approach is the conservative estimation method because one sets the cover probabilities too high. We would like to remark too that cover probability )(Pr x is not monotone and convex function of argument x.

Page 23: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Proofs of the LemmasProof of Lemma 1Let 1j if the random variable jZ belongs to the sample

rZZZ ...,,, 21 and 0j otherwise. Of course k ...,,, 21 are

dependent random variables because rk ...21 . We have: krP j /1 , krP j /10 , krPE jj /1 , krkrVar j //1 , 1/11,1 kkrrPE jiji

for ji . Furthermore

i

k

iiZS

0 .

Random variables i and iZ are independent therefore

k

iii

k

ii

k

iii k

rZEEZESE111 ,

Page 24: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

22222iiiiii k

rZEEZE ,

. 122

22222

kr

kr

kr

krZEZEZVar

ii

iiiiiiiii

(A.1)

Random variables iZ , jZ and ji for ji are independent too therefore

1

1

kkrrZEZEEZZE jijijijjii ,

11

1, 2

2

kk

rkrkr

kkrrZZCov jijijijjii . (A.2)

Formulas (A.1) and (A.2) give formula (2.10).

Page 25: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Proof of Lemma 2Let

i

k

iiZS

1 , j

k

jjZS

1 .

Then

.,,

,,),(

11

1 111

k

i ijjjii

k

ijjii

k

i

k

jjjiij

k

jji

k

ii

ZZCovZZCov

ZZCovZZCovSSCov

Page 26: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

For i j random variables jiji ZZ ,,, are independent, therefore

jjii ZZCov , = 0. Further

.

,22

2

2

iiiiii

iiiiiiiiiii

kr

krZEZEE

ZEZEZEZZCov

Therefore

k

iik

rSSCov1

22

, .

Page 27: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

REFERENCES[1] Andronov A., Afanasyeva H. Resampling-based

nonparametric statistical inferences about the distributions of order statistics. In: Transactions of XXIV International Seminar on Stability Problems for Stochastic Models, Transport and Telecommunication Institute, Riga, Latvia, 2004, pp. 300 – 307.

[2] DiCiccoT.J., Efron B. Bootstrap confidence intervals. Statistical Sciences, Vol.11, No.3, 1996, pp. 189 – 228.

[3] Hardle W., Muller M., Sperlich S., Werwatz A. Nonparametric and Semiparametric Models. Springer, Berlin, 2004.

[4] Wu C.F.J. Jackknife, Bootstrap and other resampling methods in regression analysis. The Annals of Statistics, Vol. 14, No. 3, 1986, pp. 1261 – 1295.

Page 28: ON NONPARAMETRIC INTERVAL ESTIMATION OF  A REGRESSION FUNCTION BASED ON THE RESAMPLING

Thank You for your attention