classical regression review

12
- 1 - Classical regression review Important equations Functional form of the regression Regression coefficients Standard error Coefficient of determination Variance of coefficients Variance of regression Variance of prediction ˆ ˆ ˆ , where orm oregenerally j j j j y y y x y f x 1 b XX Xy 2 2 ˆ ˆ / w here E E i i SS n k SS y y 1 1 2 2 ˆ ˆ var( ) , cov i i j ii ij b bb XX XX 1 2 ˆ ˆ var y x xXX x 2 2 1 / , where E T T i R SS SS SS y y 2 ˆ var var y y x x

Upload: varen

Post on 22-Feb-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Classical regression review. Important equations Functional form of the regression Regression coefficients Standard error Coefficient of determination Variance of coefficients Variance of regression Variance of prediction . Practice example. Example problem. % Data - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Classical regression review

- 1 -

Classical regression review

• Important equations– Functional form of the regression

– Regression coefficients

– Standard error

– Coefficient of determination

– Variance of coefficients

– Variance of regression

– Variance of prediction

ˆ ˆ ˆ, where or more generallyj j j jy y y x y f x

1 b XX X y

22ˆ ˆ/ whereE E i iSS n k SS y y

1 12 2ˆ ˆvar( ) , covi i jii ijb bb XX XX

12ˆ ˆvar y x x X X x

22 1 / , whereE T T iR SS SS SS y y

2ˆvar vary y x x

Page 2: Classical regression review

- 2 -

Practice example

• Example problem

be = [0.9871 0.4957]

se = 0.0627

R = 0.9162

sebe = [0.0454 0.0750]corr= -0.8257

Conf. interval = red linePred. interval = magenta line

% Datay=[0.95, 1.08, 1.28, 1.23, 1.42, 1.45]';x=[0 0.2 0.4 0.6 0.8 1.0]';

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6

0.8

1

1.2

1.4

1.6

Page 3: Classical regression review

- 3 -

Bayesian analysis of classical regression

• Remark– Classical regression is turned into the Bayesian: unknown coefficients b

are estimated conditional on the observed data set (x, y).– If non-informative prior for b, solution is the same as the classical one.

If there exist certain priors, however, there is no closed form solution.– Like we did before, we can practice Bayesian and validate results using

the classical solution, in case of non-informative prior.

• Statistical definition of the data– Assuming normal distribution of the data with the mean at regression

equation, the data distribution is expressed as

• Parameters to be estimated– Regression coefficients [1,2] ( something like m) and variance 2.

/22 2 22

1| , ~ , exp2

ny N X I y X y X

Page 4: Classical regression review

- 4 -

Joint posterior pdf of , 2

• Non-informative prior

• Likelihood to observe the data y

• Joint posterior pdf of [1,2], 2 (this is 3 parameters problem)

– Compare with posterior pdf of normal distribution parameters m,2

2 2,p

/22 2 22

1| , ~ , exp2

ny N X I y X y X

/2 12 22

1, | exp2

np y y X y X

/2 12 2

21, | exp

2n

p y y I y Im m m

Page 5: Classical regression review

- 5 -

Joint posterior pdf of , 2

• Analytical procedure– Factorization

– Marginal pdf of 2

– Conditional pdf of

– Posterior predictive distribution

• Sampling method based on factorization approach1. Draw random 2 from inverse- c2 distribution. 2. Draw random from conditional pdf |2.3. Draw predictive ỹ at a new point using the expression ỹ|y.

2 2 2, | | , |p y p y p y

2

2 2 2 22| ~ Inv- , or ~sp y n k s n k n k c c

2 1 ˆ ˆwhere s y X y Xn k

2 2ˆ| , ~ ,y N V

1 1ˆwhere ,X X X y V X X

2ˆ| ~ ,y y t X V y

2where V y V X XV X

Page 6: Classical regression review

- 6 -

Practice example

• Joint posterior pdf of , 2

– Data

– This is function of 3 parameters.In order to draw the shape of the pdf, let’s assume s = 0.06.Max location of be = [b1 b2] is near [1 0.5] which agrees with true values.

/2 12 22

1, | exp2

np y y X y X

y=[0.95, 1.08, 1.28, 1.23, 1.42, 1.45]';x=[0 0.2 0.4 0.6 0.8 1.0]';

where X=[ones(n,1) x];

0.5

1

1.5

0

0.5

10

0.2

0.4

0.6

0.8

1

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 7: Classical regression review

- 7 -

Practice example

• Sampling by MCMC– Using N=1e4, starting from b=[0;0] and s=1, as we iterate MCMC, we

get convergence of b and s. At the initial stage, however, samples should be discarded. This is called Burn-in.

– The max likelihood of b is found near [1;0.5], and of s near 0.06, which agree with the true values.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

100

200

300

400

500

600

700

800

900

1000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2-0.5

0

0.5

1

0 0.5 1 1.50

500

1000

1500

2000

2500

3000

3500

4000

4500

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 8: Classical regression review

- 8 -

Practice example

• Sampling by MCMC– Using N=1e4, MCMC is repeated ten times.

The variances of the results are favorably small, which shows that the distribution can be accepted as the solution.

be(1) be(2) sigma std(be1) std(be2) corranalytic 0.9871 0.4957 0.0627 0.0454 0.0750 -0.8257

mean_MCMC 0.9865 0.4967 0.0660 0.0515 0.0841 -0.8181std_MCMC 0.0060 0.0104 0.0022 0.0085 0.0114 0.0357MCMC 0.9845 0.4990 0.0643 0.0464 0.0763 -0.7994

0.9816 0.5064 0.0644 0.0461 0.0736 -0.77640.9799 0.5086 0.0681 0.0569 0.0928 -0.8519

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

100

200

300

400

500

600

700

800

900

1000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

* * * * * * * * * * * * * * *

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 9: Classical regression review

- 9 -

Practice example

• Sampling by MCMC– Different value of w for proposal pdf leads to convergence failure.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40

500

1000

1500

2000

2500

3000

3500

0.8 1 1.2 1.4 1.6 1.8 2-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

100

200

300

400

500

600

700

800

900

1000

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

Page 10: Classical regression review

- 10 -

Practice example

• Sampling by MCMC– Different starting point of b may be suggested to check convergence

and whether we get the same result.

0 0.05 0.1 0.15 0.2 0.25 0.3 0.350

100

200

300

400

500

600

700

800

900

1000

0.7 0.8 0.9 1 1.1 1.2 1.3 1.40

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.2

0.4

0.6

0.8

1

1.2

1.4

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000

0.5

1

1.5

2

2.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

200

400

600

800

1000

1200

1400

1600

1800

2000

0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Page 11: Classical regression review

- 11 -

Practice example

• Posterior analysis– Posterior distribution of regression: using samples of B1 & B2, samples

of ym are generated, where ym = B1+B2*x. • Blue curve is the mean of ym.• Red curves are the confidence bounds of ym. (2.5%, 97.5% of the samples.)

– Posterior predictive distribution: using samples of ym and S, samples of predicted y are generated, i.e., yp ~ N(ym,2).

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6

0.8

1

1.2

1.4

1.6

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0.6

0.8

1

1.2

1.4

1.6

Page 12: Classical regression review

- 12 -

Confidence vs prediction interval

• Classical regression– Confidence interval comes from variance of regression – Prediction interval comes from variance of prediction

• Bayesian approach of regression– Confidence interval comes from Posterior distribution of regression.– Predictive interval comes from Posterior predictive distribution.

• Bayesian approach of normal distribution– Confidence interval comes from t-distribution with n-1 dof where

mean y and variance s2/n.– Predictive interval comes from t-distribution with n-1 dof where

mean y and variance s2/n + s2.

12ˆ ˆvar y x x X X x 2ˆvar vary y x x