classical regression review
DESCRIPTION
Classical regression review. Important equations Functional form of the regression Regression coefficients Standard error Coefficient of determination Variance of coefficients Variance of regression Variance of prediction . Practice example. Example problem. % Data - PowerPoint PPT PresentationTRANSCRIPT
- 1 -
Classical regression review
• Important equations– Functional form of the regression
– Regression coefficients
– Standard error
– Coefficient of determination
– Variance of coefficients
– Variance of regression
– Variance of prediction
ˆ ˆ ˆ, where or more generallyj j j jy y y x y f x
1 b XX X y
22ˆ ˆ/ whereE E i iSS n k SS y y
1 12 2ˆ ˆvar( ) , covi i jii ijb bb XX XX
12ˆ ˆvar y x x X X x
22 1 / , whereE T T iR SS SS SS y y
2ˆvar vary y x x
- 2 -
Practice example
• Example problem
be = [0.9871 0.4957]
se = 0.0627
R = 0.9162
sebe = [0.0454 0.0750]corr= -0.8257
Conf. interval = red linePred. interval = magenta line
% Datay=[0.95, 1.08, 1.28, 1.23, 1.42, 1.45]';x=[0 0.2 0.4 0.6 0.8 1.0]';
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.6
0.8
1
1.2
1.4
1.6
- 3 -
Bayesian analysis of classical regression
• Remark– Classical regression is turned into the Bayesian: unknown coefficients b
are estimated conditional on the observed data set (x, y).– If non-informative prior for b, solution is the same as the classical one.
If there exist certain priors, however, there is no closed form solution.– Like we did before, we can practice Bayesian and validate results using
the classical solution, in case of non-informative prior.
• Statistical definition of the data– Assuming normal distribution of the data with the mean at regression
equation, the data distribution is expressed as
• Parameters to be estimated– Regression coefficients [1,2] ( something like m) and variance 2.
/22 2 22
1| , ~ , exp2
ny N X I y X y X
- 4 -
Joint posterior pdf of , 2
• Non-informative prior
• Likelihood to observe the data y
• Joint posterior pdf of [1,2], 2 (this is 3 parameters problem)
– Compare with posterior pdf of normal distribution parameters m,2
2 2,p
/22 2 22
1| , ~ , exp2
ny N X I y X y X
/2 12 22
1, | exp2
np y y X y X
/2 12 2
21, | exp
2n
p y y I y Im m m
- 5 -
Joint posterior pdf of , 2
• Analytical procedure– Factorization
– Marginal pdf of 2
– Conditional pdf of
– Posterior predictive distribution
• Sampling method based on factorization approach1. Draw random 2 from inverse- c2 distribution. 2. Draw random from conditional pdf |2.3. Draw predictive ỹ at a new point using the expression ỹ|y.
2 2 2, | | , |p y p y p y
2
2 2 2 22| ~ Inv- , or ~sp y n k s n k n k c c
2 1 ˆ ˆwhere s y X y Xn k
2 2ˆ| , ~ ,y N V
1 1ˆwhere ,X X X y V X X
2ˆ| ~ ,y y t X V y
2where V y V X XV X
- 6 -
Practice example
• Joint posterior pdf of , 2
– Data
– This is function of 3 parameters.In order to draw the shape of the pdf, let’s assume s = 0.06.Max location of be = [b1 b2] is near [1 0.5] which agrees with true values.
/2 12 22
1, | exp2
np y y X y X
y=[0.95, 1.08, 1.28, 1.23, 1.42, 1.45]';x=[0 0.2 0.4 0.6 0.8 1.0]';
where X=[ones(n,1) x];
0.5
1
1.5
0
0.5
10
0.2
0.4
0.6
0.8
1
0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
- 7 -
Practice example
• Sampling by MCMC– Using N=1e4, starting from b=[0;0] and s=1, as we iterate MCMC, we
get convergence of b and s. At the initial stage, however, samples should be discarded. This is called Burn-in.
– The max likelihood of b is found near [1;0.5], and of s near 0.06, which agree with the true values.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2-0.5
0
0.5
1
0 0.5 1 1.50
500
1000
1500
2000
2500
3000
3500
4000
4500
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
- 8 -
Practice example
• Sampling by MCMC– Using N=1e4, MCMC is repeated ten times.
The variances of the results are favorably small, which shows that the distribution can be accepted as the solution.
be(1) be(2) sigma std(be1) std(be2) corranalytic 0.9871 0.4957 0.0627 0.0454 0.0750 -0.8257
mean_MCMC 0.9865 0.4967 0.0660 0.0515 0.0841 -0.8181std_MCMC 0.0060 0.0104 0.0022 0.0085 0.0114 0.0357MCMC 0.9845 0.4990 0.0643 0.0464 0.0763 -0.7994
0.9816 0.5064 0.0644 0.0461 0.0736 -0.77640.9799 0.5086 0.0681 0.0569 0.0928 -0.8519
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
100
200
300
400
500
600
700
800
900
1000
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
* * * * * * * * * * * * * * *
0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
- 9 -
Practice example
• Sampling by MCMC– Different value of w for proposal pdf leads to convergence failure.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40
500
1000
1500
2000
2500
3000
3500
0.8 1 1.2 1.4 1.6 1.8 2-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
100
200
300
400
500
600
700
800
900
1000
0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
- 10 -
Practice example
• Sampling by MCMC– Different starting point of b may be suggested to check convergence
and whether we get the same result.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.350
100
200
300
400
500
600
700
800
900
1000
0.7 0.8 0.9 1 1.1 1.2 1.3 1.40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.2
0.4
0.6
0.8
1
1.2
1.4
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100000
0.5
1
1.5
2
2.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
200
400
600
800
1000
1200
1400
1600
1800
2000
0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.30.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
- 11 -
Practice example
• Posterior analysis– Posterior distribution of regression: using samples of B1 & B2, samples
of ym are generated, where ym = B1+B2*x. • Blue curve is the mean of ym.• Red curves are the confidence bounds of ym. (2.5%, 97.5% of the samples.)
– Posterior predictive distribution: using samples of ym and S, samples of predicted y are generated, i.e., yp ~ N(ym,2).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.6
0.8
1
1.2
1.4
1.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.6
0.8
1
1.2
1.4
1.6
- 12 -
Confidence vs prediction interval
• Classical regression– Confidence interval comes from variance of regression – Prediction interval comes from variance of prediction
• Bayesian approach of regression– Confidence interval comes from Posterior distribution of regression.– Predictive interval comes from Posterior predictive distribution.
• Bayesian approach of normal distribution– Confidence interval comes from t-distribution with n-1 dof where
mean y and variance s2/n.– Predictive interval comes from t-distribution with n-1 dof where
mean y and variance s2/n + s2.
12ˆ ˆvar y x x X X x 2ˆvar vary y x x