stat 31, section 1, last time 2-way tables –testing for independence –chi-square distance...
TRANSCRIPT
![Page 1: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/1.jpg)
Stat 31, Section 1, Last Time• 2-way tables
– Testing for Independence– Chi-Square distance between data & model– Chi-Square Distribution– Gives P-values (CHIDIST)
• Simpson’s Paradox:– Lurking variables can reverse comparisons
• Recall Linear Regression– Fit a line to a scatterplot
![Page 2: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/2.jpg)
Recall Linear Regression
Idea:
Fit a line to data in a scatterplot
Recall Class Example 14https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg14.xls
• To learn about “basic structure”
• To “model data”
• To provide “prediction of new values”
![Page 3: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/3.jpg)
Inference for Regression
Goal: develop
• Hypothesis Tests and Confidence Int’s
• For slope & intercept parameters, a & b
• Also study prediction
![Page 4: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/4.jpg)
Inference for Regression
Idea: do statistical inference on:
– Slope a
– Intercept b
Model:
Assume: are random, independent
and
iii ebaXY
ie
eN ,0
![Page 5: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/5.jpg)
Inference for Regression
Viewpoint: Data generated as:
y = ax + b
Yi chosen from
Xi
Note: a and b are “parameters”
![Page 6: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/6.jpg)
Inference for Regression
Parameters and determine the
underlying model (distribution)
Estimate with the Least Squares Estimates:
and
(Using SLOPE and INTERCEPT in Excel,
based on data)
a b
a b
![Page 7: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/7.jpg)
Inference for Regression
Distributions of and ?
Under the above assumptions, the sampling
distributions are:
• Centerpoints are right (unbiased)
• Spreads are more complicated
a b
aaNa ,~ˆ
bbNb ,~ˆ
![Page 8: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/8.jpg)
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accurate est. of slope
• Small for x’s more spread out– Data more spread More accurate
• Small for more data– More data More accuracy
a
n
ii
ea
xxaSD
1
2ˆ
e
![Page 9: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/9.jpg)
Inference for RegressionFormula for SD of :
• Big (small) for big (small, resp.)– Accurate data Accur’te est. of intercept
• Smaller for – Centered data More accurate intercept
• Smaller for more data– More data More accuracy
b
n
ii
eb
xx
xn
bSD
1
2
21ˆ
e
0x
![Page 10: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/10.jpg)
Inference for RegressionOne more detail:
Need to estimate using data
For this use:
• Similar to earlier sd estimate,
• Except variation is about fit line
• is similar to from before
e
2
ˆˆ1
2
n
bxays
n
iii
e
s
2n 1n
![Page 11: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/11.jpg)
Inference for Regression
Now for Probability Distributions,
Since are estimating by
Use TDIST and TINV
With degrees of freedom =
e es
2n
![Page 12: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/12.jpg)
Inference for RegressionConvenient Packaged Analysis in Excel:
Tools Data Analysis Regression
Illustrate application using:
Class Example 27,
Old Text Problem 8.6 (now 10.12)
![Page 13: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/13.jpg)
Inference for RegressionClass Example 27,
Old Text Problem 8.6 (now 10.12)Utility companies estimate energy used by
their customers. Natural gas consumption depends on temperature. A study recorded average daily gas consumption y (100s of cubic feet) for each month. The explanatory variable x is the average number of heating degree days for that month. Data for October through June are:
![Page 14: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/14.jpg)
Inference for RegressionData for October through June are:
Month X = Deg. Days Y = Gas Cons’n
Oct 15.6 5.2
Nov 26.8 6.1
Dec 37.8 8.7
Jan 36.4 8.5
Feb 35.5 8.8
Mar 18.6 4.9
Apr 15.3 4.5
May 7.9 2.5
Jun 0 1.1
![Page 15: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/15.jpg)
Inference for RegressionClass Example 27,
Old Text Problem 8.6 (now 10.12)
Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg27.xls
Good News:
Lots of things done automatically
Bad News:
Different language,
so need careful interpretation
![Page 16: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/16.jpg)
Inference for RegressionExcel Glossary:
Excel Stat 31
R2 r2 = Prop’n of Sum of Squares
Explained by Line
intercept Intercept b
X Variable Slope a
Coefficient Estimates & .a b
![Page 17: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/17.jpg)
Inference for RegressionExcel Glossary:
Excel Stat 31
Standard Errors
Estimates of & .
(recall from Sampling Dist’ns)
T – Stat. (Est. – mean) / SE, i.e. put
on scale of T – distribution
P-value For 2-sided test of:
a b
0:.0:0
b
aHvs
b
aH A
![Page 18: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/18.jpg)
Inference for RegressionExcel Glossary:
Excel Stat 31
Lower 95%
Upper 95%
Ends of 95% Confidence
Interval for a and b
(since chose 0.95 for Confidence level)
Predicted . Points on line at ,
i.e. .iXiY
baX i
![Page 19: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/19.jpg)
Inference for RegressionExcel Glossary:
Excel Stat 31
Residual for .
Recall: gave useful information about quality of fit
Standard Residuals:
on standardized scale
e
ii bXaY
ˆˆ
iX bXaY iiˆˆ
![Page 20: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/20.jpg)
Inference for RegressionSome useful variations:
Class Example 28,
Old Text Problems 10.8 - 10.10
(now 10.13 – 10.15)
Excel Analysis:https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
![Page 21: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/21.jpg)
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are:
![Page 22: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/22.jpg)
Inference for RegressionClass Example 28,
(now 10.13 – 10.15)
Old 10.8:
The data are:
Year Lean
75 642
76 644
77 656
78 667
79 673
80 688
81 696
82 698
83 713
84 717
85 725
86 742
87 757
![Page 23: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/23.jpg)
Inference for RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.8:
(a) Plot the data, does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
![Page 24: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/24.jpg)
Inference for Regression
HW:
10.3 b,c
10.5
![Page 25: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/25.jpg)
And Now for Something Completely Different
Etymology of:
“And now for something completely
different”
Anybody heard of this before?
![Page 26: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/26.jpg)
And Now for Something Completely Different
What is “etymology”?
Google responses to:
define: etymology• The history of words; the study of the history
of words.csmp.ucop.edu/crlp/resources/glossary.html
• The history of a word shown by tracing its development from another language.www.animalinfo.org/glosse.htm
![Page 27: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/27.jpg)
And Now for Something Completely Different
What is “etymology”?
• Etymology is derived from the Greek word e/)tymon(etymon) meaning "a sense" and logo/j(logos) meaning "word." Etymology is the study of the original meaning and development of a word tracing its meaning back as far as possible.www.two-age.org/glossary.htm
![Page 28: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/28.jpg)
And Now for Something Completely Different
Google response to: define: and now for something
completely differentAnd Now For Something Completely Different is a
film spinoff from the television comedy series Monty Python's Flying Circus. The title originated as a catchphrase in the TV show. Many Python fans feel that it excellently describes the nonsensical, non sequitur feel of the program. en.wikipedia.org/wiki/And_Now_For_Something_Completely_Different
![Page 29: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/29.jpg)
And Now for Something Completely Different
Google Search for:
“And now for something completely different”
Gives more than 100 results….
A perhaps interesting one:
http://www.mwscomp.com/mpfc/mpfc.html
![Page 30: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/30.jpg)
And Now for Something Completely Different
Google Search for:
“Stat 31 and now for something completely different”
Gives:
[PPT] Slide 1File Format: Microsoft Powerpoint 97 - View as HTML... But what is missing? And now for something completely different… Review Ideas on State Lotteries,. from our study of Expected Value ...https://www.unc.edu/~marron/ UNCstat31-2005/Stat31-05-03-31.ppt - Similar pages
![Page 31: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/31.jpg)
Prediction in Regression
Idea: Given data
Can find the Least Squares Fit Line, and do
inference for the parameters.
Given a new X value, say , what will the
new Y value be?
nn YXYX ,,,, 11
0X
![Page 32: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/32.jpg)
Prediction in Regression
Dealing with variation in prediction:
Under the model:
A sensible guess about ,
based on the given ,
is:
(point on the fit line above )
iii ebaXY
0Y
iY ebXaY ˆˆˆˆ 00
0X
0X
![Page 33: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/33.jpg)
Prediction in Regression
What about variation about this guess?
Natural Approach: present an interval
(as done with Confidence Intervals)
Careful: Two Notions of this:
1. Confidence Interval for mean of
2. Prediction Interval for value of
0Y
0Y
![Page 34: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/34.jpg)
Prediction in Regression
1. Confidence Interval for mean of :
Use:
where:
and where
0Y
YSEtY ˆ*ˆ
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
)2,95.01(* nTINVt
![Page 35: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/35.jpg)
Prediction in Regression
Interpretation of:
• Smaller for closer to
• But never 0
• Smaller for more spread out
• Larger for larger
0x x
n
ii xx
xxn
sSEY
1
2
20
ˆ
1
six
![Page 36: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/36.jpg)
Prediction in Regression
2. Prediction Interval for value of
Use:
where:
And again
0Y
YSEtY ˆ*
0
n
ii xx
xxn
sSEY
1
2
20
ˆ
11
)2,95.01(* nTINVt
![Page 37: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/37.jpg)
Prediction in Regression
Interpretation of:
• Similar remarks to above …
• Additional “1 + ” accounts for added
variation in compared to
n
ii
Y
xx
xxn
sSE
1
2
20
ˆ
11
Y0Y
![Page 38: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/38.jpg)
Prediction in RegressionRevisit Class Example 28,
(now 10.13 – 10.15) Old 10.8:
Engineers made measurements of the Leaning Tower of Pisa over the years 1975 – 1987. “Lean” is the difference between a points position if the tower were straight, and its actual position, in tenths of a meter, in excess of 2.9 meters. The data are listed above…
![Page 39: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/39.jpg)
Prediction in RegressionClass Example 28, (now 10.13 – 10.15)
Old 10.9:
(a) Plot the data, Does the trend in lean over time appear to be linear?
(b) What is the equation of the least squares fit line?
(c) Give a 95% confidence interval for the average rate of change of the lean.
https://www.unc.edu/~marron/UNCstat31-2005/Stat31Eg28.xls
![Page 40: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/40.jpg)
Prediction in RegressionHW:
10.20 and add part:
(f) Calculate a 95% Confidence Interval for
the mean oxygen uptake of individuals
having heart rate 96, and heart rate
115.
![Page 41: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/41.jpg)
![Page 42: Stat 31, Section 1, Last Time 2-way tables –Testing for Independence –Chi-Square distance between data & model –Chi-Square Distribution –Gives P-values](https://reader035.vdocuments.mx/reader035/viewer/2022062802/56649e915503460f94b95b21/html5/thumbnails/42.jpg)
Additional Issues in RegressionRobustness
Outliers via Java Applet
HW on outliers