11 regression

Upload: duckie

Post on 04-Mar-2016

213 views

Category:

Documents


0 download

DESCRIPTION

Notes on Regression

TRANSCRIPT

  • 171

    Stat250GundersonLectureNotes11:RegressionAnalysis

    Theinvalidassumptionthatcorrelationimpliescauseisprobablyamongthetwoorthreemostseriousandcommonerrorsofhumanreasoning.

    StephenJayGould,TheMismeasureofMan Describingandassessingthesignificanceofrelationshipsbetweenvariablesisveryimportantinresearch.Wewillfirstlearnhowtodothisinthecasewhenthetwovariablesarequantitative.Quantitativevariableshavenumericalvaluesthatcanbeorderedaccordingtothosevalues.MainideaWewishtostudytherelationshipbetweentwoquantitativevariables.Generallyonevariableisthe_____RESPONSE______variable,denotedbyy.Thisvariablemeasurestheoutcomeofthestudy

    andisalsocalledthe_____DEPENDENT_____variable.Theothervariableisthe_____EXPLANATORY______variable,denotedbyx.Itisthevariablethatisthoughttoexplainthechangesweseeintheresponsevariable.Theexplanatoryvariableisalsocalledthe__INDEPENDENT___variable. The first step inexamining the relationship is touseagraph a scatterplot todisplay therelationship.Wewill lookforanoverallpatternandsee ifthereareanydeparturesfromthisoverallpattern.Ifalinearrelationshipappearstobereasonablefromthescatterplot,wewilltakethenextstepoffindingamodel(anequationofaline)tosummarizetherelationship.Theresultingequationmaybeusedforpredictingtheresponseforvariousvaluesoftheexplanatoryvariable.Ifcertainassumptions hold,we can assess the significance of the linear relationship andmake someconfidenceintervalsforourestimationsandpredictions. Let'sbeginwithanexamplethatwewillcarrythroughoutourdiscussions.

  • 172

    GraphingtheRelationship:RestaurantBillvsTipHowwelldoesthesizeofarestaurantbillpredictthetiptheserverreceives?Belowarethebillsandtipsfromsixdifferentrestaurantvisitsindollars.

    Bill 41 98 25 85 50 73Tip 8 17 4 12 5 14

    Response(dependent)variabley=TipAmount.Explanatory(independent)variablex=AmountoftheBill.Step1:Examinethedatagraphicallywithascatterplot.Addthepointstothescatterplotbelow:

    Interpretthescatterplotintermsof...

    overallform(istheaveragepatternlooklikeastraightlineorisitcurved?) directionofassociation(positiveornegative) strengthofassociation(howmuchdothepointsvaryaroundtheaveragepattern?) anydeviationsfromtheoverallform?

  • 173

    DescribingaLinearRelationshipwithaRegressionLineRegression analysis is the area of statistics used to examine the relationship between aquantitative response variable andoneormoreexplanatory variables.A keyelement is theestimationofanequationthatdescribeshow,onaverage,theresponsevariable isrelatedtotheexplanatoryvariables.Aregressionequationcanalsobeusedtomakepredictions.Thesimplestkindofrelationshipbetweentwovariablesisastraightline,theanalysisinthiscaseiscalledlinearregression.RegressionLineforBillvs.TipRemembertheequationofaline?y=mx+bInstatisticswedenotetheregressionlineforasampleas:where:y yhat=thepredictedyorestimatedyvalue

    0b yintercept=estimatedywhenx=0(notalwaysmeaningful)1b slope=howmuchofanincrease/decreaseweexpecttoseeinywhenxincreasesby1unit.

    Goal:Tofindalinethatisclosetothedatapointsfindthebestfittingline.How?Whatdowemeanbybest?Onemeasureofhowgoodalinefitsistolookattheobservederrorsinprediction.

    Observederrors=_____ yy _________arecalled____ residuals__________Sowewant tochoose the line forwhich thesumofsquaresof theobservederrors (the sumof squaredresiduals)istheleast.Thelinethatdoesthisiscalled:______ LeastSquaresRegressionLine _____

    Apossibleline

    Observederrorifweusedthislinetopredict=yyhat

  • 174

    Theequationsfortheestimatedslopeandinterceptaregivenby:

    x

    y

    XX

    XY

    ss

    rSS

    xxyxx

    xxyyxx

    b

    221

    xbyb 10

    Theleastsquaresregressionline(estimatedregressionfunction)is: xbbxy y 10)( Moreonthisdistinctionlaterwhentalkaboutpredictionintervalsvs.CIsforamean.

    To find thisestimated regression line forourexamdatabyhand, it iseasier ifwe setup acalculationtable.Byfillinginthistableandcomputingthecolumntotals,wewillhaveallofthemainsummariesneededtoperformacomplete linearregressionanalysis. Notethatherewehaven=6observations.Thefirstfiverowshavebeencompletedforyou.Ingeneral,useRoracalculatortohelpwiththegraphingandnumericalcomputations!x=bill y=tip xx 2xx yxx yy 2yy 41 8 4162=21 (21)2=441 (21)(8)=168 810=2 (2)2=498 17 9862=36 (36)2=1296 (36)(17)=612 1710=7 (7)2=4925 4 2562=37 (37)2=1369 (37)(4)=148 410=6 (6)2=3685 12 8562=23 (23)2=529 (23)(12)=276 1210=2 (2)2=450 5 5062=12 (12)2=144 (12)(5)=60 510=5 (5)2=2573 14 7362=11 (11)2=121 (11)(14)=154 1410=4 (4)2=16372 60 0 3900 666 0 134

    x 3726

    606

    SlopeEstimate: 1 2666 0.17077

    3900x x y

    bx x

    yinterceptEstimate: 0 1 10 (0.17077)(62) 10 10.5877 0.5877b y b x EstimatedRegressionLine: 0 1 0.5877 0.17077( )y b b x x

    Predictasingleyatgivenx

    Estimatetheaverageyforallx

  • 175

    Predictthetipforadinnerguestwhohada$50bill. 0 1 0.5877 0.17077(50) 7.95y b b x

    Note:The5thdinnerguestinsamplehadabillof$50andtheobservedtipwas$5.Findtheresidualforthe5thobservation.Notationforaresidual 555 yye 57.95=2.95TheresidualsYou found the residual for one observation. You could compute the residual for eachobservation.Thefollowingtableshowseachresidual. x=bill y=tip

    predictedvalues 0.5877 0.17077

    residualsyye

    Squaredresiduals 22 )( yye

    41 8 6.41 1.59 2.5298 17 16.15 0.85 0.7225 4 3.68 0.32 0.1085 12 13.93 1.93 3.7350 5 7.95 2.95 8.7073 14 11.88 2.12 4.49 0 ~20.27

    SSE=sumofsquarederrors(orresiduals)20.27

  • 176

    MeasuringStrengthandDirectionofaLinearRelationshipwithCorrelation

    Thecorrelationcoefficientrisameasureofstrengthofthelinearrelationshipbetweenyandx.

    PropertiesabouttheCorrelationCoefficientr 1. r rangesfrom...1to+1(anditisunitless)2. Signof r indicates...directionoftheassociation3. Magnitudeof r indicates...strength

    (r=0.8andr=+0.8indicateequallystronglinearassociations)

    Astrongrisdisciplinespecific r=0.8mightbeanimportant(orstrong)correlationinengineering r=0.6mightbeastrongcorrelationinpsychologyormedicalresearch

    4. r ONLYmeasuresthestrengthoftheLINEARrelationship. Somepictures: Theformulaforthecorrelation:(butwewillgetitfromcomputeroutputorfromr2)TipsExample:wewillsoonseethatr=____0.9213______Interpretation: Afairlystrongpositivelinearassociationbetweenamountofthebillandtheamountoftip.

    y y y

    x x x

    r=+0.7 r=0.4 r0

  • 177

    Thesquareofthecorrelation 2r

    The squared correlation coefficient 2r always has a value between __0 and 1___ and issometimespresentedasapercent.Itcanbeshownthatthesquareofthecorrelationisrelatedtothesumsofsquaresthatariseinregression.

    Theresponses(theamountoftip)indatasetarenotallthesametheydovary.Wewouldmeasurethetotalvariationintheseresponsesas 2SSTO yy (lastcolumntotalincalculationtablesaidwewoulduselater).

    Partofthereasonwhytheamountoftipvariesisbecausethereisalinearrelationshipbetweenamountoftipandamountofbill,andthestudyincludeddifferentamountsofbill.

    Whenwefoundtheleastsquaresregressionline,therewasstillsomesmallvariationremainingoftheresponsesfromtheline.ThisamountofvariationthatisnotaccountedforbythelinearrelationshipiscalledtheSSE.Theamountofvariationthat isaccountedforbythe linearrelationship iscalledthesumofsquaresduetothemodel(orregression),denotedbySSM(orsometimesasSSR).Sowehave: SSTO=______SSM+SSE________Itcanbeshownthat 2r =

    SSTOSSM

    SSTOSSESSTO

    = theproportionoftotalvariabilityintheresponsesthatcanbeexplainedbythelinearrelationshipwiththeexplanatoryvariable x .

    Note: Thevalueof 2r andthesesumsofsquaresaresummarized inanANOVAtablethat isstandardoutputfromcomputerpackageswhendoingregression.

    Totalvariationintheys

    Variationnotaccountedfor

  • 178

    MeasuringStrengthandDirectionforExam2vsFinalFromourfirstcalculationtablewehave:SSTO=___134_____________Fromourresidualcalculationtablewehave:SSE=___20.27_______________Sothesquaredcorrelationcoefficientforourexamscoresregressionis:

    SSTOSSESSTOr 2 =

    134 20.27 113.73 0.84873134 134

    Interpretation:

    Weaccountedfor~84.9%ofthevariationin__AmountofTipsreceived_

    bythelinearregressiononAmountoftheBill.

    Thecorrelationcoefficientisr=2 0.84873 0.9213r

    Afewmoregeneralnotes:

    Nonlinearrelationships DetectingOutliersandtheirinfluenceonregressionresults. DangersofExtrapolation(predictingoutsidetherangeofyourdata) Dangersofcombininggroupsinappropriately(SimpsonsParadox) Correlationdoesnotprovecausation

  • 179

    RRegressionAnalysisforBillvsTipsLetslookattheRoutputforourBillandTipdata.Wewillseethatmuchofthecomputationsaredoneforus.

    Call: lm(formula = Tip ~ Bill, data = Tips) Residuals: 1 2 3 4 5 6 1.5862 0.8523 0.3185 -1.9277 -2.9508 2.1215 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58769 2.41633 -0.243 0.81980 Bill 0.17077 0.03604 4.738 0.00905 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 2.251 on 4 degrees of freedom Multiple R-squared: 0.8487, Adjusted R-squared: 0.8109 F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052

    Correlation Matrix

    Bill Tip Bill 1.0000000 0.9212755 Tip 0.9212755 1.0000000

    ANOVA Table

    Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

  • 180

    InferenceinLinearRegressionAnalysis Thematerialcoveredsofarfocusesonusingthedataforasampletographanddescribetherelationship.Theslopeandinterceptvalueswehavecomputedarestatistics,theyareestimatesoftheunderlyingtruerelationshipforthelargerpopulation.Nextweturntomaking inferencesabouttherelationshipforthe largerpopulation. Here isanice summary to help us distinguish between the regression line for the sample and theregressionlineforthepopulation.

    RegressionLinefortheSample

    RegressionLineforthePopulation

    Allimages

    Aside:E(Y)=Y(x)=meanresponseatagivenx;sometimescalledtheregressionfunction.Itcantakeonmanyforms,wewillconsiderthesimplelinearregressionfunction:0+1x

  • 181

    Todoformalinference,wethinkofourb0andb1asestimatesoftheunknownparameters0and1. Below we have the somewhat statistical way of expressing the underlying model thatproducesourdata:LinearModel:theresponsey=[0+1(x)]+ =[Populationrelationship]+RandomnessThisstatisticalmodelforsimplelinearregressionassumesthatforeachvalueofxtheobservedvaluesoftheresponse(thepopulationofyvalues)isnormallydistributed,varyingaroundsometruemean(thatmaydependonx ina linearway)andastandarddeviationthatdoesnotdependonx.ThistruemeanissometimesexpressedasE(Y)=0+1(x).Andthecomponentsandassumptionsregardingthisstatisticalmodelareshowvisuallybelow.

    Therepresentsthetrueerrorterm.Thesewouldbethedeviationsofaparticularvalueoftheresponseyfromthetrueregressionline.Asthesearethedeviationsfromthemean,thentheseerrortermsshouldhaveanormaldistributionwithmean0andconstantstandarddeviation.

    Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtochecktheconditionsaboutthetrueerrors.

    Trueregressionline

    x

  • 182

    Sowhathavewedone,andwherearewegoing?1. Estimatetheregressionlinebasedonsomedata.DONE!2. Measurethestrengthofthelinearrelationshipwiththecorrelation. DONE!3. Usetheestimatedequationforpredictions. DONE!4. Assessifthelinearrelationshipisstatisticallysignificant.5. Provideintervalestimates(confidenceintervals)forourpredictions.6. Understandandchecktheassumptionsofourmodel.Wehavealreadydiscussedthedescriptivegoalsof1,2,and3.Fortheinferentialgoalsof4and5,wewillneedanestimateoftheunknownstandarddeviationinregression

    EstimatingtheStandardDeviationforRegressionThestandarddeviationforregressioncanbethoughtofasmeasuringtheaveragesizeoftheresiduals.Arelativelysmallstandarddeviationfromtheregressionlineindicatesthatindividualdatapointsgenerallyfallclosetotheline,sopredictionsbasedonthelinewillbeclosetotheactualvalues.

    It seems reasonable thatour estimateof this average sizeof the residualsbebasedon theresidualsusingthesumofsquaredresidualsanddividingbyappropriatedegreesoffreedom.Ourestimateofisgivenby:s= MSE

    nSSE

    n 22

    residuals squared of sum

    where 22 yyeSSE i

    Note: Whyn2? Inestimatingthemeanresponsewehadtoestimate2quantities,theyinterceptandtheslope;sowelose2df.EstimatingtheStandardDeviation:BillvsTipBelowaretheportionsoftheRregressionoutputthatwecouldusetoobtaintheestimateofforourregressionanalysis.

    FromSummary:

    Residual standard error: 2.251 on 4 degrees of freedom Multiple R-squared: 0.8487, Adjusted R-squared: 0.8109 F-statistic: 22.45 on 1 and 4 DF, p-value: 0.009052

    OrfromANOVA:Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067

  • 183

    SignificantLinearRelationship?

    Considerthefollowinghypotheses: 0: 10 H versus 0: 1 aH Whathappensifthenullhypothesisistrue?If1=0thenE(Y)=0=>aconstantnomatterwhatthevalueofxis.i.e.knowingxdoesnothelptopredicttheresponse.Sothesehypothesesaretestingifthereisasignificantnonzerolinearrelationshipbetweenyandx. Thereareanumberofwaystotestthishypothesis.Onewayisthroughatteststatistic(thinkaboutwhyitisatandnotaztest).Thegeneralformforatteststatisticis:

    statistic sample theoferror standard valuenull - statistic samplet

    Wehaveoursampleestimatefor 1 ,itis 1b .Andwehavethenullvalueof0.Soweneedthestandarderrorfor 1b .Wecouldderiveit,usingtheideaofsamplingdistributions(thinkaboutthepopulationofallpossible 1b valuesifweweretorepeatthisprocedureoverandovermanytimes).Hereistheresult:ttestforthepopulationslope 1 Totest 0: 10 H wewoulduse )(s.e.

    0

    1

    1b

    bt where 21 )( xx

    sbSE andthedegreesoffreedomforthetdistributionaren2.This tstatistic couldbemodified to testa varietyofhypothesesabout thepopulation slope(differentnullvaluesandvariousdirectionsofextreme).

    TryIt!SignificantRelationshipbetweenBillandTip?Isthereasignificant(nonzero)linearrelationshipbetweenthetotalcostofarestaurantbillandthetipthatisleft?(isthebillausefullinearpredictorforthetip?)Thatis,test 0: 10 H versus 0: 1 aH usinga5%levelofsignificance.1. 1 2

    2.251( ) 0.0363900

    sSE bx x

    2. 11

    0 0.17077 0 4.74s.e.( ) 0.0326bt

    b

    3.Usingthettablewithdf=62=4,wehavepvalue

  • 184

    Thinkaboutit:Basedontheresultsofthepreviousttestconductedatthe5%significancelevel,doyouthinka95%confidenceintervalforthetrueslope 1 wouldcontainthevalueof0?

    ConfidenceIntervalforthepopulationslope 1 11 * bSEtb wheredf=n2forthe *t valueComputetheintervalandcheckyouranswer.Couldyouinterpretthe95%confidencelevelhere?0.17077(2.78)(0.036)0.170770.10008(0.07069,0.27085)(t*=2.78fromdf=4and95%confidence)Ifthisexperimentwererepeatedmanytimes,wedexpect95%oftheresultingconfidenceintervalstocontainthepopulationslope1.InferenceaboutthePopulationSlopeusingRBelowaretheportionsoftheRregressionoutputthatwecouldusetoperformthettestandobtaintheconfidenceintervalforthepopulationslope 1 . Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58769 2.41633 -0.243 0.81980 Bill 0.17077 0.03604 4.738 0.00905 **

    Note:Thereisathirdwaytotest 0: 10 H versus 0: 1 aH .ItinvolvesanotherFtestfromanANOVAforregression.Response: Tip Df Sum Sq Mean Sq F value Pr(>F) Bill 1 113.732 113.732 22.446 0.009052 ** Residuals 4 20.268 5.067 *ThettestismoreflexiblethantheFtest;Fonlytwosidedwithnull=0

  • 185

    PredictingforIndividualsversusEstimatingtheMeanConsidertherelationshipbetweenthebillandtipLeastsquaresregressionline(orestimatedregressionfunction):

    y 0.5877+0.17077(x) also YE =0.5877+0.17077(x)

    Wealsohave: s 2.251HowwouldyoupredictthetipforBarbwhohada$50restaurantbill?y 0.5877+0.17077(50)=$7.95

    Howwouldyouestimatethemeantipforallcustomerswhohada$50restaurantbill? YE =0.5877+0.17077(50)=$7.95

    Soourestimateforpredictingafutureobservationandforestimatingthemeanresponsearefoundusingthesameleastsquaresregressionequation.Whatabouttheirstandarderrors?(Wewouldneedthestandarderrorstobeabletoproduceanintervalestimate.) Idea:Considerapopulationofindividualsandapopulationofmeans:Whatisthestandarddeviationforapopulationofindividuals?

    Whatisthestandarddeviationforapopulationofmeans?nWhichstandarddeviationislarger?Soapredictionintervalforanindividualresponsewillbe

    (widerornarrower) thanaconfidenceintervalforameanresponse.

    Populationofindividuals

    n

    Populationofmeans

  • 186

    Herearethe(somewhatmessy)formulas:

    TryIt!BillvsTipConstructa95%confidenceintervalforthemeantipgivenforallcustomerswhohada$50bill(x).Recall:n=6,x 62, 2 XXSxx 3900, y 0.5877+0.17077(x),ands=2.251.y 0.5877+0.17077(50)=$7.95 t*=2.78(withdf=4)

    2 2

    21 ( ) 1 (50 62)s.e.(fit) 2.251 1.0157

    6 3900i

    x xsn x x

    * s.e.(fit) 7.95 (2.78)(1.0157) 7.95 2.83y t =>($5.12,$10.78)

    Constructa95%predictionintervalforthetipfromanindividualcustomerwhohada$50bill(x).

    2 22 2s.e.(pred) s.e.(fit) (2.251) 1.0157 2.47s * s.e.(pred) 7.95 2.78(2.47)

    7.95 6.87y t =>($1.08,$14.82)Itiswider!Showpredictionintervalandconfidenceintervalbandsonthescatterplot

  • 187

    CheckingAssumptionsinRegressionLetsrecallthestatisticalwayofexpressingtheunderlyingmodelthatproducesourdata:LinearModel:theresponsey=[0+1(x)]+ =[Populationrelationship]+Randomness

    wherethes,thetrueerrortermsshouldbenormallydistributedwithmean0andconstantstandarddeviation,andthisrandomnessisindependentfromonecasetoanother.

    Thustherearefouressentialtechnicalassumptionsrequiredforinferenceinlinearregression:(1)Relationshipisinfactlinear.(2)TRUEErrorsshouldbenormallydistributed.(3)TRUEErrorsshouldhaveconstantvariance.(4)TRUEErrorsshouldnotdisplayobviouspatterns.

    Now,wecannotobservethese s.Howeverwewillbeabletousetheestimated(observable)errors,namelytheresiduals,tocomeupwithanestimateofthestandarddeviationandtochecktheconditionsaboutthetrueerrors.Sohowcanwechecktheseassumptionswithourdataandestimatedmodel?(1) Relationshipisinfactlinear.examinethescatterplotofyversusx(2) TRUEErrorsshouldbenormallydistributed.Histogramorqqplotofresiduals

    (3) TRUEErrorsshouldhaveconstantvariance. Ifwesee(4)TRUEErrorsshouldnotdisplayobviouspatterns.

    Now,ifwesaw

    ResidualvsFittedPlot y :ifrandomscatterwithnopattern

    inhorizontalband=>ok

    ResidualvsFittedPlot:showsevidencethattrueerrorsdonothaveconstantvariance

    ResidualvsFittedPlot:showsevidencethattheunderlyingrelationshipmaynotbelinear

    (maybequadratic)

  • 188

    Let'sturntoonelastfullregressionproblemthatincludescheckingassumptions.RelationshipbetweenheightandfootlengthforCollegeMenThe heights (in inches) and foot lengths (incentimeters) of 32 college men were used todevelop amodel for the relationship betweenheight and foot length. The scatterplot andRregressionoutputareprovided.

    mean sd n foot 27.78125 1.549701 32 height 71.68750 3.057909 32

    Call: lm(formula = foot ~ height, data = heightfoot) Residuals: Min 1Q Median 3Q Max -1.74925 -0.81825 0.07875 0.58075 2.25075 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.25313 4.33232 0.058 0.954 height 0.38400 0.06038 6.360 5.12e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.028 on 30 degrees of freedom Multiple R-squared: 0.5741, Adjusted R-squared: 0.5599 F-statistic: 40.45 on 1 and 30 DF, p-value: 5.124e-07 Correlation Matrix foot height foot 1.0000000 0.7577219 height 0.7577219 1.0000000 Analysis of Variance Table Response: foot Df Sum Sq Mean Sq F value Pr(>F) height 1 42.744 42.744 40.446 5.124e-07 *** Residuals 30 31.705 1.057 Alsonotethat:SXX= 2xx =289.87

  • 189

    a. Howmuchwould you expect foot length to increase foreach1inch increase inheight?Includetheunits.

    Thisisaskingabouttheslope:0.384centimeters.

    b. Whatisthecorrelationbetweenheightandfootlength?

    r=0.7577(wouldyoubeabletointerpretthevalueofr2?

    c. Givetheequationoftheleastsquaresregressionlineforpredictingfootlengthfromheight. predictedy=yhat=0.252+0.384(x)d. SupposeMaxis70inchestallandhasafootlengthof28.5centimeters.Basedontheleast

    squaresregressionline,whatisthevalueofthepredicationerror(residual)forMax?Showallwork.

    predictedy=yhat=0.252+0.384(70)=27.13cm observedypredictedy=28.527.13=1.37cm e. Use a 1% significance level to assess if there is a significant positive linear relationship

    betweenheightandfootlength.Statethehypothesestobetested,theobservedvalueoftheteststatistic,thecorrespondingpvalue,andyourdecision.

    Hypotheses:H0:_____1=0_____ Ha:_____1>0_______ TestStatisticValue:____6.36_______ pvalue:_0.0000005124/2=0.00000002562_

    Decision:(circle) FailtorejectH0 RejectH0 Conclusion:Thusitappearsthereisasignificantpositivelinearrelationshipbetween

    heightandfootlengthsforthepopulationofcollegemenrepresentedbythesample.

  • 190

    f. Calculatea95%confidenceintervalfortheaveragefootlengthforallcollegemenwhoare70inchestall.(Justclearlypluginallnumericalvalues.)

    2

    2* )(1

    xx

    xxn

    styi

    270 71.7127.132 (2.04) 1.028

    32 289.87

    27.1320.425(26.707,27.557)g. Considertheresidualsvsfittedplotshown.

    Doesthisplotsupporttheconclusionthatthelinearregressionmodelisappropriate? Yes No

    Explain:

    Theplotshowsarandomscatterinahorizontalbandaround0withnopattern.

    Note: onexam,studentswhosaidNO,becausethevariationappearstobechangingweremarkedasoktoo.

  • 191

    RegressionLinearRegressionModelPopulationVersion:

    Mean: Individual:

    where is SampleVersion:

    Mean: Individual:

    StandardErroroftheSampleSlope

    ConfidenceIntervalfor

    df=n2tTestfor Totest

    df=n2or df=1,n2

    ParameterEstimators

    ConfidenceIntervalfortheMeanResponse df=n2 where

    Residuals=observedypredictedy

    PredictionIntervalforanIndividualResponse df=n2

    where Correlationanditssquare

    where

    StandardErroroftheSampleIntercept ConfidenceIntervalfor df=n2

    Estimateof where

    tTestfor Totest df=n2

    xYExY 10)( iii xy 10

    i ),0( N

    xbby 10 iii exbby 10

    21 )(s.e. xxs

    SsbXX

    1)(s.e. 1

    *1 btb

    1 0: 10 H )(s.e.0

    1

    1

    bb

    t

    MSEMSREGF

    221 xx

    yxx

    xx

    yyxx

    SS

    bXX

    XY

    xbyb 10

    s.e.(fit) *ty

    XXSxx

    ns

    2)(1)fit(s.e.

    yye s.e.(pred) *ty 22 )fit(s.e.s.e.(pred) s

    YYXX

    XY

    SSS

    r

    SSTOSSREG

    SSTOSSESSTOr 2

    2 yySSSTO YY

    XXSx

    nsb

    2

    01)(s.e.

    0)(s.e. 0

    *0 btb

    2 nSSEMSEs

    22 eyySSE

    0 0: 00 H

    )(s.e.0

    0

    0

    bb

    t

  • 192

    AdditionalNotesAplacetojotdownquestionsyoumayhaveandaskduringofficehours,takeafewextranotes,writeoutanextraproblemorsummarycompletedinlecture,createyourownsummaryabouttheseconcepts.