gphy 247/pols 385 introduction to statistics bl w18math121/examsold/final exam... · 2019. 7....
TRANSCRIPT
1
GPHY247/POLS385
IntroductiontoStatisticsBLW18
PracticeFinalExam
__________________________________________________________________IMPORTANT:READTHEFOLLOWINGBEFOREYOUBEGIN.• Thisexamis3hrsinduration.Theexamis17pageslong.• WriteyourstudentnumberANDnameonthecoversheetforthisexam,butnot
ontheremainingpages.TheQRcodeconnectseachpageforyou.• Youareonlypermittedtohavepensand/orpencilswhilewritingtheexam.
Pencilcasesarenotpermitted.• Answerallquestionsusinganinkpen,andanswerallquestionsdirectlyonthis
sheet.• Formultiple-choicequestions,writethemostappropriateanswerinthespace
provided.Onlytheanswerinthisspacewillbegraded.• Forshortanswerquestions,answerinthespaceprovided.• Universityapprovedcalculatorsareallowed.• Readeachquestioncarefullybeforeyouanswer.• Takecaretoallocatetimeamongallthequestions.
2
MULTIPLECHOICE–WriteasingleanswerinthespaceprovidedusingUPPERCASEletters.Illegibleanswerswillbemarkedincorrect.YouDONOTneedtojustifyyouranswer.1) Youwanttoplantabutterflygardenonthebalconyofyourapartment,butare
notsurewhatspeciesofmilkweedisbestsuitedforpots.Yougrow10plantsofthe‘CommonMilkweed’and10plantsofarelatedspeciescalledthe‘ButterflyWeed’andmeasuretheirheightafter2months.Whichofthefollowingstatementsisaboutthedescriptivestatisticsforyourdata?[1mark]I. Aboxplotshowingthattheinterquartilerangeforthecommonmilkweed
isgreaterthanforthebutterflyweed.II. Thedifferenceinmedianheightbetweenthetwospeciesis3.2cm.III. Theheightdifferencebetweenthetwospeciesisunexpectedfrom
samplingerror.
A. Ionly.B. IIonly.C. IandII.D. IandIII.E. IIandIII.
ANSWER:_____C_____
STUDYAUsethefollowingstudytoanswerquestions2&3.Airbnbisanaccommodation-sharingplatformthatreliesheavilyonguestratings.Thefigurebelowshowsthenumberofaccommodationsineachrankcategorywhere5.0isexcellentand1.0ispoor.
3
2) IfyoutookrepeatedsamplesofpotentialaccommodationsforSTUDYA,whichof
thefollowingstatementsaboutthedistributionofthemeanratingswouldbeINCORRECT?[1mark]
A. SincethedataarenotNormallydistributed,thedistributionofmeanratingscannotbeNormallydistributed.
B. Themeanofthedistributionisthesameasthemeanofthepopulationdistribution.
C. Thevarianceofthedistributiondependsonsamplesize.D. Samplingerrorcausesthevariationinthedistribution.E. Thestandarddeviationofthedistributioncanbeestimatedfromasingle
sample.
ANSWER:____A_____
3) Youcollectasampleof32accommodationsforSTUDYAandfindthefollowinginformationabouttheratings:Mean=4.2,Median=4.5,SD=0.2,IQR=1.5.Basedonthisinformation,whatistheestimatedstandarddeviationofthedistributionofmeanratings?[1mark]
A. 0≤ANSWER<0.05B. 0.05≤ANSWER<0.1C. 0.1≤ANSWER<0.2D. 0.2≤ANSWER<0.5E. 0.5≤ANSWER<1.0
ANSWER:_____A_____
4) Inwhichofthefollowingwaysarecase-controlandcohortstudiessimilar?[1mark]
A. BothareretrospectivestudiesB. BothareusedinexperimentalstudiesC. Bothstartwithgroupsofsubjectsthatdifferinsomefactorsuspectedof
influencingtheoutcomeD. BothareusedinobservationalstudiesE. Bothareformsofinferentialstatistics
ANSWER:_____D_____
4
STUDYBUsethefollowingstudytoanswerquestions5&6.Youarestudyingwhetherphysicalfitnessandthepropensityforexercisetoimprovefitnessdifferbetweenmenandwomen.Yousample200menand200womenbetween30–40yearsoldchosenrandomlyfromtheKingstoncommunityanddeterminehowmanyminutesofexerciseeachpersonengagesinperday.5) Whatanalysiswouldyouusetotestthenullhypothesisthatminutesofexercise
donotdifferbetweenmenandwomeninSTUDYB?[1mark]A. Chi-squaretestB. Paired-samplet-testC. 2-samplet-testD. Single-factoranalysisofvariance(ANOVA)E. Regression
ANSWER:_____C_____
6) WhatisthestatisticalpopulationforSTUDYB?[1mark]A. AllresidentsofKingstonB. AllresidentsofKingstonbetweentheagesof30-40C. AllresidentsofKingstonbetweentheagesof30-40whoexcerciseD. AlltheresidentsofCanadaE. Noneoftheabove
ANSWER:_____B_____
7) Saraisananthropologistwhohasbeenstudyingstresshormones(ug/ml)in
peopleoverthewintermonths.Shecollectsbloodsamplesfrom10randompeopleinacity,10randompeoplefromthesuburbs,and10randompeoplefromruralareas.Whatstatisticaltestshouldbeusedtoevaluatewhethertherearedifferencesinstresshormonesamongpeoplelivingindifferentlocations?[1mark]
A. Chi-squaretestB. Paired-samplet-testC. 2-samplet-testD. Single-factoranalysisofvariance(ANOVA)E. Regression
ANSWER:_____D_____
5
8) Below is a regression of mushroom density in relation to the fertility (nutrient content) of the soil among 41 woodlots in eastern Ontario. The dashed line is the fit regression line. Do you think that these data meet the assumptions of linear regression?[1mark]
A. Yes.TheresidualvariationlooksreasonableB. No.TheresidualvariationisclearlynotnormallydistributedC. No.TheinterceptisnotzeroD. No.TheslopeisnotzeroE. No.Theresidualvariationisclearlynothomoscedastic.
ANSWER:_____A_____
9) Youhavebeeninvestigatingthehumanhealtheffectsofafactorythathasbeenreleasingcontaminatedwaterintoariverfromwhichalllocalcommunitiesdrawtheirdrinkingwater.Thetoxininquestionisknowntocauselivercancerinlabrats.Todeterminewhetherpollutionintheriverisaffectinghumanhealth,yourandomlysample200peoplelivinginthecommunityupstreamfromthefactor(thatdon’tdrinkthepollutedwater)and200individualsthatlivedownstream(hencetheydodrinkthecontaminatedwater).Foreachpersonsampled,youperformaliverenzymetesttolookforthetelltalesignsoflivercancer.Whattypeofstatisticaltestwouldbebestsuitedtoanalyzethesedata?[1mark]
A. Chi-squaretestB. Paired-samplet-testC. 2-samplet-testD. Single-factoranalysisofvariance(ANOVA)E. Regression
ANSWER:_____C_____
5 10 15 20 25
200
400
600
800
1000
1200
1400
Soil fertility
Mus
hroo
ms
per h
ecta
re
6
10) Consideradatasetthatcontainsgradesforstudentswhohavebeeninvolvedina
studylookingattheeffectofpeer-to-peerteaching.Onegroupofstudentswasaskedtospend1hreachweekstudyingforthecourse,andasecondgroupwasaskedtospend1hreachweekteachingtheirpeersthematerialthattheyunderstoodwellfromthecourse.Thedifferenceiscalculatedasmeangradeforpeer-to-peerminusmeangradeforstudying.Whichofthefollowingtestdistributionsismostappropriatetoevaluatethehypothesisthatpeer-to-peerteachingisabetterwaytostudyforexaminations?[1mark]
ANSWER:_____A_____11) Selectwhichofthefollowingnullandalternativehypothesesaremost
appropriateforatwo-samplet-testthatanswersthefollowingquestion:“Arethemeansofmysamplesdifferent?”.[1mark]
A. H0:μ≤0 HA:μ>0B. H0:μA=μB HA:μA≠μBC. H0:μA≥μB HA:μA<μBD. H0:μ>0 HA:μ≤0E. H0:μA≠μB HA:μA=μB
ANSWER:____B_______
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
T-score
Pro
babi
lity
Den
sity
A
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
T-score
Pro
babi
lity
Den
sity
B
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
T-score
Pro
babi
lity
Den
sity
C
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
T-score
Pro
babi
lity
Den
sity
D
7
12) Youconductedastudyofthepracticalskillsamongsurgeonsincardiacversusneurologicaldisciplines.Youselected20hospitalsatrandomandselectedonecardiacandoneneurologicaldoctorwithineachhospital.Eachdoctorcompletedastandardtesttoevaluateher/hissurgicalskills.Meanperformanceonthetestwascomparedbetweendisciplinesusinga2-samplet-test.Basedonwhatyou’velearnedaboutstatisticalsleuthing,isthereaproblemwiththisanalysis?Ifso,whatisthatproblem?[1mark]
A. Thereisnoproblem.A2-samplet-testisdesignedforsituationswhereyouhaveonecategoricalpredictorwith2levelsandonenumericalresponsevariable.
B. Thereisaproblem.Theunitsofobservationwithineachgrouparenotstatisticalindependentofoneanother.
C. Thereisaproblem.Thesedatacanonlybeanalyzedwitha1-factorANOVA.
D. Thereisaproblem.ThesedatacanonlybeanalyzedwithaChi-squaretest.
E. Thereisaproblem.Thereisnoreplicationinthestudy.ANSWER:_____B_____
13) Which of the following statements about statistical errors is TRUE?[1mark]
A. TypeIerror,theprobabilityofrejectinganullhypothesiswhenitisfalse,issetbytheresearcher.
B. TypeIIerror,theprobabilityofrejectinganullhypothesiswhenitisfalse,isnotsetbytheresearcherbutdependslargelyonsamplesize.
C. TypeIerror,theprobabilityofrejectinganullhypothesiswhenitistrue,issetbytheresearcher.
D. TypeIIerror,theprobabilityofrejectinganullhypothesiswhenitistrue,isnotsetbytheresearcherbutdependslargelyonsamplesize.
ANSWER:____C_______
14) Whichofthefollowingstatementsbestdescribesthedifferencebetweenobservationalandexperimentalstudies?[1mark]
A. Experimentalstudiessufferfromproblemswithconfoundingvariables,whereasobservationalstudiescanbedesignedtominimizethese
B. Observationalstudiescreateproblemswithinterpretingthecorrectstatisticalpopulation,whileexperimentalstudiesdonot
C. Observationalstudiesareprospectivewhereasexperimentalstudiesareretrospective
D. Observationalstudiesarecorrelative,whereasexperimentalstudiesarecausal
ANSWER:_____D_____
8
15) Muchstatisticaltheoryisbasedonthedistinctionbetweenthesample,thesamplingdistributionandthepopulation.Whichofthefollowingisunaffectedbythesamplesize(n)?[1mark]
A. ThemeanofthesampleB. ThestandarddeviationofthesamplingdistributionC. ThepopulationmeanD. Thestandarddeviationofthesample
ANSWER:____C_______
16) Whichofthefollowingstatementsaboutap-valueiscorrect?[1mark]A. ItistheprobabilityofobservingaTypeIerrorunderthenullhypothesis.B. Itisunderthecontroloftheresearcher.C. Itcanbeusedtodecidewhethertorejectanullhypothesisornotwithout
referencetoathreshold.D. Itistheprobabilityofobservingyourtestscoreorsomethingmore
extremeunderthenullhypothesis.E. Itisusedfort-testsandregression,butnotforChi-squaretestsor
ANOVA.
ANSWER:_____D_____
17) ThedatabelowgivethewaittimesinminutesatStarBucksfor6peopleorderingcoffee.Calculatethelower95%confidenceintervalforthisdataset.[1mark]
{5.7,4.2,5.0,3.7,3.4,2.8}A. 1≤ANSWER<1.5B. 1.5≤ANSWER<2.0C. 2.0≤ANSWER<2.5D. 2.5≤ANSWER<3.0E. 3.0≤ANSWER<3.5
ANSWER:_____E_____
18) Whichofthefollowingstatementsarecorrectabout95%confidenceintervals?[1mark]
A. Theintervalincludesthetruepopulationvalue5%ofthetime.B. Thewidthoftheintervalisdifferentfordifferentsamplesbecauseof
samplingerror.C. Theintervalincludesthesamplemean95%ofthetime.D. ConfidenceintervalsareanalternativewaytodrawconclusionsinChi-
squaretests.E. Theintervalisfixedat+/-2.
ANSWER:_____B_____
9
19) Whichofthefollowingquestionsshouldbeanalyzedusinga1-tailedt-test?[1mark]
A. IsthefluorineconcentrationinKingston’swaterabovetherecommendedguideof0.7mg/L?
B. DoesthemeanpatientwaittimedifferbetweenanAfter-Hourshealthcliniccomparedtoaregulardoctor’soffice?
C. HasthefirstcalendardatethatstoresputouttheirChristmasornamentsforsalechangedoverthepast20years?
D. IstheamountofseaiceinourArcticwatersdifferentnowfromwhatitwas10yearsago?
ANSWER:_____A_____20) ThefollowingRoutputisforalinearregressionofpercentagevoterturnoutat
20electoraldistrictsasafunctionofmeanage.Whichofthefollowingvaluesisusedtoconductthestatisticaltestthatevaluatesthehypothesisthatmeanageofthepeopleinanelectoraldistrictcanbeusedtopredictvoterturnout?[1mark]
A. Theestimatevalueof0.41983B. Thet-valuevalueof1.171C. ThemultipleR-squaredvalueof0.0483D. Thet-valuevalueof0.530E. Theestimatevalueof0.0251
ANSWER:_____B_____21) In2003agroupof300NorthAmericanswithType1diabetesbetweentheages
of21and30wereselectedbyamedicalresearchgroup.In2014adifferentgroupofresearchersusedbothmedicalhistoriesandquestionnairestoevaluatewhetherdifferencesineatinghabitsoverthepastyearcorrelatedwithchangesintheirhealth.Thisstudydesignisbestcharacterizedas:[1mark]
A. ClusterB. BlockedC. ExperimentalD. CohortE. Case-control
ANSWER:_____D_____
10
SHORTANSWER-Writeallyouranswersinthespaceprovided.1. Thefollowingquestionslookatdifferentaspectsofhypothesistesting.
i. TheF-testisusedforsomeofthestatisticalteststhatcoveredinthecourse.ExplaininwordswhatanobservedF-ratiorepresentsusingasingle-factorANOVAasanexample.[1mark]TheF-ratioistheratioofthevariationexplainedbythegroupsovertheunexplainedvariation.
ii. Defineap-valueusingterminologythatisgeneraltoalltests.[1mark]Ap-valueistheprobabilityofobservingyourdata(ortestscore)orsomethingmoreextremeunderthenulldistribution.
iii. Anulldistributiondescribesthepossiblerangeofoutcomesfromrepeatedsampling.Explaininwordswherethisvariationcomesfrom.[1mark]Thevariationinanulldistributioniscausedbysamplingerror.Itisthevariationintestscoresthatwouldoccurifweresampledapopulationwherethenullhypothesisistrue.
2. ThefourthcriterionforanIdealSamplingProcessisthat“Eachpossiblesamplehasequalchanceofbeingselected”.Useanexampletoexplainwhatthiscriterionmeans,andhowitisdistinctfromthefirstcriterionthatstates“Unitshaveknown&non-zeroprobabilityofbeingincludedinyoursample”.[2marks]
Thefourthcriterionmeansthateverycombinationofsamplingunitsthatcouldmakeupasamplecanbeachievedbythesamplingprocess.Forexample,ifyouwantedtosampleof10addressesfromKingston,thenyoursamplingunitisanaddressandanycombinationofhousescanbeselected.However,ifyoudecidetofirstflipacoinandgotojusttheeastorwestside,thenyouwillneverhaveasamplecomposedofhousesfrombotheastandwestsides.Thefourthcriterionisdistinctfromthefirstbecausethefourthcriterionisaboutthecompositionofthesamplewhereasthefirstcriterionisaboutthesamplingunit.
11
3. Drawaboxplotdirectlyonthefollowingfigure.Theboxplotshouldbedrawntoscaleusingthedatapointsprovided.Makesuretoincludethemedian,interquartilerange,anywhiskersandanyextremevalues.[2marks]
4. Foreachofthefollowingstudies,identifyandrationalizethemostappropriate
statisticaltest(e.g.,t-test,regressionetc.).Includethenullandalternativehypotheses(bemindfulofdirectioninthetest),aswellastheappropriateteststatistic(e.g.,F-scoreforanF-test).STUDY1Ageneticistwasinterestedinthedegreetowhichbehaviorisdeterminedbygender.Sheconductedasurveyof40randomstudents.Thescientistcategorizedeachstudentasbeingmaleorfemale,andtheirstudybehaviorasdiligent,procrastinationorreactive.Shewantstoknowwhetherstudystyledependsongender
i. Identifyandrationalizethemostappropriatestatisticaltest[1mark]
ThisisaChi-squareanalysisbecausebothfactorsarecategorical.
ii. Statethenullandalternativehypothesis[1mark]Ho:thereisnodifferencebetweenexpectedandobservedfrequencies.HA:thereisadifferencebetweenexpectedandobservedfrequnecies.(studentscouldalsousetheindependent/nonindependentterminology)
iii. Identifytheappropriateteststatistic[1mark]ThisisaChi-squaretestandtheteststatisticistheChi-squarescore.
0.6 0.8 1.0 1.2 1.4
05
1015
20
Y
05
1015
20
Y
12
STUDY2Youaretheleadscientistatastartupcompanyproposingtosequesteratmosphericcarbondioxideusingageneticallymodifiedstrainofpinetrees.Toevaluatetheeffectivenessofnutrientaddition,yourunaseriesofexperimentsandmonitorgrowthratesofreplicatetreesunderfivelevelsnutrientofaddition(0,1.0,2.0,3.0,4.0and5.0ugN/gmSoil).Youareinterestedinevaluatingwhethernutrientadditioncanbeusedtopredicttreegrowth.
i. Identifyandrationalizethemostappropriatestatisticaltest[1mark]
Thisisalinearregressionbecausetheindependentfactorisnumerical,andthequestionislookingtopredicttreegrowth.
ii. Statethenullandalternativehypothesis[1mark]ThenullhypothesisisH0:b=0,HA:b≠0,wherebistheslope.
iii. Identifytheappropriateteststatistic[1mark]Thetestisat-test,andtheteststatisticisat-score.
STUDY3ResearchersinterestedintheeffectoftakingEchinaceaonthelikelihoodofcontractingthecommoncolddesignedastudythatfollowed30subjectsduringafourmonthtimeperiod.Asampleofbloodwastakenfromeachsubject,whichwasusedtoquantifythebaselineviralload.EachsubjectthentookEchinaceaforfourmonths,afterwhichtimeasecondbloodsamplewastakentoquantifytheafter-treatmentviralload.TheyareinterestedinevaluatingwhetherEchinaceahadaninfluenceonviralload.
i. Identifyandrationalizethemostappropriatestatisticaltest[1mark]
Thisisapaired-samplet-testbecauseweareinterestedinwhethertheviralloadchangesoverthe4monthsandthemeasurementsaretakenfromthesamesamplingunit.
ii. Statethenullandalternativehypothesis[1mark]ThenullhypothesisisH0:d=0,HA:d≠0,wheredisthedifferencebetweeninitialandfinalviralload.
iii. Identifytheappropriateteststatistic[1mark]Thetestisat-test,andtheteststatisticisat-score.
13
5. AstheChiefSupplyOfficerforStarbucks,Jeanisinchargeofpurchasingrawcoffeebeansfromanumberoflargeandsmallproducers.Tomonitorbeanquality,Jeandesignsanongoingmonitoringprogramthatisrunatthepeakofthegrowingseason.Themonitoringprogramistorandomlyselectabagofrawbeansfrom20randomlyselectedproducerswithineachofthe5mainproducingcountries(Brazil,Vietnam,Colombia,Indonesia&Ethiopia).Eachselectedbagisroastedandgroundintocoffee.Thecoffeeisthenbrewedfromeachbagandapanelofexpertsrankstheflavoronascaleof1to5where1isPoorTasteand5isExcellent.Identifythefollowingelementsinthisstudy.Beasspecificaspossible.
i. Sample[20bagsofcoffeewithineachcountry(orallsampledcoffeebags)] [0.5marks]
ii. Samplingunit[coffeeproducer] [0.5marks]iii. Population[allcoffeeproducers] [0.5marks]iv. Observationunit[coffeebagorcoffeefromcoffeebag]
[0.5marks]v. Datatype(e.g.,categorical)[categorical] [0.5marks]vi. Studydesign[stratified] [0.5marks]
6. DefineSamplingUnitandanObservationUnit.Identifythekeycriterionthatis
differentbetweenthem.[Thesamplingunitistheelementthatisselectedatrandom.Theobservationunitistheelementthatthequestionisbeingaskedabout.Thesamplingunitisalwaysselectedatrandom,whereastheobservationunitdoesnotneedtobeselectedatrandom.][2marks]
14
7. ThefigurebelowshowstheprobabilitydistributionforthenumberofFacebook
FriendsforCanadianFacebookusers.ThetypicaluserwhoisactiveinFacebookhasatleast20friends,butuserswhohaveblogshaveatleast40fiends.Youareanactiveuser,butreallydon’thavethetimetomaintainblogs.
i. Onthefigureabove,shadeintheprobabilitythatyouwillhaveatleast
thesamenumberoffollowersassomeonewhoisactiveandmaintainsblogs.[shadedareawillbefrom40totheright][1mark]
ii. Onthefigureabove,shadeintheprobabilitythatyouwillhavefewerfollowersthanatypicalFacebookuserwhoisactive.Makesuretodistinguishthisareafromtheonedrawninparti. [shadedareawillbefrom20totheleft][1mark]
0 20 40 60 80 100
0.00
0.02
0.04
0.06
0.08
Number of Facebook Friends
Pro
babi
lity
Den
sity
15
8. As a political strategist, you are interested in the best way to get the message out to a large number of people about the policy platform for your political party. You decide to try four methods: i) public town hall, ii) TV ads, iii) news paper ads and iv) door-to-door canvasing. Since it’s not an election year, your party agrees to let you try out these different methods for a series of small announcements. You decide to evaluate the effectiveness of each method by monitoring the number of re-tweets each announcement receives. A. Indicate the most appropriate statistical test (be as specific as possible) [1 mark]
[single-factor ANOVA]
B. Explain why you selected this test [1 mark] [There are four categories being compared, which can only be done using ANOVA. There is just one factor.]
C. Indicate the null and alternative hypothesis (be mindful of direction in the test) [1 mark] [The null hypothesis is H0: u1=u2=u3=u4, HA: u1= u2= u3= u4, where ui is the mean for the ith type of method of messaging. *grade part c based on the answer in part a, even if that was incorrect]
D. Name the appropriate test statistic (e.g., Chi-square score) (You do not need to find its value.) [1 mark] [F-score. *grade part d based on the answer in part a, even if that was incorrect]
E. The following R output shows the results of the statistical test. Based on this output, what is the observed test score for this test? [1 mark]
[Fobs=8.401]
16
F. Explain in words what the VALUE of the observed test score means for this test
in terms of a ratio of variances. [1 mark] [The group variation is 8.4 times that of the residual variation]
G. What is the critical test score for this test assuming an alpha value of 0.05? [1
mark] [Fcrit= 2.92 or 2.84]
H. Based on the observed and critical test scores, what is your statistical? [1 mark] [Since Fobs>Fcrit, we reject the null hypothesis
9. The figure below shows a boxplot of what your data might look like for the above question. Assume for this question that the overall ANOVA test was significant. Use the output below from a TukeyHSD test to identify what levels are different. Indicate this directly on the boxplot using the lettering scheme shown in lecture and used in tutorial. [2 mark]
Door Newspaper TownHall TV
050
100
150
200
250
300
Num
ber o
f Ret
wee
ts
a b a a
17
FORMULAE
18
19
20