learning from data - an introduction to statistical reasoning
DESCRIPTION
STATTRANSCRIPT
-
LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING
THIRD EDITION
ER9405.indb 1 7/5/07 11:05:50 AM
-
ER9405.indb 2 7/5/07 11:05:50 AM
-
LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING
THIRD EDITION
ARTHUR M. GLENBERG
MATTHEW E. ANDRZEJEWSKI
Lawrence Erlbaum Associates
New York London
ER9405.indb 3 7/5/07 11:05:50 AM
-
Lawrence Erlbaum AssociatesTaylor & Francis Group270 Madison AvenueNew York, NY 10016
Lawrence Erlbaum AssociatesTaylor & Francis Group2 Park SquareMilton Park, AbingdonOxon OX14 4RN
2008 by Taylor & Francis Group, LLC Lawrence Erlbaum Associates is an imprint of Taylor & Francis Group, an Informa business
Printed in the United States of America on acidfree paper10 9 8 7 6 5 4 3 2 1
International Standard Book Number13: 9780805849219 (Hardcover)
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.
Library of Congress CataloginginPublication Data
Glenberg, Arthur M.Learning from data : an introduction to statistical reasoning / Arthur M. Glenberg and Matthew E.
Andrzejewski. 3rd ed.p. cm.
Includes bibliographical references and index.ISBN13: 9780805849219 (alk. paper)1. Statistics. I. Andrzejewski, Matthew E. II. Title.
HA29.G57 2008001.422dc22 2007022035
Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com
ER9405.indb 4 7/5/07 11:05:51 AM
-
Contents
Preface xiii
Chapter 1WhyStatistics? 1Variability 2PopulationsandSamples 4DescriptiveandInferentialStatisticalProcedures 6Measurement 8UsingComputerstoLearnFromData 15Summary 16Exercises 17
part IDescriptie Statistics 19
Chapter 2FrequencyDistributionsandPercentiles 21FrequencyDistributions 22GroupedFrequencyDistributions 25GraphingFrequencyDistributions 30CharacteristicsofDistributions 33Percentiles 38ComputationsUsingExcel 39Summary 41Exercises 42
ER9405.indb 5 7/5/07 11:05:51 AM
-
i Contents
Chapter 3CentralTendencyandVariability 47SigmaNotation 47MeasuresofCentralTendency 50MeasuresofVariability 56Summary 64Exercises 65
Chapter 4zScoresandNormalDistributions 69StandardScores(zScores) 69CharacteristicsofzScores 74NormalDistributions 76UsingtheStandardNormalDistribution 80OtherStandardizedScores 87Summary 88Exercises 88
part IIIntroduction to Inferential Statistics 91
Chapter 5OverviewofInferentialStatistics 93WhyInferentialProceduresAreNeeded 93VarietiesofInferentialProcedures 95RandomSampling 96BiasedSampling 100Overgeneralizing 101Summary 102Exercises 103
Chapter 6Probability 105ProbabilitiesofEvents 106ProbabilityandRelativeFrequency 107DiscreteProbabilityDistributions 109
ER9405.indb 6 7/5/07 11:05:51 AM
-
Contents ii
TheOr-ruleforMutuallyExclusiveEvents 112ConditionalProbabilities 113ProbabilityandContinuousVariables 114Summary 116Exercises 117
Chapter 7SamplingDistributions 119ConstructingaSamplingDistribution 119TwoSamplingDistributions 123SamplingDistributionsUsedinStatisticalInference 127SamplingDistributionoftheSampleMean 128ReviewofSymbolsandConcepts 133zScoresandtheSamplingDistributionoftheSampleMean 133APreviewofInferentialStatistics 136Summary 138Exercises 139
Chapter 8LogicofHypothesisTesting 141Step1:ChecktheAssumptionsoftheStatisticalProcedure 143Step2:GeneratetheNullandAlternativeHypotheses 145Step3:SamplingDistributionoftheTestStatistic 147Step4:SettheSignificanceLevelandFormulatetheDecisionRule 150Step5:RandomlySampleFromthePopulationandComputetheTestStatistic 152Step6:ApplytheDecisionRuleandDrawConclusions 153WhenH0IsNotRejected 154BriefReview 155ErrorsinHypothesisTesting:TypeIErrors 157TypeIIErrors 158OutcomesofaStatisticalTest 161DirectionalAlternativeHypotheses 162ASecondExample 166AThirdExample 169Summary 172Exercises 174
Chapter 9Power 177CalculatingPowerUsingzScores 178FactorsAffectingPower 182
ER9405.indb 7 7/5/07 11:05:52 AM
-
iii Contents
EffectSize 189ComputingProceduresforPowerandSampleSizeDetermination 193WhentoUsePowerAnalyses 195Summary 197Exercises 198
Chapter 10LogicofParameterEstimation 199PointEstimation 200IntervalEstimation 200ConstructingConfidenceLimitsforWhenIsKnown 201WhytheFormulaWorks 204FactorsThatAffecttheWidthoftheConfidenceInterval 206ComparisonofIntervalEstimationandHypothesisTesting 209Summary 210Exercises 211
part IIIapplications of Inferential Statistics 213
Chapter 11InferencesAboutPopulationProportionsUsingthezStatistic 215TheBinomialExperiment 216TestingHypothesesAbout 219TestingaDirectionalAlternativeHypothesisAbout 225PowerandSampleSizeAnalyses 228Estimating 232RelatedStatisticalProcedures 235Summary 236Exercises 238
Chapter 12InferencesAboutWhenIsUnknown:TheSingle-sampletTest 241WhysCannotBeUsedtoComputez 242ThetStatistic 243
ER9405.indb 8 7/5/07 11:05:52 AM
-
Contents ix
UsingttoTestHypothesesAbout 245ExampleUsingaDirectionalAlternative 252PowerandSampleSizeAnalyses 253EstimatingWhenIsNotKnown 256Summary 258Exercises 258
Chapter 13ComparingTwoPopulations:IndependentSamples 263ComparingNaturallyOccurringandHypotheticalPopulations 264IndependentandDependentSamplingFromPopulations 266SamplingDistributionoftheDifferenceBetweenSample Means(IndependentSamples) 267ThetDistributionforIndependentSamples 269HypothesisTesting 271ASecondExampleofHypothesisTesting 279PowerandSampleSizeAnalyses 281EstimatingtheDifferenceBetweenTwoPopulationMeans 283TheRank-sumTestforIndependentSamples 286Summary 292Exercises 292
Chapter 14RandomSampling,RandomAssignment,andCausality 299RandomSampling 299ExperimentsintheBehavioralSciences 300RandomAssignmentCan(Sometimes)BeUsedInsteadofRandomSampling 303InterpretingtheResultsBasedonRandomAssignment 305Review 306ASecondExample 306Summary 307Exercises 308
Chapter 15ComparingTwoPopulations:DependentSamples 311DependentSampling 312SamplingDistributionsoftheDependent-sampletStatistic 318HypothesisTestingUsingtheDependent-samplet Statistic 320
ER9405.indb 9 7/5/07 11:05:52 AM
-
x Contents
ASecondExample 326PowerandSampleSizeAnalyses 328EstimatingtheDifferenceBetweenTwoPopulationMeans 330TheWilcoxonTmTest 332HypothesisTestingUsingtheWilcoxonTmStatistic 334Summary 337Exercises 338
Chapter 16ComparingTwoPopulationVariances:TheFStatistic 345TheFStatistic 346TestingHypothesesAboutPopulationVariances 348ASecondExample 353EstimatingtheRatioofTwoPopulationVariances 354Summary 355Exercises 355
Chapter 17ComparingMultiplePopulationMeans:One-factorANOVA 359FactorsandTreatments 361HowtheIndependent-sampleOne-factorANOVAWorks 361TestingHypothesesUsingtheIndependent-sampleANOVA 367ComparisonsBetweenSelectedPopulationMeans:TheProtectedtTest 372ASecondExampleoftheIndependent-sampleOne-factorANOVA 374One-factorANOVAforDependentSamples 376ASecondDependent-sampleOne-factorANOVA 381KruskalWallisHTest:NonparametricAnalogueforthe Independent-sampleOne-factorANOVA 384FriedmanFrTest:NonparametricAnalogueforthe Dependent-sampleOne-factorANOVA 387Summary 390Exercises 391
Chapter 18IntroductiontoFactorialDesigns 399TheTwo-factorFactorialExperiment:CombiningTwoExperimentsIntoOne 400LearningFromaFactorialExperiment 402ASecondExampleofaFactorialExperiment 407
ER9405.indb 10 7/5/07 11:05:53 AM
-
Contents xi
GraphingtheResultsofaFactorialExperiment 408DesignofFactorialExperiments 410Three-factorFactorialExperiment 412Summary 421Exercises 422
Chapter 19ComputationalMethodsfortheFactorialANOVA 425Two-factorFactorialANOVA 425ComparingPairsofMeans:TheProtectedtTest 433ASecondExampleoftheFactorialANOVA 436Summary 437Exercises 438
Chapter 20DescribingLinearRelationships:Regression 441DependentSamples 443MathematicsofStraightLines 445DescribingLinearRelationships:TheLeast-squaresRegressionLine 448PrecautionsinRegression(andCorrelation)Analysis 454InferencesAbouttheSlopeoftheRegressionLine 457UsingtheRegressionLineforPrediction 464MultipleRegression 469Summary 470Exercises 470
Chapter 21MeasuringtheStrengthofLinearRelationships:Correlation 477Correlation:DescribingtheStrengthofaLinearRelationship 478FactorsThatAffecttheSizeofr 482TestingHypothesesAbout 482CorrelationDoesNotProveCausation 489TheSpearmanRank-orderCorrelation 491OtherCorrelationCoefficients 495PowerandSampleSizeAnalyses 496Summary 498Exercises 498
ER9405.indb 11 7/5/07 11:05:53 AM
-
xii Contents
Chapter 22InferencesFromNominalData:The2Statistic 505Nominal,Categorical,EnumerativeData 5062Goodness-of-fitTest 507ASecondExampleofthe2Goodness-of-fitTest 511ComparisonofMultiplePopulationDistributions 513SecondExampleofUsing2toCompareMultipleDistributions 517AnAlternativeConceptualization:AnalysisofContingency 519Summary 522Exercises 523
GlossaryofSymbols 527
Tables 531
AppendixA.VariablesFromtheStopSmokingStudy 545
AppendixB.VariablesFromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork 547
AnswerstoSelectedExercises 549
Index 555
ER9405.indb 12 7/5/07 11:05:53 AM
-
xiii
Preface
Statisticsisadifficultsubject.Thereisalottolearn,andmuchofitinvolvesnewthink-ing.Asthetitle implies,Learning From Data: An Introduction to Statistical Reasoningteachesyouanewwayofthinkingaboutandlearningabouttheworld.Ourgoalistoputreadersinagoodpositiontounderstandpsychologicaldataandtheirlimitations.Anothermore important goal is to evaluate data that affect all aspects of lifepsychological,social, educational, political, and economicto better prepare readers to question andtochallenge.Yetanothergoal is tohelp readers retain thematerial.Psychologistshavedeveloped(fromdata)techniquesthatfacilitatelearningandcomprehension,andwehaveincorporatedthreeofthesetechniquesintothebook.
First, we have devoted extra attention to explaining difficult-to-understand conceptsindetail.Forexample, some textbooksattempt tocombine importantconcepts suchassamplingdistributions,hypothesistesting,power,andparameterestimationinonechapter.Inthisbook,eachconcepthasitsownchapter.Yes,thismeansmorereading,butitalsomeansgreaterunderstanding.
Second,thebookusesrepetitionextensivelytohelpstudentslearnandretainconcepts.Therearemultiplefullyexplainedexamplesofeachmajorprocedure.Manyconcepts(forexample,power,TypeIerrors)arerepeatedfromchaptertochapter.Theproblemsetsattheendsofmostchaptersrequirestudentstoapplyprinciplesintroducedinearlierchapters.
Thethirdmajorlearningaidistheuseofaconsistentschema(thesix-stepprocedure)fordescribingallstatisticaltestsfromthesimplesttothemostcomplex.Theschemapro-videsavaluableheuristicforlearningfromdata.Studentslearn(1)toconsidertheassump-tionsofastatisticaltest,(2)togeneratenullandalternativehypotheses,(3)tochooseanappropriatesamplingdistribution,(4)tosetasignificancecriterionandgenerateadecisionrule, (5) to compute the statistic of interest, and (6) to drawconclusions. Learning theschemaatanearlystage(inChapter8)willeasethewaythroughChapters11through22,inwhich theschemaisapplied tomanydifferentsituations.Thisschemaalsoprovidesa convenient summary for each hypothesis-testingprocedure.A tablewith a summaryschema is included in the lastsectionofeachchaptercontaining thehypothesis-testingprocedure.InsidethefrontcoverofthebookisaStatisticalSelectionGuidetofurtherassiststudentsindeterminingwhichstatisticaltestismostappropriateforthesituation.
ER9405.indb 13 7/5/07 11:05:54 AM
-
xi Preface
About the book
TherearemanyaspectstoLearning From Datathatdifferentiateitfromotherstatisticstextbooks.Inadditiontothethreeteaching/learningmethodsmentionedearlier,thecon-tentandorganizationofthebookmaybequitedifferentfromwhatstudentsareusedto.First,nonparametricstatisticaltestsareintegratedintothechaptersinwhichanalogousparametrictestsaredescribed.Withthisorganization,studentscanbetterappreciatethesituationsinwhichparticulartestsapply.Infact,throughoutthebookthereisanemphasisonpracticinghowtochoosethebeststatisticalprocedure.Thechoiceoftheprocedureisdiscussedinexamples,andstudentsarerequiredtomakethecorrectchoiceastheysolvetheproblemsattheendofthechapter.Theendpapersofthebookprovideguidelinesforchoosingprocedures.
Second, the initial parts of the chapters on regression (Chapter 20) and correlation(Chapter 21) are self-contained sections that include discussions of regression and cor-relationasdescriptiveprocedures.Instructorsmaypresentthesetopicsalongwithotherdescriptivestatisticsordelaytheirintroductionuntillaterinthecourse.
Third, thebookcontains twoindependent treatmentsofpower.Themajor treatmentbegins inChapter9withgraphical illustrationsofhowpowerchangesunder the influ-enceofsuchfactorsasthesignificancelevelandsamplesize.Thechapteralsointroducesformulasforcomputingpowerandestimatingsamplesizeneededtoobtainaparticularlevelofpower.These formulasare repeatedandgeneralized formanyof thestatisticalproceduresdiscussedinlaterchapters.Often,however,theremaynotbeenoughtimeforanextensivetreatmentofpower.Inthatcase,instructorscanchoosetotreatpowerlessextensivelyandomitChapter9(andtherelevantformulasintheotherchapters).Thislessextensivetreatmentofpowerispartofeachnewinferentialprocedure.Itconsistsofanon-mathematicaldiscussionofhowpowercanbeenhancedforthatparticularprocedure.
Fourth,factorialdesigns,interactions,andtheANOVAareexplainedingreaterdetailthaninmostintroductorytextbooks.Ourgoalistogivestudentsenoughinformationsothattheywillbeabletounderstandthestatisticsusedinmanyprofessionaljournalarticles.Ofcourse,itwouldbefoolishfortheauthorsofanyintroductorytextbooktotrytocoverthestatisticalanalysesofcomplexsituations.Instead,Chapter18discusseshowtwo-factorandthree-factorfactorialexperimentsaredesigned,andhowtointerpretmaineffectsandtwo-factor and three-factor interactions. Chapter 19 presents a description of computa-tionalproceduresfortherelativelysimpletwo-factor,independentsampleANOVA.
Last,butmost important tous, isChapter14,RandomSampling,RandomAssign-ment,andCausality.Amajorreasonforwritingthefirsttwoeditionsofthisbookwastoaddresstheissuesdiscussedinthischapter.Allofuswhoteachstatisticscoursesandcon-ductresearchhavebeenstruckbytheincongruitybetweenwhatwepracticeandwhatwepreach.Whenweteachastatisticscourse,weemphasizerandomsamplingfrompopula-tions.Butinmostexperimentswedonosuchthing.Instead,weusesomeformofrandomassignment to conditions. How can we perform statistical analyses of our experimentswhenwehaveignoredthemostimportantassumptionofthestatisticaltests?InChapter14,wedeveloparationaleforthisbehavior,buttherationaleextractsseverepaymentbyplacingrestrictionsontheinterpretationoftheresultswhenrandomassignmentisusedinsteadofrandomsampling.
ER9405.indb 14 7/5/07 11:05:54 AM
-
Preface x
New to the third editioN
Inadditiontothefeaturesalreadydescribed,thereareanumberofnewfeatures.First,thethirdeditionofLearning From DataisdesignedtobeusedseamlesslywithExcel.Unlikeothertextsthatconcentrateonstatisticalsoftware,wechoosetofocusonExcel,a spreadsheet program.Recentversionsof statistical programsproduceoutput that arefarmorecomplicatedthanneededfortheundergraduatelevel.TheoutputfromExcelisstraightforward;however, thestatistical toolsavailablearenotcomplete.Thus,wehavewrittenanAdd-in(LFD3DataAnalysisAdd-in)forExcelsoalltheanalysespresentedinthebookcanbeconductedinExcel.Exceliswidelyavailableandcanalsobeusedasa database, data manager, and graphics program; experience with these functions mayprovideavaluablesetofskillsforundergraduatesinanumberofprofessions,includingpsychology.Thus,filescontainingallthedatausedinthebookareprovidedonacom-panionCDinExcelformat.However,becauseotherprogramsarestillwidelyused,text-basedfilesarealsoavailableforuseinotherstatisticalprograms, likeSPSS,SAS,andSystat.
Second,thebookattemptstocapturethestudentsinterestbyfocusingonwhatcanbelearnedfromastatisticalanalysis,notjustonhowitisdone.Thisismostapparentinthetreatmentofhypothesistesting.Usingthesix-stepschema,thelaststepinhypothesis-test-ingisdescribedasdecidingwhethertorejectthenullhypothesisand thenconcludingwhatthatdecisionimpliesabouttheworldandwhattheimplicationsforfutureactionmightbe.Anotherwaythatthebookattemptstocapturethestudentsinterestisbycontinuallyrefer-ringbacktotworealdatasets.Thesedatasetsareintrinsicallyinterestingandsavetimebecausenewexperimentalscenariosdonotneedtobecontinuallyintroduced.ThefirstdatasetontheeffectivenessofZybanandnicotine-replacementgumonsmokingcomesfromDr.TimothyBaker.Datafrom608participantsareincludedonthecompanionCD.TheseconddatasetontheeffectsofhavingachildonmarriagecomesfromDr.JanetHydeandDr.MarilynEssex.Thedatafrom244familiesarealsoincludedonthecom-panionCD.Datafromthesestudiesareusedthroughoutthebookinillustratingimportantconcepts.ThefactthatthesearerealdatasetsstrikesachordwithstudentsthatstatisticsplaysanimportantroleinLearning From Data.
Finally,wehaveprovidedinstructorswithsubstantialresources.Tobeginwith,wehaveaddedapproximately20newproblemstotheend-of-chapterexercisesandprovidedmanymoreonthecompanionCD.IncludedontheinstructorCDaresampletestquestions,exer-cises,andsampledatasets.WehavealsogeneratedPowerpointlecturesforeachchapterforinstructorstouseoredit,astheychoose.Thereareanumberofveryusefulgraphicsandillustrationsthatmirrortheonesinthebook.Therearealsofun,interactiveexercises/dem-onstrationsandtoolsthatwehavefounduseful(forexample,datagenerationalgorithms,Gaussianrandomnumbergenerators,etc.).Asadditionalitemsbecomeavailable,ourWebsite(www.LFD3.net)willprovideusersofthetextbookaccesstothem.
MANy thANks
Manypeoplehavecontributedtothisbook.WethankourstudentsandcolleaguesattheUniversityofWisconsinMadisonandthoseinstructorswhousedthefirsttwoeditionsand
ER9405.indb 15 7/5/07 11:05:54 AM
-
xi Preface
providedvaluablecomments.WealsothankLauraD.Goodwin(UniversityofColorado,Denver),RichardE.Zinbarg(NorthwesternUniversity),DanielS.Levine(UniversityofTexas, Arlington), and Randall De Pry (University of Colorado, Colorado Springs) fortheirvaluablereviewsofmanyofthechaptersandoftheproposalforathirdeditionofthebook.AMGthankshisinstructorsattheUniversityofMichiganandMiamiUniversity.MEAthankshisinstructorsatTempleUniversity,especiallyRalphRosnow,AlanSock-loff,andPhilBersh.Thanksaredue to theeditorialandproductionstaffsatLawrenceErlbaumAssociates,whotolerateddelayafterdelay.Finally,thankstoMinaandAnnafortheirloveandsupport.
Arthur M. Glenberg
Matthew e. Andrzejewski
ER9405.indb 16 7/5/07 11:05:55 AM
-
1Chapter
Why Statistics?
Therearemanywaystolearnabouttheworldandthepeoplewhopopulateit.Learningcanresultfromcriticalthinking,askinganauthority,orevenfromareligiousexperience.However,collectingdata(thatis,measuringobservations)isthesurestwaytolearnabouthowtheworldreallyis.
Unfortunately,data in thebehavioralsciencesaremessy. Initialexaminationofdatarevealsnoclearfactsabouttheworld.Instead,thedataappeartobenothingbutaninco-herentjumbleofnumbers.Tolearnabouttheworldfromdata,youmustfirstlearnhowtomakesenseoutofdata,andthatiswhatthistextbookwillteachyou.Statistical procedures are tools for learning about the world by learning from data.
Tohelpyoutounderstandthepowerandusefulnessofstatisticalprocedures,wewillexploretworeal(andimportant!)datasetsthroughoutthecourseofthebook.OneofthedatasetsiscourtesyofProfessorTimothyBakerattheUniversityofWisconsinCenterforTobaccoResearchandIntervention(whichwewillcalltheSmokingStudy).Thedatawerecollectedtoinvestigateseveralquestionsaboutsmoking,addiction,withdrawal,andhowbesttoquitsmoking.Thedatasetconsistsofasampleof608peoplewhowantedtoquitsmoking.Thesepeoplewererandomlyassigned(seeChapter14forthebenefitsofrandomassignment)tothreegroups.Theparticipantsinonegroupweregiventhedrugbupropion
1Variability
SourcesofVariabilityVariablesandConstants
Populations and samplesStatisticalPopulationsTheProblemofLargePopulationsSamples
descriptive and inferential statistical Procedures
DescriptiveStatisticalProceduresInferentialStatisticalProcedures
MeasurementConsideringMeasurementinaSocial
andPoliticalContextDifferencesAmongMeasurementRulesPropertiesofNumbersUsedas
MeasurementsTypesofMeasurementScalesImportanceofScaleTypes
using Computers to Learn From dataWhatStatisticalAnalysisProgramsCan
DoforYouWhattheProgramsCannotDoforYou
summaryexercises
TermsQuestions
ER9405.indb 1 7/5/07 11:05:55 AM
-
2 Chapter1/WhyStatistics?
SR(Zyban)alongwithnicotinereplacementgum.Inasecondgroup,theparticipantsweregiventhebupropionalongwithaplacebogumthatdidnotcontainanyactiveingredients.Thefinalgroupreceivedbothaplacebodrugandaplacebogum.Themajorquestionofinterestiswhetherpeoplearemoresuccessfulinquittingsmokingwhenthetheactivegumisaddedtothebupropion.Thesedataareexcitingforacoupleofreasons.First,giventhetremendoussocialcostofcigarettesmoking,weasasocietyneedtofigureouthowtohelppeopleovercomethisaddiction,and thesedatado just that.Second, thestudy includedmeasurementsofabout30othervariablestohelpanswerancillaryquestions.Forexample,therearedataonhowlongpeoplehavesmokedandhowmuchtheysmoked;dataonhealthfactorsanddruguse;anddemographicdatasuchasgender,ethnicity,age,education,andheight.ThesevariablesaredescribedmorefullywithintheExcelandSPSSdatafilesontheCDthatcomeswiththisbookandinAppendixA.Thestatisticaltoolsyouwilllearnaboutwillgiveyou theopportunity toexplore thesedata to the fullest extentpossible.Youcanaskimportantquestionssomethatmayneverhavebeenaskedbeforesuchaswhetherdruguseaffectspeoplesabilitytoquitesmoking,andyoucangettheanswers.Inaddition,thesedatawillbeusedtoillustratevariousstatisticalprocedures,andtheywillbeusedintheend-of-chapterexercises.
The second data set is courtesyof Professors JanetHyde and MarilynEssex of theUniversityofWisconsinMadison.ThedatasetisasubsetofthedatafromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork(wewillrefertoitastheMaternityStudy).Thisprojectwasdesignedtoanswerquestionsabouthowhavingababyaffectsfamilydynamicssuchasmaritalsatisfaction,andhowvariousfactorsaffectchilddevelopment.Thedatasetconsistsofmeasurementsof26variablesfor244families.Someofthesevariablesaredemographic,suchasage,education,andfamilyincome.Maritalsatisfactionwasmeasuredseparatelyformothersandfathersbothbeforethechildwasborn(duringthe5thmonthofpregnancy)andatthreetimesafterthebirth(1,4,and12monthspostpartum).Therearealsodataonhowmuchthemotherworkedoutsidethehouseandhowequallyhouseholdtasksweredividedamongthemothersandfathers.Finally,thereareeightmeasuresofthequalityofmotherchildinteractionsat12monthsafterbirth,andthreemeasuresofchildtemperament(forexample,hyperactivity)measuredwhenthechildwas4.5yearsold.ThesevariablesaredescribedmorefullyontheCDthatcomeswiththisbookandinAppendixB.Aswiththesmokingdata,youarefreetousethesedatatoanswerimportantquestions,suchaswhethertheamountoftimethatamotherworksaffectschilddevelopment.
This chapter introducesanumberof topics that arebasic to statistical analyses.Webeginwithadiscussionofvariability,thecauseofmessydata,andmoveontothedistinc-tionsbetweenpopulationandsample,descriptiveandinferentialstatistics,andtypesofmeasurementfoundinthebehavioralsciences.
VAriAbiLity
Thefirststepinlearninghowtolearnfromdataistounderstandwhydataaremessy.Acon-creteexampleisuseful.ConsidertheCESD(CenterforEpidemiologicStudiesDepression)scores from the Smoking Study (see Appendix A). Each participant rated 20questions
ER9405.indb 2 7/5/07 11:05:55 AM
-
Variability 3
suchasIfeltlonelyusingaratingof0(rarelyornoneofthetimeduringthepastweek)to3(mostofthetimeduringthepastweek).Thescoreisthesumoftheratingsforthepar-ticipant.Forthe601participantsforwhomwehaveCESDscores,thescoresrangefrom0to23.Aboutaquarterofthescoresarebelow2,butanotherquarterareabove9.Thesedataaremessyinthesensethatthescoresareverydifferentfromoneanother.
Variability is the statistical term for the degree to which scores (such as thedepressionscores)differfromoneanother.
Chapter3presents statisticalprocedures forpreciselymeasuring thevariability inasetofscores.Fornow,onlyanintuitiveunderstandingofvariabilityisneeded.Whenthescoresdifferfromoneanotherbyquitealot(suchasthedepressionscores),variabilityishigh.Whenthescoreshavesimilarvalues,variabilityislow.Whenallthescoresarethesame,thereisnovariability.
sources of Variability
It iseasyenoughtoseethat theCESDdataarevariable,butwhyaretheyvariable?Ingeneral,variabilityarisesfromseveralsources.Onesourceofvariabilityisindividualdif-ferences:Somesmokersaremoredepressedthanothers;somehavedifficultyreadingandunderstandingtheitemsonthetest;somesmokersanswersontheinventoryaremorehon-estthantheanswersofothersmokers.Thereareasmanypotentialsourcesofvariabilityduetoindividualdifferencesastherearereasonsforwhyonepersondiffersfromanotherinintelligence,personality,performance,andphysicalcharacteristics.
Anothersourceofvariabilityistheprocedureusedincollectingthedata.Perhapssomeofthesmokersweremorerushedthanothers;perhapssomeweretestedattheendofthedayandweremoretiredthanothers.Anychangeintheproceduresusedforcollectingthedatacanintroducevariability.Finally,somevariabilitymaybeduetoconditionsimposedontheparticipants,suchaswhethertheyaretakingtheplacebogum.
Variables and Constants
Variabilitydoesnotoccuronlyintextbookexamples;itischaracteristicofalldatainthebehavioralsciences.Wheneverabehavioralscientistcollectsdata,whetheron the inci-denceofdepression, theeffectivenessofapsychotherapeutic technique,or the reactiontimetorespondtoastimulus,thedatawillbevariable;thatis,notallthescorescollectedwillbethesame.Infact,becausedataarevariable,collectingdataissometimesreferredtoasmeasuringavariable(orarandomvariable).
A variableisameasurementthatchangesfromoneobservationtothenext.
CESD is a variable because it changes from one smoker (observation) to the next.Effectivenessofapsychotherapeutictechniqueisanotherexampleofavariable,becauseagiventechniquewillbemoreeffectiveforsomepeoplethanforothers.
ER9405.indb 3 7/5/07 11:05:55 AM
-
4 Chapter1/WhyStatistics?
Variablesshouldbedistinguishedfromconstants.
Constantsaremeasurementsthatstaythesamefromoneobservationtothenext.
The boiling point of pure water at sea level is an example of a constant. It is always100degreesCentigrade.Whetheryouusealittlewateroralotofwater,whetherthewaterisencouragedtoboilfasterornot,nomatterwhoismakingtheobservation(aslongastheobserveriscareful!),thewateralwaysboilsatthesametemperature.AnotherconstantisNewtonsgravitationalconstant,therateofaccelerationofanobjectinagravitationalfield(whethertheobjectislargeorsmall,solidorliquid,andsoon).
Manyoftheobservationsmadeinthephysicalsciencesareobservationsofconstants.Becauseofthis,itiseasyforthebeginningstudentinthephysicalsciencestolearnfromdata.Asinglecarefulobservationofaconstanttellsthewholestory.
Youmaybesurprisedtolearnthatthereisnotoneconstantinallofthebehavioralsci-ences.Thereisnosuchthingasthe effectivenessofapsychotherapeutictechnique,orthe depressionscore,becausemeasurementsofthesevariableschangefrompersontoperson.Infact,becausewhatisknowninthebehavioralsciencesisalwaysbasedonmeasuringvari-ables,eventhebeginningstudentmusthavesomefamiliaritywithstatisticalprocedurestoappreciatethebodyofknowledgethatcomprisesthebehavioralsciencesandthelimitationsinherentinthatbodyofknowledge.Incaseyouwerewondering,thisiswhyyouaretakinganintroductorystatisticscourse,andyourfriendsmajoringinthephysicalsciencesarenot.
Theconceptofvariabilityisabsolutelybasictostatisticalreasoning,anditwillmoti-vatealldiscussionsoflearningfromdata.Infact,theremainderofthischapterintroducesconceptsthathavebeendevelopedtohelpcopewithvariability.
PoPuLAtioNs ANd sAMPLes
ThepsychologistsstudyingaddictionmightbeinterestedintheCESDscoresofthespe-cificsmokersfromwhomtheycollecteddata.However,itislikelythattheyareinterestedinmorethanjustthoseindividuals.Forexample,theymaybeinterestedintheincidenceofdepressionamongallsmokersinWisconsin,orallsmokersintheUnitedStates,orevenallsmokersintheworld.Becausedepressionisavariablethatchangesfrompersontoperson,the specificobservations cannot reveal everything the researchersmightwant toknowaboutallofthesedepressionscores.
statistical Populations
Astatisticalpopulationisacollectionorsetofmeasurementsofavariablethatsharesomecommoncharacteristic.
OneexampleofapopulationisthesetofCESDscoresofallsmokersinWisconsin.Thesescoresaremeasurementsofavariable(CESD),andtheyhavethecommoncharacteristicofbeingfromaparticulargroupofpeople:smokersinWisconsin.Adifferentstatistical
ER9405.indb 4 7/5/07 11:05:56 AM
-
PopulationsandSamples 5
populationconsistsoftheCESDscoresforsmokersintheUnitedStates.And,averydif-ferentpopulationconsistsofthemaritalsatisfactionscoresfornewmotherswhoworkfull-timeoutsideofthehome.Thepointisthatyoushouldnotthinkofstatisticalpopulationsasgroupsofpeople,suchasthepeopleintheUnitedStates.ThereisonlyonepopulationofpeoplefortheUnitedStates,butthereareaninfinitenumberofstatisticalpopulationsdependingonwhatvariablesaremeasured(forexample,CESDormaritalsatisfaction),andhowthosescoresmightbegrouped(forexample,smokersorworkingmothers).
Thinkingofstatisticalpopulationsassetsofmeasurementsmayappearcoldandunfeel-ing.Nonetheless,thinkingthiswayhasatremendousadvantageinthatitfacilitatestheapplicationofthesamestatisticalproceduretoavarietyofpopulations.Insteadofhavingto learnone technique for analyzingand learning fromdepression scores, and anothertechniqueforanalyzingIQscores,andyetanotherforanalyzingerrorsratsmakeinlearn-ingmazes,manyofthesameprocedurescanbeappliedinallofthesecases.Ineverycasewearedealing(statistically)withthesamestuff,asetofmeasurements.
Unfortunately, thinkingofstatisticalpopulationsassetsofnumberscancausesomepeopletobecomeboredandloseinterestintheenterprise.Thewaytocounterthisbore-dom is to remember that the statistical procedures areoperatingonnumbers that havemeaning:Thenumbersarescoresthatrepresentsomethinginterestingabouttheworld(forexample,theincidenceofdepressioninsmokers).Asyoureadthroughthisbook,thinkaboutapplyingyournewknowledgetoproblemsthatareofinteresttoyou,andnotjustasmanipulationofnumbers.
the Problem of Large Populations
Somestatisticalpopulationsconsistofamanageablenumberofscores.Usually,however,statisticalpopulationsareverylarge.Forexample,therearepotentiallymillionsofCESDscoresofsmokers.Whendealingwithlargepopulations,itisdifficultandtimeconsum-ingtoactuallycollectallofthescoresinthepopulation.Sometimes,forethicalreasons,allthescoresinthepopulationcannotbeobtained.Forexample,supposethatamedicalresearcherbelievesthatshehasdiscoveredadrugthatsafelyandeffectivelyreduceshighbloodpressure.Oneway todetermine thedrugseffectiveness is toadminister it toallpeoplesufferingfromhighbloodpressureandthentomeasuretheirbloodpressures.(Thepopulationofinterestconsistsofthebloodpressurescoresofpeoplesufferingfromhighbloodpressurewhohavetakenthenewdrug.)Clearly,thiswouldbetimeconsumingandexpensive.Itwouldalsobeveryunethical.Afterall,whatifthemedicalresearcherwerewrong,andthedrugdidmoreharmthangood?Also,evenwithagreatnationaleffort,notallthescorescouldbecollected,becausesomeofthepeoplewoulddiebeforetheytookthedrug,otherswouldhavetheirbloodpressuresloweredbyotherdrugs,andotherswoulddevelophighbloodpressureoverthecourseofdatacollection.
Weappeartohaverunacrossaproblem.Usually,wearenotinterestedinjustafewscores,butinallthescoresinapopulation.Yet,becausebehavioralscientistsareinter-estedinlearningaboutvariables(notconstants),itisimpossibletoknowforsureaboutallthescoresinapopulationfrommeasuringjustafewofthem.Ontheotherhand,itistimeconsumingandexpensivetocollectallthescoresinapopulation,anditmaybeunethicalorimpossible.Whattodo?
ER9405.indb 5 7/5/07 11:05:56 AM
-
6 Chapter1/WhyStatistics?
samples
The solution to this problem is provided by statistical procedures based on samplingfrompopulations.
A sampleisasubsetofmeasurementsfromapopulation.
Thatis,asamplecontainssome,butusuallynotall,ofthescoresinthepopulation.The608CESDscoresareasamplefromthepopulationofCESDscoresofallsmokers.
Animportanttypeofsampleisarandomsample.
A random sampleisselectedsothateveryscoreinthepopulationhasanequalchanceofbeingincluded.
Whetherasampleisrandomornotdoesnotdependontheactualscoresincludedinthesample,butonhowthescoresinthesampleareselected.Onlyifthescoresareselectedinsuchawaythateachscoreinthepopulationhasanequalchanceofbeingincludedinthesampleisthesamplearandomsample.TheCESDscoresarenotarandomsampleofCESDscoresofallsmokers.ThesescoresareonlyfrompeoplelivinginMadisonandMilwaukee,Wisconsin,andtherewasnoattempttoensurethatCESDscoresofpeoplelivingelsewherewereincluded.ProceduresforproducingrandomsamplesarediscussedinChapter5.
AsyouwillseeinChapters522,randomsamplesareusedtohelpsolvetheproblemoflargepopulations.Thatis,withthedatainarandomsample,wecanlearnaboutthepopu-lationfromwhichthesamplewasobtainedbyusinginferentialstatisticalprocedures.
desCriPtiVe ANd iNFereNtiAL stAtistiCAL ProCedures
descriptive statistical Procedures
Becauseofvariability,inordertolearnanythingfromdata,thedatamustbeorganized.
descriptive statistical proceduresareusedtoorganizeandsummarizethemeas-urementsinsamplesandpopulations.
Inotherwords,descriptivestatisticalproceduresdowhatthenameimpliestheydescribethedata.Theseprocedurescanbeappliedtosamplesandtopopulations.Mostoften,theyareappliedtosamples,becauseitisraretohaveallthescoresinapopulation.
Descriptivestatisticalproceduresincludewaysoforderingandgroupingdataintodis-tributions(discussedinChapter2)andwaysofcalculatingsinglenumbersthatsummarizethewholesetofscoresinthesampleorpopulation(discussedinChapters2and3).Somedescriptivestatisticalproceduresareusedtorepresentdatagraphically,becauseasevery-oneknows,apictureisworthathousandwords.
ER9405.indb 6 7/5/07 11:05:56 AM
-
DescriptiveandInferentialStatisticalProcedures 7
inferential statistical Procedures
Themostpowerfultoolsavailabletothestatisticianareinferentialstatisticalprocedures.
inferential statistical procedures are used to make educated guesses (infer-ences)aboutpopulationsbasedonrandomsamplesfromthepopulations.
Theseeducatedguessesarethebestwaytolearnaboutapopulationshortofcollectingallofthescoresinthepopulation.
Allofthismaysoundabitlikemagic.Howcanyoupossiblylearnaboutawholepopu-lationthatmaycontainmillionsandmillions(or,theoretically,aninfinity)ofscoresbyexaminingasmallnumberofscorescontainedinarandomsamplefromthatpopulation?Itisnotmagic,however,anditisevenunderstandable.PartIIofthisbookpresentsadetaileddescriptionofhowinferentialstatisticalprocedureswork.
Inferentialstatisticalproceduresaresopervasiveinoursocietythatyouhaveundoubt-edlyreadabout themandmadedecisionsbasedonthem.Forexample, thinkabout thelasttimeyouheardtheresultsofanopinionpoll,suchasthepercentagesoftheregisteredvoterswhofavorCandidatesA,B,orC.Supposedly,youropinion is included in thosepercentages(assumingthatyouarearegisteredvotersothatyouropinionisincludedinthepopulation).Butonwhatgroundsdoesthepollsterpresumetoknowyouropinion?Itisasafebetthatonlyrarely,ifever,hasapollsteractuallycontactedyouandaskedyouyouropinion.Instead,thepercentagesreportedinthepollareeducatedguessesbasedoninferentialstatisticalproceduresappliedtoarandomsample.
Inrecentyears,ithasbecomefashionableforthebroadcastandprintmediatoacknowl-edge thatconclusions fromopinionpollsareeducatedguesses (rather thancertainties).Thisacknowledgmentisintheformofamarginoferror.Themarginoferrorishowmuchthereportedpercentagesmaydifferfromtheactualpercentagesinthepopulation(seeChapter11fordetails).
Anotherexampleoftheimpactofinferentialstatisticalproceduresonourdailylivesisinourchoicesoffoodsandmedicines.ManynewfoodadditivesandmedicinesaretestedforsafetyandapprovedbygovernmentagenciessuchastheFoodandDrugAdministra-tion(FDA).ButhowdoestheFDAknowthatthenewproductissafeforyou?Infact,theFDAdoesnotknowforsure.Thedecisionthatanewdrugissafeisbasedoninferentialstatistical procedures. The FDA example raises several sobering issues about the datausedbygovernmentagencies to set standardsonwhichour lives literallydepend. It isonlyrecentlythatgovernmentagencieshaveinsistedthatdatabecollectedfromwomen,andwithoutsuchdata,itisuncertainifaparticulardrugisactuallysafeoreffectiveforwomen.TheterriblebirthdefectsattributedtothedrugThalidomideoccurredbecausenoonehadbotheredtocollectthedatathatwouldverifythesafetyofthedrugwithpregnantwoman.Similarly,verylittledataonsafelevelsofenvironmentalpollutantssuchasPCBsandpesticideshavebeencollectedfromchildren.Consequently,oursocietymaybeset-tingthesceneforadisasterbyallowingintotheenvironmentchemicalsthatarerelativelysafeforadultsbutdisastrousforchildrenwhoseimmunesystemsareimmatureandwhoserapidlydevelopingbrainsaresensitivetodisruptionbychemicals.1
1 Foranexcellentdiscussionoftheseissues,seeC.F.Moore(2003),Silent scourge.NewYork:OxfordUniversityPress.
ER9405.indb 7 7/5/07 11:05:57 AM
-
8 Chapter1/WhyStatistics?
Thefinalexampleoftheuseofinferentialproceduresisthebehavioralsciencesthem-selves.Mostknowledgeinthebehavioralsciencesisderivedfromdata.Thedataareana-lyzedusinginferentialstatisticalprocedures,becauseinterestisnotconfinedtojustthesampleofscores,butextends to thewholepopulationofscoresfromwhichthesamplewasselected.Ifyouaretounderstandthedataofthebehavioralsciences,thenyouneedtounderstandhowstatisticalprocedureswork.
MeAsureMeNt
Dataarecollectedbymeasuringavariable.Butwhatdoesitmeantomeasureavariable?
Measurementistheuseofaruletoassignanumbertoaspecificobservationofavariable.
Asanexample, thinkaboutmeasuringthelengthofyourdesk.Theruleformeasuringlength is, Assign anumber equal to thenumberof lengthsof a standard ruler thatfitexactlyfromoneendofthedesktotheother.Inthisexample,thevariablebeingmeas-uredislength.Theobservationisthelengthofaspecificdesk,yourdesk.Theruleistoassignavalue(forexample,4feet)equaltothenumberoflengthsofastandardrulerthatfitfromoneendofthedesktotheother.
Asanotherexample,considermeasuringtheweightofanewbornbaby.Thevariablebeingmeasuredisweight.Thespecificobservationistheweightofthespecificbaby.Themeasurement rule is something like,Put thebabyononesideofabalancescale,andassigntothatbabyaweightequaltothenumberofpoundweightsplacedontheothersideofthescaletogetthescaletobalance.
Measuringvariablesinthebehavioralsciencesalsorequiresthatweusearuletoassignnumbers to observationsof a variable.For example, oneway tomeasuredepression istoassignascoreequaltothesumoftheratingsoftheCESDquestions.Thevariableisdepression,thespecificobservationisthedepressionofthepersonbeingassessed,andtheruleistoassignavalueequaltothesumoftheratings.Similarly,measuringintelligencemeansassigninganumberbasedon thenumberofquestionsansweredcorrectlyonanintelligencetest.
Considering Measurement in a social and Political Context
Thechoiceofwhatvariablestomeasureinastudyisnoaccident;usuallythosechoicesentail a lot of discussion andplanning, and areoften influencedby social or politicalmotivesoftheresearcher.Themeasurementrules,aswell,usuallyinvolvemuchdiscus-sion,butthedetailsarerarelystatedinastudysresults.Attheveryleast,theresusuallysome ambiguity. Take, for example, the LONG variable in the Smoking Study, whichmeasures the longest timewithout smoking.Lets say thata studyparticipantanswers8months,whichwouldresultinascoreof7(612months).But,ifweprobefurther,
ER9405.indb 8 7/5/07 11:05:57 AM
-
Measurement 9
wemayfindthattheparticipantactuallyanswered:Well,Ididntsmokefor4months,butthenonenightIhadonecigarette,andthendidnthaveanotherfor4months.Isay8monthsbecauseitwasjustaminorslip-up.Isthelongesttimewithoutsmokingforthisindividual8monthsor4months?Issmokingdefinedasonecigaretteoronedragor buying a pack? If the researcher is interested in the effectiveness of a particularantismokingprogram,shemaygivethisparticipantabreakandcountitas8months,becauseclearly,toher,thisparticipantdidntrelapse(itwasonlyonecigarette,afterall).Adifferentresearcher,interestedinshowingthatalladdictswindupusingagain(relaps-ing)mightsaythatonecigaretteconstitutesarelapse,andscorethisas4months.Politi-calmotivesmayenterastudyinthiswaybecauseforsomepeopletheonlysolutionfordrugaddictionmaybeabstinence(forexample,AlcoholicsAnonymous),butforothers,recreationaldrugusemaybeseenasOKincertainsituations(forexample,harmreduc-tionapproaches).Inaddition,aresearchersgrantfundingmaybedependentonhavingand solving a social problem, and maybe even a growing problem, even though theproblemisnotasbigasonemightthink.Therefore,weshouldremaincriticalofhowpsychologistsmeasureandcontemplatewhatmighthavebeenincludedandwhatmighthavebeenleftout.
differences Among Measurement rules
All rules formeasuringvariables arenot equallygood.Theydiffer in three importantways.First,theydifferinvalidity.
Validityreferstohowwellthemeasurementruleactuallymeasuresthevariableunderconsiderationasopposedtosomeothervariable.
Some intelligence tests are better than others because they measure intelligence ratherthan(accidentally)being influencedbycreativityormemoryfor trivia.Similarly,somemeasuresofdepressionarebetterthanothersbecausetheymeasuredepressionratherthanintroversionoraggressiveness.
Measurementrulesalsodifferinreliability.
reliabilityisanindexofhowconsistentlytheruleassignsthesamenumbertothesameobservation.
Forexample,anintelligencetestisreliableifittendstoassignthesamenumbertoindi-vidualseachtimetheytakethetest.Booksonpsychologicaltestingdiscussvalidityandreliabilityindetail.2
Finally,athirddifferenceamongmeasurementrulesisthatthepropertiesofthenum-bersassignedasmeasurementsdependontherule.Atfirstblush,thisstatementmaysoundlikenonsense.Afterall,numbersarenumbers;howcantheirpropertiesdiffer?
2 AclassictextisA.Anastasi(1988),Psychological testing(6thed.).NewYork:Macmillan.
ER9405.indb 9 7/5/07 11:05:57 AM
-
10 Chapter1/WhyStatistics?
Properties of Numbers used as Measurements
Whennumbersaremeasurements,theycanhavefourproperties.Thefirstoftheseisthecategoryproperty.
Thecategory propertyisthatobservationsassignedthesamenumberareinthesamecategory,andobservationsassigneddifferentnumbersareindifferentcategories.
For example, suppose that you are collecting data on the types of cars that Americancitizensdrive,andyouaremostinterestedinthecountryinwhichthecarsweremanu-factured.Youcouldmeasurethecountryofmanufacture(thevariable)byusingthefol-lowingruletoassignnumberstoobservations:IfthecarwasmanufacturedintheUnitedStates,assignita1;ifmanufacturedinJapan,assignita2;ifinGermany,a3;ifinFrance,a4;ifinItaly,a5;andifmanufacturedanywhereelse,a0.Thesenumbershavethecat-egorypropertybecauseeachobservationassignedthesamenumber(forexample,2)isinthesamecategory(madeinJapan).
Thesecountry-of-manufacturenumbersaredifferentfromthenumbersthatweusuallyencounter.Typically,assigninganumber toanobservation (say,ObservationA)meansmorethanjustassigningobservationAtoaspecificcategory.Forexample,ifObservationAisassignedavalueof1andObservationBisassignedavalueof2,itusuallymeansthatObservationAisshorter,lighter,orlessvaluablethanObservationB.Thisisnotthecaseforthemeasurementsofcountryofmanufacture.AcarmanufacturedintheUnitedStates(andassignedanumber1)isnotnecessarilyshorter,lighter,orlessvaluablethanacarmanufacturedinJapan(assignedthenumber2).Thepointis,howweinterpretthemeasurementsdependsonthepropertiesofthenumbers,whichinturndependontheruleusedinassigningthenumbers.
Measurementshavetheordinal propertywhenthenumberscanbeusedtoordertheobservationsfromthosethathavetheleastofthevariablebeingmeasuredtothosethathavethemost.
Consideranotherexample.Supposethatasocialpsychologist investigatingcoopera-tionhasapreschoolteacherrankthefourpupilsintheclassfromleastcooperative(first)tomostcooperative(fourth).Thesecooperationscores(ranks)havetwoproperties.First,thescoreshavethecategoryproperty,becausechildrenassigneddifferentscoresareindifferentcategoriesofcooperation.Second,thescoreshavetheordinalpropertybecausethescorescanbeusedtoordertheobservationsfromthosethathavetheleasttothosethathavethemostcooperation.Itisonlywhenmeasurementshavetheordinalpropertythatweknowthatobservationswithlargermeasurementshavemoreofwhateverisbeingmeasured.
Athirdpropertythatmeasurementsmayhaveistheequalintervalsproperty.
Theequal intervals propertymeansthatwhenevertwoobservationsareassignedmeasurements thatdifferbyexactlyoneunit, thereisalwaysanequal interval(difference)betweentheobservationsintheactualvariablebeingmeasured.
ER9405.indb 10 7/5/07 11:05:58 AM
-
Measurement 11
Tounderstandwhatismeantbyequalintervals,consideragainmeasuringthecoopera-tivenessofthefourpreschoolchildren.Thefourchildren(callthemAlana,Bob,Carol,andDan)havecooperationscoresof1,2,3,and4.ThedifferencebetweenAlanascoop-erationscore(1)andBobscooperationscore(2)is1.Likewise, thedifferencebetweenCarolscooperationscore(3)andDanscooperationscore(4)is1.Theimportantquestioniswhethertheactualdifferenceincooperation(notjustthescore)betweenAlanaandBobequalstheactualdifferenceincooperationbetweenCarolandDan.
ItisveryunlikelythatthedifferenceincooperationbetweenAlanaandBobequalsthedifferenceincooperationbetweenCarolandDan.Theteachersimplyrankedthechildrenfromleasttomostcooperative.Theteacherdidnottakeanyprecautionstoensureequalintervals.AlanaandBobmaybothbeveryuncooperative,withBobbeingjustabitmorecooperativethanAlana(theactualdifferenceincooperationbetweenAlanaandBobisabit).Carolmayalsobeontheuncooperativeside,butjustabitmorecooperativethanBob(theactualdifferencebetweenCarolandBobisabit).Suppose,however,thatDanistheteachershelperandisverycooperative.Inthiscase,thedifferenceincooperationbetweenCarolandDanmaybeverylarge,muchlargerthanthedifferenceincoopera-tionbetweenAlanaandBob.Becausethedifferencesinscoresareequal(thedifferenceincooperationscoresbetweenAlanaandBobequalsthedifferenceincooperationscoresbetweenCarolandDan),butthedifferencesinamountofcooperation(thevariable)arenotequal,thesecooperationscores do nothavetheequalintervalproperty.
NowconsiderusingarulertomeasurethelengthsofthefourlinesinFigure1.1.ThelinesA,B,C,andDhavelengthsof1,2,3,and6centimeters,respectively.Usingarulertomeasurelengthgeneratesmeasurementswiththeequalintervalsproperty:Foreachpairofobservationsforwhichthemeasurementsdifferbyexactlyoneunit,thedifferencesinlengthareexactlyequal.Thatis,themeasurementsassignedlinesA(1)andB(2)differbyone,asdothemeasurementsassignedlinesB(2)andC(3);and,importanttonote,theactualdifferenceinlengthsbetweenlinesAandBexactlyequalstheactualdifferenceinlengthbetweenlinesBandC.
FiGure 1.1Length measured using two different measurement rules.
Length measured using a ruler:Length measured using ranks:
A
B
C
D
1 2 3 62 3 41
ER9405.indb 11 7/5/07 11:05:59 AM
-
12 Chapter1/WhyStatistics?
Adifficultyinunderstandingtheequalintervalspropertyisinmaintainingthedistinc-tionbetweenthevariablebeingmeasured(lengthorcooperation)andthenumberassignedasameasurementofthevariable.Thenumbersrepresent orstandforcertainpropertiesofthevariable.Thenumbersarenotthevariableitself.Thenumber1isnomorethecoopera-tionofAlana(itisameasureofhercooperation)thanisthenumber1theactuallengthoflineA(itisameasureofitslength).Whetherornotthemeasurementshavepropertiessuchasequalintervalsdependsonhowthenumbersareassignedtorepresentthevariablebeingmeasured.Usingarulertomeasurelengthofadeskassignsnumbersthathavetheequalintervalsproperty;usingrankingstomeasurecooperationofpreschoolchildrenassignsnumbersthatdonothavetheequalintervalsproperty.
Thedifferencebetweenthelengthandcooperationexamplesisnotinwhatisbeingmeas-ured,butintheruleusedtodothemeasuring.Arankingrulecanbeusedtomeasurethelengthsoflines(thisiswhatwedowhenweneedaroughmeasureoflengthcomparetwolengthstoseewhichislonger).Inthiscase,themeasuredlengthsoflinesA,B,C,andDwouldbe1,2,3,and4,respectively(seeFigure1.1).Thesemeasurementsoflengthdonot havetheequalintervalsproperty,becauseforeachpairofobservationsforwhichthemeas-urementsdifferbyexactlyoneunit,therealdifferencesinlengtharenotexactlyequal.
Thefourthpropertythatmeasurementsmayhaveistheabsolutezeroproperty.
Theabsolute zero propertymeansthatavalueofzeroisassignedasameasure-mentonlywhenthereisnothingatallofthevariablethatisbeingmeasured.
Whenlengthismeasuredusingaruler(ratherthanranks),thescoreofzeroisanabsolutezero.Thatis,thevalueofzeroisassignedonlywhenthereisnolength.Whenmeasur-ingcountryofcarmanufacture,zeroisnotanabsolutezero.Inthatexample,zerodoesnotmeanthatthereisnocountryofmanufacture,onlythatthecountryisnottheUnitedStates,Japan,Germany,France,orItaly.
Another exampleof ameasurement scale thatdoesnothaveanabsolute zero is theFahrenheit(orCentigrade)scaleformeasuringtemperature.Atemperatureof0Fdoesnotmeanthatthereisnoheat.Infact,thereisstillsomeheatattemperaturesof10F,20F,andsoon.Becausethereisstillsomeheat(thevariablebeingmeasured)whenzeroisassignedasthemeasurement,thezeroisnotanabsolutezero.3
types of Measurement scales
Inadditiontothefourpropertiesofmeasurements(category,ordinal,equalintervals,andabsolutezero), therearefourtypesofmeasurementrules(orscales),determinedbythepropertiesofthenumbersassignedbythemeasurementrules.
A nominal scaleisformedwhenthenumbersassignedbythemeasurementrulehaveonlythecategoryproperty.
3 TheKelvinscaleoftemperaturedoeshaveanabsolutezero.Onthisscale,0meansabsolutelynoheat.ZerodegreesKelvinequals459.69F.
ER9405.indb 12 7/5/07 11:05:59 AM
-
Measurement 13
Nominalcomesfromtheword name.Thenumbersassignedusinganominalscalenamethecategorytowhichtheobservationbelongsbutindicatenothingelse.Thus,themeas-urementsofcountryofmanufactureofcarsformanominalscale,becausethenumbersnamethecategory(country),buthavenootherproperties.
SeveralofthevariablesintheSmokingStudyaremeasuredusingnominalscales.Forexample,TYPCIG(typeofcigarettesmoked)ismeasuredusinganominalscaledefinedas1=regularfilter;2=regularnofilter;3=light;4=ultralight;5=other.Anothernomi-nallymeasuredvariableisSPOUSE,thatis,whetherthesmokersspousesmokes(1)ordoesnotsmoke(0).TheGENDERvariableintheMaternityStudy(isthechildmaleorfemale)isalsomeasuredusinganominalscale.
An ordinal scale is formedwhen themeasurement rule assignsnumbers thathavethecategoryandtheordinalproperties,butnootherproperties.
Manyof thevariables in theSmoking andMaternity studies aremeasuredusingordi-nal scales.The longest timewithout smoking (LONG)variable ismeasuredas1=lessthanaday;2=17days;3=814days;4=15daystoamonth;5=13months;6=36months; 7=612months; 8=more thanayear.As the assigned score increases from1to8, the length of time without smoking increases, so the numbers have the ordinalproperty.However,thedifferencebetweenameasurementof1and2(LONG1LONG2=about3days)isnotcomparabletoadifferencebetweenameasurementof,say,5and6(LONG5LONG6=about3months),thusthemeasurementsdonothavetheequalinter-valsproperty.TheresearchersmighthaveattemptedtomeasureLONGusingaratioscalebyaskingparticipantstoestimatethelongestnumberofdayswithoutsmoking,from0tothousandsofdays.Unfortunately,peoplesestimatesareoftencloudedbyfaultymemoryprocessesandfaultyestimates.OnepersonwhoknowsthathequitonceformorethanayearmightestimateLONGas500days.Anotherpersonwhohadbeenabstinentforthesameamountoftime,butwhocantrememberwhetherhequitintheyear2001or1999,andwhocantquiterememberhowtotranslateyearsintodays,mightestimateLONGas10,000days.Thus,thesemeasurementsarenotasvalidorreliableasthesimplerordinalmeasurementsoftheLONGscale.
Many behavioral scientists (and businesses that conduct marketing research) collectdatabyhavingpeoplerateobservationsforspecificqualities.Forexample,aclinicalpsy-chologistmaybeaskedtoratetheseverityofhispatientspsychopathologiesonascalefrom1(extremelymild)to10(extremelysevere).Asanotherexample,aconsumermaybeaskedtoratethetasteofanewicecreamfrom1(awful)to100(sublime).Inbothcases,themeasurementsrepresentordinalproperties.Fortheclinicalpsychologist,thelargernum-bersrepresentmoreseverepsychopathologythanthesmallernumbers;fortheice-creamraters,thelargernumbersrepresentbetter-tastingicecreamthanthesmallernumbers.Inneitherexample,however,do themeasurementshave theequal intervalsproperty.As a general rule, ratings and rankings form ordinal scales.
Thethirdtypeofscaleistheintervalscale.
interval scalesareformedwhenthenumbersassignedasmeasurementshavethecategory,ordinal,andequalintervalsproperties,butnotanabsolutezero.
ER9405.indb 13 7/5/07 11:05:59 AM
-
14 Chapter1/WhyStatistics?
TwoexamplesofintervalscalesaretheFahrenheitandCentigradescalesoftemperature.Neitherhasanabsolutezerobecause0(ForC)doesnotmeanabsolutelynoheat.Themeasurementsdohavethecategoryproperty(allobservationsassignedthesamenumberofdegreeshavethesameamountofheat),theordinalproperty(largernumbersindicatemoreheat),andtheequalintervalsproperty(onaparticularscale,adifferenceof1alwayscorrespondstoaspecificamountofheat).
Manypsychologicalvariablesaremeasuredusingscalesthatarebetweenordinalandintervalscales.ThisstatementholdsformanyofthevariablesincludedintheMaternityStudy,suchasmaritalsatisfaction(forexample,M1MARSAT),motherspositiveaffectduringfreeplay(MPOS),infantdysregulationduringfreeplay(IDYS),andchildsinter-nalizingbehaviorduringfreeplay(M7INT).ConsiderM7INTinalittlemoredetail.Tomeasure thevariable,amotherwasasked torateherchildsbehavior in regard toninequestionssuchas,Tends tobe fearfulorafraidofnew thingsornewsituations.Theratingscalewas0=doesnotapply;1=sometimesapplies;2=frequentlyapplies.Thus,theratingofeachquestionformsanordinalscalewithouttheequalintervalsproperty.ButwhathappenswhenwesumtheratingsfromninequestionstogettheM7INTscore?ItisunlikelythatthedifferenceininternalizingbehaviorbetweenM7INT10andM7INT11isexactlythesameasthedifferencebetween,say,M7INT20andM7INT21.Nonetheless,itmaywellbethatthesetwodifferencesininternalizingbehaviorarefairlycomparable,thatis,thatthescaleisclosetohavingtheequalintervalsproperty.
Theconservative(andalwayscorrect)approachtotheseinbetweenscalesistotreatthemasordinalscales.AswewillseeinPartII,however,ordinalscalesareatadisadvan-tagecomparedtointervalscaleswhenitcomestotherangeandpowerofstatisticaltech-niquesthatcanbeappliedtothedata.Recognizingthisdisadvantage,manypsychologiststreatthedatafromthesein-betweenscalesasintervaldata;thatis,theytreatthedataasifthemeasurementswerecollectedusinganintervalscale.Oneruleofthumbisthatscoresfromthemiddleofanin-betweenscalearemorelikelytohavetheequalintervalspropertythanscoresfromeitherend.Ifthedataincludescoresfromtheendsofanin-betweenscale,itisbesttotreatthedataconservativelyasordinal.
Manyscalesformeasuringphysicalqualities(length,weight,time)areratioscales.
A ratio scaleisformedwhenthenumbersassignedbythemeasurementrulehaveallfourproperties:category,ordinal,equalinterval,andabsolutezero.
Thereasonforthenameratioisthatstatementsaboutratiosofmeasurementsaremean-ingfulonlyonaratioscale.Itmakessensetosaythatalinethatis2.5centimeterslongishalf(aratio)thelengthofa5-centimeterline.Similarly,itmakessensetosaythat20sec-ondsistwice(aratio)thedurationof10seconds.
Ontheotherhand,itdoesnotmakesensetosaythat68Fistwiceashotas34F.ThisiseasilydemonstratedbyconvertingtoCentigrademeasurements.Supposethatthetem-peratureofObjectAis34F(correspondingto1C)andthatthetemperatureofObjectB is68F(corresponding to20C).Comparing theamountofheat in theobjectsusingtheFahrenheitmeasurementsseemstoindicatethatObjectBistwiceashotasObjectA,because68istwice34.ComparingthemeasurementsontheCentigradescale(whichofcoursedoesnotchangetherealamountofheatintheobjects),itseemsthatObjectBis20timesashotasObjectA.ObjectBcannotbe20timesashotasobjectAandatthesametimebetwiceashot.Theproblemisthatstatementsaboutratiosarenotmeaningfulunless
ER9405.indb 14 7/5/07 11:06:00 AM
-
UsingComputerstoLearnFromData 15
themeasurementsaremadeusingaratioscale.Neitherratio(2:1or20:1)isright,becauseneithersetofmeasurementswasmadeusingaratioscale.
Thisproblemdoesnotoccurwhenusingaratioscale.A5-centimeter(2-inch)lineistwiceaslongasa2.5-centimeter(1-inch)line,andthatistruewhetherthemeasurementsaremadeincentimeters,inches,oranyotherratiomeasurementoflength.
SeveralvariablesintheSmokingStudyaremeasuredusingratioscales.Oneexampleisthecarbonmonoxidelevelattheendoftreatmentmeasuredinpartspermillion(CO_EOT),andanotheristhenumberoftimestheparticipanthastriedtoquitsmoking(QUIT).
importance of scale types
Thequestionthatmaybeuppermostinyourmindis,Sowhat?Therearethreereasonswhyknowingaboutscaletypesisimportant.First,nowthatyouknowaboutscaletypesyouwillbelesslikelytomakeunsupportablestatementsaboutdata.Onesuchstatementistheuseofratiocomparisonswhenthedataarenotmeasuredusingaratioscale.Forexam-ple,considerateacherwhogivesaspellingtestandobservesthatAlicespelled10wordscorrectly,whereasBillspelledonly5wordscorrectly.Certainly,AlicespelledtwiceasmanywordscorrectlyasdidBill.Nevertheless,thenumberofwordscorrectonaspellingtestisnotaratiomeasurementofspellingability(zerowordscorrectdoesnotnecessarilymeanzerospellingability).So,althoughit isperfectlycorrecttosaythatAlicespelledtwiceasmanywordscorrectlyasdidBill,itissillytosaythatAliceistwiceasgoodaspellerasisBill.Similarly,itisnotlegitimatetoclaimthatachildwithaninternalizingscore(M7INT)of20internalizestwiceasmuchasachildwithascoreof10.
Second,thetypesofdescriptivestatisticalproceduresthatcanbeappliedtodatadependin part on the scale type. Although some types of descriptions can be applied to dataregardlessofthescaletype,othersareappropriateonlyforintervalorratioscales,andstillothersareappropriateforordinal,interval,andratioscales,butnotnominalscales.
Third,thetypesofinferentialstatisticalproceduresthatcanbeappliedtodatadependinpartonthemeasurementscale.
Giventhesethreereasons,itisclearthatifyouwanttolearnfromdatayoumustbeabletodeterminewhatsortofscalewasusedincollectingthedata.Theonlywaytoknowthescaletypeistodeterminethepropertiesofthenumbersassignedusingthatrule.Iftheonlypropertyofthemeasurementsisthecategoryproperty,thenthedataarenominal;ifthemeasurementshaveboththecategoryandordinalproperties,thenthedataareordinal;if,inaddition,thedatahavetheequalintervalproperty,thenthedataareinterval.Onlyifthedatahaveallfourpropertiesaretheyratio.
Nowthatyouunderstandtheimportanceofscaletypes,itmaybehelpfultoreadthissectionagain.Yourabilitytodistinguishamongscaletypeswillbeusedthroughoutthistextbookandinallofyourdealingswithbehavioraldata.
usiNG CoMPuters to LeArN FroM dAtA
Dataanalysisofteninvolvessomeprettytediouscomputations,suchasaddingcolumnsofnumbers.Muchofthisdrudgerycanbeeliminatedbyusingacomputerprogramsuch
ER9405.indb 15 7/5/07 11:06:00 AM
-
16 Chapter1/WhyStatistics?
asExcel,andLearning From Dataiswrittentobeusedwiththatprogram.TheCDthatcomeswiththisbookprovidesthefilesthatyourExcelprogramrequirestomeshwiththebook.First,openuptheReadMefileandfollowthe instructionsfor loadingtheExcelAdd-ins.TheseAdd-insprovidecomputerroutinesthatexactlymatchthoseusedinthebook.Second,ifyouarenotfamiliarwithbasicExceloperations(e.g,forenteringdataina spreadsheetor for selecting rowsandcolumns),youshould run theExcel tutorial.Third,theCDincludesnumerousdatafiles.TwolargedatafilesprovidethedatafromtheMaternityandSmokingstudies.Otherdatafilesprovidethedatausedinallofthemajorworked-outexamplesandtheend-of-chapterexercises.
what statistical Analysis Programs Can do for you
Theprogramshavetwomainbenefits.First,theyeliminatethedrudgeryofdoinglotsofcalculations.Second,theyensureaccuracyofcalculation.Abenefitthatflowsfromthesetwoisthattheprogramsmakeiteasytoexploredatabyconductingmultipleanalyses.
what the Programs Cannot do for you
Almosteverythingthatisimportantisnot donebytheprograms.Theessenceofstatisticalanalysisischoice(choosingtherightstatisticalmethodandinterpretationoftheoutcomeof the chosen method). The programs cannot choose the appropriate methods for you.Similarly,theprogramsdonotknowwhetheradatasetisasample,arandomsample,orapopulation.Consequently,theprogramcannotadequatelyinterprettheoutput.Learning From Datateachesyouhowtomakegoodchoicesandhowtointerprettheoutcomeofthestatisticalmethods;thecomputereliminatesthedrudgery.
Because the computer program does the calculations, you might think that you canignoretheformulasinthetext.Thatwouldbeabigmistakeforseveralreasons.First,forsmallsetsofdataitiseasiertodocalculationsbyhand(orusingacalculator)ratherthanusingacomputer.Buttodothecalculationsbyhand,youneedtoknowtheformulas.Sec-ond,followingtheformulasisoftenthebestwaytofigureoutexactlywhatthestatisticaltechniqueisdoingandhowitworks.Workingthroughtheformulascanbehardintellec-tuallabor,butthatistheonlywaytounderstandwhattheydo.
suMMAry
Thebehavioralsciencesarebuiltonafoundationofdata.Unfortunately,becausebehav-ioraldataconsistofmeasurementsofvariables,individualmeasurementswilldifferfromoneanothersothatnoclearpictureisimmediatelyevident.Fortunately,wecanlearnfromvariabledatabyapplyingstatisticalprocedures.
Descriptivestatisticalproceduresorganize,describe,andsummarizedata.Descriptivestatisticalprocedurescanbeappliedtosamplesortopopulations,butbecausewerarelyhaveall thescoresinapopulation,descriptiveproceduresaregenerallyappliedtodata
ER9405.indb 16 7/5/07 11:06:00 AM
-
Exercises 17
fromsamples.Weuseinferentialstatisticalprocedurestomakeeducatedguesses(infer-ences)aboutapopulationofscoresbasedonarandomsampleofscoresfromthepopula-tion.Althoughtheseinferencesarenoterror-free,appropriateuseofinferentialstatisticalprocedurescanreducethechanceoferrortoacceptablelevels(forexample,themarginoferrorinapoll).
Appropriatenessofastatisticalproceduredependsinpartonthetypeofmeasurementscaleusedincollectingthedata.Themeasurementscaleisdeterminedbythepropertiesof thenumbers (assignedby themeasurement). If themeasurementshave thecategory,ordinal,equal interval,andabsolutezeroproperties, thenaratioscaleisformed; if themeasurementshaveallbuttheabsolutezeroproperty,anintervalscaleisformed.Ifthemeasurementshaveonlythecategoryandordinalproperties,theyformanordinalscale.Finally,ifthemeasurementshaveonlythecategoryproperty,theyformanominalscale.
exerCises
terms Define these new terms.
variable measurementconstant categorypropertysample ordinalpropertyrandomsample equalintervalspropertypopulation absolutezeropropertydescriptivestatisticalprocedure nominalscaleinferentialstatisticalprocedure ordinalscalevalidity intervalscalereliability ratioscale
Questions Answer the following questions. (Answersaregiveninthebackofthebookforquestionsmarkedwith.)
1.Whywouldtherebenoneedfordescriptiveorinferentialstatisticalproceduresifbehavioralscientistscouldmeasureconstantsinsteadofvariables?
2.List10differentvariablesand1constantinthebehavioralsciences. 3.Classifyeachofthefollowingasapopulation,asample,orboth.Whentheanswer
isboth,describethecircumstancesunderwhichthedatashouldbeconsideredapopulationandunderwhichtheyshouldbeconsideredasample.
a. FamilyincomesofallfamiliesintheUnitedStates. b. FamilyincomesofallfamiliesinWisconsin. c. Thenumberofwordsrecalledfromalistof50wordsby25first-yearcollege
studentswhovolunteertotakepartinanexperiment. d. Thenumberofdaysspentinintensivecareforallpeoplewhohaveundergone
hearttransplantsurgery. e. Thenumberoferrorsmadebyratslearningamaze.
ER9405.indb 17 7/5/07 11:06:01 AM
-
18 Chapter1/WhyStatistics?
4.Describetwoexamplesofeachofthefourtypesofmeasurementscales.Indicatewhyeachisanexampleofitstype.
5. Ifyouhadachoicebetweenusingnominal,ordinal, interval,orratioscalestomeasureavariable,whatwouldbethebestchoice?Why?
6.Asetofscorescanbeonetypeofscaleoranother,dependingonwhatthesetofscoresrepresents.Considerthenumberoferrorsmadebyratsinlearningamaze.If thedata representsimply thenumberoferrors, then thescores forma ratioscale.Thenumbershaveallfourproperties,anditmakesperfectlygoodsensetosaythatifRatAmade30errorsandRatBmade15errors,thenRatAmadetwiceasmanyerrorsasRatB.Suppose,however,thatthescoresareusedasameasureofratintelligence.Arethesescoresaratiomeasureofintelligence?Explainyouranswer.Whataresomeoftheimplicationsofyouranswer?
7.Determinethetypeofmeasurementscaleusedineachofthefollowingsituations: a. Asupervisorrankshisemployeesfromleasttomostproductive. b. Studentsratetheirstatisticsteachersteachingabilityusingascaleof1(awful)
to10(magnificent). c. Asociologistclassifiessexualpreferenceas0(heterosexual),1(homosexual),
2(bisexual),3(asexual),4(other). d. Apsychologistmeasuresthetimetocompleteaproblem-solvingtask.
ER9405.indb 18 7/5/07 11:06:01 AM
-
part
Descriptive Statistics2/ FrequencyDistributionsandPercentiles3/ CentralTendencyandVariability4/ zScoresandNormalDistributions
I
ER9405.indb 19 7/5/07 11:06:01 AM
-
20
ThethreechaptersinPartIprovideanintroductiontodescriptivestatisticaltechniques.Allofthesetechniquesaredesignedtohelpyouorganizeandsummarizeyourdatawithoutintroducingdistortions.Asyouwillsee,oncethedatahavebeenorganized,itisfareasiertomakesenseofthem;thatis,itisfareasiertounderstandwhatthedataaretellingyouabouttheworld.
Threegeneraltypesofdescriptivetechniquesarecovered.WebegininChapter2withfre-quencydistributionsatechniqueforarrangingthescoresinasampleorapopulationtorevealgeneraltrends.Wewillalsolearnhowtousegraphstoillustratefrequencydistributions.
Aseconddescriptivetechniqueiscomputingstatisticsthatsummarizefrequencydis-tributionswithjustafewnumbers.InChapter3,wewill learnhowtocomputeseveralindicesofcentraltendency,themosttypicalscoresinadistribution.Wewillalsolearnhowtosummarizethevariabilityofthescoresinadistribution.
Finally, we will consider two methods for describing relative location of individualscoreswithinadistributionthatis,whereaparticularscorestandsrelativetotheothers.PercentilesareintroducedinChapter2.TheyareoftenusedwhenreportingtheresultsofstandardizedtestssuchastheScholasticAptitudeTest(SAT)andAmericanCollegeTest(ACT).Theothermeasureofrelativestandingisthestandardscore(orzscore)discussedinChapter4.Standardscoresaregenerallymoreusefulthanpercentiles,buttheyrequirethesamebackgroundtounderstand.
All of thesedescriptive techniques form theunderpinning for the remainder of thisbook,whichdealswithinferentialstatisticaltechniques.Statisticalinferencebeginswithadescriptionofthedatainasample,anditisthisdescriptionthatisusedtomakeinferencesaboutabroaderpopulation.
ER9405.indb 20 7/5/07 11:06:01 AM
-
21
Chapter
Frequency Distributions and Percentiles
Collectingdatameansmeasuringobservationsofavariable.And,ofcourse,thesemeasurementswilldifferfromoneanother.Giventhisvariability,itisoftendifficulttomakeanysenseofthedatauntiltheyareanalyzedanddescribed.Thischapterexaminesabasictechniquefordealingwithvariabilityanddescribingdata:formingafrequencydis-tribution.Whenformedcorrectly,frequencydistributionsachievethegoalsofalldescrip-tivestatisticaltechniques:Theyorganizeandsummarizethedatawithoutdistortingtheinformationthedataprovideabouttheworld.
Thischapteralsointroducestworelatedtopics,graphicalrepresentationofdistributionsandpercentiles.Graphicalrepresentationshighlightthemajorfeaturesofdistributionstofacilitatelearningfromthedata.Percentilesareatechniquefordeterminingtherelativestandingofindividualmeasurementswithinadistribution.
While reading this chapter, keep in mind that the procedures for constructing fre-quencydistributionscanbeappliedtopopulationsandtosamples.Becauseitissoraretoactuallyhaveallthescoresinapopulation,however,frequencydistributionsareusually
2Frequency distributions
RelativeFrequencyCumulativeFrequency
Grouped Frequency distributionsConstructingGroupedDistributions
Graphing Frequency distributionsHistogramsFrequencyPolygonsWhentoUseHistogramsand
FrequencyPolygonsCharacteristics of distributions
ShapeCentralTendencyVariabilityComparingDistributions
PercentilesPercentileRanksandPercentilesThreePrecautions
Computations using excelConstructingFrequencyDistributionsEstimatingPercentileRanksWithExcelEstimatingPercentiles
summaryexercises
TermsQuestions
ER9405.indb 21 7/5/07 11:06:01 AM
-
22 Chapter2 / FrequencyDistributionsandPercentiles
constructedfromsamples.Reflectingthisfact,mostoftheexamplesinthechapterwillinvolvesamples.
FreQueNCy distributioNs
Supposethatyouareworkingonastudyofsocialdevelopment.Ofparticularinterestistheageatwhichaggressivetendenciesfirstappearinchildren.Youbegindatacollection(measuringtheaggressivenessvariable)byaskingtheteacherofapreschoolclasstoratetheaggressivenessofthe20childrenintheclassusingthescale:
Meaning ScoreValue
potentialforviolence 5
veryaggressive 4
somewhataggressive 3
average 2
timid 1
verytimid 0
ThedataareinTable2.1.Asisobvious,thedataarevariable;thatis,themeasurementsdifferfromoneanother.ItisalsoobviousthatitisdifficulttolearnanythingfromthesedataastheyarepresentedinTable2.1.Soasafirststepinlearningfromthedata,theycanbeorganizedandsummarizedbyarrangingthemintheformofafrequencydistribution.
A frequency distributionisatabulationofthenumberofoccurrencesofeachscorevalue.
ThefrequencydistributionfortheaggressivenessdataisgiveninTable2.2.Thesecondcolumnliststhescorevalues.ThethirdcolumninTable2.2liststhefrequencywithwhicheachscorevalueappearsinthedata.Constructingthefrequencydistributioninvolvesnoth-ingmorethancountingthenumberofoccurrencesofeachscorevalue.Thereisasimplewaytocheckwhetherthedistributionhasbeenproperlyconstructed:Thesumofthefre-quenciesinthedistributionshouldequalthenumberofobservationsinthesample(orpop-ulation).AsindicatedinTable2.2,thefrequenciessumto20,thenumberofobservations.
tAbLe 2.1Aggressiveness ratings for 20 Preschoolers
Child Rating Child Rating Child Rating Child Rating
a 4 f 0 k 3 p 2
b 3 g 3 l 0 q 3
c 1 h 3 m 4 r 3
d 1 i 4 n 2 s 1
e 2 j 2 o 3 t 3
ER9405.indb 22 7/5/07 11:06:02 AM
-
FrequencyDistributions 23
ItisclearthatthefrequencydistributionhasanumberofadvantagesoverthelistingofthedatainTable2.1.Thefrequencydistributionorganizesandsummarizesthedata,therebyhighlightingthemajorcharacteristics.Forexample,itisnoweasytoseethatthemeasurementsinthesamplerangefromalowof0toahighof4.Also,mostofthemeas-urementsareinthemiddlerangeofscorevalues,andtherearefewermeasurementsintheendsofthedistribution.
Anotherbenefitprovidedbythefrequencydistributionisthatthedataarenoweasilycommunicated.Todescribethedata,youneedtoreportonlyfivepairsofnumbers(scorevaluesandtheirfrequencies).
Trynottoconfusethenumbersrepresentingthescorevaluesandthenumbersrepre-sentingthefrequenciesoftheparticularscorevalues.Forexample,inTable2.2thenum-ber4appearsinthecolumnlabeledscorevalueandthecolumnlabeledfrequency.Themeaningof thisnumber isquitedifferent in the twocolumns,however.Thescorevalueof4meansaparticularlevelofaggressiveness(very aggressive).Thefrequencyof4meansthenumberoftimesaparticularscorevaluewasobservedinthedata.Inthiscase,ascorevalueof2(average)wasobservedfourtimes.
Tohelpovercomeanyconfusion,besurethatyouunderstandthedistinctionsamongthefollowingterms.Scorevaluereferstoapossiblevalueonthemeasurementscale.Notallscorevalueswillnecessarilyappearinthedata,however.Ifaparticularscorevalueisneverassignedasameasurement(forexample,thescorevalue5,potential for violence),thenthatscorevaluewouldhaveafrequencyofzero.Frequencyreferstothenumberof timesaparticular scorevalueoccurs in thedata.Finally, the termsmeasurement,observation,andscoreareusedinterchangeablytorefer toaparticulardatumthenumberassignedtoaparticularindividual.Thus,inTable2.2,thescorevalueof1(timid)occurswithafrequencyof3.Similarly,therearethreescores(measurements,observa-tions)withthescorevalueof1(timid).
relative Frequency
Animportanttypeoffrequencydistributionistherelativefrequencydistribution.
tAbLe 2.2Frequency distributions for the Aggressiveness data in table 2.1
CumulativeScore Relative Cumulative Relative
Meaning Values Frequency Frequency Frequency Frequency
VeryTimid 0 2 .10 2 .10
Timid 1 3 .15 5 .25
Average 2 4 .20 9 .45
Aggressive 3 8 .40 17 .85
VeryAggressive 4 3 .15 20 1.00
PotentialforViolence 5 0 .00 20 1.00
20 1.00
ER9405.indb 23 7/5/07 11:06:02 AM
-
24 Chapter2 / FrequencyDistributionsandPercentiles
relative frequency ofascorevalueistheproportionofobservationsinthedis-tributionatthatscorevalue.A relative frequency distributionisalistingoftherelativefrequenciesofeachscorevalue.
Therelativefrequencyofascorevalueisobtainedbydividingthescorevaluesfrequencybythetotalnumberofobservations(measurements)inthedistribution.Forexample,therelativefrequencyofaggressivechildren(scorevalueof3)is8/20=.40.
Relativefrequencyiscloselyrelatedtopercentage.Multiplyingtherelativefrequencyby100givesthepercentageofobservationsatthatscorevalue.Forthesedata,thepercent-ageofchildrenratedaggressiveis.40100=40%.
ThefourthcolumninTable2.2istherelativefrequencydistributionfortheaggressive-nessdata.Notethatalloftherelativefrequenciesarebetween0.0and1.0,astheymustbe.Also,thesumoftherelativefrequenciesinthedistributionwillalwaysequal1.0.Thus,computingthesumisaquickwaytoensurethat therelativefrequencydistributionhasbeenproperlyconstructed.
Relative frequencydistributionsareoftenpreferredover rawfrequencydistributionsbecausetherelativefrequenciescombineinformationaboutfrequencywithinformationaboutthenumberofmeasurements.Thiscombinationmakesiteasiertointerpretthedata.Forexample, suppose thatanadvertisement forNationwideBeer informsyou that inascientificallyselectedsample,90peoplepreferredNationwide,comparedtoonly10whopreferredBrandX.YoumayconcludefromthesedatathatmostpeoplepreferNationwide.Suppose,however,thatthesampleactuallyincluded10,000people,90ofwhompreferredNationwide,10ofwhompreferredBrandX,and9,900ofwhomcouldnottellthediffer-ence.Inthiscase,therelativefrequenciesaremuchmoreinformative(fortheconsumer).TherelativefrequencyofpreferenceforNationwideisonly.009.
Thesameargumentinfavorofrelativefrequencycanalsobemade(inamoremodestway)forthedataonaggressiveness.Itismoreinformativetoknowthattherelativefrequencyofaggressivechildrenis.15thantosimplyknowthatthreechildrenwereratedasaggressive.
Whendescribingdatafromrandomsamples,relativefrequencyhasanotheradvantage.Therelativefrequencyofascorevalueinarandomsampleisagoodguessfortherela-tivefrequencyofthatscorevalueinthepopulationfromwhichtherandomsamplewasselected.Thereisnocorrespondingrelationbetweenfrequenciesinasampleandfrequen-ciesinapopulation.
Cumulative Frequency
Anothertypeofdistributionisthecumulativefrequencydistribution.
A cumulative frequency distributionisatabulationofthefrequencyofallmeas-urementsatorsmallerthanagivenscorevalue.
ThefifthcolumninTable2.2isthecumulativefrequencydistributionfortheaggres-sivenessscores.Thecumulativefrequencyofascorevalueisthefrequencyofthatscorevalueplusthefrequencyofallsmallerscorevalues.Thecumulativefrequencyofascorevalueofzero(very timid)is2.Thecumulativefrequencyofascorevalueof1(timid)is
ER9405.indb 24 7/5/07 11:06:02 AM
-
GroupedFrequencyDistributions 25
obtainedbyadding3(thefrequencyoftimid)plus2(thefrequencyofvery timid)toget5.Notethatthecumulativefrequencyofthelargestscorevalue(5)equals20,thetotalnum-berofobservations.Thismustbethecase,becausecumulativefrequencyisthefrequencyofallobservationsatsmallerthanagivenscorevalue,andalloftheobservationsmustbeatorsmallerthanthelargestscorevalue.Also,notethatthecumulativefrequenciescanneverdecreasewhengoingfromthelowesttothehighestscorevalue.Thereasonisthatthecumulativefrequencyofthenexthigherscorevalueisalwaysobtainedbyaddingtothelowercumulativefrequency.
Thenotionofatorsmallerimpliesthatthescorevaluescanbeorderedsothatwecandeterminewhatissmaller.Thus,cumulativefrequencydistributionsareusuallynotappropriatefornominaldata.
A cumulative relative frequencydistributionisatabulationoftherelativefre-quenciesofallmeasurementsatorbelowagivenscorevalue.
ThelastcolumninTable2.2liststhecumulativerelativefrequenciesfortheaggressive-nessdata.Thesenumbersareobtainedbyaddinguptherelativefrequenciesofallscorevaluesatorsmallerthanagivenscorevalue.
Cumulativefrequencydistributionsaremostoftenusedwhencomputingpercentiles.Weshallpostponefurtherdiscussionofthesedistributionsuntilthatsectionofthechapter.
GrouPed FreQueNCy distributioNs
Theaggressivenessdatawereparticularlyamenabletodescriptionbyfrequencydistribu-tionsinpartbecausetherewereonlyafewscorevalues.Sometimes,however,thedataarenotsoaccommodating,andamoresophisticatedapproachiscalledfor.
Consider,forexample,thefirst60measurementsontheYRSMKvariableintheSmok-ingStudy(Table2.3).Becausethemeasurementsarevariable,itisdifficulttolearnany-thingfromthedataaspresentedinthistable.
ThefrequencydistributionispresentedinTable2.4.Asyoucansee,thefrequencydis-tributionforthesedatadoesnotprovideaveryusefulsummaryofthedata.Theproblemisthattherearetoomanydifferentscorevalues.
tAbLe 2.3yrsMkNumber of years smoking daily From the First 60 Participants in the smoking study
5 13 17 20 19 35 21 28 3 22
26 13 30 30 30 32 40 27 14 4
27 33 28 45 29 25 38 35 33 39
5 4 20 24 25 27 16 25 38 9
36 20 18 11 12 23 22 27 32 49
22 30 0 32 4 23 9 29 22 23
ER9405.indb 25 7/5/07 11:06:03 AM
-
26 Chapter2 / FrequencyDistributionsandPercentiles
Thesolutionistogroupthedataintoclusterscalledclassintervals.
A class intervalisarangeofscorevalues.A grouped frequency distributionisatabulationofthenumberofmeasurementsineachclassinterval.
ThegroupedfrequencydistributionispresentedinTable2.5.Theclass intervalsarelistedontheleft.Thelowestinterval,04,containsallofthemeasurementsbetween(andincluding)0and4.Thenextinterval,59,containsthemeasurementsbetween5and9,andsoon.
Clearly, the data in the grouped distribution are much more easily interpreted thanwhenthedataareungrouped.Wecannowseethatmostofthepeopleinthissamplehavebeensmokingfor2030years,althoughthereareafewwhohavebeensmokingformorethan45yearsandafewwhohavebeensmokingonlyacoupleofyears.
Relativeandcumulativefrequencydistributionscanalsobeformedfromgroupeddata.Relativefrequenciesareformedbydividingthefrequencyineachclass intervalbythetotalnumberofmeasurements.Cumulativedistributionsareformedbyaddingupthefre-quencies(orrelativefrequencies)ofallclassintervalsatorbelowagivenclassinterval.ThesedistributionsarealsogiveninTable2.5.
Constructing Grouped distributions
Agroupedfrequencydistributionshouldsummarizethedatawithoutdistortingthem.Sum-marizationisaccomplishedbyformingclassintervals;iftheintervalsareinappropriate(for
tAbLe 2.4Frequency distribution of First 60 yrsMk scores
ScoreValue Frequency ScoreValue Frequency ScoreValue Frequency
0 1 17 1 34 0
1 0 18 1 35 2
2 0 19 1 36 1
3 1 20 3 37 0
4 3 21 1 38 2
5 2 22 4 39 1
6 0 23 3 40 1
7 0 24 1 41 0
8 0 25 3 42 0
9 2 26 1 43 0
10 1 27 4 44 0
11 1 28 2 45 1
12 0 29 2 46 0
13 2 30 4 47 0
14 1 31 0 48 0
15 0 32 3 49 1
16 1 33 2
ER9405.indb 26 7/5/07 11:06:03 AM
-
GroupedFrequencyDistributions 27
example,toobig),however,thedataaredistorted.Asanexampleofdistortion,Table2.6summarizestheYRSMKdatafromTable2.3usingthreelargeintervals.Indeed,thedataaresummarized,butimportantinformationregardinghowthemeasurementsaredistrib-uted is lost.The followingsteps shouldbeused toconstructgroupeddistributions thatsummarizebutdonotdistort.
Guidelinesforgroupedfrequencyintervals:
1.Thereshouldbebetween8and15intervals. 2.Useconvenientclassintervalsizes,like2,3,5,ormultiplesof5. 3.Startthefirstintervalatorbelowyourlowestscore.
Toconstructagoodgroupedfrequencydistribution:
1.Computetherangeofyourscoresbysubtractingthelowestscorefromthehighestscore(Range=HighScoreLowScore).
2.Dividetherangeby8and15.Findaconvenientnumberinbetweenthosetwovalues.Thatwillbeyourclassinterval.Thisisalsoknownasyourbinwidth.
tAbLe 2.5Grouped Frequency distributions for yrsMk scores in table 2.3
ClassInterval Frequency
RelativeFrequency
CumulativeFrequency
CumulativeRelative
Frequency
04 5 .083 5 .083
59 4 .067 9 .150
1014 5 .083 14 .233
1519 4 .067 18 .300
2024 12 .200 30 .500
2529 12 .200 42 .700
3034 9 .150 51 .850
3539 6 .100 57 .950
4044 1 .017 58 .967
4549 2 .033 60 1.00
Total 60 1.000
tAbLe 2.6Grouped Frequency distributions for yrsMk scores in table 2.3
Interval f (YRSMK)
019 18
2039 39
4059 3
Total 60
ER9405.indb 27 7/5/07 11:06:03 AM
-
28 Chapter2 / FrequencyDistributionsandPercentiles
3.Selectastartingvalue.Thestartingvaluecouldbeyourlowestscore,butifyourclassintervalisamultipleof5,thenyoumaywanttoselectamoreconvenient,andhencelowerstartingpoint.Forexample,theintervals09,1019,2029,etc.,workverywellifyouhavedeterminedthataclassintervalof10isappropriate.Ifyourlowestscoreis3,theintervals312,1322,2332,etc.,donotseemasintuitiveas09,1019,2029,etc.(or110,1120,2130,etc.).
4.Beginningwithyourstartingvalue,constructintervalsofincreasingvalue. 5.Countthenumber(frequency)ofscoresineachinterval.
Oneotherstepisneededwhenthemeasurementscontaindecimalsinsteadofwholenumbers.Inthesecases,allofthemeasurementsshouldberoundedsothattheyhavethesamenumberofdecimalplaces.
ThesestepswereusedtoconstructthegroupedfrequencydistributioninTable2.5.ForStep1,therangewascomputedas49(490).ForStep2,therangewasdividedby8(49/8=6.125)and15(49/15=3.267),andaconvenientnumberinbetweenthosetwo(5)wasselectedastheclassinterval.ForStep3,becausethelowestscorewas0,thestartingvaluewassetat0.ForStep4,startingwith0,consecutiveintervals,ofwidth5,wereconstructed:04,59,1014,etc.
Notethattheinterval59includesthefivescorevalues5,6,7,8,and9.Thus,theinter-valsizereallyis5,eventhoughthedifferencebetween9and5is4.
Once the lowest interval is specified, the remaining class intervals are easily con-structed.Eachsuccessiveintervalisformedbyaddingtheintervalsize(5)totheboundsoftheprecedinginterval.Forexample,theinterval1014wasobtainedbyadding5toboththelowerandupperboundsoftheinterval59.Finally,tabulatethenumberofmeasure-mentswithineachintervaltoconstructthefrequencydistribution.
Forasecondexampleofgrouping,considerthedatainTable2.7.The60measurementsinthistablearefromtheSmokingStudy.EachmeasurementisaparticipantsscoreontheWisconsinInventoryofSmokingDependenceMotives(WISDM),whichareratingson65questionssuchasDoessmokingmakeagoodmoodbetter?
tAbLe 2.7First 60 scores on the wisconsin inventory of smoking dependence Motives (wisdM)
52.9952 60.7071 53.2262 82.0333 65.9119 59.7071
44.4405 38.3167 62.2333 39.6786 46.2762 52.1119
68.4571 66.4476 28.2667 60.6667 50.0857 44.5690
33.6786 55.1119 21.8190 27.6929 53.1310 36.5500
57.4857 63.9262 60.0548 50.8071 61.2405 66.5810
55.3071 28.4643 43.9143 67.8524 54.7310 52.9429
60.8405 60.7238 51.0786 35.5071 54.2524 65.5429
60.1310 78.9357 65.1976 32.4833 51.2381 48.5786
62.5905 80.6071 54.0476 68.8190 52.1738 55.4214
61.4619 53.3571 35.8976 59.3190 68.1143 62.9429
ER9405.indb 28 7/5/07 11:06:04 AM
-
GroupedFrequencyDistributions 29
Becausethesemeasurementscontaindecimals,webeginbymakingsurethatallhavethe same number of decimal places, as they do. For Step 1, we compute the range ofscoresbysubtractingthelowestscore(21.8190)fromthehighestscore(82.0333)toarriveat60.2143.
InStep2,wedivide the rangeby8and15:60.2143/8=7.526788and60.2143/15=4.041287.Wechooseanumberbetweenthesetworesults,preferablyamultipleof2,3,or5.Wemightchoose5.6901,whichisanumberbetween7.526788and4.041287,andisdivisibleby3,but5.6901willnotserveasaconvenientclassinterval.Rather,5isbetween7.526788and4.041287,divisibleby5(obviously),andconvenient.
Thethirdstep,selectingastartingvalue,couldbesetatthelowestscore,21.8190,but20.0000seemsmoreintuitive.Thefirstinterval,therefore,willbe20.000024.9999,thenext25.000029.9999,andsoon.Thefinalstepistotabulatethenumberofmeasurementsineachintervaltoobtainthefrequencydistribution,andthendivideeachfrequencybythetotalnumberofobservationstoobtaintherelativefrequencydistribution.
AsyoucantellfromTable2.8,thesedataareveryinteresting.Thedistributionappearstopheavy.Inotherwords,morethanhalfofthescoresaregreaterthan50.Thismaynotbeunexpected,though,foritisameasureofsmokingmotivesandsmokers(whichalltheparticipantsinthestudyare)mayhavemanymotivestosmoke.Nevertheless,thesedatamaybeimportanttothestudysdesignersbecausetheycanshowthattheirpartici-pantswerehighlymotivatedtosmoke,asopposedtoparticipantswhowerentmotivatedtosmoke.Intheend,thestudysauthors,iftheexperimentissuccessful,canclaimthattheirinterventionworksforpeoplehighlymotivatedtosmoke.
tAbLe 2.8relative Frequency distribution for wisdM scores (First 60 subjects)
ClassInterval RelativeFrequency
20.000024.9999 0.017
25.000029.9999 0.050
30.000034.9999 0.033
35.000039.9999 0.083
40.000044.9999 0.050
45.000049.9999 0.033
50.000054.9999 0.233
55.000059.9999 0.100
60.000064.9999 0.200
65.000069.9999 0.150
70.000074.9999 0.000
75.000079.9999 0.017
80.000084.9999 0.033
Total 1.000
ER9405.indb 29 7/5/07 11:06:04 AM
-
30 Chapter2 / FrequencyDistributionsandPercentiles
GrAPhiNG FreQueNCy distributioNs
Displayingafrequencydistributionasagraphcanhighlightimportantfeaturesofthedata.Graphsoffrequencydistributionsarealwaysdrawnusingtwoaxes.
Theabscissaorx-axisisthehorizontalaxis.Forfrequencyandrelativefrequencydistributions,theabscissaismarkedinunitsofthevariablebeingmeasured,anditislabeledwiththevariablesname.Theordinateory-axisismarkedinunitsoffrequencyorrelativefrequency,andsolabeled.
InFigure2.1,theabscissaislabeledwithvaluesoftheaggressivenessvariableforthedistribution in Table2.2. The ordinate is marked to represent relative frequency of themeasurements.Techniquesforgraphingfrequencyandrelativefrequencydistributionsarealmostexactlythesame.Theonlydifferenceisinhowtheordinateismarked.Becauserelativefrequencyisgenerallymoreusefulthanrawfrequency,theexamplesthatfollowareforrelativefrequencydistributions.
histograms
Figure2.1isarelativefrequencyhistogramfortheaggressivenessdata.
A relative frequency histogram uses theheightsofbars to represent relativefrequenciesofscorevalues(orclassintervals).
FiGure 2.1relative frequency histogram for the aggressiveness scores in table 2.2.
0.40
0.30
0.20
Relat
ive fr
eque
ncy
0.10
0 1 2Aggressiveness score
3 4 5
ER9405.indb 30 7/5/07 11:06:05 AM
-
GraphingFrequencyDistributions 31
Toconstructthehistogram,placeabarovereachscorevalue.Thebarextendsuptotheappropriatefrequencymarkontheordinate.Thus,abarsheightisavisualanalogueofthescorevaluesrelativefrequency:thehigherthebar,thegreatertherelativefrequency.
Relativefrequencyhistogramscanalsobedrawnforgroupeddistributions.Forthesedistributions,abarisplacedovereachclassinterval.
Figure2.2isarelativefrequencyhistogramoftheYRSMKscoresinTable2.5.Some-times,onlythemidpointsofeachintervalareshownontheabscissa.Themidpointofaclassintervalistheaverageoftheintervalslowerboundandtheupperbound.Again,theheightofeachbarcorrespondstoitsrelativefrequency.
TherelativefrequencyhistogramillustratedinFigure2.2makesparticularlyclearsomeofthesalientcharacteristicsofthedistribution.Forexample,itiseasytoseethatmostofthescoresareinthemiddleofthedistributionandthatthereisadecreaseinfrequencyfromthemoderatescorestothehigherscores.
Frequency Polygons
Figure2.3 is an example of a relative frequency polygon using the WISDM scores inTable2.8.Theaxesofarelativefrequencypolygonarethesameasforahistogram.How-ever,insteadofplacingabarovereachmidpoint(orscorevalue),adotisplacedover