learning from data - an introduction to statistical reasoning

582

Upload: shan4600

Post on 11-Jan-2016

99 views

Category:

Documents


1 download

DESCRIPTION

STAT

TRANSCRIPT

  • LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING

    THIRD EDITION

    ER9405.indb 1 7/5/07 11:05:50 AM

  • ER9405.indb 2 7/5/07 11:05:50 AM

  • LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING

    THIRD EDITION

    ARTHUR M. GLENBERG

    MATTHEW E. ANDRZEJEWSKI

    Lawrence Erlbaum Associates

    New York London

    ER9405.indb 3 7/5/07 11:05:50 AM

  • Lawrence Erlbaum AssociatesTaylor & Francis Group270 Madison AvenueNew York, NY 10016

    Lawrence Erlbaum AssociatesTaylor & Francis Group2 Park SquareMilton Park, AbingdonOxon OX14 4RN

    2008 by Taylor & Francis Group, LLC Lawrence Erlbaum Associates is an imprint of Taylor & Francis Group, an Informa business

    Printed in the United States of America on acidfree paper10 9 8 7 6 5 4 3 2 1

    International Standard Book Number13: 9780805849219 (Hardcover)

    No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

    Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

    Library of Congress CataloginginPublication Data

    Glenberg, Arthur M.Learning from data : an introduction to statistical reasoning / Arthur M. Glenberg and Matthew E.

    Andrzejewski. 3rd ed.p. cm.

    Includes bibliographical references and index.ISBN13: 9780805849219 (alk. paper)1. Statistics. I. Andrzejewski, Matthew E. II. Title.

    HA29.G57 2008001.422dc22 2007022035

    Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

    ER9405.indb 4 7/5/07 11:05:51 AM

  • Contents

    Preface xiii

    Chapter 1WhyStatistics? 1Variability 2PopulationsandSamples 4DescriptiveandInferentialStatisticalProcedures 6Measurement 8UsingComputerstoLearnFromData 15Summary 16Exercises 17

    part IDescriptie Statistics 19

    Chapter 2FrequencyDistributionsandPercentiles 21FrequencyDistributions 22GroupedFrequencyDistributions 25GraphingFrequencyDistributions 30CharacteristicsofDistributions 33Percentiles 38ComputationsUsingExcel 39Summary 41Exercises 42

    ER9405.indb 5 7/5/07 11:05:51 AM

  • i Contents

    Chapter 3CentralTendencyandVariability 47SigmaNotation 47MeasuresofCentralTendency 50MeasuresofVariability 56Summary 64Exercises 65

    Chapter 4zScoresandNormalDistributions 69StandardScores(zScores) 69CharacteristicsofzScores 74NormalDistributions 76UsingtheStandardNormalDistribution 80OtherStandardizedScores 87Summary 88Exercises 88

    part IIIntroduction to Inferential Statistics 91

    Chapter 5OverviewofInferentialStatistics 93WhyInferentialProceduresAreNeeded 93VarietiesofInferentialProcedures 95RandomSampling 96BiasedSampling 100Overgeneralizing 101Summary 102Exercises 103

    Chapter 6Probability 105ProbabilitiesofEvents 106ProbabilityandRelativeFrequency 107DiscreteProbabilityDistributions 109

    ER9405.indb 6 7/5/07 11:05:51 AM

  • Contents ii

    TheOr-ruleforMutuallyExclusiveEvents 112ConditionalProbabilities 113ProbabilityandContinuousVariables 114Summary 116Exercises 117

    Chapter 7SamplingDistributions 119ConstructingaSamplingDistribution 119TwoSamplingDistributions 123SamplingDistributionsUsedinStatisticalInference 127SamplingDistributionoftheSampleMean 128ReviewofSymbolsandConcepts 133zScoresandtheSamplingDistributionoftheSampleMean 133APreviewofInferentialStatistics 136Summary 138Exercises 139

    Chapter 8LogicofHypothesisTesting 141Step1:ChecktheAssumptionsoftheStatisticalProcedure 143Step2:GeneratetheNullandAlternativeHypotheses 145Step3:SamplingDistributionoftheTestStatistic 147Step4:SettheSignificanceLevelandFormulatetheDecisionRule 150Step5:RandomlySampleFromthePopulationandComputetheTestStatistic 152Step6:ApplytheDecisionRuleandDrawConclusions 153WhenH0IsNotRejected 154BriefReview 155ErrorsinHypothesisTesting:TypeIErrors 157TypeIIErrors 158OutcomesofaStatisticalTest 161DirectionalAlternativeHypotheses 162ASecondExample 166AThirdExample 169Summary 172Exercises 174

    Chapter 9Power 177CalculatingPowerUsingzScores 178FactorsAffectingPower 182

    ER9405.indb 7 7/5/07 11:05:52 AM

  • iii Contents

    EffectSize 189ComputingProceduresforPowerandSampleSizeDetermination 193WhentoUsePowerAnalyses 195Summary 197Exercises 198

    Chapter 10LogicofParameterEstimation 199PointEstimation 200IntervalEstimation 200ConstructingConfidenceLimitsforWhenIsKnown 201WhytheFormulaWorks 204FactorsThatAffecttheWidthoftheConfidenceInterval 206ComparisonofIntervalEstimationandHypothesisTesting 209Summary 210Exercises 211

    part IIIapplications of Inferential Statistics 213

    Chapter 11InferencesAboutPopulationProportionsUsingthezStatistic 215TheBinomialExperiment 216TestingHypothesesAbout 219TestingaDirectionalAlternativeHypothesisAbout 225PowerandSampleSizeAnalyses 228Estimating 232RelatedStatisticalProcedures 235Summary 236Exercises 238

    Chapter 12InferencesAboutWhenIsUnknown:TheSingle-sampletTest 241WhysCannotBeUsedtoComputez 242ThetStatistic 243

    ER9405.indb 8 7/5/07 11:05:52 AM

  • Contents ix

    UsingttoTestHypothesesAbout 245ExampleUsingaDirectionalAlternative 252PowerandSampleSizeAnalyses 253EstimatingWhenIsNotKnown 256Summary 258Exercises 258

    Chapter 13ComparingTwoPopulations:IndependentSamples 263ComparingNaturallyOccurringandHypotheticalPopulations 264IndependentandDependentSamplingFromPopulations 266SamplingDistributionoftheDifferenceBetweenSample Means(IndependentSamples) 267ThetDistributionforIndependentSamples 269HypothesisTesting 271ASecondExampleofHypothesisTesting 279PowerandSampleSizeAnalyses 281EstimatingtheDifferenceBetweenTwoPopulationMeans 283TheRank-sumTestforIndependentSamples 286Summary 292Exercises 292

    Chapter 14RandomSampling,RandomAssignment,andCausality 299RandomSampling 299ExperimentsintheBehavioralSciences 300RandomAssignmentCan(Sometimes)BeUsedInsteadofRandomSampling 303InterpretingtheResultsBasedonRandomAssignment 305Review 306ASecondExample 306Summary 307Exercises 308

    Chapter 15ComparingTwoPopulations:DependentSamples 311DependentSampling 312SamplingDistributionsoftheDependent-sampletStatistic 318HypothesisTestingUsingtheDependent-samplet Statistic 320

    ER9405.indb 9 7/5/07 11:05:52 AM

  • x Contents

    ASecondExample 326PowerandSampleSizeAnalyses 328EstimatingtheDifferenceBetweenTwoPopulationMeans 330TheWilcoxonTmTest 332HypothesisTestingUsingtheWilcoxonTmStatistic 334Summary 337Exercises 338

    Chapter 16ComparingTwoPopulationVariances:TheFStatistic 345TheFStatistic 346TestingHypothesesAboutPopulationVariances 348ASecondExample 353EstimatingtheRatioofTwoPopulationVariances 354Summary 355Exercises 355

    Chapter 17ComparingMultiplePopulationMeans:One-factorANOVA 359FactorsandTreatments 361HowtheIndependent-sampleOne-factorANOVAWorks 361TestingHypothesesUsingtheIndependent-sampleANOVA 367ComparisonsBetweenSelectedPopulationMeans:TheProtectedtTest 372ASecondExampleoftheIndependent-sampleOne-factorANOVA 374One-factorANOVAforDependentSamples 376ASecondDependent-sampleOne-factorANOVA 381KruskalWallisHTest:NonparametricAnalogueforthe Independent-sampleOne-factorANOVA 384FriedmanFrTest:NonparametricAnalogueforthe Dependent-sampleOne-factorANOVA 387Summary 390Exercises 391

    Chapter 18IntroductiontoFactorialDesigns 399TheTwo-factorFactorialExperiment:CombiningTwoExperimentsIntoOne 400LearningFromaFactorialExperiment 402ASecondExampleofaFactorialExperiment 407

    ER9405.indb 10 7/5/07 11:05:53 AM

  • Contents xi

    GraphingtheResultsofaFactorialExperiment 408DesignofFactorialExperiments 410Three-factorFactorialExperiment 412Summary 421Exercises 422

    Chapter 19ComputationalMethodsfortheFactorialANOVA 425Two-factorFactorialANOVA 425ComparingPairsofMeans:TheProtectedtTest 433ASecondExampleoftheFactorialANOVA 436Summary 437Exercises 438

    Chapter 20DescribingLinearRelationships:Regression 441DependentSamples 443MathematicsofStraightLines 445DescribingLinearRelationships:TheLeast-squaresRegressionLine 448PrecautionsinRegression(andCorrelation)Analysis 454InferencesAbouttheSlopeoftheRegressionLine 457UsingtheRegressionLineforPrediction 464MultipleRegression 469Summary 470Exercises 470

    Chapter 21MeasuringtheStrengthofLinearRelationships:Correlation 477Correlation:DescribingtheStrengthofaLinearRelationship 478FactorsThatAffecttheSizeofr 482TestingHypothesesAbout 482CorrelationDoesNotProveCausation 489TheSpearmanRank-orderCorrelation 491OtherCorrelationCoefficients 495PowerandSampleSizeAnalyses 496Summary 498Exercises 498

    ER9405.indb 11 7/5/07 11:05:53 AM

  • xii Contents

    Chapter 22InferencesFromNominalData:The2Statistic 505Nominal,Categorical,EnumerativeData 5062Goodness-of-fitTest 507ASecondExampleofthe2Goodness-of-fitTest 511ComparisonofMultiplePopulationDistributions 513SecondExampleofUsing2toCompareMultipleDistributions 517AnAlternativeConceptualization:AnalysisofContingency 519Summary 522Exercises 523

    GlossaryofSymbols 527

    Tables 531

    AppendixA.VariablesFromtheStopSmokingStudy 545

    AppendixB.VariablesFromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork 547

    AnswerstoSelectedExercises 549

    Index 555

    ER9405.indb 12 7/5/07 11:05:53 AM

  • xiii

    Preface

    Statisticsisadifficultsubject.Thereisalottolearn,andmuchofitinvolvesnewthink-ing.Asthetitle implies,Learning From Data: An Introduction to Statistical Reasoningteachesyouanewwayofthinkingaboutandlearningabouttheworld.Ourgoalistoputreadersinagoodpositiontounderstandpsychologicaldataandtheirlimitations.Anothermore important goal is to evaluate data that affect all aspects of lifepsychological,social, educational, political, and economicto better prepare readers to question andtochallenge.Yetanothergoal is tohelp readers retain thematerial.Psychologistshavedeveloped(fromdata)techniquesthatfacilitatelearningandcomprehension,andwehaveincorporatedthreeofthesetechniquesintothebook.

    First, we have devoted extra attention to explaining difficult-to-understand conceptsindetail.Forexample, some textbooksattempt tocombine importantconcepts suchassamplingdistributions,hypothesistesting,power,andparameterestimationinonechapter.Inthisbook,eachconcepthasitsownchapter.Yes,thismeansmorereading,butitalsomeansgreaterunderstanding.

    Second,thebookusesrepetitionextensivelytohelpstudentslearnandretainconcepts.Therearemultiplefullyexplainedexamplesofeachmajorprocedure.Manyconcepts(forexample,power,TypeIerrors)arerepeatedfromchaptertochapter.Theproblemsetsattheendsofmostchaptersrequirestudentstoapplyprinciplesintroducedinearlierchapters.

    Thethirdmajorlearningaidistheuseofaconsistentschema(thesix-stepprocedure)fordescribingallstatisticaltestsfromthesimplesttothemostcomplex.Theschemapro-videsavaluableheuristicforlearningfromdata.Studentslearn(1)toconsidertheassump-tionsofastatisticaltest,(2)togeneratenullandalternativehypotheses,(3)tochooseanappropriatesamplingdistribution,(4)tosetasignificancecriterionandgenerateadecisionrule, (5) to compute the statistic of interest, and (6) to drawconclusions. Learning theschemaatanearlystage(inChapter8)willeasethewaythroughChapters11through22,inwhich theschemaisapplied tomanydifferentsituations.Thisschemaalsoprovidesa convenient summary for each hypothesis-testingprocedure.A tablewith a summaryschema is included in the lastsectionofeachchaptercontaining thehypothesis-testingprocedure.InsidethefrontcoverofthebookisaStatisticalSelectionGuidetofurtherassiststudentsindeterminingwhichstatisticaltestismostappropriateforthesituation.

    ER9405.indb 13 7/5/07 11:05:54 AM

  • xi Preface

    About the book

    TherearemanyaspectstoLearning From Datathatdifferentiateitfromotherstatisticstextbooks.Inadditiontothethreeteaching/learningmethodsmentionedearlier,thecon-tentandorganizationofthebookmaybequitedifferentfromwhatstudentsareusedto.First,nonparametricstatisticaltestsareintegratedintothechaptersinwhichanalogousparametrictestsaredescribed.Withthisorganization,studentscanbetterappreciatethesituationsinwhichparticulartestsapply.Infact,throughoutthebookthereisanemphasisonpracticinghowtochoosethebeststatisticalprocedure.Thechoiceoftheprocedureisdiscussedinexamples,andstudentsarerequiredtomakethecorrectchoiceastheysolvetheproblemsattheendofthechapter.Theendpapersofthebookprovideguidelinesforchoosingprocedures.

    Second, the initial parts of the chapters on regression (Chapter 20) and correlation(Chapter 21) are self-contained sections that include discussions of regression and cor-relationasdescriptiveprocedures.Instructorsmaypresentthesetopicsalongwithotherdescriptivestatisticsordelaytheirintroductionuntillaterinthecourse.

    Third, thebookcontains twoindependent treatmentsofpower.Themajor treatmentbegins inChapter9withgraphical illustrationsofhowpowerchangesunder the influ-enceofsuchfactorsasthesignificancelevelandsamplesize.Thechapteralsointroducesformulasforcomputingpowerandestimatingsamplesizeneededtoobtainaparticularlevelofpower.These formulasare repeatedandgeneralized formanyof thestatisticalproceduresdiscussedinlaterchapters.Often,however,theremaynotbeenoughtimeforanextensivetreatmentofpower.Inthatcase,instructorscanchoosetotreatpowerlessextensivelyandomitChapter9(andtherelevantformulasintheotherchapters).Thislessextensivetreatmentofpowerispartofeachnewinferentialprocedure.Itconsistsofanon-mathematicaldiscussionofhowpowercanbeenhancedforthatparticularprocedure.

    Fourth,factorialdesigns,interactions,andtheANOVAareexplainedingreaterdetailthaninmostintroductorytextbooks.Ourgoalistogivestudentsenoughinformationsothattheywillbeabletounderstandthestatisticsusedinmanyprofessionaljournalarticles.Ofcourse,itwouldbefoolishfortheauthorsofanyintroductorytextbooktotrytocoverthestatisticalanalysesofcomplexsituations.Instead,Chapter18discusseshowtwo-factorandthree-factorfactorialexperimentsaredesigned,andhowtointerpretmaineffectsandtwo-factor and three-factor interactions. Chapter 19 presents a description of computa-tionalproceduresfortherelativelysimpletwo-factor,independentsampleANOVA.

    Last,butmost important tous, isChapter14,RandomSampling,RandomAssign-ment,andCausality.Amajorreasonforwritingthefirsttwoeditionsofthisbookwastoaddresstheissuesdiscussedinthischapter.Allofuswhoteachstatisticscoursesandcon-ductresearchhavebeenstruckbytheincongruitybetweenwhatwepracticeandwhatwepreach.Whenweteachastatisticscourse,weemphasizerandomsamplingfrompopula-tions.Butinmostexperimentswedonosuchthing.Instead,weusesomeformofrandomassignment to conditions. How can we perform statistical analyses of our experimentswhenwehaveignoredthemostimportantassumptionofthestatisticaltests?InChapter14,wedeveloparationaleforthisbehavior,buttherationaleextractsseverepaymentbyplacingrestrictionsontheinterpretationoftheresultswhenrandomassignmentisusedinsteadofrandomsampling.

    ER9405.indb 14 7/5/07 11:05:54 AM

  • Preface x

    New to the third editioN

    Inadditiontothefeaturesalreadydescribed,thereareanumberofnewfeatures.First,thethirdeditionofLearning From DataisdesignedtobeusedseamlesslywithExcel.Unlikeothertextsthatconcentrateonstatisticalsoftware,wechoosetofocusonExcel,a spreadsheet program.Recentversionsof statistical programsproduceoutput that arefarmorecomplicatedthanneededfortheundergraduatelevel.TheoutputfromExcelisstraightforward;however, thestatistical toolsavailablearenotcomplete.Thus,wehavewrittenanAdd-in(LFD3DataAnalysisAdd-in)forExcelsoalltheanalysespresentedinthebookcanbeconductedinExcel.Exceliswidelyavailableandcanalsobeusedasa database, data manager, and graphics program; experience with these functions mayprovideavaluablesetofskillsforundergraduatesinanumberofprofessions,includingpsychology.Thus,filescontainingallthedatausedinthebookareprovidedonacom-panionCDinExcelformat.However,becauseotherprogramsarestillwidelyused,text-basedfilesarealsoavailableforuseinotherstatisticalprograms, likeSPSS,SAS,andSystat.

    Second,thebookattemptstocapturethestudentsinterestbyfocusingonwhatcanbelearnedfromastatisticalanalysis,notjustonhowitisdone.Thisismostapparentinthetreatmentofhypothesistesting.Usingthesix-stepschema,thelaststepinhypothesis-test-ingisdescribedasdecidingwhethertorejectthenullhypothesisand thenconcludingwhatthatdecisionimpliesabouttheworldandwhattheimplicationsforfutureactionmightbe.Anotherwaythatthebookattemptstocapturethestudentsinterestisbycontinuallyrefer-ringbacktotworealdatasets.Thesedatasetsareintrinsicallyinterestingandsavetimebecausenewexperimentalscenariosdonotneedtobecontinuallyintroduced.ThefirstdatasetontheeffectivenessofZybanandnicotine-replacementgumonsmokingcomesfromDr.TimothyBaker.Datafrom608participantsareincludedonthecompanionCD.TheseconddatasetontheeffectsofhavingachildonmarriagecomesfromDr.JanetHydeandDr.MarilynEssex.Thedatafrom244familiesarealsoincludedonthecom-panionCD.Datafromthesestudiesareusedthroughoutthebookinillustratingimportantconcepts.ThefactthatthesearerealdatasetsstrikesachordwithstudentsthatstatisticsplaysanimportantroleinLearning From Data.

    Finally,wehaveprovidedinstructorswithsubstantialresources.Tobeginwith,wehaveaddedapproximately20newproblemstotheend-of-chapterexercisesandprovidedmanymoreonthecompanionCD.IncludedontheinstructorCDaresampletestquestions,exer-cises,andsampledatasets.WehavealsogeneratedPowerpointlecturesforeachchapterforinstructorstouseoredit,astheychoose.Thereareanumberofveryusefulgraphicsandillustrationsthatmirrortheonesinthebook.Therearealsofun,interactiveexercises/dem-onstrationsandtoolsthatwehavefounduseful(forexample,datagenerationalgorithms,Gaussianrandomnumbergenerators,etc.).Asadditionalitemsbecomeavailable,ourWebsite(www.LFD3.net)willprovideusersofthetextbookaccesstothem.

    MANy thANks

    Manypeoplehavecontributedtothisbook.WethankourstudentsandcolleaguesattheUniversityofWisconsinMadisonandthoseinstructorswhousedthefirsttwoeditionsand

    ER9405.indb 15 7/5/07 11:05:54 AM

  • xi Preface

    providedvaluablecomments.WealsothankLauraD.Goodwin(UniversityofColorado,Denver),RichardE.Zinbarg(NorthwesternUniversity),DanielS.Levine(UniversityofTexas, Arlington), and Randall De Pry (University of Colorado, Colorado Springs) fortheirvaluablereviewsofmanyofthechaptersandoftheproposalforathirdeditionofthebook.AMGthankshisinstructorsattheUniversityofMichiganandMiamiUniversity.MEAthankshisinstructorsatTempleUniversity,especiallyRalphRosnow,AlanSock-loff,andPhilBersh.Thanksaredue to theeditorialandproductionstaffsatLawrenceErlbaumAssociates,whotolerateddelayafterdelay.Finally,thankstoMinaandAnnafortheirloveandsupport.

    Arthur M. Glenberg

    Matthew e. Andrzejewski

    ER9405.indb 16 7/5/07 11:05:55 AM

  • 1Chapter

    Why Statistics?

    Therearemanywaystolearnabouttheworldandthepeoplewhopopulateit.Learningcanresultfromcriticalthinking,askinganauthority,orevenfromareligiousexperience.However,collectingdata(thatis,measuringobservations)isthesurestwaytolearnabouthowtheworldreallyis.

    Unfortunately,data in thebehavioralsciencesaremessy. Initialexaminationofdatarevealsnoclearfactsabouttheworld.Instead,thedataappeartobenothingbutaninco-herentjumbleofnumbers.Tolearnabouttheworldfromdata,youmustfirstlearnhowtomakesenseoutofdata,andthatiswhatthistextbookwillteachyou.Statistical procedures are tools for learning about the world by learning from data.

    Tohelpyoutounderstandthepowerandusefulnessofstatisticalprocedures,wewillexploretworeal(andimportant!)datasetsthroughoutthecourseofthebook.OneofthedatasetsiscourtesyofProfessorTimothyBakerattheUniversityofWisconsinCenterforTobaccoResearchandIntervention(whichwewillcalltheSmokingStudy).Thedatawerecollectedtoinvestigateseveralquestionsaboutsmoking,addiction,withdrawal,andhowbesttoquitsmoking.Thedatasetconsistsofasampleof608peoplewhowantedtoquitsmoking.Thesepeoplewererandomlyassigned(seeChapter14forthebenefitsofrandomassignment)tothreegroups.Theparticipantsinonegroupweregiventhedrugbupropion

    1Variability

    SourcesofVariabilityVariablesandConstants

    Populations and samplesStatisticalPopulationsTheProblemofLargePopulationsSamples

    descriptive and inferential statistical Procedures

    DescriptiveStatisticalProceduresInferentialStatisticalProcedures

    MeasurementConsideringMeasurementinaSocial

    andPoliticalContextDifferencesAmongMeasurementRulesPropertiesofNumbersUsedas

    MeasurementsTypesofMeasurementScalesImportanceofScaleTypes

    using Computers to Learn From dataWhatStatisticalAnalysisProgramsCan

    DoforYouWhattheProgramsCannotDoforYou

    summaryexercises

    TermsQuestions

    ER9405.indb 1 7/5/07 11:05:55 AM

  • 2 Chapter1/WhyStatistics?

    SR(Zyban)alongwithnicotinereplacementgum.Inasecondgroup,theparticipantsweregiventhebupropionalongwithaplacebogumthatdidnotcontainanyactiveingredients.Thefinalgroupreceivedbothaplacebodrugandaplacebogum.Themajorquestionofinterestiswhetherpeoplearemoresuccessfulinquittingsmokingwhenthetheactivegumisaddedtothebupropion.Thesedataareexcitingforacoupleofreasons.First,giventhetremendoussocialcostofcigarettesmoking,weasasocietyneedtofigureouthowtohelppeopleovercomethisaddiction,and thesedatado just that.Second, thestudy includedmeasurementsofabout30othervariablestohelpanswerancillaryquestions.Forexample,therearedataonhowlongpeoplehavesmokedandhowmuchtheysmoked;dataonhealthfactorsanddruguse;anddemographicdatasuchasgender,ethnicity,age,education,andheight.ThesevariablesaredescribedmorefullywithintheExcelandSPSSdatafilesontheCDthatcomeswiththisbookandinAppendixA.Thestatisticaltoolsyouwilllearnaboutwillgiveyou theopportunity toexplore thesedata to the fullest extentpossible.Youcanaskimportantquestionssomethatmayneverhavebeenaskedbeforesuchaswhetherdruguseaffectspeoplesabilitytoquitesmoking,andyoucangettheanswers.Inaddition,thesedatawillbeusedtoillustratevariousstatisticalprocedures,andtheywillbeusedintheend-of-chapterexercises.

    The second data set is courtesyof Professors JanetHyde and MarilynEssex of theUniversityofWisconsinMadison.ThedatasetisasubsetofthedatafromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork(wewillrefertoitastheMaternityStudy).Thisprojectwasdesignedtoanswerquestionsabouthowhavingababyaffectsfamilydynamicssuchasmaritalsatisfaction,andhowvariousfactorsaffectchilddevelopment.Thedatasetconsistsofmeasurementsof26variablesfor244families.Someofthesevariablesaredemographic,suchasage,education,andfamilyincome.Maritalsatisfactionwasmeasuredseparatelyformothersandfathersbothbeforethechildwasborn(duringthe5thmonthofpregnancy)andatthreetimesafterthebirth(1,4,and12monthspostpartum).Therearealsodataonhowmuchthemotherworkedoutsidethehouseandhowequallyhouseholdtasksweredividedamongthemothersandfathers.Finally,thereareeightmeasuresofthequalityofmotherchildinteractionsat12monthsafterbirth,andthreemeasuresofchildtemperament(forexample,hyperactivity)measuredwhenthechildwas4.5yearsold.ThesevariablesaredescribedmorefullyontheCDthatcomeswiththisbookandinAppendixB.Aswiththesmokingdata,youarefreetousethesedatatoanswerimportantquestions,suchaswhethertheamountoftimethatamotherworksaffectschilddevelopment.

    This chapter introducesanumberof topics that arebasic to statistical analyses.Webeginwithadiscussionofvariability,thecauseofmessydata,andmoveontothedistinc-tionsbetweenpopulationandsample,descriptiveandinferentialstatistics,andtypesofmeasurementfoundinthebehavioralsciences.

    VAriAbiLity

    Thefirststepinlearninghowtolearnfromdataistounderstandwhydataaremessy.Acon-creteexampleisuseful.ConsidertheCESD(CenterforEpidemiologicStudiesDepression)scores from the Smoking Study (see Appendix A). Each participant rated 20questions

    ER9405.indb 2 7/5/07 11:05:55 AM

  • Variability 3

    suchasIfeltlonelyusingaratingof0(rarelyornoneofthetimeduringthepastweek)to3(mostofthetimeduringthepastweek).Thescoreisthesumoftheratingsforthepar-ticipant.Forthe601participantsforwhomwehaveCESDscores,thescoresrangefrom0to23.Aboutaquarterofthescoresarebelow2,butanotherquarterareabove9.Thesedataaremessyinthesensethatthescoresareverydifferentfromoneanother.

    Variability is the statistical term for the degree to which scores (such as thedepressionscores)differfromoneanother.

    Chapter3presents statisticalprocedures forpreciselymeasuring thevariability inasetofscores.Fornow,onlyanintuitiveunderstandingofvariabilityisneeded.Whenthescoresdifferfromoneanotherbyquitealot(suchasthedepressionscores),variabilityishigh.Whenthescoreshavesimilarvalues,variabilityislow.Whenallthescoresarethesame,thereisnovariability.

    sources of Variability

    It iseasyenoughtoseethat theCESDdataarevariable,butwhyaretheyvariable?Ingeneral,variabilityarisesfromseveralsources.Onesourceofvariabilityisindividualdif-ferences:Somesmokersaremoredepressedthanothers;somehavedifficultyreadingandunderstandingtheitemsonthetest;somesmokersanswersontheinventoryaremorehon-estthantheanswersofothersmokers.Thereareasmanypotentialsourcesofvariabilityduetoindividualdifferencesastherearereasonsforwhyonepersondiffersfromanotherinintelligence,personality,performance,andphysicalcharacteristics.

    Anothersourceofvariabilityistheprocedureusedincollectingthedata.Perhapssomeofthesmokersweremorerushedthanothers;perhapssomeweretestedattheendofthedayandweremoretiredthanothers.Anychangeintheproceduresusedforcollectingthedatacanintroducevariability.Finally,somevariabilitymaybeduetoconditionsimposedontheparticipants,suchaswhethertheyaretakingtheplacebogum.

    Variables and Constants

    Variabilitydoesnotoccuronlyintextbookexamples;itischaracteristicofalldatainthebehavioralsciences.Wheneverabehavioralscientistcollectsdata,whetheron the inci-denceofdepression, theeffectivenessofapsychotherapeutic technique,or the reactiontimetorespondtoastimulus,thedatawillbevariable;thatis,notallthescorescollectedwillbethesame.Infact,becausedataarevariable,collectingdataissometimesreferredtoasmeasuringavariable(orarandomvariable).

    A variableisameasurementthatchangesfromoneobservationtothenext.

    CESD is a variable because it changes from one smoker (observation) to the next.Effectivenessofapsychotherapeutictechniqueisanotherexampleofavariable,becauseagiventechniquewillbemoreeffectiveforsomepeoplethanforothers.

    ER9405.indb 3 7/5/07 11:05:55 AM

  • 4 Chapter1/WhyStatistics?

    Variablesshouldbedistinguishedfromconstants.

    Constantsaremeasurementsthatstaythesamefromoneobservationtothenext.

    The boiling point of pure water at sea level is an example of a constant. It is always100degreesCentigrade.Whetheryouusealittlewateroralotofwater,whetherthewaterisencouragedtoboilfasterornot,nomatterwhoismakingtheobservation(aslongastheobserveriscareful!),thewateralwaysboilsatthesametemperature.AnotherconstantisNewtonsgravitationalconstant,therateofaccelerationofanobjectinagravitationalfield(whethertheobjectislargeorsmall,solidorliquid,andsoon).

    Manyoftheobservationsmadeinthephysicalsciencesareobservationsofconstants.Becauseofthis,itiseasyforthebeginningstudentinthephysicalsciencestolearnfromdata.Asinglecarefulobservationofaconstanttellsthewholestory.

    Youmaybesurprisedtolearnthatthereisnotoneconstantinallofthebehavioralsci-ences.Thereisnosuchthingasthe effectivenessofapsychotherapeutictechnique,orthe depressionscore,becausemeasurementsofthesevariableschangefrompersontoperson.Infact,becausewhatisknowninthebehavioralsciencesisalwaysbasedonmeasuringvari-ables,eventhebeginningstudentmusthavesomefamiliaritywithstatisticalprocedurestoappreciatethebodyofknowledgethatcomprisesthebehavioralsciencesandthelimitationsinherentinthatbodyofknowledge.Incaseyouwerewondering,thisiswhyyouaretakinganintroductorystatisticscourse,andyourfriendsmajoringinthephysicalsciencesarenot.

    Theconceptofvariabilityisabsolutelybasictostatisticalreasoning,anditwillmoti-vatealldiscussionsoflearningfromdata.Infact,theremainderofthischapterintroducesconceptsthathavebeendevelopedtohelpcopewithvariability.

    PoPuLAtioNs ANd sAMPLes

    ThepsychologistsstudyingaddictionmightbeinterestedintheCESDscoresofthespe-cificsmokersfromwhomtheycollecteddata.However,itislikelythattheyareinterestedinmorethanjustthoseindividuals.Forexample,theymaybeinterestedintheincidenceofdepressionamongallsmokersinWisconsin,orallsmokersintheUnitedStates,orevenallsmokersintheworld.Becausedepressionisavariablethatchangesfrompersontoperson,the specificobservations cannot reveal everything the researchersmightwant toknowaboutallofthesedepressionscores.

    statistical Populations

    Astatisticalpopulationisacollectionorsetofmeasurementsofavariablethatsharesomecommoncharacteristic.

    OneexampleofapopulationisthesetofCESDscoresofallsmokersinWisconsin.Thesescoresaremeasurementsofavariable(CESD),andtheyhavethecommoncharacteristicofbeingfromaparticulargroupofpeople:smokersinWisconsin.Adifferentstatistical

    ER9405.indb 4 7/5/07 11:05:56 AM

  • PopulationsandSamples 5

    populationconsistsoftheCESDscoresforsmokersintheUnitedStates.And,averydif-ferentpopulationconsistsofthemaritalsatisfactionscoresfornewmotherswhoworkfull-timeoutsideofthehome.Thepointisthatyoushouldnotthinkofstatisticalpopulationsasgroupsofpeople,suchasthepeopleintheUnitedStates.ThereisonlyonepopulationofpeoplefortheUnitedStates,butthereareaninfinitenumberofstatisticalpopulationsdependingonwhatvariablesaremeasured(forexample,CESDormaritalsatisfaction),andhowthosescoresmightbegrouped(forexample,smokersorworkingmothers).

    Thinkingofstatisticalpopulationsassetsofmeasurementsmayappearcoldandunfeel-ing.Nonetheless,thinkingthiswayhasatremendousadvantageinthatitfacilitatestheapplicationofthesamestatisticalproceduretoavarietyofpopulations.Insteadofhavingto learnone technique for analyzingand learning fromdepression scores, and anothertechniqueforanalyzingIQscores,andyetanotherforanalyzingerrorsratsmakeinlearn-ingmazes,manyofthesameprocedurescanbeappliedinallofthesecases.Ineverycasewearedealing(statistically)withthesamestuff,asetofmeasurements.

    Unfortunately, thinkingofstatisticalpopulationsassetsofnumberscancausesomepeopletobecomeboredandloseinterestintheenterprise.Thewaytocounterthisbore-dom is to remember that the statistical procedures areoperatingonnumbers that havemeaning:Thenumbersarescoresthatrepresentsomethinginterestingabouttheworld(forexample,theincidenceofdepressioninsmokers).Asyoureadthroughthisbook,thinkaboutapplyingyournewknowledgetoproblemsthatareofinteresttoyou,andnotjustasmanipulationofnumbers.

    the Problem of Large Populations

    Somestatisticalpopulationsconsistofamanageablenumberofscores.Usually,however,statisticalpopulationsareverylarge.Forexample,therearepotentiallymillionsofCESDscoresofsmokers.Whendealingwithlargepopulations,itisdifficultandtimeconsum-ingtoactuallycollectallofthescoresinthepopulation.Sometimes,forethicalreasons,allthescoresinthepopulationcannotbeobtained.Forexample,supposethatamedicalresearcherbelievesthatshehasdiscoveredadrugthatsafelyandeffectivelyreduceshighbloodpressure.Oneway todetermine thedrugseffectiveness is toadminister it toallpeoplesufferingfromhighbloodpressureandthentomeasuretheirbloodpressures.(Thepopulationofinterestconsistsofthebloodpressurescoresofpeoplesufferingfromhighbloodpressurewhohavetakenthenewdrug.)Clearly,thiswouldbetimeconsumingandexpensive.Itwouldalsobeveryunethical.Afterall,whatifthemedicalresearcherwerewrong,andthedrugdidmoreharmthangood?Also,evenwithagreatnationaleffort,notallthescorescouldbecollected,becausesomeofthepeoplewoulddiebeforetheytookthedrug,otherswouldhavetheirbloodpressuresloweredbyotherdrugs,andotherswoulddevelophighbloodpressureoverthecourseofdatacollection.

    Weappeartohaverunacrossaproblem.Usually,wearenotinterestedinjustafewscores,butinallthescoresinapopulation.Yet,becausebehavioralscientistsareinter-estedinlearningaboutvariables(notconstants),itisimpossibletoknowforsureaboutallthescoresinapopulationfrommeasuringjustafewofthem.Ontheotherhand,itistimeconsumingandexpensivetocollectallthescoresinapopulation,anditmaybeunethicalorimpossible.Whattodo?

    ER9405.indb 5 7/5/07 11:05:56 AM

  • 6 Chapter1/WhyStatistics?

    samples

    The solution to this problem is provided by statistical procedures based on samplingfrompopulations.

    A sampleisasubsetofmeasurementsfromapopulation.

    Thatis,asamplecontainssome,butusuallynotall,ofthescoresinthepopulation.The608CESDscoresareasamplefromthepopulationofCESDscoresofallsmokers.

    Animportanttypeofsampleisarandomsample.

    A random sampleisselectedsothateveryscoreinthepopulationhasanequalchanceofbeingincluded.

    Whetherasampleisrandomornotdoesnotdependontheactualscoresincludedinthesample,butonhowthescoresinthesampleareselected.Onlyifthescoresareselectedinsuchawaythateachscoreinthepopulationhasanequalchanceofbeingincludedinthesampleisthesamplearandomsample.TheCESDscoresarenotarandomsampleofCESDscoresofallsmokers.ThesescoresareonlyfrompeoplelivinginMadisonandMilwaukee,Wisconsin,andtherewasnoattempttoensurethatCESDscoresofpeoplelivingelsewherewereincluded.ProceduresforproducingrandomsamplesarediscussedinChapter5.

    AsyouwillseeinChapters522,randomsamplesareusedtohelpsolvetheproblemoflargepopulations.Thatis,withthedatainarandomsample,wecanlearnaboutthepopu-lationfromwhichthesamplewasobtainedbyusinginferentialstatisticalprocedures.

    desCriPtiVe ANd iNFereNtiAL stAtistiCAL ProCedures

    descriptive statistical Procedures

    Becauseofvariability,inordertolearnanythingfromdata,thedatamustbeorganized.

    descriptive statistical proceduresareusedtoorganizeandsummarizethemeas-urementsinsamplesandpopulations.

    Inotherwords,descriptivestatisticalproceduresdowhatthenameimpliestheydescribethedata.Theseprocedurescanbeappliedtosamplesandtopopulations.Mostoften,theyareappliedtosamples,becauseitisraretohaveallthescoresinapopulation.

    Descriptivestatisticalproceduresincludewaysoforderingandgroupingdataintodis-tributions(discussedinChapter2)andwaysofcalculatingsinglenumbersthatsummarizethewholesetofscoresinthesampleorpopulation(discussedinChapters2and3).Somedescriptivestatisticalproceduresareusedtorepresentdatagraphically,becauseasevery-oneknows,apictureisworthathousandwords.

    ER9405.indb 6 7/5/07 11:05:56 AM

  • DescriptiveandInferentialStatisticalProcedures 7

    inferential statistical Procedures

    Themostpowerfultoolsavailabletothestatisticianareinferentialstatisticalprocedures.

    inferential statistical procedures are used to make educated guesses (infer-ences)aboutpopulationsbasedonrandomsamplesfromthepopulations.

    Theseeducatedguessesarethebestwaytolearnaboutapopulationshortofcollectingallofthescoresinthepopulation.

    Allofthismaysoundabitlikemagic.Howcanyoupossiblylearnaboutawholepopu-lationthatmaycontainmillionsandmillions(or,theoretically,aninfinity)ofscoresbyexaminingasmallnumberofscorescontainedinarandomsamplefromthatpopulation?Itisnotmagic,however,anditisevenunderstandable.PartIIofthisbookpresentsadetaileddescriptionofhowinferentialstatisticalprocedureswork.

    Inferentialstatisticalproceduresaresopervasiveinoursocietythatyouhaveundoubt-edlyreadabout themandmadedecisionsbasedonthem.Forexample, thinkabout thelasttimeyouheardtheresultsofanopinionpoll,suchasthepercentagesoftheregisteredvoterswhofavorCandidatesA,B,orC.Supposedly,youropinion is included in thosepercentages(assumingthatyouarearegisteredvotersothatyouropinionisincludedinthepopulation).Butonwhatgroundsdoesthepollsterpresumetoknowyouropinion?Itisasafebetthatonlyrarely,ifever,hasapollsteractuallycontactedyouandaskedyouyouropinion.Instead,thepercentagesreportedinthepollareeducatedguessesbasedoninferentialstatisticalproceduresappliedtoarandomsample.

    Inrecentyears,ithasbecomefashionableforthebroadcastandprintmediatoacknowl-edge thatconclusions fromopinionpollsareeducatedguesses (rather thancertainties).Thisacknowledgmentisintheformofamarginoferror.Themarginoferrorishowmuchthereportedpercentagesmaydifferfromtheactualpercentagesinthepopulation(seeChapter11fordetails).

    Anotherexampleoftheimpactofinferentialstatisticalproceduresonourdailylivesisinourchoicesoffoodsandmedicines.ManynewfoodadditivesandmedicinesaretestedforsafetyandapprovedbygovernmentagenciessuchastheFoodandDrugAdministra-tion(FDA).ButhowdoestheFDAknowthatthenewproductissafeforyou?Infact,theFDAdoesnotknowforsure.Thedecisionthatanewdrugissafeisbasedoninferentialstatistical procedures. The FDA example raises several sobering issues about the datausedbygovernmentagencies to set standardsonwhichour lives literallydepend. It isonlyrecentlythatgovernmentagencieshaveinsistedthatdatabecollectedfromwomen,andwithoutsuchdata,itisuncertainifaparticulardrugisactuallysafeoreffectiveforwomen.TheterriblebirthdefectsattributedtothedrugThalidomideoccurredbecausenoonehadbotheredtocollectthedatathatwouldverifythesafetyofthedrugwithpregnantwoman.Similarly,verylittledataonsafelevelsofenvironmentalpollutantssuchasPCBsandpesticideshavebeencollectedfromchildren.Consequently,oursocietymaybeset-tingthesceneforadisasterbyallowingintotheenvironmentchemicalsthatarerelativelysafeforadultsbutdisastrousforchildrenwhoseimmunesystemsareimmatureandwhoserapidlydevelopingbrainsaresensitivetodisruptionbychemicals.1

    1 Foranexcellentdiscussionoftheseissues,seeC.F.Moore(2003),Silent scourge.NewYork:OxfordUniversityPress.

    ER9405.indb 7 7/5/07 11:05:57 AM

  • 8 Chapter1/WhyStatistics?

    Thefinalexampleoftheuseofinferentialproceduresisthebehavioralsciencesthem-selves.Mostknowledgeinthebehavioralsciencesisderivedfromdata.Thedataareana-lyzedusinginferentialstatisticalprocedures,becauseinterestisnotconfinedtojustthesampleofscores,butextends to thewholepopulationofscoresfromwhichthesamplewasselected.Ifyouaretounderstandthedataofthebehavioralsciences,thenyouneedtounderstandhowstatisticalprocedureswork.

    MeAsureMeNt

    Dataarecollectedbymeasuringavariable.Butwhatdoesitmeantomeasureavariable?

    Measurementistheuseofaruletoassignanumbertoaspecificobservationofavariable.

    Asanexample, thinkaboutmeasuringthelengthofyourdesk.Theruleformeasuringlength is, Assign anumber equal to thenumberof lengthsof a standard ruler thatfitexactlyfromoneendofthedesktotheother.Inthisexample,thevariablebeingmeas-uredislength.Theobservationisthelengthofaspecificdesk,yourdesk.Theruleistoassignavalue(forexample,4feet)equaltothenumberoflengthsofastandardrulerthatfitfromoneendofthedesktotheother.

    Asanotherexample,considermeasuringtheweightofanewbornbaby.Thevariablebeingmeasuredisweight.Thespecificobservationistheweightofthespecificbaby.Themeasurement rule is something like,Put thebabyononesideofabalancescale,andassigntothatbabyaweightequaltothenumberofpoundweightsplacedontheothersideofthescaletogetthescaletobalance.

    Measuringvariablesinthebehavioralsciencesalsorequiresthatweusearuletoassignnumbers to observationsof a variable.For example, oneway tomeasuredepression istoassignascoreequaltothesumoftheratingsoftheCESDquestions.Thevariableisdepression,thespecificobservationisthedepressionofthepersonbeingassessed,andtheruleistoassignavalueequaltothesumoftheratings.Similarly,measuringintelligencemeansassigninganumberbasedon thenumberofquestionsansweredcorrectlyonanintelligencetest.

    Considering Measurement in a social and Political Context

    Thechoiceofwhatvariablestomeasureinastudyisnoaccident;usuallythosechoicesentail a lot of discussion andplanning, and areoften influencedby social or politicalmotivesoftheresearcher.Themeasurementrules,aswell,usuallyinvolvemuchdiscus-sion,butthedetailsarerarelystatedinastudysresults.Attheveryleast,theresusuallysome ambiguity. Take, for example, the LONG variable in the Smoking Study, whichmeasures the longest timewithout smoking.Lets say thata studyparticipantanswers8months,whichwouldresultinascoreof7(612months).But,ifweprobefurther,

    ER9405.indb 8 7/5/07 11:05:57 AM

  • Measurement 9

    wemayfindthattheparticipantactuallyanswered:Well,Ididntsmokefor4months,butthenonenightIhadonecigarette,andthendidnthaveanotherfor4months.Isay8monthsbecauseitwasjustaminorslip-up.Isthelongesttimewithoutsmokingforthisindividual8monthsor4months?Issmokingdefinedasonecigaretteoronedragor buying a pack? If the researcher is interested in the effectiveness of a particularantismokingprogram,shemaygivethisparticipantabreakandcountitas8months,becauseclearly,toher,thisparticipantdidntrelapse(itwasonlyonecigarette,afterall).Adifferentresearcher,interestedinshowingthatalladdictswindupusingagain(relaps-ing)mightsaythatonecigaretteconstitutesarelapse,andscorethisas4months.Politi-calmotivesmayenterastudyinthiswaybecauseforsomepeopletheonlysolutionfordrugaddictionmaybeabstinence(forexample,AlcoholicsAnonymous),butforothers,recreationaldrugusemaybeseenasOKincertainsituations(forexample,harmreduc-tionapproaches).Inaddition,aresearchersgrantfundingmaybedependentonhavingand solving a social problem, and maybe even a growing problem, even though theproblemisnotasbigasonemightthink.Therefore,weshouldremaincriticalofhowpsychologistsmeasureandcontemplatewhatmighthavebeenincludedandwhatmighthavebeenleftout.

    differences Among Measurement rules

    All rules formeasuringvariables arenot equallygood.Theydiffer in three importantways.First,theydifferinvalidity.

    Validityreferstohowwellthemeasurementruleactuallymeasuresthevariableunderconsiderationasopposedtosomeothervariable.

    Some intelligence tests are better than others because they measure intelligence ratherthan(accidentally)being influencedbycreativityormemoryfor trivia.Similarly,somemeasuresofdepressionarebetterthanothersbecausetheymeasuredepressionratherthanintroversionoraggressiveness.

    Measurementrulesalsodifferinreliability.

    reliabilityisanindexofhowconsistentlytheruleassignsthesamenumbertothesameobservation.

    Forexample,anintelligencetestisreliableifittendstoassignthesamenumbertoindi-vidualseachtimetheytakethetest.Booksonpsychologicaltestingdiscussvalidityandreliabilityindetail.2

    Finally,athirddifferenceamongmeasurementrulesisthatthepropertiesofthenum-bersassignedasmeasurementsdependontherule.Atfirstblush,thisstatementmaysoundlikenonsense.Afterall,numbersarenumbers;howcantheirpropertiesdiffer?

    2 AclassictextisA.Anastasi(1988),Psychological testing(6thed.).NewYork:Macmillan.

    ER9405.indb 9 7/5/07 11:05:57 AM

  • 10 Chapter1/WhyStatistics?

    Properties of Numbers used as Measurements

    Whennumbersaremeasurements,theycanhavefourproperties.Thefirstoftheseisthecategoryproperty.

    Thecategory propertyisthatobservationsassignedthesamenumberareinthesamecategory,andobservationsassigneddifferentnumbersareindifferentcategories.

    For example, suppose that you are collecting data on the types of cars that Americancitizensdrive,andyouaremostinterestedinthecountryinwhichthecarsweremanu-factured.Youcouldmeasurethecountryofmanufacture(thevariable)byusingthefol-lowingruletoassignnumberstoobservations:IfthecarwasmanufacturedintheUnitedStates,assignita1;ifmanufacturedinJapan,assignita2;ifinGermany,a3;ifinFrance,a4;ifinItaly,a5;andifmanufacturedanywhereelse,a0.Thesenumbershavethecat-egorypropertybecauseeachobservationassignedthesamenumber(forexample,2)isinthesamecategory(madeinJapan).

    Thesecountry-of-manufacturenumbersaredifferentfromthenumbersthatweusuallyencounter.Typically,assigninganumber toanobservation (say,ObservationA)meansmorethanjustassigningobservationAtoaspecificcategory.Forexample,ifObservationAisassignedavalueof1andObservationBisassignedavalueof2,itusuallymeansthatObservationAisshorter,lighter,orlessvaluablethanObservationB.Thisisnotthecaseforthemeasurementsofcountryofmanufacture.AcarmanufacturedintheUnitedStates(andassignedanumber1)isnotnecessarilyshorter,lighter,orlessvaluablethanacarmanufacturedinJapan(assignedthenumber2).Thepointis,howweinterpretthemeasurementsdependsonthepropertiesofthenumbers,whichinturndependontheruleusedinassigningthenumbers.

    Measurementshavetheordinal propertywhenthenumberscanbeusedtoordertheobservationsfromthosethathavetheleastofthevariablebeingmeasuredtothosethathavethemost.

    Consideranotherexample.Supposethatasocialpsychologist investigatingcoopera-tionhasapreschoolteacherrankthefourpupilsintheclassfromleastcooperative(first)tomostcooperative(fourth).Thesecooperationscores(ranks)havetwoproperties.First,thescoreshavethecategoryproperty,becausechildrenassigneddifferentscoresareindifferentcategoriesofcooperation.Second,thescoreshavetheordinalpropertybecausethescorescanbeusedtoordertheobservationsfromthosethathavetheleasttothosethathavethemostcooperation.Itisonlywhenmeasurementshavetheordinalpropertythatweknowthatobservationswithlargermeasurementshavemoreofwhateverisbeingmeasured.

    Athirdpropertythatmeasurementsmayhaveistheequalintervalsproperty.

    Theequal intervals propertymeansthatwhenevertwoobservationsareassignedmeasurements thatdifferbyexactlyoneunit, thereisalwaysanequal interval(difference)betweentheobservationsintheactualvariablebeingmeasured.

    ER9405.indb 10 7/5/07 11:05:58 AM

  • Measurement 11

    Tounderstandwhatismeantbyequalintervals,consideragainmeasuringthecoopera-tivenessofthefourpreschoolchildren.Thefourchildren(callthemAlana,Bob,Carol,andDan)havecooperationscoresof1,2,3,and4.ThedifferencebetweenAlanascoop-erationscore(1)andBobscooperationscore(2)is1.Likewise, thedifferencebetweenCarolscooperationscore(3)andDanscooperationscore(4)is1.Theimportantquestioniswhethertheactualdifferenceincooperation(notjustthescore)betweenAlanaandBobequalstheactualdifferenceincooperationbetweenCarolandDan.

    ItisveryunlikelythatthedifferenceincooperationbetweenAlanaandBobequalsthedifferenceincooperationbetweenCarolandDan.Theteachersimplyrankedthechildrenfromleasttomostcooperative.Theteacherdidnottakeanyprecautionstoensureequalintervals.AlanaandBobmaybothbeveryuncooperative,withBobbeingjustabitmorecooperativethanAlana(theactualdifferenceincooperationbetweenAlanaandBobisabit).Carolmayalsobeontheuncooperativeside,butjustabitmorecooperativethanBob(theactualdifferencebetweenCarolandBobisabit).Suppose,however,thatDanistheteachershelperandisverycooperative.Inthiscase,thedifferenceincooperationbetweenCarolandDanmaybeverylarge,muchlargerthanthedifferenceincoopera-tionbetweenAlanaandBob.Becausethedifferencesinscoresareequal(thedifferenceincooperationscoresbetweenAlanaandBobequalsthedifferenceincooperationscoresbetweenCarolandDan),butthedifferencesinamountofcooperation(thevariable)arenotequal,thesecooperationscores do nothavetheequalintervalproperty.

    NowconsiderusingarulertomeasurethelengthsofthefourlinesinFigure1.1.ThelinesA,B,C,andDhavelengthsof1,2,3,and6centimeters,respectively.Usingarulertomeasurelengthgeneratesmeasurementswiththeequalintervalsproperty:Foreachpairofobservationsforwhichthemeasurementsdifferbyexactlyoneunit,thedifferencesinlengthareexactlyequal.Thatis,themeasurementsassignedlinesA(1)andB(2)differbyone,asdothemeasurementsassignedlinesB(2)andC(3);and,importanttonote,theactualdifferenceinlengthsbetweenlinesAandBexactlyequalstheactualdifferenceinlengthbetweenlinesBandC.

    FiGure 1.1Length measured using two different measurement rules.

    Length measured using a ruler:Length measured using ranks:

    A

    B

    C

    D

    1 2 3 62 3 41

    ER9405.indb 11 7/5/07 11:05:59 AM

  • 12 Chapter1/WhyStatistics?

    Adifficultyinunderstandingtheequalintervalspropertyisinmaintainingthedistinc-tionbetweenthevariablebeingmeasured(lengthorcooperation)andthenumberassignedasameasurementofthevariable.Thenumbersrepresent orstandforcertainpropertiesofthevariable.Thenumbersarenotthevariableitself.Thenumber1isnomorethecoopera-tionofAlana(itisameasureofhercooperation)thanisthenumber1theactuallengthoflineA(itisameasureofitslength).Whetherornotthemeasurementshavepropertiessuchasequalintervalsdependsonhowthenumbersareassignedtorepresentthevariablebeingmeasured.Usingarulertomeasurelengthofadeskassignsnumbersthathavetheequalintervalsproperty;usingrankingstomeasurecooperationofpreschoolchildrenassignsnumbersthatdonothavetheequalintervalsproperty.

    Thedifferencebetweenthelengthandcooperationexamplesisnotinwhatisbeingmeas-ured,butintheruleusedtodothemeasuring.Arankingrulecanbeusedtomeasurethelengthsoflines(thisiswhatwedowhenweneedaroughmeasureoflengthcomparetwolengthstoseewhichislonger).Inthiscase,themeasuredlengthsoflinesA,B,C,andDwouldbe1,2,3,and4,respectively(seeFigure1.1).Thesemeasurementsoflengthdonot havetheequalintervalsproperty,becauseforeachpairofobservationsforwhichthemeas-urementsdifferbyexactlyoneunit,therealdifferencesinlengtharenotexactlyequal.

    Thefourthpropertythatmeasurementsmayhaveistheabsolutezeroproperty.

    Theabsolute zero propertymeansthatavalueofzeroisassignedasameasure-mentonlywhenthereisnothingatallofthevariablethatisbeingmeasured.

    Whenlengthismeasuredusingaruler(ratherthanranks),thescoreofzeroisanabsolutezero.Thatis,thevalueofzeroisassignedonlywhenthereisnolength.Whenmeasur-ingcountryofcarmanufacture,zeroisnotanabsolutezero.Inthatexample,zerodoesnotmeanthatthereisnocountryofmanufacture,onlythatthecountryisnottheUnitedStates,Japan,Germany,France,orItaly.

    Another exampleof ameasurement scale thatdoesnothaveanabsolute zero is theFahrenheit(orCentigrade)scaleformeasuringtemperature.Atemperatureof0Fdoesnotmeanthatthereisnoheat.Infact,thereisstillsomeheatattemperaturesof10F,20F,andsoon.Becausethereisstillsomeheat(thevariablebeingmeasured)whenzeroisassignedasthemeasurement,thezeroisnotanabsolutezero.3

    types of Measurement scales

    Inadditiontothefourpropertiesofmeasurements(category,ordinal,equalintervals,andabsolutezero), therearefourtypesofmeasurementrules(orscales),determinedbythepropertiesofthenumbersassignedbythemeasurementrules.

    A nominal scaleisformedwhenthenumbersassignedbythemeasurementrulehaveonlythecategoryproperty.

    3 TheKelvinscaleoftemperaturedoeshaveanabsolutezero.Onthisscale,0meansabsolutelynoheat.ZerodegreesKelvinequals459.69F.

    ER9405.indb 12 7/5/07 11:05:59 AM

  • Measurement 13

    Nominalcomesfromtheword name.Thenumbersassignedusinganominalscalenamethecategorytowhichtheobservationbelongsbutindicatenothingelse.Thus,themeas-urementsofcountryofmanufactureofcarsformanominalscale,becausethenumbersnamethecategory(country),buthavenootherproperties.

    SeveralofthevariablesintheSmokingStudyaremeasuredusingnominalscales.Forexample,TYPCIG(typeofcigarettesmoked)ismeasuredusinganominalscaledefinedas1=regularfilter;2=regularnofilter;3=light;4=ultralight;5=other.Anothernomi-nallymeasuredvariableisSPOUSE,thatis,whetherthesmokersspousesmokes(1)ordoesnotsmoke(0).TheGENDERvariableintheMaternityStudy(isthechildmaleorfemale)isalsomeasuredusinganominalscale.

    An ordinal scale is formedwhen themeasurement rule assignsnumbers thathavethecategoryandtheordinalproperties,butnootherproperties.

    Manyof thevariables in theSmoking andMaternity studies aremeasuredusingordi-nal scales.The longest timewithout smoking (LONG)variable ismeasuredas1=lessthanaday;2=17days;3=814days;4=15daystoamonth;5=13months;6=36months; 7=612months; 8=more thanayear.As the assigned score increases from1to8, the length of time without smoking increases, so the numbers have the ordinalproperty.However,thedifferencebetweenameasurementof1and2(LONG1LONG2=about3days)isnotcomparabletoadifferencebetweenameasurementof,say,5and6(LONG5LONG6=about3months),thusthemeasurementsdonothavetheequalinter-valsproperty.TheresearchersmighthaveattemptedtomeasureLONGusingaratioscalebyaskingparticipantstoestimatethelongestnumberofdayswithoutsmoking,from0tothousandsofdays.Unfortunately,peoplesestimatesareoftencloudedbyfaultymemoryprocessesandfaultyestimates.OnepersonwhoknowsthathequitonceformorethanayearmightestimateLONGas500days.Anotherpersonwhohadbeenabstinentforthesameamountoftime,butwhocantrememberwhetherhequitintheyear2001or1999,andwhocantquiterememberhowtotranslateyearsintodays,mightestimateLONGas10,000days.Thus,thesemeasurementsarenotasvalidorreliableasthesimplerordinalmeasurementsoftheLONGscale.

    Many behavioral scientists (and businesses that conduct marketing research) collectdatabyhavingpeoplerateobservationsforspecificqualities.Forexample,aclinicalpsy-chologistmaybeaskedtoratetheseverityofhispatientspsychopathologiesonascalefrom1(extremelymild)to10(extremelysevere).Asanotherexample,aconsumermaybeaskedtoratethetasteofanewicecreamfrom1(awful)to100(sublime).Inbothcases,themeasurementsrepresentordinalproperties.Fortheclinicalpsychologist,thelargernum-bersrepresentmoreseverepsychopathologythanthesmallernumbers;fortheice-creamraters,thelargernumbersrepresentbetter-tastingicecreamthanthesmallernumbers.Inneitherexample,however,do themeasurementshave theequal intervalsproperty.As a general rule, ratings and rankings form ordinal scales.

    Thethirdtypeofscaleistheintervalscale.

    interval scalesareformedwhenthenumbersassignedasmeasurementshavethecategory,ordinal,andequalintervalsproperties,butnotanabsolutezero.

    ER9405.indb 13 7/5/07 11:05:59 AM

  • 14 Chapter1/WhyStatistics?

    TwoexamplesofintervalscalesaretheFahrenheitandCentigradescalesoftemperature.Neitherhasanabsolutezerobecause0(ForC)doesnotmeanabsolutelynoheat.Themeasurementsdohavethecategoryproperty(allobservationsassignedthesamenumberofdegreeshavethesameamountofheat),theordinalproperty(largernumbersindicatemoreheat),andtheequalintervalsproperty(onaparticularscale,adifferenceof1alwayscorrespondstoaspecificamountofheat).

    Manypsychologicalvariablesaremeasuredusingscalesthatarebetweenordinalandintervalscales.ThisstatementholdsformanyofthevariablesincludedintheMaternityStudy,suchasmaritalsatisfaction(forexample,M1MARSAT),motherspositiveaffectduringfreeplay(MPOS),infantdysregulationduringfreeplay(IDYS),andchildsinter-nalizingbehaviorduringfreeplay(M7INT).ConsiderM7INTinalittlemoredetail.Tomeasure thevariable,amotherwasasked torateherchildsbehavior in regard toninequestionssuchas,Tends tobe fearfulorafraidofnew thingsornewsituations.Theratingscalewas0=doesnotapply;1=sometimesapplies;2=frequentlyapplies.Thus,theratingofeachquestionformsanordinalscalewithouttheequalintervalsproperty.ButwhathappenswhenwesumtheratingsfromninequestionstogettheM7INTscore?ItisunlikelythatthedifferenceininternalizingbehaviorbetweenM7INT10andM7INT11isexactlythesameasthedifferencebetween,say,M7INT20andM7INT21.Nonetheless,itmaywellbethatthesetwodifferencesininternalizingbehaviorarefairlycomparable,thatis,thatthescaleisclosetohavingtheequalintervalsproperty.

    Theconservative(andalwayscorrect)approachtotheseinbetweenscalesistotreatthemasordinalscales.AswewillseeinPartII,however,ordinalscalesareatadisadvan-tagecomparedtointervalscaleswhenitcomestotherangeandpowerofstatisticaltech-niquesthatcanbeappliedtothedata.Recognizingthisdisadvantage,manypsychologiststreatthedatafromthesein-betweenscalesasintervaldata;thatis,theytreatthedataasifthemeasurementswerecollectedusinganintervalscale.Oneruleofthumbisthatscoresfromthemiddleofanin-betweenscalearemorelikelytohavetheequalintervalspropertythanscoresfromeitherend.Ifthedataincludescoresfromtheendsofanin-betweenscale,itisbesttotreatthedataconservativelyasordinal.

    Manyscalesformeasuringphysicalqualities(length,weight,time)areratioscales.

    A ratio scaleisformedwhenthenumbersassignedbythemeasurementrulehaveallfourproperties:category,ordinal,equalinterval,andabsolutezero.

    Thereasonforthenameratioisthatstatementsaboutratiosofmeasurementsaremean-ingfulonlyonaratioscale.Itmakessensetosaythatalinethatis2.5centimeterslongishalf(aratio)thelengthofa5-centimeterline.Similarly,itmakessensetosaythat20sec-ondsistwice(aratio)thedurationof10seconds.

    Ontheotherhand,itdoesnotmakesensetosaythat68Fistwiceashotas34F.ThisiseasilydemonstratedbyconvertingtoCentigrademeasurements.Supposethatthetem-peratureofObjectAis34F(correspondingto1C)andthatthetemperatureofObjectB is68F(corresponding to20C).Comparing theamountofheat in theobjectsusingtheFahrenheitmeasurementsseemstoindicatethatObjectBistwiceashotasObjectA,because68istwice34.ComparingthemeasurementsontheCentigradescale(whichofcoursedoesnotchangetherealamountofheatintheobjects),itseemsthatObjectBis20timesashotasObjectA.ObjectBcannotbe20timesashotasobjectAandatthesametimebetwiceashot.Theproblemisthatstatementsaboutratiosarenotmeaningfulunless

    ER9405.indb 14 7/5/07 11:06:00 AM

  • UsingComputerstoLearnFromData 15

    themeasurementsaremadeusingaratioscale.Neitherratio(2:1or20:1)isright,becauseneithersetofmeasurementswasmadeusingaratioscale.

    Thisproblemdoesnotoccurwhenusingaratioscale.A5-centimeter(2-inch)lineistwiceaslongasa2.5-centimeter(1-inch)line,andthatistruewhetherthemeasurementsaremadeincentimeters,inches,oranyotherratiomeasurementoflength.

    SeveralvariablesintheSmokingStudyaremeasuredusingratioscales.Oneexampleisthecarbonmonoxidelevelattheendoftreatmentmeasuredinpartspermillion(CO_EOT),andanotheristhenumberoftimestheparticipanthastriedtoquitsmoking(QUIT).

    importance of scale types

    Thequestionthatmaybeuppermostinyourmindis,Sowhat?Therearethreereasonswhyknowingaboutscaletypesisimportant.First,nowthatyouknowaboutscaletypesyouwillbelesslikelytomakeunsupportablestatementsaboutdata.Onesuchstatementistheuseofratiocomparisonswhenthedataarenotmeasuredusingaratioscale.Forexam-ple,considerateacherwhogivesaspellingtestandobservesthatAlicespelled10wordscorrectly,whereasBillspelledonly5wordscorrectly.Certainly,AlicespelledtwiceasmanywordscorrectlyasdidBill.Nevertheless,thenumberofwordscorrectonaspellingtestisnotaratiomeasurementofspellingability(zerowordscorrectdoesnotnecessarilymeanzerospellingability).So,althoughit isperfectlycorrecttosaythatAlicespelledtwiceasmanywordscorrectlyasdidBill,itissillytosaythatAliceistwiceasgoodaspellerasisBill.Similarly,itisnotlegitimatetoclaimthatachildwithaninternalizingscore(M7INT)of20internalizestwiceasmuchasachildwithascoreof10.

    Second,thetypesofdescriptivestatisticalproceduresthatcanbeappliedtodatadependin part on the scale type. Although some types of descriptions can be applied to dataregardlessofthescaletype,othersareappropriateonlyforintervalorratioscales,andstillothersareappropriateforordinal,interval,andratioscales,butnotnominalscales.

    Third,thetypesofinferentialstatisticalproceduresthatcanbeappliedtodatadependinpartonthemeasurementscale.

    Giventhesethreereasons,itisclearthatifyouwanttolearnfromdatayoumustbeabletodeterminewhatsortofscalewasusedincollectingthedata.Theonlywaytoknowthescaletypeistodeterminethepropertiesofthenumbersassignedusingthatrule.Iftheonlypropertyofthemeasurementsisthecategoryproperty,thenthedataarenominal;ifthemeasurementshaveboththecategoryandordinalproperties,thenthedataareordinal;if,inaddition,thedatahavetheequalintervalproperty,thenthedataareinterval.Onlyifthedatahaveallfourpropertiesaretheyratio.

    Nowthatyouunderstandtheimportanceofscaletypes,itmaybehelpfultoreadthissectionagain.Yourabilitytodistinguishamongscaletypeswillbeusedthroughoutthistextbookandinallofyourdealingswithbehavioraldata.

    usiNG CoMPuters to LeArN FroM dAtA

    Dataanalysisofteninvolvessomeprettytediouscomputations,suchasaddingcolumnsofnumbers.Muchofthisdrudgerycanbeeliminatedbyusingacomputerprogramsuch

    ER9405.indb 15 7/5/07 11:06:00 AM

  • 16 Chapter1/WhyStatistics?

    asExcel,andLearning From Dataiswrittentobeusedwiththatprogram.TheCDthatcomeswiththisbookprovidesthefilesthatyourExcelprogramrequirestomeshwiththebook.First,openuptheReadMefileandfollowthe instructionsfor loadingtheExcelAdd-ins.TheseAdd-insprovidecomputerroutinesthatexactlymatchthoseusedinthebook.Second,ifyouarenotfamiliarwithbasicExceloperations(e.g,forenteringdataina spreadsheetor for selecting rowsandcolumns),youshould run theExcel tutorial.Third,theCDincludesnumerousdatafiles.TwolargedatafilesprovidethedatafromtheMaternityandSmokingstudies.Otherdatafilesprovidethedatausedinallofthemajorworked-outexamplesandtheend-of-chapterexercises.

    what statistical Analysis Programs Can do for you

    Theprogramshavetwomainbenefits.First,theyeliminatethedrudgeryofdoinglotsofcalculations.Second,theyensureaccuracyofcalculation.Abenefitthatflowsfromthesetwoisthattheprogramsmakeiteasytoexploredatabyconductingmultipleanalyses.

    what the Programs Cannot do for you

    Almosteverythingthatisimportantisnot donebytheprograms.Theessenceofstatisticalanalysisischoice(choosingtherightstatisticalmethodandinterpretationoftheoutcomeof the chosen method). The programs cannot choose the appropriate methods for you.Similarly,theprogramsdonotknowwhetheradatasetisasample,arandomsample,orapopulation.Consequently,theprogramcannotadequatelyinterprettheoutput.Learning From Datateachesyouhowtomakegoodchoicesandhowtointerprettheoutcomeofthestatisticalmethods;thecomputereliminatesthedrudgery.

    Because the computer program does the calculations, you might think that you canignoretheformulasinthetext.Thatwouldbeabigmistakeforseveralreasons.First,forsmallsetsofdataitiseasiertodocalculationsbyhand(orusingacalculator)ratherthanusingacomputer.Buttodothecalculationsbyhand,youneedtoknowtheformulas.Sec-ond,followingtheformulasisoftenthebestwaytofigureoutexactlywhatthestatisticaltechniqueisdoingandhowitworks.Workingthroughtheformulascanbehardintellec-tuallabor,butthatistheonlywaytounderstandwhattheydo.

    suMMAry

    Thebehavioralsciencesarebuiltonafoundationofdata.Unfortunately,becausebehav-ioraldataconsistofmeasurementsofvariables,individualmeasurementswilldifferfromoneanothersothatnoclearpictureisimmediatelyevident.Fortunately,wecanlearnfromvariabledatabyapplyingstatisticalprocedures.

    Descriptivestatisticalproceduresorganize,describe,andsummarizedata.Descriptivestatisticalprocedurescanbeappliedtosamplesortopopulations,butbecausewerarelyhaveall thescoresinapopulation,descriptiveproceduresaregenerallyappliedtodata

    ER9405.indb 16 7/5/07 11:06:00 AM

  • Exercises 17

    fromsamples.Weuseinferentialstatisticalprocedurestomakeeducatedguesses(infer-ences)aboutapopulationofscoresbasedonarandomsampleofscoresfromthepopula-tion.Althoughtheseinferencesarenoterror-free,appropriateuseofinferentialstatisticalprocedurescanreducethechanceoferrortoacceptablelevels(forexample,themarginoferrorinapoll).

    Appropriatenessofastatisticalproceduredependsinpartonthetypeofmeasurementscaleusedincollectingthedata.Themeasurementscaleisdeterminedbythepropertiesof thenumbers (assignedby themeasurement). If themeasurementshave thecategory,ordinal,equal interval,andabsolutezeroproperties, thenaratioscaleisformed; if themeasurementshaveallbuttheabsolutezeroproperty,anintervalscaleisformed.Ifthemeasurementshaveonlythecategoryandordinalproperties,theyformanordinalscale.Finally,ifthemeasurementshaveonlythecategoryproperty,theyformanominalscale.

    exerCises

    terms Define these new terms.

    variable measurementconstant categorypropertysample ordinalpropertyrandomsample equalintervalspropertypopulation absolutezeropropertydescriptivestatisticalprocedure nominalscaleinferentialstatisticalprocedure ordinalscalevalidity intervalscalereliability ratioscale

    Questions Answer the following questions. (Answersaregiveninthebackofthebookforquestionsmarkedwith.)

    1.Whywouldtherebenoneedfordescriptiveorinferentialstatisticalproceduresifbehavioralscientistscouldmeasureconstantsinsteadofvariables?

    2.List10differentvariablesand1constantinthebehavioralsciences. 3.Classifyeachofthefollowingasapopulation,asample,orboth.Whentheanswer

    isboth,describethecircumstancesunderwhichthedatashouldbeconsideredapopulationandunderwhichtheyshouldbeconsideredasample.

    a. FamilyincomesofallfamiliesintheUnitedStates. b. FamilyincomesofallfamiliesinWisconsin. c. Thenumberofwordsrecalledfromalistof50wordsby25first-yearcollege

    studentswhovolunteertotakepartinanexperiment. d. Thenumberofdaysspentinintensivecareforallpeoplewhohaveundergone

    hearttransplantsurgery. e. Thenumberoferrorsmadebyratslearningamaze.

    ER9405.indb 17 7/5/07 11:06:01 AM

  • 18 Chapter1/WhyStatistics?

    4.Describetwoexamplesofeachofthefourtypesofmeasurementscales.Indicatewhyeachisanexampleofitstype.

    5. Ifyouhadachoicebetweenusingnominal,ordinal, interval,orratioscalestomeasureavariable,whatwouldbethebestchoice?Why?

    6.Asetofscorescanbeonetypeofscaleoranother,dependingonwhatthesetofscoresrepresents.Considerthenumberoferrorsmadebyratsinlearningamaze.If thedata representsimply thenumberoferrors, then thescores forma ratioscale.Thenumbershaveallfourproperties,anditmakesperfectlygoodsensetosaythatifRatAmade30errorsandRatBmade15errors,thenRatAmadetwiceasmanyerrorsasRatB.Suppose,however,thatthescoresareusedasameasureofratintelligence.Arethesescoresaratiomeasureofintelligence?Explainyouranswer.Whataresomeoftheimplicationsofyouranswer?

    7.Determinethetypeofmeasurementscaleusedineachofthefollowingsituations: a. Asupervisorrankshisemployeesfromleasttomostproductive. b. Studentsratetheirstatisticsteachersteachingabilityusingascaleof1(awful)

    to10(magnificent). c. Asociologistclassifiessexualpreferenceas0(heterosexual),1(homosexual),

    2(bisexual),3(asexual),4(other). d. Apsychologistmeasuresthetimetocompleteaproblem-solvingtask.

    ER9405.indb 18 7/5/07 11:06:01 AM

  • part

    Descriptive Statistics2/ FrequencyDistributionsandPercentiles3/ CentralTendencyandVariability4/ zScoresandNormalDistributions

    I

    ER9405.indb 19 7/5/07 11:06:01 AM

  • 20

    ThethreechaptersinPartIprovideanintroductiontodescriptivestatisticaltechniques.Allofthesetechniquesaredesignedtohelpyouorganizeandsummarizeyourdatawithoutintroducingdistortions.Asyouwillsee,oncethedatahavebeenorganized,itisfareasiertomakesenseofthem;thatis,itisfareasiertounderstandwhatthedataaretellingyouabouttheworld.

    Threegeneraltypesofdescriptivetechniquesarecovered.WebegininChapter2withfre-quencydistributionsatechniqueforarrangingthescoresinasampleorapopulationtorevealgeneraltrends.Wewillalsolearnhowtousegraphstoillustratefrequencydistributions.

    Aseconddescriptivetechniqueiscomputingstatisticsthatsummarizefrequencydis-tributionswithjustafewnumbers.InChapter3,wewill learnhowtocomputeseveralindicesofcentraltendency,themosttypicalscoresinadistribution.Wewillalsolearnhowtosummarizethevariabilityofthescoresinadistribution.

    Finally, we will consider two methods for describing relative location of individualscoreswithinadistributionthatis,whereaparticularscorestandsrelativetotheothers.PercentilesareintroducedinChapter2.TheyareoftenusedwhenreportingtheresultsofstandardizedtestssuchastheScholasticAptitudeTest(SAT)andAmericanCollegeTest(ACT).Theothermeasureofrelativestandingisthestandardscore(orzscore)discussedinChapter4.Standardscoresaregenerallymoreusefulthanpercentiles,buttheyrequirethesamebackgroundtounderstand.

    All of thesedescriptive techniques form theunderpinning for the remainder of thisbook,whichdealswithinferentialstatisticaltechniques.Statisticalinferencebeginswithadescriptionofthedatainasample,anditisthisdescriptionthatisusedtomakeinferencesaboutabroaderpopulation.

    ER9405.indb 20 7/5/07 11:06:01 AM

  • 21

    Chapter

    Frequency Distributions and Percentiles

    Collectingdatameansmeasuringobservationsofavariable.And,ofcourse,thesemeasurementswilldifferfromoneanother.Giventhisvariability,itisoftendifficulttomakeanysenseofthedatauntiltheyareanalyzedanddescribed.Thischapterexaminesabasictechniquefordealingwithvariabilityanddescribingdata:formingafrequencydis-tribution.Whenformedcorrectly,frequencydistributionsachievethegoalsofalldescrip-tivestatisticaltechniques:Theyorganizeandsummarizethedatawithoutdistortingtheinformationthedataprovideabouttheworld.

    Thischapteralsointroducestworelatedtopics,graphicalrepresentationofdistributionsandpercentiles.Graphicalrepresentationshighlightthemajorfeaturesofdistributionstofacilitatelearningfromthedata.Percentilesareatechniquefordeterminingtherelativestandingofindividualmeasurementswithinadistribution.

    While reading this chapter, keep in mind that the procedures for constructing fre-quencydistributionscanbeappliedtopopulationsandtosamples.Becauseitissoraretoactuallyhaveallthescoresinapopulation,however,frequencydistributionsareusually

    2Frequency distributions

    RelativeFrequencyCumulativeFrequency

    Grouped Frequency distributionsConstructingGroupedDistributions

    Graphing Frequency distributionsHistogramsFrequencyPolygonsWhentoUseHistogramsand

    FrequencyPolygonsCharacteristics of distributions

    ShapeCentralTendencyVariabilityComparingDistributions

    PercentilesPercentileRanksandPercentilesThreePrecautions

    Computations using excelConstructingFrequencyDistributionsEstimatingPercentileRanksWithExcelEstimatingPercentiles

    summaryexercises

    TermsQuestions

    ER9405.indb 21 7/5/07 11:06:01 AM

  • 22 Chapter2 / FrequencyDistributionsandPercentiles

    constructedfromsamples.Reflectingthisfact,mostoftheexamplesinthechapterwillinvolvesamples.

    FreQueNCy distributioNs

    Supposethatyouareworkingonastudyofsocialdevelopment.Ofparticularinterestistheageatwhichaggressivetendenciesfirstappearinchildren.Youbegindatacollection(measuringtheaggressivenessvariable)byaskingtheteacherofapreschoolclasstoratetheaggressivenessofthe20childrenintheclassusingthescale:

    Meaning ScoreValue

    potentialforviolence 5

    veryaggressive 4

    somewhataggressive 3

    average 2

    timid 1

    verytimid 0

    ThedataareinTable2.1.Asisobvious,thedataarevariable;thatis,themeasurementsdifferfromoneanother.ItisalsoobviousthatitisdifficulttolearnanythingfromthesedataastheyarepresentedinTable2.1.Soasafirststepinlearningfromthedata,theycanbeorganizedandsummarizedbyarrangingthemintheformofafrequencydistribution.

    A frequency distributionisatabulationofthenumberofoccurrencesofeachscorevalue.

    ThefrequencydistributionfortheaggressivenessdataisgiveninTable2.2.Thesecondcolumnliststhescorevalues.ThethirdcolumninTable2.2liststhefrequencywithwhicheachscorevalueappearsinthedata.Constructingthefrequencydistributioninvolvesnoth-ingmorethancountingthenumberofoccurrencesofeachscorevalue.Thereisasimplewaytocheckwhetherthedistributionhasbeenproperlyconstructed:Thesumofthefre-quenciesinthedistributionshouldequalthenumberofobservationsinthesample(orpop-ulation).AsindicatedinTable2.2,thefrequenciessumto20,thenumberofobservations.

    tAbLe 2.1Aggressiveness ratings for 20 Preschoolers

    Child Rating Child Rating Child Rating Child Rating

    a 4 f 0 k 3 p 2

    b 3 g 3 l 0 q 3

    c 1 h 3 m 4 r 3

    d 1 i 4 n 2 s 1

    e 2 j 2 o 3 t 3

    ER9405.indb 22 7/5/07 11:06:02 AM

  • FrequencyDistributions 23

    ItisclearthatthefrequencydistributionhasanumberofadvantagesoverthelistingofthedatainTable2.1.Thefrequencydistributionorganizesandsummarizesthedata,therebyhighlightingthemajorcharacteristics.Forexample,itisnoweasytoseethatthemeasurementsinthesamplerangefromalowof0toahighof4.Also,mostofthemeas-urementsareinthemiddlerangeofscorevalues,andtherearefewermeasurementsintheendsofthedistribution.

    Anotherbenefitprovidedbythefrequencydistributionisthatthedataarenoweasilycommunicated.Todescribethedata,youneedtoreportonlyfivepairsofnumbers(scorevaluesandtheirfrequencies).

    Trynottoconfusethenumbersrepresentingthescorevaluesandthenumbersrepre-sentingthefrequenciesoftheparticularscorevalues.Forexample,inTable2.2thenum-ber4appearsinthecolumnlabeledscorevalueandthecolumnlabeledfrequency.Themeaningof thisnumber isquitedifferent in the twocolumns,however.Thescorevalueof4meansaparticularlevelofaggressiveness(very aggressive).Thefrequencyof4meansthenumberoftimesaparticularscorevaluewasobservedinthedata.Inthiscase,ascorevalueof2(average)wasobservedfourtimes.

    Tohelpovercomeanyconfusion,besurethatyouunderstandthedistinctionsamongthefollowingterms.Scorevaluereferstoapossiblevalueonthemeasurementscale.Notallscorevalueswillnecessarilyappearinthedata,however.Ifaparticularscorevalueisneverassignedasameasurement(forexample,thescorevalue5,potential for violence),thenthatscorevaluewouldhaveafrequencyofzero.Frequencyreferstothenumberof timesaparticular scorevalueoccurs in thedata.Finally, the termsmeasurement,observation,andscoreareusedinterchangeablytorefer toaparticulardatumthenumberassignedtoaparticularindividual.Thus,inTable2.2,thescorevalueof1(timid)occurswithafrequencyof3.Similarly,therearethreescores(measurements,observa-tions)withthescorevalueof1(timid).

    relative Frequency

    Animportanttypeoffrequencydistributionistherelativefrequencydistribution.

    tAbLe 2.2Frequency distributions for the Aggressiveness data in table 2.1

    CumulativeScore Relative Cumulative Relative

    Meaning Values Frequency Frequency Frequency Frequency

    VeryTimid 0 2 .10 2 .10

    Timid 1 3 .15 5 .25

    Average 2 4 .20 9 .45

    Aggressive 3 8 .40 17 .85

    VeryAggressive 4 3 .15 20 1.00

    PotentialforViolence 5 0 .00 20 1.00

    20 1.00

    ER9405.indb 23 7/5/07 11:06:02 AM

  • 24 Chapter2 / FrequencyDistributionsandPercentiles

    relative frequency ofascorevalueistheproportionofobservationsinthedis-tributionatthatscorevalue.A relative frequency distributionisalistingoftherelativefrequenciesofeachscorevalue.

    Therelativefrequencyofascorevalueisobtainedbydividingthescorevaluesfrequencybythetotalnumberofobservations(measurements)inthedistribution.Forexample,therelativefrequencyofaggressivechildren(scorevalueof3)is8/20=.40.

    Relativefrequencyiscloselyrelatedtopercentage.Multiplyingtherelativefrequencyby100givesthepercentageofobservationsatthatscorevalue.Forthesedata,thepercent-ageofchildrenratedaggressiveis.40100=40%.

    ThefourthcolumninTable2.2istherelativefrequencydistributionfortheaggressive-nessdata.Notethatalloftherelativefrequenciesarebetween0.0and1.0,astheymustbe.Also,thesumoftherelativefrequenciesinthedistributionwillalwaysequal1.0.Thus,computingthesumisaquickwaytoensurethat therelativefrequencydistributionhasbeenproperlyconstructed.

    Relative frequencydistributionsareoftenpreferredover rawfrequencydistributionsbecausetherelativefrequenciescombineinformationaboutfrequencywithinformationaboutthenumberofmeasurements.Thiscombinationmakesiteasiertointerpretthedata.Forexample, suppose thatanadvertisement forNationwideBeer informsyou that inascientificallyselectedsample,90peoplepreferredNationwide,comparedtoonly10whopreferredBrandX.YoumayconcludefromthesedatathatmostpeoplepreferNationwide.Suppose,however,thatthesampleactuallyincluded10,000people,90ofwhompreferredNationwide,10ofwhompreferredBrandX,and9,900ofwhomcouldnottellthediffer-ence.Inthiscase,therelativefrequenciesaremuchmoreinformative(fortheconsumer).TherelativefrequencyofpreferenceforNationwideisonly.009.

    Thesameargumentinfavorofrelativefrequencycanalsobemade(inamoremodestway)forthedataonaggressiveness.Itismoreinformativetoknowthattherelativefrequencyofaggressivechildrenis.15thantosimplyknowthatthreechildrenwereratedasaggressive.

    Whendescribingdatafromrandomsamples,relativefrequencyhasanotheradvantage.Therelativefrequencyofascorevalueinarandomsampleisagoodguessfortherela-tivefrequencyofthatscorevalueinthepopulationfromwhichtherandomsamplewasselected.Thereisnocorrespondingrelationbetweenfrequenciesinasampleandfrequen-ciesinapopulation.

    Cumulative Frequency

    Anothertypeofdistributionisthecumulativefrequencydistribution.

    A cumulative frequency distributionisatabulationofthefrequencyofallmeas-urementsatorsmallerthanagivenscorevalue.

    ThefifthcolumninTable2.2isthecumulativefrequencydistributionfortheaggres-sivenessscores.Thecumulativefrequencyofascorevalueisthefrequencyofthatscorevalueplusthefrequencyofallsmallerscorevalues.Thecumulativefrequencyofascorevalueofzero(very timid)is2.Thecumulativefrequencyofascorevalueof1(timid)is

    ER9405.indb 24 7/5/07 11:06:02 AM

  • GroupedFrequencyDistributions 25

    obtainedbyadding3(thefrequencyoftimid)plus2(thefrequencyofvery timid)toget5.Notethatthecumulativefrequencyofthelargestscorevalue(5)equals20,thetotalnum-berofobservations.Thismustbethecase,becausecumulativefrequencyisthefrequencyofallobservationsatsmallerthanagivenscorevalue,andalloftheobservationsmustbeatorsmallerthanthelargestscorevalue.Also,notethatthecumulativefrequenciescanneverdecreasewhengoingfromthelowesttothehighestscorevalue.Thereasonisthatthecumulativefrequencyofthenexthigherscorevalueisalwaysobtainedbyaddingtothelowercumulativefrequency.

    Thenotionofatorsmallerimpliesthatthescorevaluescanbeorderedsothatwecandeterminewhatissmaller.Thus,cumulativefrequencydistributionsareusuallynotappropriatefornominaldata.

    A cumulative relative frequencydistributionisatabulationoftherelativefre-quenciesofallmeasurementsatorbelowagivenscorevalue.

    ThelastcolumninTable2.2liststhecumulativerelativefrequenciesfortheaggressive-nessdata.Thesenumbersareobtainedbyaddinguptherelativefrequenciesofallscorevaluesatorsmallerthanagivenscorevalue.

    Cumulativefrequencydistributionsaremostoftenusedwhencomputingpercentiles.Weshallpostponefurtherdiscussionofthesedistributionsuntilthatsectionofthechapter.

    GrouPed FreQueNCy distributioNs

    Theaggressivenessdatawereparticularlyamenabletodescriptionbyfrequencydistribu-tionsinpartbecausetherewereonlyafewscorevalues.Sometimes,however,thedataarenotsoaccommodating,andamoresophisticatedapproachiscalledfor.

    Consider,forexample,thefirst60measurementsontheYRSMKvariableintheSmok-ingStudy(Table2.3).Becausethemeasurementsarevariable,itisdifficulttolearnany-thingfromthedataaspresentedinthistable.

    ThefrequencydistributionispresentedinTable2.4.Asyoucansee,thefrequencydis-tributionforthesedatadoesnotprovideaveryusefulsummaryofthedata.Theproblemisthattherearetoomanydifferentscorevalues.

    tAbLe 2.3yrsMkNumber of years smoking daily From the First 60 Participants in the smoking study

    5 13 17 20 19 35 21 28 3 22

    26 13 30 30 30 32 40 27 14 4

    27 33 28 45 29 25 38 35 33 39

    5 4 20 24 25 27 16 25 38 9

    36 20 18 11 12 23 22 27 32 49

    22 30 0 32 4 23 9 29 22 23

    ER9405.indb 25 7/5/07 11:06:03 AM

  • 26 Chapter2 / FrequencyDistributionsandPercentiles

    Thesolutionistogroupthedataintoclusterscalledclassintervals.

    A class intervalisarangeofscorevalues.A grouped frequency distributionisatabulationofthenumberofmeasurementsineachclassinterval.

    ThegroupedfrequencydistributionispresentedinTable2.5.Theclass intervalsarelistedontheleft.Thelowestinterval,04,containsallofthemeasurementsbetween(andincluding)0and4.Thenextinterval,59,containsthemeasurementsbetween5and9,andsoon.

    Clearly, the data in the grouped distribution are much more easily interpreted thanwhenthedataareungrouped.Wecannowseethatmostofthepeopleinthissamplehavebeensmokingfor2030years,althoughthereareafewwhohavebeensmokingformorethan45yearsandafewwhohavebeensmokingonlyacoupleofyears.

    Relativeandcumulativefrequencydistributionscanalsobeformedfromgroupeddata.Relativefrequenciesareformedbydividingthefrequencyineachclass intervalbythetotalnumberofmeasurements.Cumulativedistributionsareformedbyaddingupthefre-quencies(orrelativefrequencies)ofallclassintervalsatorbelowagivenclassinterval.ThesedistributionsarealsogiveninTable2.5.

    Constructing Grouped distributions

    Agroupedfrequencydistributionshouldsummarizethedatawithoutdistortingthem.Sum-marizationisaccomplishedbyformingclassintervals;iftheintervalsareinappropriate(for

    tAbLe 2.4Frequency distribution of First 60 yrsMk scores

    ScoreValue Frequency ScoreValue Frequency ScoreValue Frequency

    0 1 17 1 34 0

    1 0 18 1 35 2

    2 0 19 1 36 1

    3 1 20 3 37 0

    4 3 21 1 38 2

    5 2 22 4 39 1

    6 0 23 3 40 1

    7 0 24 1 41 0

    8 0 25 3 42 0

    9 2 26 1 43 0

    10 1 27 4 44 0

    11 1 28 2 45 1

    12 0 29 2 46 0

    13 2 30 4 47 0

    14 1 31 0 48 0

    15 0 32 3 49 1

    16 1 33 2

    ER9405.indb 26 7/5/07 11:06:03 AM

  • GroupedFrequencyDistributions 27

    example,toobig),however,thedataaredistorted.Asanexampleofdistortion,Table2.6summarizestheYRSMKdatafromTable2.3usingthreelargeintervals.Indeed,thedataaresummarized,butimportantinformationregardinghowthemeasurementsaredistrib-uted is lost.The followingsteps shouldbeused toconstructgroupeddistributions thatsummarizebutdonotdistort.

    Guidelinesforgroupedfrequencyintervals:

    1.Thereshouldbebetween8and15intervals. 2.Useconvenientclassintervalsizes,like2,3,5,ormultiplesof5. 3.Startthefirstintervalatorbelowyourlowestscore.

    Toconstructagoodgroupedfrequencydistribution:

    1.Computetherangeofyourscoresbysubtractingthelowestscorefromthehighestscore(Range=HighScoreLowScore).

    2.Dividetherangeby8and15.Findaconvenientnumberinbetweenthosetwovalues.Thatwillbeyourclassinterval.Thisisalsoknownasyourbinwidth.

    tAbLe 2.5Grouped Frequency distributions for yrsMk scores in table 2.3

    ClassInterval Frequency

    RelativeFrequency

    CumulativeFrequency

    CumulativeRelative

    Frequency

    04 5 .083 5 .083

    59 4 .067 9 .150

    1014 5 .083 14 .233

    1519 4 .067 18 .300

    2024 12 .200 30 .500

    2529 12 .200 42 .700

    3034 9 .150 51 .850

    3539 6 .100 57 .950

    4044 1 .017 58 .967

    4549 2 .033 60 1.00

    Total 60 1.000

    tAbLe 2.6Grouped Frequency distributions for yrsMk scores in table 2.3

    Interval f (YRSMK)

    019 18

    2039 39

    4059 3

    Total 60

    ER9405.indb 27 7/5/07 11:06:03 AM

  • 28 Chapter2 / FrequencyDistributionsandPercentiles

    3.Selectastartingvalue.Thestartingvaluecouldbeyourlowestscore,butifyourclassintervalisamultipleof5,thenyoumaywanttoselectamoreconvenient,andhencelowerstartingpoint.Forexample,theintervals09,1019,2029,etc.,workverywellifyouhavedeterminedthataclassintervalof10isappropriate.Ifyourlowestscoreis3,theintervals312,1322,2332,etc.,donotseemasintuitiveas09,1019,2029,etc.(or110,1120,2130,etc.).

    4.Beginningwithyourstartingvalue,constructintervalsofincreasingvalue. 5.Countthenumber(frequency)ofscoresineachinterval.

    Oneotherstepisneededwhenthemeasurementscontaindecimalsinsteadofwholenumbers.Inthesecases,allofthemeasurementsshouldberoundedsothattheyhavethesamenumberofdecimalplaces.

    ThesestepswereusedtoconstructthegroupedfrequencydistributioninTable2.5.ForStep1,therangewascomputedas49(490).ForStep2,therangewasdividedby8(49/8=6.125)and15(49/15=3.267),andaconvenientnumberinbetweenthosetwo(5)wasselectedastheclassinterval.ForStep3,becausethelowestscorewas0,thestartingvaluewassetat0.ForStep4,startingwith0,consecutiveintervals,ofwidth5,wereconstructed:04,59,1014,etc.

    Notethattheinterval59includesthefivescorevalues5,6,7,8,and9.Thus,theinter-valsizereallyis5,eventhoughthedifferencebetween9and5is4.

    Once the lowest interval is specified, the remaining class intervals are easily con-structed.Eachsuccessiveintervalisformedbyaddingtheintervalsize(5)totheboundsoftheprecedinginterval.Forexample,theinterval1014wasobtainedbyadding5toboththelowerandupperboundsoftheinterval59.Finally,tabulatethenumberofmeasure-mentswithineachintervaltoconstructthefrequencydistribution.

    Forasecondexampleofgrouping,considerthedatainTable2.7.The60measurementsinthistablearefromtheSmokingStudy.EachmeasurementisaparticipantsscoreontheWisconsinInventoryofSmokingDependenceMotives(WISDM),whichareratingson65questionssuchasDoessmokingmakeagoodmoodbetter?

    tAbLe 2.7First 60 scores on the wisconsin inventory of smoking dependence Motives (wisdM)

    52.9952 60.7071 53.2262 82.0333 65.9119 59.7071

    44.4405 38.3167 62.2333 39.6786 46.2762 52.1119

    68.4571 66.4476 28.2667 60.6667 50.0857 44.5690

    33.6786 55.1119 21.8190 27.6929 53.1310 36.5500

    57.4857 63.9262 60.0548 50.8071 61.2405 66.5810

    55.3071 28.4643 43.9143 67.8524 54.7310 52.9429

    60.8405 60.7238 51.0786 35.5071 54.2524 65.5429

    60.1310 78.9357 65.1976 32.4833 51.2381 48.5786

    62.5905 80.6071 54.0476 68.8190 52.1738 55.4214

    61.4619 53.3571 35.8976 59.3190 68.1143 62.9429

    ER9405.indb 28 7/5/07 11:06:04 AM

  • GroupedFrequencyDistributions 29

    Becausethesemeasurementscontaindecimals,webeginbymakingsurethatallhavethe same number of decimal places, as they do. For Step 1, we compute the range ofscoresbysubtractingthelowestscore(21.8190)fromthehighestscore(82.0333)toarriveat60.2143.

    InStep2,wedivide the rangeby8and15:60.2143/8=7.526788and60.2143/15=4.041287.Wechooseanumberbetweenthesetworesults,preferablyamultipleof2,3,or5.Wemightchoose5.6901,whichisanumberbetween7.526788and4.041287,andisdivisibleby3,but5.6901willnotserveasaconvenientclassinterval.Rather,5isbetween7.526788and4.041287,divisibleby5(obviously),andconvenient.

    Thethirdstep,selectingastartingvalue,couldbesetatthelowestscore,21.8190,but20.0000seemsmoreintuitive.Thefirstinterval,therefore,willbe20.000024.9999,thenext25.000029.9999,andsoon.Thefinalstepistotabulatethenumberofmeasurementsineachintervaltoobtainthefrequencydistribution,andthendivideeachfrequencybythetotalnumberofobservationstoobtaintherelativefrequencydistribution.

    AsyoucantellfromTable2.8,thesedataareveryinteresting.Thedistributionappearstopheavy.Inotherwords,morethanhalfofthescoresaregreaterthan50.Thismaynotbeunexpected,though,foritisameasureofsmokingmotivesandsmokers(whichalltheparticipantsinthestudyare)mayhavemanymotivestosmoke.Nevertheless,thesedatamaybeimportanttothestudysdesignersbecausetheycanshowthattheirpartici-pantswerehighlymotivatedtosmoke,asopposedtoparticipantswhowerentmotivatedtosmoke.Intheend,thestudysauthors,iftheexperimentissuccessful,canclaimthattheirinterventionworksforpeoplehighlymotivatedtosmoke.

    tAbLe 2.8relative Frequency distribution for wisdM scores (First 60 subjects)

    ClassInterval RelativeFrequency

    20.000024.9999 0.017

    25.000029.9999 0.050

    30.000034.9999 0.033

    35.000039.9999 0.083

    40.000044.9999 0.050

    45.000049.9999 0.033

    50.000054.9999 0.233

    55.000059.9999 0.100

    60.000064.9999 0.200

    65.000069.9999 0.150

    70.000074.9999 0.000

    75.000079.9999 0.017

    80.000084.9999 0.033

    Total 1.000

    ER9405.indb 29 7/5/07 11:06:04 AM

  • 30 Chapter2 / FrequencyDistributionsandPercentiles

    GrAPhiNG FreQueNCy distributioNs

    Displayingafrequencydistributionasagraphcanhighlightimportantfeaturesofthedata.Graphsoffrequencydistributionsarealwaysdrawnusingtwoaxes.

    Theabscissaorx-axisisthehorizontalaxis.Forfrequencyandrelativefrequencydistributions,theabscissaismarkedinunitsofthevariablebeingmeasured,anditislabeledwiththevariablesname.Theordinateory-axisismarkedinunitsoffrequencyorrelativefrequency,andsolabeled.

    InFigure2.1,theabscissaislabeledwithvaluesoftheaggressivenessvariableforthedistribution in Table2.2. The ordinate is marked to represent relative frequency of themeasurements.Techniquesforgraphingfrequencyandrelativefrequencydistributionsarealmostexactlythesame.Theonlydifferenceisinhowtheordinateismarked.Becauserelativefrequencyisgenerallymoreusefulthanrawfrequency,theexamplesthatfollowareforrelativefrequencydistributions.

    histograms

    Figure2.1isarelativefrequencyhistogramfortheaggressivenessdata.

    A relative frequency histogram uses theheightsofbars to represent relativefrequenciesofscorevalues(orclassintervals).

    FiGure 2.1relative frequency histogram for the aggressiveness scores in table 2.2.

    0.40

    0.30

    0.20

    Relat

    ive fr

    eque

    ncy

    0.10

    0 1 2Aggressiveness score

    3 4 5

    ER9405.indb 30 7/5/07 11:06:05 AM

  • GraphingFrequencyDistributions 31

    Toconstructthehistogram,placeabarovereachscorevalue.Thebarextendsuptotheappropriatefrequencymarkontheordinate.Thus,abarsheightisavisualanalogueofthescorevaluesrelativefrequency:thehigherthebar,thegreatertherelativefrequency.

    Relativefrequencyhistogramscanalsobedrawnforgroupeddistributions.Forthesedistributions,abarisplacedovereachclassinterval.

    Figure2.2isarelativefrequencyhistogramoftheYRSMKscoresinTable2.5.Some-times,onlythemidpointsofeachintervalareshownontheabscissa.Themidpointofaclassintervalistheaverageoftheintervalslowerboundandtheupperbound.Again,theheightofeachbarcorrespondstoitsrelativefrequency.

    TherelativefrequencyhistogramillustratedinFigure2.2makesparticularlyclearsomeofthesalientcharacteristicsofthedistribution.Forexample,itiseasytoseethatmostofthescoresareinthemiddleofthedistributionandthatthereisadecreaseinfrequencyfromthemoderatescorestothehigherscores.

    Frequency Polygons

    Figure2.3 is an example of a relative frequency polygon using the WISDM scores inTable2.8.Theaxesofarelativefrequencypolygonarethesameasforahistogram.How-ever,insteadofplacingabarovereachmidpoint(orscorevalue),adotisplacedover