learning from data - an introduction to statistical reasoning

LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING

THIRD EDITION

ER9405.indb 1 7/5/07 11:05:50 AM

ER9405.indb 2 7/5/07 11:05:50 AM

LEARNING FROM DATAAN INTRODUCTION TO STATISTICAL REASONING

THIRD EDITION

ARTHUR M. GLENBERG

MATTHEW E. ANDRZEJEWSKI

Lawrence Erlbaum Associates

New York London

ER9405.indb 3 7/5/07 11:05:50 AM

Lawrence Erlbaum AssociatesTaylor & Francis Group270 Madison AvenueNew York, NY 10016

Lawrence Erlbaum AssociatesTaylor & Francis Group2 Park SquareMilton Park, AbingdonOxon OX14 4RN

2008 by Taylor & Francis Group, LLC Lawrence Erlbaum Associates is an imprint of Taylor & Francis Group, an Informa business

Printed in the United States of America on acidfree paper10 9 8 7 6 5 4 3 2 1

International Standard Book Number13: 9780805849219 (Hardcover)

No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe.

Library of Congress CataloginginPublication Data

Glenberg, Arthur M.Learning from data : an introduction to statistical reasoning / Arthur M. Glenberg and Matthew E.

Andrzejewski. 3rd ed.p. cm.

Includes bibliographical references and index.ISBN13: 9780805849219 (alk. paper)1. Statistics. I. Andrzejewski, Matthew E. II. Title.

HA29.G57 2008001.422dc22 2007022035

Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.com

ER9405.indb 4 7/5/07 11:05:51 AM

Contents

Preface xiii

Chapter 1WhyStatistics? 1Variability 2PopulationsandSamples 4DescriptiveandInferentialStatisticalProcedures 6Measurement 8UsingComputerstoLearnFromData 15Summary 16Exercises 17

part IDescriptie Statistics 19

Chapter 2FrequencyDistributionsandPercentiles 21FrequencyDistributions 22GroupedFrequencyDistributions 25GraphingFrequencyDistributions 30CharacteristicsofDistributions 33Percentiles 38ComputationsUsingExcel 39Summary 41Exercises 42

ER9405.indb 5 7/5/07 11:05:51 AM

i Contents

Chapter 3CentralTendencyandVariability 47SigmaNotation 47MeasuresofCentralTendency 50MeasuresofVariability 56Summary 64Exercises 65

Chapter 4zScoresandNormalDistributions 69StandardScores(zScores) 69CharacteristicsofzScores 74NormalDistributions 76UsingtheStandardNormalDistribution 80OtherStandardizedScores 87Summary 88Exercises 88

part IIIntroduction to Inferential Statistics 91

Chapter 5OverviewofInferentialStatistics 93WhyInferentialProceduresAreNeeded 93VarietiesofInferentialProcedures 95RandomSampling 96BiasedSampling 100Overgeneralizing 101Summary 102Exercises 103

Chapter 6Probability 105ProbabilitiesofEvents 106ProbabilityandRelativeFrequency 107DiscreteProbabilityDistributions 109

ER9405.indb 6 7/5/07 11:05:51 AM

Contents ii

TheOr-ruleforMutuallyExclusiveEvents 112ConditionalProbabilities 113ProbabilityandContinuousVariables 114Summary 116Exercises 117

Chapter 7SamplingDistributions 119ConstructingaSamplingDistribution 119TwoSamplingDistributions 123SamplingDistributionsUsedinStatisticalInference 127SamplingDistributionoftheSampleMean 128ReviewofSymbolsandConcepts 133zScoresandtheSamplingDistributionoftheSampleMean 133APreviewofInferentialStatistics 136Summary 138Exercises 139

Chapter 8LogicofHypothesisTesting 141Step1:ChecktheAssumptionsoftheStatisticalProcedure 143Step2:GeneratetheNullandAlternativeHypotheses 145Step3:SamplingDistributionoftheTestStatistic 147Step4:SettheSignificanceLevelandFormulatetheDecisionRule 150Step5:RandomlySampleFromthePopulationandComputetheTestStatistic 152Step6:ApplytheDecisionRuleandDrawConclusions 153WhenH0IsNotRejected 154BriefReview 155ErrorsinHypothesisTesting:TypeIErrors 157TypeIIErrors 158OutcomesofaStatisticalTest 161DirectionalAlternativeHypotheses 162ASecondExample 166AThirdExample 169Summary 172Exercises 174

Chapter 9Power 177CalculatingPowerUsingzScores 178FactorsAffectingPower 182

ER9405.indb 7 7/5/07 11:05:52 AM

iii Contents

EffectSize 189ComputingProceduresforPowerandSampleSizeDetermination 193WhentoUsePowerAnalyses 195Summary 197Exercises 198

Chapter 10LogicofParameterEstimation 199PointEstimation 200IntervalEstimation 200ConstructingConfidenceLimitsforWhenIsKnown 201WhytheFormulaWorks 204FactorsThatAffecttheWidthoftheConfidenceInterval 206ComparisonofIntervalEstimationandHypothesisTesting 209Summary 210Exercises 211

part IIIapplications of Inferential Statistics 213

Chapter 11InferencesAboutPopulationProportionsUsingthezStatistic 215TheBinomialExperiment 216TestingHypothesesAbout 219TestingaDirectionalAlternativeHypothesisAbout 225PowerandSampleSizeAnalyses 228Estimating 232RelatedStatisticalProcedures 235Summary 236Exercises 238

Chapter 12InferencesAboutWhenIsUnknown:TheSingle-sampletTest 241WhysCannotBeUsedtoComputez 242ThetStatistic 243

ER9405.indb 8 7/5/07 11:05:52 AM

Contents ix

UsingttoTestHypothesesAbout 245ExampleUsingaDirectionalAlternative 252PowerandSampleSizeAnalyses 253EstimatingWhenIsNotKnown 256Summary 258Exercises 258

Chapter 13ComparingTwoPopulations:IndependentSamples 263ComparingNaturallyOccurringandHypotheticalPopulations 264IndependentandDependentSamplingFromPopulations 266SamplingDistributionoftheDifferenceBetweenSample Means(IndependentSamples) 267ThetDistributionforIndependentSamples 269HypothesisTesting 271ASecondExampleofHypothesisTesting 279PowerandSampleSizeAnalyses 281EstimatingtheDifferenceBetweenTwoPopulationMeans 283TheRank-sumTestforIndependentSamples 286Summary 292Exercises 292

Chapter 14RandomSampling,RandomAssignment,andCausality 299RandomSampling 299ExperimentsintheBehavioralSciences 300RandomAssignmentCan(Sometimes)BeUsedInsteadofRandomSampling 303InterpretingtheResultsBasedonRandomAssignment 305Review 306ASecondExample 306Summary 307Exercises 308

Chapter 15ComparingTwoPopulations:DependentSamples 311DependentSampling 312SamplingDistributionsoftheDependent-sampletStatistic 318HypothesisTestingUsingtheDependent-samplet Statistic 320

ER9405.indb 9 7/5/07 11:05:52 AM

x Contents

ASecondExample 326PowerandSampleSizeAnalyses 328EstimatingtheDifferenceBetweenTwoPopulationMeans 330TheWilcoxonTmTest 332HypothesisTestingUsingtheWilcoxonTmStatistic 334Summary 337Exercises 338

Chapter 16ComparingTwoPopulationVariances:TheFStatistic 345TheFStatistic 346TestingHypothesesAboutPopulationVariances 348ASecondExample 353EstimatingtheRatioofTwoPopulationVariances 354Summary 355Exercises 355

Chapter 17ComparingMultiplePopulationMeans:One-factorANOVA 359FactorsandTreatments 361HowtheIndependent-sampleOne-factorANOVAWorks 361TestingHypothesesUsingtheIndependent-sampleANOVA 367ComparisonsBetweenSelectedPopulationMeans:TheProtectedtTest 372ASecondExampleoftheIndependent-sampleOne-factorANOVA 374One-factorANOVAforDependentSamples 376ASecondDependent-sampleOne-factorANOVA 381KruskalWallisHTest:NonparametricAnalogueforthe Independent-sampleOne-factorANOVA 384FriedmanFrTest:NonparametricAnalogueforthe Dependent-sampleOne-factorANOVA 387Summary 390Exercises 391

Chapter 18IntroductiontoFactorialDesigns 399TheTwo-factorFactorialExperiment:CombiningTwoExperimentsIntoOne 400LearningFromaFactorialExperiment 402ASecondExampleofaFactorialExperiment 407

ER9405.indb 10 7/5/07 11:05:53 AM

Contents xi

GraphingtheResultsofaFactorialExperiment 408DesignofFactorialExperiments 410Three-factorFactorialExperiment 412Summary 421Exercises 422

Chapter 19ComputationalMethodsfortheFactorialANOVA 425Two-factorFactorialANOVA 425ComparingPairsofMeans:TheProtectedtTest 433ASecondExampleoftheFactorialANOVA 436Summary 437Exercises 438

Chapter 20DescribingLinearRelationships:Regression 441DependentSamples 443MathematicsofStraightLines 445DescribingLinearRelationships:TheLeast-squaresRegressionLine 448PrecautionsinRegression(andCorrelation)Analysis 454InferencesAbouttheSlopeoftheRegressionLine 457UsingtheRegressionLineforPrediction 464MultipleRegression 469Summary 470Exercises 470

Chapter 21MeasuringtheStrengthofLinearRelationships:Correlation 477Correlation:DescribingtheStrengthofaLinearRelationship 478FactorsThatAffecttheSizeofr 482TestingHypothesesAbout 482CorrelationDoesNotProveCausation 489TheSpearmanRank-orderCorrelation 491OtherCorrelationCoefficients 495PowerandSampleSizeAnalyses 496Summary 498Exercises 498

ER9405.indb 11 7/5/07 11:05:53 AM

xii Contents

Chapter 22InferencesFromNominalData:The2Statistic 505Nominal,Categorical,EnumerativeData 5062Goodness-of-fitTest 507ASecondExampleofthe2Goodness-of-fitTest 511ComparisonofMultiplePopulationDistributions 513SecondExampleofUsing2toCompareMultipleDistributions 517AnAlternativeConceptualization:AnalysisofContingency 519Summary 522Exercises 523

GlossaryofSymbols 527

Tables 531

AppendixA.VariablesFromtheStopSmokingStudy 545

AppendixB.VariablesFromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork 547

AnswerstoSelectedExercises 549

Index 555

ER9405.indb 12 7/5/07 11:05:53 AM

xiii

Preface

Statisticsisadifficultsubject.Thereisalottolearn,andmuchofitinvolvesnewthink-ing.Asthetitle implies,Learning From Data: An Introduction to Statistical Reasoningteachesyouanewwayofthinkingaboutandlearningabouttheworld.Ourgoalistoputreadersinagoodpositiontounderstandpsychologicaldataandtheirlimitations.Anothermore important goal is to evaluate data that affect all aspects of lifepsychological,social, educational, political, and economicto better prepare readers to question andtochallenge.Yetanothergoal is tohelp readers retain thematerial.Psychologistshavedeveloped(fromdata)techniquesthatfacilitatelearningandcomprehension,andwehaveincorporatedthreeofthesetechniquesintothebook.

First, we have devoted extra attention to explaining difficult-to-understand conceptsindetail.Forexample, some textbooksattempt tocombine importantconcepts suchassamplingdistributions,hypothesistesting,power,andparameterestimationinonechapter.Inthisbook,eachconcepthasitsownchapter.Yes,thismeansmorereading,butitalsomeansgreaterunderstanding.

Second,thebookusesrepetitionextensivelytohelpstudentslearnandretainconcepts.Therearemultiplefullyexplainedexamplesofeachmajorprocedure.Manyconcepts(forexample,power,TypeIerrors)arerepeatedfromchaptertochapter.Theproblemsetsattheendsofmostchaptersrequirestudentstoapplyprinciplesintroducedinearlierchapters.

Thethirdmajorlearningaidistheuseofaconsistentschema(thesix-stepprocedure)fordescribingallstatisticaltestsfromthesimplesttothemostcomplex.Theschemapro-videsavaluableheuristicforlearningfromdata.Studentslearn(1)toconsidertheassump-tionsofastatisticaltest,(2)togeneratenullandalternativehypotheses,(3)tochooseanappropriatesamplingdistribution,(4)tosetasignificancecriterionandgenerateadecisionrule, (5) to compute the statistic of interest, and (6) to drawconclusions. Learning theschemaatanearlystage(inChapter8)willeasethewaythroughChapters11through22,inwhich theschemaisapplied tomanydifferentsituations.Thisschemaalsoprovidesa convenient summary for each hypothesis-testingprocedure.A tablewith a summaryschema is included in the lastsectionofeachchaptercontaining thehypothesis-testingprocedure.InsidethefrontcoverofthebookisaStatisticalSelectionGuidetofurtherassiststudentsindeterminingwhichstatisticaltestismostappropriateforthesituation.

ER9405.indb 13 7/5/07 11:05:54 AM

xi Preface

About the book

TherearemanyaspectstoLearning From Datathatdifferentiateitfromotherstatisticstextbooks.Inadditiontothethreeteaching/learningmethodsmentionedearlier,thecon-tentandorganizationofthebookmaybequitedifferentfromwhatstudentsareusedto.First,nonparametricstatisticaltestsareintegratedintothechaptersinwhichanalogousparametrictestsaredescribed.Withthisorganization,studentscanbetterappreciatethesituationsinwhichparticulartestsapply.Infact,throughoutthebookthereisanemphasisonpracticinghowtochoosethebeststatisticalprocedure.Thechoiceoftheprocedureisdiscussedinexamples,andstudentsarerequiredtomakethecorrectchoiceastheysolvetheproblemsattheendofthechapter.Theendpapersofthebookprovideguidelinesforchoosingprocedures.

Second, the initial parts of the chapters on regression (Chapter 20) and correlation(Chapter 21) are self-contained sections that include discussions of regression and cor-relationasdescriptiveprocedures.Instructorsmaypresentthesetopicsalongwithotherdescriptivestatisticsordelaytheirintroductionuntillaterinthecourse.

Third, thebookcontains twoindependent treatmentsofpower.Themajor treatmentbegins inChapter9withgraphical illustrationsofhowpowerchangesunder the influ-enceofsuchfactorsasthesignificancelevelandsamplesize.Thechapteralsointroducesformulasforcomputingpowerandestimatingsamplesizeneededtoobtainaparticularlevelofpower.These formulasare repeatedandgeneralized formanyof thestatisticalproceduresdiscussedinlaterchapters.Often,however,theremaynotbeenoughtimeforanextensivetreatmentofpower.Inthatcase,instructorscanchoosetotreatpowerlessextensivelyandomitChapter9(andtherelevantformulasintheotherchapters).Thislessextensivetreatmentofpowerispartofeachnewinferentialprocedure.Itconsistsofanon-mathematicaldiscussionofhowpowercanbeenhancedforthatparticularprocedure.

Fourth,factorialdesigns,interactions,andtheANOVAareexplainedingreaterdetailthaninmostintroductorytextbooks.Ourgoalistogivestudentsenoughinformationsothattheywillbeabletounderstandthestatisticsusedinmanyprofessionaljournalarticles.Ofcourse,itwouldbefoolishfortheauthorsofanyintroductorytextbooktotrytocoverthestatisticalanalysesofcomplexsituations.Instead,Chapter18discusseshowtwo-factorandthree-factorfactorialexperimentsaredesigned,andhowtointerpretmaineffectsandtwo-factor and three-factor interactions. Chapter 19 presents a description of computa-tionalproceduresfortherelativelysimpletwo-factor,independentsampleANOVA.

Last,butmost important tous, isChapter14,RandomSampling,RandomAssign-ment,andCausality.Amajorreasonforwritingthefirsttwoeditionsofthisbookwastoaddresstheissuesdiscussedinthischapter.Allofuswhoteachstatisticscoursesandcon-ductresearchhavebeenstruckbytheincongruitybetweenwhatwepracticeandwhatwepreach.Whenweteachastatisticscourse,weemphasizerandomsamplingfrompopula-tions.Butinmostexperimentswedonosuchthing.Instead,weusesomeformofrandomassignment to conditions. How can we perform statistical analyses of our experimentswhenwehaveignoredthemostimportantassumptionofthestatisticaltests?InChapter14,wedeveloparationaleforthisbehavior,buttherationaleextractsseverepaymentbyplacingrestrictionsontheinterpretationoftheresultswhenrandomassignmentisusedinsteadofrandomsampling.

ER9405.indb 14 7/5/07 11:05:54 AM

Preface x

New to the third editioN

Inadditiontothefeaturesalreadydescribed,thereareanumberofnewfeatures.First,thethirdeditionofLearning From DataisdesignedtobeusedseamlesslywithExcel.Unlikeothertextsthatconcentrateonstatisticalsoftware,wechoosetofocusonExcel,a spreadsheet program.Recentversionsof statistical programsproduceoutput that arefarmorecomplicatedthanneededfortheundergraduatelevel.TheoutputfromExcelisstraightforward;however, thestatistical toolsavailablearenotcomplete.Thus,wehavewrittenanAdd-in(LFD3DataAnalysisAdd-in)forExcelsoalltheanalysespresentedinthebookcanbeconductedinExcel.Exceliswidelyavailableandcanalsobeusedasa database, data manager, and graphics program; experience with these functions mayprovideavaluablesetofskillsforundergraduatesinanumberofprofessions,includingpsychology.Thus,filescontainingallthedatausedinthebookareprovidedonacom-panionCDinExcelformat.However,becauseotherprogramsarestillwidelyused,text-basedfilesarealsoavailableforuseinotherstatisticalprograms, likeSPSS,SAS,andSystat.

Second,thebookattemptstocapturethestudentsinterestbyfocusingonwhatcanbelearnedfromastatisticalanalysis,notjustonhowitisdone.Thisismostapparentinthetreatmentofhypothesistesting.Usingthesix-stepschema,thelaststepinhypothesis-test-ingisdescribedasdecidingwhethertorejectthenullhypothesisand thenconcludingwhatthatdecisionimpliesabouttheworldandwhattheimplicationsforfutureactionmightbe.Anotherwaythatthebookattemptstocapturethestudentsinterestisbycontinuallyrefer-ringbacktotworealdatasets.Thesedatasetsareintrinsicallyinterestingandsavetimebecausenewexperimentalscenariosdonotneedtobecontinuallyintroduced.ThefirstdatasetontheeffectivenessofZybanandnicotine-replacementgumonsmokingcomesfromDr.TimothyBaker.Datafrom608participantsareincludedonthecompanionCD.TheseconddatasetontheeffectsofhavingachildonmarriagecomesfromDr.JanetHydeandDr.MarilynEssex.Thedatafrom244familiesarealsoincludedonthecom-panionCD.Datafromthesestudiesareusedthroughoutthebookinillustratingimportantconcepts.ThefactthatthesearerealdatasetsstrikesachordwithstudentsthatstatisticsplaysanimportantroleinLearning From Data.

Finally,wehaveprovidedinstructorswithsubstantialresources.Tobeginwith,wehaveaddedapproximately20newproblemstotheend-of-chapterexercisesandprovidedmanymoreonthecompanionCD.IncludedontheinstructorCDaresampletestquestions,exer-cises,andsampledatasets.WehavealsogeneratedPowerpointlecturesforeachchapterforinstructorstouseoredit,astheychoose.Thereareanumberofveryusefulgraphicsandillustrationsthatmirrortheonesinthebook.Therearealsofun,interactiveexercises/dem-onstrationsandtoolsthatwehavefounduseful(forexample,datagenerationalgorithms,Gaussianrandomnumbergenerators,etc.).Asadditionalitemsbecomeavailable,ourWebsite(www.LFD3.net)willprovideusersofthetextbookaccesstothem.

MANy thANks

Manypeoplehavecontributedtothisbook.WethankourstudentsandcolleaguesattheUniversityofWisconsinMadisonandthoseinstructorswhousedthefirsttwoeditionsand

ER9405.indb 15 7/5/07 11:05:54 AM

xi Preface

providedvaluablecomments.WealsothankLauraD.Goodwin(UniversityofColorado,Denver),RichardE.Zinbarg(NorthwesternUniversity),DanielS.Levine(UniversityofTexas, Arlington), and Randall De Pry (University of Colorado, Colorado Springs) fortheirvaluablereviewsofmanyofthechaptersandoftheproposalforathirdeditionofthebook.AMGthankshisinstructorsattheUniversityofMichiganandMiamiUniversity.MEAthankshisinstructorsatTempleUniversity,especiallyRalphRosnow,AlanSock-loff,andPhilBersh.Thanksaredue to theeditorialandproductionstaffsatLawrenceErlbaumAssociates,whotolerateddelayafterdelay.Finally,thankstoMinaandAnnafortheirloveandsupport.

Arthur M. Glenberg

Matthew e. Andrzejewski

ER9405.indb 16 7/5/07 11:05:55 AM

1Chapter

Why Statistics?

Therearemanywaystolearnabouttheworldandthepeoplewhopopulateit.Learningcanresultfromcriticalthinking,askinganauthority,orevenfromareligiousexperience.However,collectingdata(thatis,measuringobservations)isthesurestwaytolearnabouthowtheworldreallyis.

Unfortunately,data in thebehavioralsciencesaremessy. Initialexaminationofdatarevealsnoclearfactsabouttheworld.Instead,thedataappeartobenothingbutaninco-herentjumbleofnumbers.Tolearnabouttheworldfromdata,youmustfirstlearnhowtomakesenseoutofdata,andthatiswhatthistextbookwillteachyou.Statistical procedures are tools for learning about the world by learning from data.

Tohelpyoutounderstandthepowerandusefulnessofstatisticalprocedures,wewillexploretworeal(andimportant!)datasetsthroughoutthecourseofthebook.OneofthedatasetsiscourtesyofProfessorTimothyBakerattheUniversityofWisconsinCenterforTobaccoResearchandIntervention(whichwewillcalltheSmokingStudy).Thedatawerecollectedtoinvestigateseveralquestionsaboutsmoking,addiction,withdrawal,andhowbesttoquitsmoking.Thedatasetconsistsofasampleof608peoplewhowantedtoquitsmoking.Thesepeoplewererandomlyassigned(seeChapter14forthebenefitsofrandomassignment)tothreegroups.Theparticipantsinonegroupweregiventhedrugbupropion

1Variability

SourcesofVariabilityVariablesandConstants

Populations and samplesStatisticalPopulationsTheProblemofLargePopulationsSamples

descriptive and inferential statistical Procedures

DescriptiveStatisticalProceduresInferentialStatisticalProcedures

MeasurementConsideringMeasurementinaSocial

andPoliticalContextDifferencesAmongMeasurementRulesPropertiesofNumbersUsedas

MeasurementsTypesofMeasurementScalesImportanceofScaleTypes

using Computers to Learn From dataWhatStatisticalAnalysisProgramsCan

DoforYouWhattheProgramsCannotDoforYou

summaryexercises

TermsQuestions

ER9405.indb 1 7/5/07 11:05:55 AM

2 Chapter1/WhyStatistics?

SR(Zyban)alongwithnicotinereplacementgum.Inasecondgroup,theparticipantsweregiventhebupropionalongwithaplacebogumthatdidnotcontainanyactiveingredients.Thefinalgroupreceivedbothaplacebodrugandaplacebogum.Themajorquestionofinterestiswhetherpeoplearemoresuccessfulinquittingsmokingwhenthetheactivegumisaddedtothebupropion.Thesedataareexcitingforacoupleofreasons.First,giventhetremendoussocialcostofcigarettesmoking,weasasocietyneedtofigureouthowtohelppeopleovercomethisaddiction,and thesedatado just that.Second, thestudy includedmeasurementsofabout30othervariablestohelpanswerancillaryquestions.Forexample,therearedataonhowlongpeoplehavesmokedandhowmuchtheysmoked;dataonhealthfactorsanddruguse;anddemographicdatasuchasgender,ethnicity,age,education,andheight.ThesevariablesaredescribedmorefullywithintheExcelandSPSSdatafilesontheCDthatcomeswiththisbookandinAppendixA.Thestatisticaltoolsyouwilllearnaboutwillgiveyou theopportunity toexplore thesedata to the fullest extentpossible.Youcanaskimportantquestionssomethatmayneverhavebeenaskedbeforesuchaswhetherdruguseaffectspeoplesabilitytoquitesmoking,andyoucangettheanswers.Inaddition,thesedatawillbeusedtoillustratevariousstatisticalprocedures,andtheywillbeusedintheend-of-chapterexercises.

The second data set is courtesyof Professors JanetHyde and MarilynEssex of theUniversityofWisconsinMadison.ThedatasetisasubsetofthedatafromtheWisconsinMaternityLeaveandHealthProjectandtheWisconsinStudyofFamiliesandWork(wewillrefertoitastheMaternityStudy).Thisprojectwasdesignedtoanswerquestionsabouthowhavingababyaffectsfamilydynamicssuchasmaritalsatisfaction,andhowvariousfactorsaffectchilddevelopment.Thedatasetconsistsofmeasurementsof26variablesfor244families.Someofthesevariablesaredemographic,suchasage,education,andfamilyincome.Maritalsatisfactionwasmeasuredseparatelyformothersandfathersbothbeforethechildwasborn(duringthe5thmonthofpregnancy)andatthreetimesafterthebirth(1,4,and12monthspostpartum).Therearealsodataonhowmuchthemotherworkedoutsidethehouseandhowequallyhouseholdtasksweredividedamongthemothersandfathers.Finally,thereareeightmeasuresofthequalityofmotherchildinteractionsat12monthsafterbirth,andthreemeasuresofchildtemperament(forexample,hyperactivity)measuredwhenthechildwas4.5yearsold.ThesevariablesaredescribedmorefullyontheCDthatcomeswiththisbookandinAppendixB.Aswiththesmokingdata,youarefreetousethesedatatoanswerimportantquestions,suchaswhethertheamountoftimethatamotherworksaffectschilddevelopment.

This chapter introducesanumberof topics that arebasic to statistical analyses.Webeginwithadiscussionofvariability,thecauseofmessydata,andmoveontothedistinc-tionsbetweenpopulationandsample,descriptiveandinferentialstatistics,andtypesofmeasurementfoundinthebehavioralsciences.

VAriAbiLity

Thefirststepinlearninghowtolearnfromdataistounderstandwhydataaremessy.Acon-creteexampleisuseful.ConsidertheCESD(CenterforEpidemiologicStudiesDepression)scores from the Smoking Study (see Appendix A). Each participant rated 20questions

ER9405.indb 2 7/5/07 11:05:55 AM

Variability 3

suchasIfeltlonelyusingaratingof0(rarelyornoneofthetimeduringthepastweek)to3(mostofthetimeduringthepastweek).Thescoreisthesumoftheratingsforthepar-ticipant.Forthe601participantsforwhomwehaveCESDscores,thescoresrangefrom0to23.Aboutaquarterofthescoresarebelow2,butanotherquarterareabove9.Thesedataaremessyinthesensethatthescoresareverydifferentfromoneanother.

Variability is the statistical term for the degree to which scores (such as thedepressionscores)differfromoneanother.

Chapter3presents statisticalprocedures forpreciselymeasuring thevariability inasetofscores.Fornow,onlyanintuitiveunderstandingofvariabilityisneeded.Whenthescoresdifferfromoneanotherbyquitealot(suchasthedepressionscores),variabilityishigh.Whenthescoreshavesimilarvalues,variabilityislow.Whenallthescoresarethesame,thereisnovariability.

sources of Variability

It iseasyenoughtoseethat theCESDdataarevariable,butwhyaretheyvariable?Ingeneral,variabilityarisesfromseveralsources.Onesourceofvariabilityisindividualdif-ferences:Somesmokersaremoredepressedthanothers;somehavedifficultyreadingandunderstandingtheitemsonthetest;somesmokersanswersontheinventoryaremorehon-estthantheanswersofothersmokers.Thereareasmanypotentialsourcesofvariabilityduetoindividualdifferencesastherearereasonsforwhyonepersondiffersfromanotherinintelligence,personality,performance,andphysicalcharacteristics.

Anothersourceofvariabilityistheprocedureusedincollectingthedata.Perhapssomeofthesmokersweremorerushedthanothers;perhapssomeweretestedattheendofthedayandweremoretiredthanothers.Anychangeintheproceduresusedforcollectingthedatacanintroducevariability.Finally,somevariabilitymaybeduetoconditionsimposedontheparticipants,suchaswhethertheyaretakingtheplacebogum.

Variables and Constants

Variabilitydoesnotoccuronlyintextbookexamples;itischaracteristicofalldatainthebehavioralsciences.Wheneverabehavioralscientistcollectsdata,whetheron the inci-denceofdepression, theeffectivenessofapsychotherapeutic technique,or the reactiontimetorespondtoastimulus,thedatawillbevariable;thatis,notallthescorescollectedwillbethesame.Infact,becausedataarevariable,collectingdataissometimesreferredtoasmeasuringavariable(orarandomvariable).

A variableisameasurementthatchangesfromoneobservationtothenext.

CESD is a variable because it changes from one smoker (observation) to the next.Effectivenessofapsychotherapeutictechniqueisanotherexampleofavariable,becauseagiventechniquewillbemoreeffectiveforsomepeoplethanforothers.

ER9405.indb 3 7/5/07 11:05:55 AM


Variablesshouldbedistinguishedfromconstants.

Constantsaremeasurementsthatstaythesamefromoneobservationtothenext.

The boiling point of pure water at sea level is an example of a constant. It is always100degreesCentigrade.Whetheryouusealittlewateroralotofwater,whetherthewaterisencouragedtoboilfasterornot,nomatterwhoismakingtheobservation(aslongastheobserveriscareful!),thewateralwaysboilsatthesametemperature.AnotherconstantisNewtonsgravitationalconstant,therateofaccelerationofanobjectinagravitationalfield(whethertheobjectislargeorsmall,solidorliquid,andsoon).

Manyoftheobservationsmadeinthephysicalsciencesareobservationsofconstants.Becauseofthis,itiseasyforthebeginningstudentinthephysicalsciencestolearnfromdata.Asinglecarefulobservationofaconstanttellsthewholestory.

Youmaybesurprisedtolearnthatthereisnotoneconstantinallofthebehavioralsci-ences.Thereisnosuchthingasthe effectivenessofapsychotherapeutictechnique,orthe depressionscore,becausemeasurementsofthesevariableschangefrompersontoperson.Infact,becausewhatisknowninthebehavioralsciencesisalwaysbasedonmeasuringvari-ables,eventhebeginningstudentmusthavesomefamiliaritywithstatisticalprocedurestoappreciatethebodyofknowledgethatcomprisesthebehavioralsciencesandthelimitationsinherentinthatbodyofknowledge.Incaseyouwerewondering,thisiswhyyouaretakinganintroductorystatisticscourse,andyourfriendsmajoringinthephysicalsciencesarenot.

Theconceptofvariabilityisabsolutelybasictostatisticalreasoning,anditwillmoti-vatealldiscussionsoflearningfromdata.Infact,theremainderofthischapterintroducesconceptsthathavebeendevelopedtohelpcopewithvariability.

PoPuLAtioNs ANd sAMPLes

ThepsychologistsstudyingaddictionmightbeinterestedintheCESDscoresofthespe-cificsmokersfromwhomtheycollecteddata.However,itislikelythattheyareinterestedinmorethanjustthoseindividuals.Forexample,theymaybeinterestedintheincidenceofdepressionamongallsmokersinWisconsin,orallsmokersintheUnitedStates,orevenallsmokersintheworld.Becausedepressionisavariablethatchangesfrompersontoperson,the specificobservations cannot reveal everything the researchersmightwant toknowaboutallofthesedepressionscores.

statistical Populations

Astatisticalpopulationisacollectionorsetofmeasurementsofavariablethatsharesomecommoncharacteristic.

OneexampleofapopulationisthesetofCESDscoresofallsmokersinWisconsin.Thesescoresaremeasurementsofavariable(CESD),andtheyhavethecommoncharacteristicofbeingfromaparticulargroupofpeople:smokersinWisconsin.Adifferentstatistical

ER9405.indb 4 7/5/07 11:05:56 AM

PopulationsandSamples 5

populationconsistsoftheCESDscoresforsmokersintheUnitedStates.And,averydif-ferentpopulationconsistsofthemaritalsatisfactionscoresfornewmotherswhoworkfull-timeoutsideofthehome.Thepointisthatyoushouldnotthinkofstatisticalpopulationsasgroupsofpeople,suchasthepeopleintheUnitedStates.ThereisonlyonepopulationofpeoplefortheUnitedStates,butthereareaninfinitenumberofstatisticalpopulationsdependingonwhatvariablesaremeasured(forexample,CESDormaritalsatisfaction),andhowthosescoresmightbegrouped(forexample,smokersorworkingmothers).

Thinkingofstatisticalpopulationsassetsofmeasurementsmayappearcoldandunfeel-ing.Nonetheless,thinkingthiswayhasatremendousadvantageinthatitfacilitatestheapplicationofthesamestatisticalproceduretoavarietyofpopulations.Insteadofhavingto learnone technique for analyzingand learning fromdepression scores, and anothertechniqueforanalyzingIQscores,andyetanotherforanalyzingerrorsratsmakeinlearn-ingmazes,manyofthesameprocedurescanbeappliedinallofthesecases.Ineverycasewearedealing(statistically)withthesamestuff,asetofmeasurements.

Unfortunately, thinkingofstatisticalpopulationsassetsofnumberscancausesomepeopletobecomeboredandloseinterestintheenterprise.Thewaytocounterthisbore-dom is to remember that the statistical procedures areoperatingonnumbers that havemeaning:Thenumbersarescoresthatrepresentsomethinginterestingabouttheworld(forexample,theincidenceofdepressioninsmokers).Asyoureadthroughthisbook,thinkaboutapplyingyournewknowledgetoproblemsthatareofinteresttoyou,andnotjustasmanipulationofnumbers.

the Problem of Large Populations

Somestatisticalpopulationsconsistofamanageablenumberofscores.Usually,however,statisticalpopulationsareverylarge.Forexample,therearepotentiallymillionsofCESDscoresofsmokers.Whendealingwithlargepopulations,itisdifficultandtimeconsum-ingtoactuallycollectallofthescoresinthepopulation.Sometimes,forethicalreasons,allthescoresinthepopulationcannotbeobtained.Forexample,supposethatamedicalresearcherbelievesthatshehasdiscoveredadrugthatsafelyandeffectivelyreduceshighbloodpressure.Oneway todetermine thedrugseffectiveness is toadminister it toallpeoplesufferingfromhighbloodpressureandthentomeasuretheirbloodpressures.(Thepopulationofinterestconsistsofthebloodpressurescoresofpeoplesufferingfromhighbloodpressurewhohavetakenthenewdrug.)Clearly,thiswouldbetimeconsumingandexpensive.Itwouldalsobeveryunethical.Afterall,whatifthemedicalresearcherwerewrong,andthedrugdidmoreharmthangood?Also,evenwithagreatnationaleffort,notallthescorescouldbecollected,becausesomeofthepeoplewoulddiebeforetheytookthedrug,otherswouldhavetheirbloodpressuresloweredbyotherdrugs,andotherswoulddevelophighbloodpressureoverthecourseofdatacollection.

Weappeartohaverunacrossaproblem.Usually,wearenotinterestedinjustafewscores,butinallthescoresinapopulation.Yet,becausebehavioralscientistsareinter-estedinlearningaboutvariables(notconstants),itisimpossibletoknowforsureaboutallthescoresinapopulationfrommeasuringjustafewofthem.Ontheotherhand,itistimeconsumingandexpensivetocollectallthescoresinapopulation,anditmaybeunethicalorimpossible.Whattodo?

ER9405.indb 5 7/5/07 11:05:56 AM


samples

The solution to this problem is provided by statistical procedures based on samplingfrompopulations.

A sampleisasubsetofmeasurementsfromapopulation.

Thatis,asamplecontainssome,butusuallynotall,ofthescoresinthepopulation.The608CESDscoresareasamplefromthepopulationofCESDscoresofallsmokers.

Animportanttypeofsampleisarandomsample.

A random sampleisselectedsothateveryscoreinthepopulationhasanequalchanceofbeingincluded.

Whetherasampleisrandomornotdoesnotdependontheactualscoresincludedinthesample,butonhowthescoresinthesampleareselected.Onlyifthescoresareselectedinsuchawaythateachscoreinthepopulationhasanequalchanceofbeingincludedinthesampleisthesamplearandomsample.TheCESDscoresarenotarandomsampleofCESDscoresofallsmokers.ThesescoresareonlyfrompeoplelivinginMadisonandMilwaukee,Wisconsin,andtherewasnoattempttoensurethatCESDscoresofpeoplelivingelsewherewereincluded.ProceduresforproducingrandomsamplesarediscussedinChapter5.

AsyouwillseeinChapters522,randomsamplesareusedtohelpsolvetheproblemoflargepopulations.Thatis,withthedatainarandomsample,wecanlearnaboutthepopu-lationfromwhichthesamplewasobtainedbyusinginferentialstatisticalprocedures.

desCriPtiVe ANd iNFereNtiAL stAtistiCAL ProCedures

descriptive statistical Procedures

Becauseofvariability,inordertolearnanythingfromdata,thedatamustbeorganized.

descriptive statistical proceduresareusedtoorganizeandsummarizethemeas-urementsinsamplesandpopulations.

Inotherwords,descriptivestatisticalproceduresdowhatthenameimpliestheydescribethedata.Theseprocedurescanbeappliedtosamplesandtopopulations.Mostoften,theyareappliedtosamples,becauseitisraretohaveallthescoresinapopulation.

Descriptivestatisticalproceduresincludewaysoforderingandgroupingdataintodis-tributions(discussedinChapter2)andwaysofcalculatingsinglenumbersthatsummarizethewholesetofscoresinthesampleorpopulation(discussedinChapters2and3).Somedescriptivestatisticalproceduresareusedtorepresentdatagraphically,becauseasevery-oneknows,apictureisworthathousandwords.

ER9405.indb 6 7/5/07 11:05:56 AM

DescriptiveandInferentialStatisticalProcedures 7

inferential statistical Procedures

Themostpowerfultoolsavailabletothestatisticianareinferentialstatisticalprocedures.

inferential statistical procedures are used to make educated guesses (infer-ences)aboutpopulationsbasedonrandomsamplesfromthepopulations.

Theseeducatedguessesarethebestwaytolearnaboutapopulationshortofcollectingallofthescoresinthepopulation.

Allofthismaysoundabitlikemagic.Howcanyoupossiblylearnaboutawholepopu-lationthatmaycontainmillionsandmillions(or,theoretically,aninfinity)ofscoresbyexaminingasmallnumberofscorescontainedinarandomsamplefromthatpopulation?Itisnotmagic,however,anditisevenunderstandable.PartIIofthisbookpresentsadetaileddescriptionofhowinferentialstatisticalprocedureswork.

Inferentialstatisticalproceduresaresopervasiveinoursocietythatyouhaveundoubt-edlyreadabout themandmadedecisionsbasedonthem.Forexample, thinkabout thelasttimeyouheardtheresultsofanopinionpoll,suchasthepercentagesoftheregisteredvoterswhofavorCandidatesA,B,orC.Supposedly,youropinion is included in thosepercentages(assumingthatyouarearegisteredvotersothatyouropinionisincludedinthepopulation).Butonwhatgroundsdoesthepollsterpresumetoknowyouropinion?Itisasafebetthatonlyrarely,ifever,hasapollsteractuallycontactedyouandaskedyouyouropinion.Instead,thepercentagesreportedinthepollareeducatedguessesbasedoninferentialstatisticalproceduresappliedtoarandomsample.

Inrecentyears,ithasbecomefashionableforthebroadcastandprintmediatoacknowl-edge thatconclusions fromopinionpollsareeducatedguesses (rather thancertainties).Thisacknowledgmentisintheformofamarginoferror.Themarginoferrorishowmuchthereportedpercentagesmaydifferfromtheactualpercentagesinthepopulation(seeChapter11fordetails).

Anotherexampleoftheimpactofinferentialstatisticalproceduresonourdailylivesisinourchoicesoffoodsandmedicines.ManynewfoodadditivesandmedicinesaretestedforsafetyandapprovedbygovernmentagenciessuchastheFoodandDrugAdministra-tion(FDA).ButhowdoestheFDAknowthatthenewproductissafeforyou?Infact,theFDAdoesnotknowforsure.Thedecisionthatanewdrugissafeisbasedoninferentialstatistical procedures. The FDA example raises several sobering issues about the datausedbygovernmentagencies to set standardsonwhichour lives literallydepend. It isonlyrecentlythatgovernmentagencieshaveinsistedthatdatabecollectedfromwomen,andwithoutsuchdata,itisuncertainifaparticulardrugisactuallysafeoreffectiveforwomen.TheterriblebirthdefectsattributedtothedrugThalidomideoccurredbecausenoonehadbotheredtocollectthedatathatwouldverifythesafetyofthedrugwithpregnantwoman.Similarly,verylittledataonsafelevelsofenvironmentalpollutantssuchasPCBsandpesticideshavebeencollectedfromchildren.Consequently,oursocietymaybeset-tingthesceneforadisasterbyallowingintotheenvironmentchemicalsthatarerelativelysafeforadultsbutdisastrousforchildrenwhoseimmunesystemsareimmatureandwhoserapidlydevelopingbrainsaresensitivetodisruptionbychemicals.1

1 Foranexcellentdiscussionoftheseissues,seeC.F.Moore(2003),Silent scourge.NewYork:OxfordUniversityPress.

ER9405.indb 7 7/5/07 11:05:57 AM


Thefinalexampleoftheuseofinferentialproceduresisthebehavioralsciencesthem-selves.Mostknowledgeinthebehavioralsciencesisderivedfromdata.Thedataareana-lyzedusinginferentialstatisticalprocedures,becauseinterestisnotconfinedtojustthesampleofscores,butextends to thewholepopulationofscoresfromwhichthesamplewasselected.Ifyouaretounderstandthedataofthebehavioralsciences,thenyouneedtounderstandhowstatisticalprocedureswork.

MeAsureMeNt

Dataarecollectedbymeasuringavariable.Butwhatdoesitmeantomeasureavariable?

Measurementistheuseofaruletoassignanumbertoaspecificobservationofavariable.

Asanexample, thinkaboutmeasuringthelengthofyourdesk.Theruleformeasuringlength is, Assign anumber equal to thenumberof lengthsof a standard ruler thatfitexactlyfromoneendofthedesktotheother.Inthisexample,thevariablebeingmeas-uredislength.Theobservationisthelengthofaspecificdesk,yourdesk.Theruleistoassignavalue(forexample,4feet)equaltothenumberoflengthsofastandardrulerthatfitfromoneendofthedesktotheother.

Asanotherexample,considermeasuringtheweightofanewbornbaby.Thevariablebeingmeasuredisweight.Thespecificobservationistheweightofthespecificbaby.Themeasurement rule is something like,Put thebabyononesideofabalancescale,andassigntothatbabyaweightequaltothenumberofpoundweightsplacedontheothersideofthescaletogetthescaletobalance.

Measuringvariablesinthebehavioralsciencesalsorequiresthatweusearuletoassignnumbers to observationsof a variable.For example, oneway tomeasuredepression istoassignascoreequaltothesumoftheratingsoftheCESDquestions.Thevariableisdepression,thespecificobservationisthedepressionofthepersonbeingassessed,andtheruleistoassignavalueequaltothesumoftheratings.Similarly,measuringintelligencemeansassigninganumberbasedon thenumberofquestionsansweredcorrectlyonanintelligencetest.

Considering Measurement in a social and Political Context

Thechoiceofwhatvariablestomeasureinastudyisnoaccident;usuallythosechoicesentail a lot of discussion andplanning, and areoften influencedby social or politicalmotivesoftheresearcher.Themeasurementrules,aswell,usuallyinvolvemuchdiscus-sion,butthedetailsarerarelystatedinastudysresults.Attheveryleast,theresusuallysome ambiguity. Take, for example, the LONG variable in the Smoking Study, whichmeasures the longest timewithout smoking.Lets say thata studyparticipantanswers8months,whichwouldresultinascoreof7(612months).But,ifweprobefurther,

ER9405.indb 8 7/5/07 11:05:57 AM

Measurement 9

wemayfindthattheparticipantactuallyanswered:Well,Ididntsmokefor4months,butthenonenightIhadonecigarette,andthendidnthaveanotherfor4months.Isay8monthsbecauseitwasjustaminorslip-up.Isthelongesttimewithoutsmokingforthisindividual8monthsor4months?Issmokingdefinedasonecigaretteoronedragor buying a pack? If the researcher is interested in the effectiveness of a particularantismokingprogram,shemaygivethisparticipantabreakandcountitas8months,becauseclearly,toher,thisparticipantdidntrelapse(itwasonlyonecigarette,afterall).Adifferentresearcher,interestedinshowingthatalladdictswindupusingagain(relaps-ing)mightsaythatonecigaretteconstitutesarelapse,andscorethisas4months.Politi-calmotivesmayenterastudyinthiswaybecauseforsomepeopletheonlysolutionfordrugaddictionmaybeabstinence(forexample,AlcoholicsAnonymous),butforothers,recreationaldrugusemaybeseenasOKincertainsituations(forexample,harmreduc-tionapproaches).Inaddition,aresearchersgrantfundingmaybedependentonhavingand solving a social problem, and maybe even a growing problem, even though theproblemisnotasbigasonemightthink.Therefore,weshouldremaincriticalofhowpsychologistsmeasureandcontemplatewhatmighthavebeenincludedandwhatmighthavebeenleftout.

differences Among Measurement rules

All rules formeasuringvariables arenot equallygood.Theydiffer in three importantways.First,theydifferinvalidity.

Validityreferstohowwellthemeasurementruleactuallymeasuresthevariableunderconsiderationasopposedtosomeothervariable.

Some intelligence tests are better than others because they measure intelligence ratherthan(accidentally)being influencedbycreativityormemoryfor trivia.Similarly,somemeasuresofdepressionarebetterthanothersbecausetheymeasuredepressionratherthanintroversionoraggressiveness.

Measurementrulesalsodifferinreliability.

reliabilityisanindexofhowconsistentlytheruleassignsthesamenumbertothesameobservation.

Forexample,anintelligencetestisreliableifittendstoassignthesamenumbertoindi-vidualseachtimetheytakethetest.Booksonpsychologicaltestingdiscussvalidityandreliabilityindetail.2

Finally,athirddifferenceamongmeasurementrulesisthatthepropertiesofthenum-bersassignedasmeasurementsdependontherule.Atfirstblush,thisstatementmaysoundlikenonsense.Afterall,numbersarenumbers;howcantheirpropertiesdiffer?

2 AclassictextisA.Anastasi(1988),Psychological testing(6thed.).NewYork:Macmillan.

ER9405.indb 9 7/5/07 11:05:57 AM


Properties of Numbers used as Measurements

Whennumbersaremeasurements,theycanhavefourproperties.Thefirstoftheseisthecategoryproperty.

Thecategory propertyisthatobservationsassignedthesamenumberareinthesamecategory,andobservationsassigneddifferentnumbersareindifferentcategories.

For example, suppose that you are collecting data on the types of cars that Americancitizensdrive,andyouaremostinterestedinthecountryinwhichthecarsweremanu-factured.Youcouldmeasurethecountryofmanufacture(thevariable)byusingthefol-lowingruletoassignnumberstoobservations:IfthecarwasmanufacturedintheUnitedStates,assignita1;ifmanufacturedinJapan,assignita2;ifinGermany,a3;ifinFrance,a4;ifinItaly,a5;andifmanufacturedanywhereelse,a0.Thesenumbershavethecat-egorypropertybecauseeachobservationassignedthesamenumber(forexample,2)isinthesamecategory(madeinJapan).

Thesecountry-of-manufacturenumbersaredifferentfromthenumbersthatweusuallyencounter.Typically,assigninganumber toanobservation (say,ObservationA)meansmorethanjustassigningobservationAtoaspecificcategory.Forexample,ifObservationAisassignedavalueof1andObservationBisassignedavalueof2,itusuallymeansthatObservationAisshorter,lighter,orlessvaluablethanObservationB.Thisisnotthecaseforthemeasurementsofcountryofmanufacture.AcarmanufacturedintheUnitedStates(andassignedanumber1)isnotnecessarilyshorter,lighter,orlessvaluablethanacarmanufacturedinJapan(assignedthenumber2).Thepointis,howweinterpretthemeasurementsdependsonthepropertiesofthenumbers,whichinturndependontheruleusedinassigningthenumbers.

Measurementshavetheordinal propertywhenthenumberscanbeusedtoordertheobservationsfromthosethathavetheleastofthevariablebeingmeasuredtothosethathavethemost.

Consideranotherexample.Supposethatasocialpsychologist investigatingcoopera-tionhasapreschoolteacherrankthefourpupilsintheclassfromleastcooperative(first)tomostcooperative(fourth).Thesecooperationscores(ranks)havetwoproperties.First,thescoreshavethecategoryproperty,becausechildrenassigneddifferentscoresareindifferentcategoriesofcooperation.Second,thescoreshavetheordinalpropertybecausethescorescanbeusedtoordertheobservationsfromthosethathavetheleasttothosethathavethemostcooperation.Itisonlywhenmeasurementshavetheordinalpropertythatweknowthatobservationswithlargermeasurementshavemoreofwhateverisbeingmeasured.

Athirdpropertythatmeasurementsmayhaveistheequalintervalsproperty.

Theequal intervals propertymeansthatwhenevertwoobservationsareassignedmeasurements thatdifferbyexactlyoneunit, thereisalwaysanequal interval(difference)betweentheobservationsintheactualvariablebeingmeasured.

ER9405.indb 10 7/5/07 11:05:58 AM

Measurement 11

Tounderstandwhatismeantbyequalintervals,consideragainmeasuringthecoopera-tivenessofthefourpreschoolchildren.Thefourchildren(callthemAlana,Bob,Carol,andDan)havecooperationscoresof1,2,3,and4.ThedifferencebetweenAlanascoop-erationscore(1)andBobscooperationscore(2)is1.Likewise, thedifferencebetweenCarolscooperationscore(3)andDanscooperationscore(4)is1.Theimportantquestioniswhethertheactualdifferenceincooperation(notjustthescore)betweenAlanaandBobequalstheactualdifferenceincooperationbetweenCarolandDan.

ItisveryunlikelythatthedifferenceincooperationbetweenAlanaandBobequalsthedifferenceincooperationbetweenCarolandDan.Theteachersimplyrankedthechildrenfromleasttomostcooperative.Theteacherdidnottakeanyprecautionstoensureequalintervals.AlanaandBobmaybothbeveryuncooperative,withBobbeingjustabitmorecooperativethanAlana(theactualdifferenceincooperationbetweenAlanaandBobisabit).Carolmayalsobeontheuncooperativeside,butjustabitmorecooperativethanBob(theactualdifferencebetweenCarolandBobisabit).Suppose,however,thatDanistheteachershelperandisverycooperative.Inthiscase,thedifferenceincooperationbetweenCarolandDanmaybeverylarge,muchlargerthanthedifferenceincoopera-tionbetweenAlanaandBob.Becausethedifferencesinscoresareequal(thedifferenceincooperationscoresbetweenAlanaandBobequalsthedifferenceincooperationscoresbetweenCarolandDan),butthedifferencesinamountofcooperation(thevariable)arenotequal,thesecooperationscores do nothavetheequalintervalproperty.

NowconsiderusingarulertomeasurethelengthsofthefourlinesinFigure1.1.ThelinesA,B,C,andDhavelengthsof1,2,3,and6centimeters,respectively.Usingarulertomeasurelengthgeneratesmeasurementswiththeequalintervalsproperty:Foreachpairofobservationsforwhichthemeasurementsdifferbyexactlyoneunit,thedifferencesinlengthareexactlyequal.Thatis,themeasurementsassignedlinesA(1)andB(2)differbyone,asdothemeasurementsassignedlinesB(2)andC(3);and,importanttonote,theactualdifferenceinlengthsbetweenlinesAandBexactlyequalstheactualdifferenceinlengthbetweenlinesBandC.

FiGure 1.1Length measured using two different measurement rules.

Length measured using a ruler:Length measured using ranks:

A

B

C

D

1 2 3 62 3 41

ER9405.indb 11 7/5/07 11:05:59 AM


Adifficultyinunderstandingtheequalintervalspropertyisinmaintainingthedistinc-tionbetweenthevariablebeingmeasured(lengthorcooperation)andthenumberassignedasameasurementofthevariable.Thenumbersrepresent orstandforcertainpropertiesofthevariable.Thenumbersarenotthevariableitself.Thenumber1isnomorethecoopera-tionofAlana(itisameasureofhercooperation)thanisthenumber1theactuallengthoflineA(itisameasureofitslength).Whetherornotthemeasurementshavepropertiessuchasequalintervalsdependsonhowthenumbersareassignedtorepresentthevariablebeingmeasured.Usingarulertomeasurelengthofadeskassignsnumbersthathavetheequalintervalsproperty;usingrankingstomeasurecooperationofpreschoolchildrenassignsnumbersthatdonothavetheequalintervalsproperty.

Thedifferencebetweenthelengthandcooperationexamplesisnotinwhatisbeingmeas-ured,butintheruleusedtodothemeasuring.Arankingrulecanbeusedtomeasurethelengthsoflines(thisiswhatwedowhenweneedaroughmeasureoflengthcomparetwolengthstoseewhichislonger).Inthiscase,themeasuredlengthsoflinesA,B,C,andDwouldbe1,2,3,and4,respectively(seeFigure1.1).Thesemeasurementsoflengthdonot havetheequalintervalsproperty,becauseforeachpairofobservationsforwhichthemeas-urementsdifferbyexactlyoneunit,therealdifferencesinlengtharenotexactlyequal.

Thefourthpropertythatmeasurementsmayhaveistheabsolutezeroproperty.

Theabsolute zero propertymeansthatavalueofzeroisassignedasameasure-mentonlywhenthereisnothingatallofthevariablethatisbeingmeasured.

Whenlengthismeasuredusingaruler(ratherthanranks),thescoreofzeroisanabsolutezero.Thatis,thevalueofzeroisassignedonlywhenthereisnolength.Whenmeasur-ingcountryofcarmanufacture,zeroisnotanabsolutezero.Inthatexample,zerodoesnotmeanthatthereisnocountryofmanufacture,onlythatthecountryisnottheUnitedStates,Japan,Germany,France,orItaly.

Another exampleof ameasurement scale thatdoesnothaveanabsolute zero is theFahrenheit(orCentigrade)scaleformeasuringtemperature.Atemperatureof0Fdoesnotmeanthatthereisnoheat.Infact,thereisstillsomeheatattemperaturesof10F,20F,andsoon.Becausethereisstillsomeheat(thevariablebeingmeasured)whenzeroisassignedasthemeasurement,thezeroisnotanabsolutezero.3

types of Measurement scales

Inadditiontothefourpropertiesofmeasurements(category,ordinal,equalintervals,andabsolutezero), therearefourtypesofmeasurementrules(orscales),determinedbythepropertiesofthenumbersassignedbythemeasurementrules.

A nominal scaleisformedwhenthenumbersassignedbythemeasurementrulehaveonlythecategoryproperty.

3 TheKelvinscaleoftemperaturedoeshaveanabsolutezero.Onthisscale,0meansabsolutelynoheat.ZerodegreesKelvinequals459.69F.

ER9405.indb 12 7/5/07 11:05:59 AM

Measurement 13

Nominalcomesfromtheword name.Thenumbersassignedusinganominalscalenamethecategorytowhichtheobservationbelongsbutindicatenothingelse.Thus,themeas-urementsofcountryofmanufactureofcarsformanominalscale,becausethenumbersnamethecategory(country),buthavenootherproperties.

SeveralofthevariablesintheSmokingStudyaremeasuredusingnominalscales.Forexample,TYPCIG(typeofcigarettesmoked)ismeasuredusinganominalscaledefinedas1=regularfilter;2=regularnofilter;3=light;4=ultralight;5=other.Anothernomi-nallymeasuredvariableisSPOUSE,thatis,whetherthesmokersspousesmokes(1)ordoesnotsmoke(0).TheGENDERvariableintheMaternityStudy(isthechildmaleorfemale)isalsomeasuredusinganominalscale.

An ordinal scale is formedwhen themeasurement rule assignsnumbers thathavethecategoryandtheordinalproperties,butnootherproperties.

Manyof thevariables in theSmoking andMaternity studies aremeasuredusingordi-nal scales.The longest timewithout smoking (LONG)variable ismeasuredas1=lessthanaday;2=17days;3=814days;4=15daystoamonth;5=13months;6=36months; 7=612months; 8=more thanayear.As the assigned score increases from1to8, the length of time without smoking increases, so the numbers have the ordinalproperty.However,thedifferencebetweenameasurementof1and2(LONG1LONG2=about3days)isnotcomparabletoadifferencebetweenameasurementof,say,5and6(LONG5LONG6=about3months),thusthemeasurementsdonothavetheequalinter-valsproperty.TheresearchersmighthaveattemptedtomeasureLONGusingaratioscalebyaskingparticipantstoestimatethelongestnumberofdayswithoutsmoking,from0tothousandsofdays.Unfortunately,peoplesestimatesareoftencloudedbyfaultymemoryprocessesandfaultyestimates.OnepersonwhoknowsthathequitonceformorethanayearmightestimateLONGas500days.Anotherpersonwhohadbeenabstinentforthesameamountoftime,butwhocantrememberwhetherhequitintheyear2001or1999,andwhocantquiterememberhowtotranslateyearsintodays,mightestimateLONGas10,000days.Thus,thesemeasurementsarenotasvalidorreliableasthesimplerordinalmeasurementsoftheLONGscale.

Many behavioral scientists (and businesses that conduct marketing research) collectdatabyhavingpeoplerateobservationsforspecificqualities.Forexample,aclinicalpsy-chologistmaybeaskedtoratetheseverityofhispatientspsychopathologiesonascalefrom1(extremelymild)to10(extremelysevere).Asanotherexample,aconsumermaybeaskedtoratethetasteofanewicecreamfrom1(awful)to100(sublime).Inbothcases,themeasurementsrepresentordinalproperties.Fortheclinicalpsychologist,thelargernum-bersrepresentmoreseverepsychopathologythanthesmallernumbers;fortheice-creamraters,thelargernumbersrepresentbetter-tastingicecreamthanthesmallernumbers.Inneitherexample,however,do themeasurementshave theequal intervalsproperty.As a general rule, ratings and rankings form ordinal scales.

Thethirdtypeofscaleistheintervalscale.

interval scalesareformedwhenthenumbersassignedasmeasurementshavethecategory,ordinal,andequalintervalsproperties,butnotanabsolutezero.

ER9405.indb 13 7/5/07 11:05:59 AM


TwoexamplesofintervalscalesaretheFahrenheitandCentigradescalesoftemperature.Neitherhasanabsolutezerobecause0(ForC)doesnotmeanabsolutelynoheat.Themeasurementsdohavethecategoryproperty(allobservationsassignedthesamenumberofdegreeshavethesameamountofheat),theordinalproperty(largernumbersindicatemoreheat),andtheequalintervalsproperty(onaparticularscale,adifferenceof1alwayscorrespondstoaspecificamountofheat).

Manypsychologicalvariablesaremeasuredusingscalesthatarebetweenordinalandintervalscales.ThisstatementholdsformanyofthevariablesincludedintheMaternityStudy,suchasmaritalsatisfaction(forexample,M1MARSAT),motherspositiveaffectduringfreeplay(MPOS),infantdysregulationduringfreeplay(IDYS),andchildsinter-nalizingbehaviorduringfreeplay(M7INT).ConsiderM7INTinalittlemoredetail.Tomeasure thevariable,amotherwasasked torateherchildsbehavior in regard toninequestionssuchas,Tends tobe fearfulorafraidofnew thingsornewsituations.Theratingscalewas0=doesnotapply;1=sometimesapplies;2=frequentlyapplies.Thus,theratingofeachquestionformsanordinalscalewithouttheequalintervalsproperty.ButwhathappenswhenwesumtheratingsfromninequestionstogettheM7INTscore?ItisunlikelythatthedifferenceininternalizingbehaviorbetweenM7INT10andM7INT11isexactlythesameasthedifferencebetween,say,M7INT20andM7INT21.Nonetheless,itmaywellbethatthesetwodifferencesininternalizingbehaviorarefairlycomparable,thatis,thatthescaleisclosetohavingtheequalintervalsproperty.

Theconservative(andalwayscorrect)approachtotheseinbetweenscalesistotreatthemasordinalscales.AswewillseeinPartII,however,ordinalscalesareatadisadvan-tagecomparedtointervalscaleswhenitcomestotherangeandpowerofstatisticaltech-niquesthatcanbeappliedtothedata.Recognizingthisdisadvantage,manypsychologiststreatthedatafromthesein-betweenscalesasintervaldata;thatis,theytreatthedataasifthemeasurementswerecollectedusinganintervalscale.Oneruleofthumbisthatscoresfromthemiddleofanin-betweenscalearemorelikelytohavetheequalintervalspropertythanscoresfromeitherend.Ifthedataincludescoresfromtheendsofanin-betweenscale,itisbesttotreatthedataconservativelyasordinal.

Manyscalesformeasuringphysicalqualities(length,weight,time)areratioscales.

A ratio scaleisformedwhenthenumbersassignedbythemeasurementrulehaveallfourproperties:category,ordinal,equalinterval,andabsolutezero.

Thereasonforthenameratioisthatstatementsaboutratiosofmeasurementsaremean-ingfulonlyonaratioscale.Itmakessensetosaythatalinethatis2.5centimeterslongishalf(aratio)thelengthofa5-centimeterline.Similarly,itmakessensetosaythat20sec-ondsistwice(aratio)thedurationof10seconds.

Ontheotherhand,itdoesnotmakesensetosaythat68Fistwiceashotas34F.ThisiseasilydemonstratedbyconvertingtoCentigrademeasurements.Supposethatthetem-peratureofObjectAis34F(correspondingto1C)andthatthetemperatureofObjectB is68F(corresponding to20C).Comparing theamountofheat in theobjectsusingtheFahrenheitmeasurementsseemstoindicatethatObjectBistwiceashotasObjectA,because68istwice34.ComparingthemeasurementsontheCentigradescale(whichofcoursedoesnotchangetherealamountofheatintheobjects),itseemsthatObjectBis20timesashotasObjectA.ObjectBcannotbe20timesashotasobjectAandatthesametimebetwiceashot.Theproblemisthatstatementsaboutratiosarenotmeaningfulunless

ER9405.indb 14 7/5/07 11:06:00 AM

UsingComputerstoLearnFromData 15

themeasurementsaremadeusingaratioscale.Neitherratio(2:1or20:1)isright,becauseneithersetofmeasurementswasmadeusingaratioscale.

Thisproblemdoesnotoccurwhenusingaratioscale.A5-centimeter(2-inch)lineistwiceaslongasa2.5-centimeter(1-inch)line,andthatistruewhetherthemeasurementsaremadeincentimeters,inches,oranyotherratiomeasurementoflength.

SeveralvariablesintheSmokingStudyaremeasuredusingratioscales.Oneexampleisthecarbonmonoxidelevelattheendoftreatmentmeasuredinpartspermillion(CO_EOT),andanotheristhenumberoftimestheparticipanthastriedtoquitsmoking(QUIT).

importance of scale types

Thequestionthatmaybeuppermostinyourmindis,Sowhat?Therearethreereasonswhyknowingaboutscaletypesisimportant.First,nowthatyouknowaboutscaletypesyouwillbelesslikelytomakeunsupportablestatementsaboutdata.Onesuchstatementistheuseofratiocomparisonswhenthedataarenotmeasuredusingaratioscale.Forexam-ple,considerateacherwhogivesaspellingtestandobservesthatAlicespelled10wordscorrectly,whereasBillspelledonly5wordscorrectly.Certainly,AlicespelledtwiceasmanywordscorrectlyasdidBill.Nevertheless,thenumberofwordscorrectonaspellingtestisnotaratiomeasurementofspellingability(zerowordscorrectdoesnotnecessarilymeanzerospellingability).So,althoughit isperfectlycorrecttosaythatAlicespelledtwiceasmanywordscorrectlyasdidBill,itissillytosaythatAliceistwiceasgoodaspellerasisBill.Similarly,itisnotlegitimatetoclaimthatachildwithaninternalizingscore(M7INT)of20internalizestwiceasmuchasachildwithascoreof10.

Second,thetypesofdescriptivestatisticalproceduresthatcanbeappliedtodatadependin part on the scale type. Although some types of descriptions can be applied to dataregardlessofthescaletype,othersareappropriateonlyforintervalorratioscales,andstillothersareappropriateforordinal,interval,andratioscales,butnotnominalscales.

Third,thetypesofinferentialstatisticalproceduresthatcanbeappliedtodatadependinpartonthemeasurementscale.

Giventhesethreereasons,itisclearthatifyouwanttolearnfromdatayoumustbeabletodeterminewhatsortofscalewasusedincollectingthedata.Theonlywaytoknowthescaletypeistodeterminethepropertiesofthenumbersassignedusingthatrule.Iftheonlypropertyofthemeasurementsisthecategoryproperty,thenthedataarenominal;ifthemeasurementshaveboththecategoryandordinalproperties,thenthedataareordinal;if,inaddition,thedatahavetheequalintervalproperty,thenthedataareinterval.Onlyifthedatahaveallfourpropertiesaretheyratio.

Nowthatyouunderstandtheimportanceofscaletypes,itmaybehelpfultoreadthissectionagain.Yourabilitytodistinguishamongscaletypeswillbeusedthroughoutthistextbookandinallofyourdealingswithbehavioraldata.

usiNG CoMPuters to LeArN FroM dAtA

Dataanalysisofteninvolvessomeprettytediouscomputations,suchasaddingcolumnsofnumbers.Muchofthisdrudgerycanbeeliminatedbyusingacomputerprogramsuch

ER9405.indb 15 7/5/07 11:06:00 AM


asExcel,andLearning From Dataiswrittentobeusedwiththatprogram.TheCDthatcomeswiththisbookprovidesthefilesthatyourExcelprogramrequirestomeshwiththebook.First,openuptheReadMefileandfollowthe instructionsfor loadingtheExcelAdd-ins.TheseAdd-insprovidecomputerroutinesthatexactlymatchthoseusedinthebook.Second,ifyouarenotfamiliarwithbasicExceloperations(e.g,forenteringdataina spreadsheetor for selecting rowsandcolumns),youshould run theExcel tutorial.Third,theCDincludesnumerousdatafiles.TwolargedatafilesprovidethedatafromtheMaternityandSmokingstudies.Otherdatafilesprovidethedatausedinallofthemajorworked-outexamplesandtheend-of-chapterexercises.

what statistical Analysis Programs Can do for you

Theprogramshavetwomainbenefits.First,theyeliminatethedrudgeryofdoinglotsofcalculations.Second,theyensureaccuracyofcalculation.Abenefitthatflowsfromthesetwoisthattheprogramsmakeiteasytoexploredatabyconductingmultipleanalyses.

what the Programs Cannot do for you

Almosteverythingthatisimportantisnot donebytheprograms.Theessenceofstatisticalanalysisischoice(choosingtherightstatisticalmethodandinterpretationoftheoutcomeof the chosen method). The programs cannot choose the appropriate methods for you.Similarly,theprogramsdonotknowwhetheradatasetisasample,arandomsample,orapopulation.Consequently,theprogramcannotadequatelyinterprettheoutput.Learning From Datateachesyouhowtomakegoodchoicesandhowtointerprettheoutcomeofthestatisticalmethods;thecomputereliminatesthedrudgery.

Because the computer program does the calculations, you might think that you canignoretheformulasinthetext.Thatwouldbeabigmistakeforseveralreasons.First,forsmallsetsofdataitiseasiertodocalculationsbyhand(orusingacalculator)ratherthanusingacomputer.Buttodothecalculationsbyhand,youneedtoknowtheformulas.Sec-ond,followingtheformulasisoftenthebestwaytofigureoutexactlywhatthestatisticaltechniqueisdoingandhowitworks.Workingthroughtheformulascanbehardintellec-tuallabor,butthatistheonlywaytounderstandwhattheydo.

suMMAry

Thebehavioralsciencesarebuiltonafoundationofdata.Unfortunately,becausebehav-ioraldataconsistofmeasurementsofvariables,individualmeasurementswilldifferfromoneanothersothatnoclearpictureisimmediatelyevident.Fortunately,wecanlearnfromvariabledatabyapplyingstatisticalprocedures.

Descriptivestatisticalproceduresorganize,describe,andsummarizedata.Descriptivestatisticalprocedurescanbeappliedtosamplesortopopulations,butbecausewerarelyhaveall thescoresinapopulation,descriptiveproceduresaregenerallyappliedtodata

ER9405.indb 16 7/5/07 11:06:00 AM

Exercises 17

fromsamples.Weuseinferentialstatisticalprocedurestomakeeducatedguesses(infer-ences)aboutapopulationofscoresbasedonarandomsampleofscoresfromthepopula-tion.Althoughtheseinferencesarenoterror-free,appropriateuseofinferentialstatisticalprocedurescanreducethechanceoferrortoacceptablelevels(forexample,themarginoferrorinapoll).

Appropriatenessofastatisticalproceduredependsinpartonthetypeofmeasurementscaleusedincollectingthedata.Themeasurementscaleisdeterminedbythepropertiesof thenumbers (assignedby themeasurement). If themeasurementshave thecategory,ordinal,equal interval,andabsolutezeroproperties, thenaratioscaleisformed; if themeasurementshaveallbuttheabsolutezeroproperty,anintervalscaleisformed.Ifthemeasurementshaveonlythecategoryandordinalproperties,theyformanordinalscale.Finally,ifthemeasurementshaveonlythecategoryproperty,theyformanominalscale.

exerCises

terms Define these new terms.

variable measurementconstant categorypropertysample ordinalpropertyrandomsample equalintervalspropertypopulation absolutezeropropertydescriptivestatisticalprocedure nominalscaleinferentialstatisticalprocedure ordinalscalevalidity intervalscalereliability ratioscale

Questions Answer the following questions. (Answersaregiveninthebackofthebookforquestionsmarkedwith.)

1.Whywouldtherebenoneedfordescriptiveorinferentialstatisticalproceduresifbehavioralscientistscouldmeasureconstantsinsteadofvariables?

2.List10differentvariablesand1constantinthebehavioralsciences. 3.Classifyeachofthefollowingasapopulation,asample,orboth.Whentheanswer

isboth,describethecircumstancesunderwhichthedatashouldbeconsideredapopulationandunderwhichtheyshouldbeconsideredasample.

a. FamilyincomesofallfamiliesintheUnitedStates. b. FamilyincomesofallfamiliesinWisconsin. c. Thenumberofwordsrecalledfromalistof50wordsby25first-yearcollege

studentswhovolunteertotakepartinanexperiment. d. Thenumberofdaysspentinintensivecareforallpeoplewhohaveundergone

hearttransplantsurgery. e. Thenumberoferrorsmadebyratslearningamaze.

ER9405.indb 17 7/5/07 11:06:01 AM


4.Describetwoexamplesofeachofthefourtypesofmeasurementscales.Indicatewhyeachisanexampleofitstype.

5. Ifyouhadachoicebetweenusingnominal,ordinal, interval,orratioscalestomeasureavariable,whatwouldbethebestchoice?Why?

6.Asetofscorescanbeonetypeofscaleoranother,dependingonwhatthesetofscoresrepresents.Considerthenumberoferrorsmadebyratsinlearningamaze.If thedata representsimply thenumberoferrors, then thescores forma ratioscale.Thenumbershaveallfourproperties,anditmakesperfectlygoodsensetosaythatifRatAmade30errorsandRatBmade15errors,thenRatAmadetwiceasmanyerrorsasRatB.Suppose,however,thatthescoresareusedasameasureofratintelligence.Arethesescoresaratiomeasureofintelligence?Explainyouranswer.Whataresomeoftheimplicationsofyouranswer?

7.Determinethetypeofmeasurementscaleusedineachofthefollowingsituations: a. Asupervisorrankshisemployeesfromleasttomostproductive. b. Studentsratetheirstatisticsteachersteachingabilityusingascaleof1(awful)

to10(magnificent). c. Asociologistclassifiessexualpreferenceas0(heterosexual),1(homosexual),

2(bisexual),3(asexual),4(other). d. Apsychologistmeasuresthetimetocompleteaproblem-solvingtask.

ER9405.indb 18 7/5/07 11:06:01 AM

part

Descriptive Statistics2/ FrequencyDistributionsandPercentiles3/ CentralTendencyandVariability4/ zScoresandNormalDistributions

I

ER9405.indb 19 7/5/07 11:06:01 AM

20

ThethreechaptersinPartIprovideanintroductiontodescriptivestatisticaltechniques.Allofthesetechniquesaredesignedtohelpyouorganizeandsummarizeyourdatawithoutintroducingdistortions.Asyouwillsee,oncethedatahavebeenorganized,itisfareasiertomakesenseofthem;thatis,itisfareasiertounderstandwhatthedataaretellingyouabouttheworld.

Threegeneraltypesofdescriptivetechniquesarecovered.WebegininChapter2withfre-quencydistributionsatechniqueforarrangingthescoresinasampleorapopulationtorevealgeneraltrends.Wewillalsolearnhowtousegraphstoillustratefrequencydistributions.

Aseconddescriptivetechniqueiscomputingstatisticsthatsummarizefrequencydis-tributionswithjustafewnumbers.InChapter3,wewill learnhowtocomputeseveralindicesofcentraltendency,themosttypicalscoresinadistribution.Wewillalsolearnhowtosummarizethevariabilityofthescoresinadistribution.

Finally, we will consider two methods for describing relative location of individualscoreswithinadistributionthatis,whereaparticularscorestandsrelativetotheothers.PercentilesareintroducedinChapter2.TheyareoftenusedwhenreportingtheresultsofstandardizedtestssuchastheScholasticAptitudeTest(SAT)andAmericanCollegeTest(ACT).Theothermeasureofrelativestandingisthestandardscore(orzscore)discussedinChapter4.Standardscoresaregenerallymoreusefulthanpercentiles,buttheyrequirethesamebackgroundtounderstand.

All of thesedescriptive techniques form theunderpinning for the remainder of thisbook,whichdealswithinferentialstatisticaltechniques.Statisticalinferencebeginswithadescriptionofthedatainasample,anditisthisdescriptionthatisusedtomakeinferencesaboutabroaderpopulation.

ER9405.indb 20 7/5/07 11:06:01 AM

21

Chapter

Frequency Distributions and Percentiles

Collectingdatameansmeasuringobservationsofavariable.And,ofcourse,thesemeasurementswilldifferfromoneanother.Giventhisvariability,itisoftendifficulttomakeanysenseofthedatauntiltheyareanalyzedanddescribed.Thischapterexaminesabasictechniquefordealingwithvariabilityanddescribingdata:formingafrequencydis-tribution.Whenformedcorrectly,frequencydistributionsachievethegoalsofalldescrip-tivestatisticaltechniques:Theyorganizeandsummarizethedatawithoutdistortingtheinformationthedataprovideabouttheworld.

Thischapteralsointroducestworelatedtopics,graphicalrepresentationofdistributionsandpercentiles.Graphicalrepresentationshighlightthemajorfeaturesofdistributionstofacilitatelearningfromthedata.Percentilesareatechniquefordeterminingtherelativestandingofindividualmeasurementswithinadistribution.

While reading this chapter, keep in mind that the procedures for constructing fre-quencydistributionscanbeappliedtopopulationsandtosamples.Becauseitissoraretoactuallyhaveallthescoresinapopulation,however,frequencydistributionsareusually

2Frequency distributions

RelativeFrequencyCumulativeFrequency

Grouped Frequency distributionsConstructingGroupedDistributions

Graphing Frequency distributionsHistogramsFrequencyPolygonsWhentoUseHistogramsand

FrequencyPolygonsCharacteristics of distributions

ShapeCentralTendencyVariabilityComparingDistributions

PercentilesPercentileRanksandPercentilesThreePrecautions

Computations using excelConstructingFrequencyDistributionsEstimatingPercentileRanksWithExcelEstimatingPercentiles

summaryexercises

TermsQuestions

ER9405.indb 21 7/5/07 11:06:01 AM

22 Chapter2 / FrequencyDistributionsandPercentiles

constructedfromsamples.Reflectingthisfact,mostoftheexamplesinthechapterwillinvolvesamples.

FreQueNCy distributioNs

Supposethatyouareworkingonastudyofsocialdevelopment.Ofparticularinterestistheageatwhichaggressivetendenciesfirstappearinchildren.Youbegindatacollection(measuringtheaggressivenessvariable)byaskingtheteacherofapreschoolclasstoratetheaggressivenessofthe20childrenintheclassusingthescale:

Meaning ScoreValue

potentialforviolence 5

veryaggressive 4

somewhataggressive 3

average 2

timid 1

verytimid 0

ThedataareinTable2.1.Asisobvious,thedataarevariable;thatis,themeasurementsdifferfromoneanother.ItisalsoobviousthatitisdifficulttolearnanythingfromthesedataastheyarepresentedinTable2.1.Soasafirststepinlearningfromthedata,theycanbeorganizedandsummarizedbyarrangingthemintheformofafrequencydistribution.

A frequency distributionisatabulationofthenumberofoccurrencesofeachscorevalue.

ThefrequencydistributionfortheaggressivenessdataisgiveninTable2.2.Thesecondcolumnliststhescorevalues.ThethirdcolumninTable2.2liststhefrequencywithwhicheachscorevalueappearsinthedata.Constructingthefrequencydistributioninvolvesnoth-ingmorethancountingthenumberofoccurrencesofeachscorevalue.Thereisasimplewaytocheckwhetherthedistributionhasbeenproperlyconstructed:Thesumofthefre-quenciesinthedistributionshouldequalthenumberofobservationsinthesample(orpop-ulation).AsindicatedinTable2.2,thefrequenciessumto20,thenumberofobservations.

tAbLe 2.1Aggressiveness ratings for 20 Preschoolers

Child Rating Child Rating Child Rating Child Rating

a 4 f 0 k 3 p 2

b 3 g 3 l 0 q 3

c 1 h 3 m 4 r 3

d 1 i 4 n 2 s 1

e 2 j 2 o 3 t 3

ER9405.indb 22 7/5/07 11:06:02 AM

FrequencyDistributions 23

ItisclearthatthefrequencydistributionhasanumberofadvantagesoverthelistingofthedatainTable2.1.Thefrequencydistributionorganizesandsummarizesthedata,therebyhighlightingthemajorcharacteristics.Forexample,itisnoweasytoseethatthemeasurementsinthesamplerangefromalowof0toahighof4.Also,mostofthemeas-urementsareinthemiddlerangeofscorevalues,andtherearefewermeasurementsintheendsofthedistribution.

Anotherbenefitprovidedbythefrequencydistributionisthatthedataarenoweasilycommunicated.Todescribethedata,youneedtoreportonlyfivepairsofnumbers(scorevaluesandtheirfrequencies).

Trynottoconfusethenumbersrepresentingthescorevaluesandthenumbersrepre-sentingthefrequenciesoftheparticularscorevalues.Forexample,inTable2.2thenum-ber4appearsinthecolumnlabeledscorevalueandthecolumnlabeledfrequency.Themeaningof thisnumber isquitedifferent in the twocolumns,however.Thescorevalueof4meansaparticularlevelofaggressiveness(very aggressive).Thefrequencyof4meansthenumberoftimesaparticularscorevaluewasobservedinthedata.Inthiscase,ascorevalueof2(average)wasobservedfourtimes.

Tohelpovercomeanyconfusion,besurethatyouunderstandthedistinctionsamongthefollowingterms.Scorevaluereferstoapossiblevalueonthemeasurementscale.Notallscorevalueswillnecessarilyappearinthedata,however.Ifaparticularscorevalueisneverassignedasameasurement(forexample,thescorevalue5,potential for violence),thenthatscorevaluewouldhaveafrequencyofzero.Frequencyreferstothenumberof timesaparticular scorevalueoccurs in thedata.Finally, the termsmeasurement,observation,andscoreareusedinterchangeablytorefer toaparticulardatumthenumberassignedtoaparticularindividual.Thus,inTable2.2,thescorevalueof1(timid)occurswithafrequencyof3.Similarly,therearethreescores(measurements,observa-tions)withthescorevalueof1(timid).

relative Frequency

Animportanttypeoffrequencydistributionistherelativefrequencydistribution.

tAbLe 2.2Frequency distributions for the Aggressiveness data in table 2.1

CumulativeScore Relative Cumulative Relative

Meaning Values Frequency Frequency Frequency Frequency

VeryTimid 0 2 .10 2 .10

Timid 1 3 .15 5 .25

Average 2 4 .20 9 .45

Aggressive 3 8 .40 17 .85

VeryAggressive 4 3 .15 20 1.00

PotentialforViolence 5 0 .00 20 1.00

20 1.00

ER9405.indb 23 7/5/07 11:06:02 AM


relative frequency ofascorevalueistheproportionofobservationsinthedis-tributionatthatscorevalue.A relative frequency distributionisalistingoftherelativefrequenciesofeachscorevalue.

Therelativefrequencyofascorevalueisobtainedbydividingthescorevaluesfrequencybythetotalnumberofobservations(measurements)inthedistribution.Forexample,therelativefrequencyofaggressivechildren(scorevalueof3)is8/20=.40.

Relativefrequencyiscloselyrelatedtopercentage.Multiplyingtherelativefrequencyby100givesthepercentageofobservationsatthatscorevalue.Forthesedata,thepercent-ageofchildrenratedaggressiveis.40100=40%.

ThefourthcolumninTable2.2istherelativefrequencydistributionfortheaggressive-nessdata.Notethatalloftherelativefrequenciesarebetween0.0and1.0,astheymustbe.Also,thesumoftherelativefrequenciesinthedistributionwillalwaysequal1.0.Thus,computingthesumisaquickwaytoensurethat therelativefrequencydistributionhasbeenproperlyconstructed.

Relative frequencydistributionsareoftenpreferredover rawfrequencydistributionsbecausetherelativefrequenciescombineinformationaboutfrequencywithinformationaboutthenumberofmeasurements.Thiscombinationmakesiteasiertointerpretthedata.Forexample, suppose thatanadvertisement forNationwideBeer informsyou that inascientificallyselectedsample,90peoplepreferredNationwide,comparedtoonly10whopreferredBrandX.YoumayconcludefromthesedatathatmostpeoplepreferNationwide.Suppose,however,thatthesampleactuallyincluded10,000people,90ofwhompreferredNationwide,10ofwhompreferredBrandX,and9,900ofwhomcouldnottellthediffer-ence.Inthiscase,therelativefrequenciesaremuchmoreinformative(fortheconsumer).TherelativefrequencyofpreferenceforNationwideisonly.009.

Thesameargumentinfavorofrelativefrequencycanalsobemade(inamoremodestway)forthedataonaggressiveness.Itismoreinformativetoknowthattherelativefrequencyofaggressivechildrenis.15thantosimplyknowthatthreechildrenwereratedasaggressive.

Whendescribingdatafromrandomsamples,relativefrequencyhasanotheradvantage.Therelativefrequencyofascorevalueinarandomsampleisagoodguessfortherela-tivefrequencyofthatscorevalueinthepopulationfromwhichtherandomsamplewasselected.Thereisnocorrespondingrelationbetweenfrequenciesinasampleandfrequen-ciesinapopulation.

Cumulative Frequency

Anothertypeofdistributionisthecumulativefrequencydistribution.

A cumulative frequency distributionisatabulationofthefrequencyofallmeas-urementsatorsmallerthanagivenscorevalue.

ThefifthcolumninTable2.2isthecumulativefrequencydistributionfortheaggres-sivenessscores.Thecumulativefrequencyofascorevalueisthefrequencyofthatscorevalueplusthefrequencyofallsmallerscorevalues.Thecumulativefrequencyofascorevalueofzero(very timid)is2.Thecumulativefrequencyofascorevalueof1(timid)is

ER9405.indb 24 7/5/07 11:06:02 AM

GroupedFrequencyDistributions 25

obtainedbyadding3(thefrequencyoftimid)plus2(thefrequencyofvery timid)toget5.Notethatthecumulativefrequencyofthelargestscorevalue(5)equals20,thetotalnum-berofobservations.Thismustbethecase,becausecumulativefrequencyisthefrequencyofallobservationsatsmallerthanagivenscorevalue,andalloftheobservationsmustbeatorsmallerthanthelargestscorevalue.Also,notethatthecumulativefrequenciescanneverdecreasewhengoingfromthelowesttothehighestscorevalue.Thereasonisthatthecumulativefrequencyofthenexthigherscorevalueisalwaysobtainedbyaddingtothelowercumulativefrequency.

Thenotionofatorsmallerimpliesthatthescorevaluescanbeorderedsothatwecandeterminewhatissmaller.Thus,cumulativefrequencydistributionsareusuallynotappropriatefornominaldata.

A cumulative relative frequencydistributionisatabulationoftherelativefre-quenciesofallmeasurementsatorbelowagivenscorevalue.

ThelastcolumninTable2.2liststhecumulativerelativefrequenciesfortheaggressive-nessdata.Thesenumbersareobtainedbyaddinguptherelativefrequenciesofallscorevaluesatorsmallerthanagivenscorevalue.

Cumulativefrequencydistributionsaremostoftenusedwhencomputingpercentiles.Weshallpostponefurtherdiscussionofthesedistributionsuntilthatsectionofthechapter.

GrouPed FreQueNCy distributioNs

Theaggressivenessdatawereparticularlyamenabletodescriptionbyfrequencydistribu-tionsinpartbecausetherewereonlyafewscorevalues.Sometimes,however,thedataarenotsoaccommodating,andamoresophisticatedapproachiscalledfor.

Consider,forexample,thefirst60measurementsontheYRSMKvariableintheSmok-ingStudy(Table2.3).Becausethemeasurementsarevariable,itisdifficulttolearnany-thingfromthedataaspresentedinthistable.

ThefrequencydistributionispresentedinTable2.4.Asyoucansee,thefrequencydis-tributionforthesedatadoesnotprovideaveryusefulsummaryofthedata.Theproblemisthattherearetoomanydifferentscorevalues.

tAbLe 2.3yrsMkNumber of years smoking daily From the First 60 Participants in the smoking study

5 13 17 20 19 35 21 28 3 22

26 13 30 30 30 32 40 27 14 4

27 33 28 45 29 25 38 35 33 39

5 4 20 24 25 27 16 25 38 9

36 20 18 11 12 23 22 27 32 49

22 30 0 32 4 23 9 29 22 23

ER9405.indb 25 7/5/07 11:06:03 AM


Thesolutionistogroupthedataintoclusterscalledclassintervals.

A class intervalisarangeofscorevalues.A grouped frequency distributionisatabulationofthenumberofmeasurementsineachclassinterval.

ThegroupedfrequencydistributionispresentedinTable2.5.Theclass intervalsarelistedontheleft.Thelowestinterval,04,containsallofthemeasurementsbetween(andincluding)0and4.Thenextinterval,59,containsthemeasurementsbetween5and9,andsoon.

Clearly, the data in the grouped distribution are much more easily interpreted thanwhenthedataareungrouped.Wecannowseethatmostofthepeopleinthissamplehavebeensmokingfor2030years,althoughthereareafewwhohavebeensmokingformorethan45yearsandafewwhohavebeensmokingonlyacoupleofyears.

Relativeandcumulativefrequencydistributionscanalsobeformedfromgroupeddata.Relativefrequenciesareformedbydividingthefrequencyineachclass intervalbythetotalnumberofmeasurements.Cumulativedistributionsareformedbyaddingupthefre-quencies(orrelativefrequencies)ofallclassintervalsatorbelowagivenclassinterval.ThesedistributionsarealsogiveninTable2.5.

Constructing Grouped distributions

Agroupedfrequencydistributionshouldsummarizethedatawithoutdistortingthem.Sum-marizationisaccomplishedbyformingclassintervals;iftheintervalsareinappropriate(for

tAbLe 2.4Frequency distribution of First 60 yrsMk scores

ScoreValue Frequency ScoreValue Frequency ScoreValue Frequency

0 1 17 1 34 0

1 0 18 1 35 2

2 0 19 1 36 1

3 1 20 3 37 0

4 3 21 1 38 2

5 2 22 4 39 1

6 0 23 3 40 1

7 0 24 1 41 0

8 0 25 3 42 0

9 2 26 1 43 0

10 1 27 4 44 0

11 1 28 2 45 1

12 0 29 2 46 0

13 2 30 4 47 0

14 1 31 0 48 0

15 0 32 3 49 1

16 1 33 2

ER9405.indb 26 7/5/07 11:06:03 AM


example,toobig),however,thedataaredistorted.Asanexampleofdistortion,Table2.6summarizestheYRSMKdatafromTable2.3usingthreelargeintervals.Indeed,thedataaresummarized,butimportantinformationregardinghowthemeasurementsaredistrib-uted is lost.The followingsteps shouldbeused toconstructgroupeddistributions thatsummarizebutdonotdistort.

Guidelinesforgroupedfrequencyintervals:

1.Thereshouldbebetween8and15intervals. 2.Useconvenientclassintervalsizes,like2,3,5,ormultiplesof5. 3.Startthefirstintervalatorbelowyourlowestscore.

Toconstructagoodgroupedfrequencydistribution:

1.Computetherangeofyourscoresbysubtractingthelowestscorefromthehighestscore(Range=HighScoreLowScore).

2.Dividetherangeby8and15.Findaconvenientnumberinbetweenthosetwovalues.Thatwillbeyourclassinterval.Thisisalsoknownasyourbinwidth.

tAbLe 2.5Grouped Frequency distributions for yrsMk scores in table 2.3

ClassInterval Frequency

RelativeFrequency

CumulativeFrequency

CumulativeRelative

Frequency

04 5 .083 5 .083

59 4 .067 9 .150

1014 5 .083 14 .233

1519 4 .067 18 .300

2024 12 .200 30 .500

2529 12 .200 42 .700

3034 9 .150 51 .850

3539 6 .100 57 .950

4044 1 .017 58 .967

4549 2 .033 60 1.00

Total 60 1.000

tAbLe 2.6Grouped Frequency distributions for yrsMk scores in table 2.3

Interval f (YRSMK)

019 18

2039 39

4059 3

Total 60

ER9405.indb 27 7/5/07 11:06:03 AM


3.Selectastartingvalue.Thestartingvaluecouldbeyourlowestscore,butifyourclassintervalisamultipleof5,thenyoumaywanttoselectamoreconvenient,andhencelowerstartingpoint.Forexample,theintervals09,1019,2029,etc.,workverywellifyouhavedeterminedthataclassintervalof10isappropriate.Ifyourlowestscoreis3,theintervals312,1322,2332,etc.,donotseemasintuitiveas09,1019,2029,etc.(or110,1120,2130,etc.).

4.Beginningwithyourstartingvalue,constructintervalsofincreasingvalue. 5.Countthenumber(frequency)ofscoresineachinterval.

Oneotherstepisneededwhenthemeasurementscontaindecimalsinsteadofwholenumbers.Inthesecases,allofthemeasurementsshouldberoundedsothattheyhavethesamenumberofdecimalplaces.

ThesestepswereusedtoconstructthegroupedfrequencydistributioninTable2.5.ForStep1,therangewascomputedas49(490).ForStep2,therangewasdividedby8(49/8=6.125)and15(49/15=3.267),andaconvenientnumberinbetweenthosetwo(5)wasselectedastheclassinterval.ForStep3,becausethelowestscorewas0,thestartingvaluewassetat0.ForStep4,startingwith0,consecutiveintervals,ofwidth5,wereconstructed:04,59,1014,etc.

Notethattheinterval59includesthefivescorevalues5,6,7,8,and9.Thus,theinter-valsizereallyis5,eventhoughthedifferencebetween9and5is4.

Once the lowest interval is specified, the remaining class intervals are easily con-structed.Eachsuccessiveintervalisformedbyaddingtheintervalsize(5)totheboundsoftheprecedinginterval.Forexample,theinterval1014wasobtainedbyadding5toboththelowerandupperboundsoftheinterval59.Finally,tabulatethenumberofmeasure-mentswithineachintervaltoconstructthefrequencydistribution.

Forasecondexampleofgrouping,considerthedatainTable2.7.The60measurementsinthistablearefromtheSmokingStudy.EachmeasurementisaparticipantsscoreontheWisconsinInventoryofSmokingDependenceMotives(WISDM),whichareratingson65questionssuchasDoessmokingmakeagoodmoodbetter?

tAbLe 2.7First 60 scores on the wisconsin inventory of smoking dependence Motives (wisdM)

52.9952 60.7071 53.2262 82.0333 65.9119 59.7071

44.4405 38.3167 62.2333 39.6786 46.2762 52.1119

68.4571 66.4476 28.2667 60.6667 50.0857 44.5690

33.6786 55.1119 21.8190 27.6929 53.1310 36.5500

57.4857 63.9262 60.0548 50.8071 61.2405 66.5810

55.3071 28.4643 43.9143 67.8524 54.7310 52.9429

60.8405 60.7238 51.0786 35.5071 54.2524 65.5429

60.1310 78.9357 65.1976 32.4833 51.2381 48.5786

62.5905 80.6071 54.0476 68.8190 52.1738 55.4214

61.4619 53.3571 35.8976 59.3190 68.1143 62.9429

ER9405.indb 28 7/5/07 11:06:04 AM


Becausethesemeasurementscontaindecimals,webeginbymakingsurethatallhavethe same number of decimal places, as they do. For Step 1, we compute the range ofscoresbysubtractingthelowestscore(21.8190)fromthehighestscore(82.0333)toarriveat60.2143.

InStep2,wedivide the rangeby8and15:60.2143/8=7.526788and60.2143/15=4.041287.Wechooseanumberbetweenthesetworesults,preferablyamultipleof2,3,or5.Wemightchoose5.6901,whichisanumberbetween7.526788and4.041287,andisdivisibleby3,but5.6901willnotserveasaconvenientclassinterval.Rather,5isbetween7.526788and4.041287,divisibleby5(obviously),andconvenient.

Thethirdstep,selectingastartingvalue,couldbesetatthelowestscore,21.8190,but20.0000seemsmoreintuitive.Thefirstinterval,therefore,willbe20.000024.9999,thenext25.000029.9999,andsoon.Thefinalstepistotabulatethenumberofmeasurementsineachintervaltoobtainthefrequencydistribution,andthendivideeachfrequencybythetotalnumberofobservationstoobtaintherelativefrequencydistribution.

AsyoucantellfromTable2.8,thesedataareveryinteresting.Thedistributionappearstopheavy.Inotherwords,morethanhalfofthescoresaregreaterthan50.Thismaynotbeunexpected,though,foritisameasureofsmokingmotivesandsmokers(whichalltheparticipantsinthestudyare)mayhavemanymotivestosmoke.Nevertheless,thesedatamaybeimportanttothestudysdesignersbecausetheycanshowthattheirpartici-pantswerehighlymotivatedtosmoke,asopposedtoparticipantswhowerentmotivatedtosmoke.Intheend,thestudysauthors,iftheexperimentissuccessful,canclaimthattheirinterventionworksforpeoplehighlymotivatedtosmoke.

tAbLe 2.8relative Frequency distribution for wisdM scores (First 60 subjects)

ClassInterval RelativeFrequency

20.000024.9999 0.017

25.000029.9999 0.050

30.000034.9999 0.033

35.000039.9999 0.083

40.000044.9999 0.050

45.000049.9999 0.033

50.000054.9999 0.233

55.000059.9999 0.100

60.000064.9999 0.200

65.000069.9999 0.150

70.000074.9999 0.000

75.000079.9999 0.017

80.000084.9999 0.033

Total 1.000

ER9405.indb 29 7/5/07 11:06:04 AM


GrAPhiNG FreQueNCy distributioNs

Displayingafrequencydistributionasagraphcanhighlightimportantfeaturesofthedata.Graphsoffrequencydistributionsarealwaysdrawnusingtwoaxes.

Theabscissaorx-axisisthehorizontalaxis.Forfrequencyandrelativefrequencydistributions,theabscissaismarkedinunitsofthevariablebeingmeasured,anditislabeledwiththevariablesname.Theordinateory-axisismarkedinunitsoffrequencyorrelativefrequency,andsolabeled.

InFigure2.1,theabscissaislabeledwithvaluesoftheaggressivenessvariableforthedistribution in Table2.2. The ordinate is marked to represent relative frequency of themeasurements.Techniquesforgraphingfrequencyandrelativefrequencydistributionsarealmostexactlythesame.Theonlydifferenceisinhowtheordinateismarked.Becauserelativefrequencyisgenerallymoreusefulthanrawfrequency,theexamplesthatfollowareforrelativefrequencydistributions.

histograms

Figure2.1isarelativefrequencyhistogramfortheaggressivenessdata.

A relative frequency histogram uses theheightsofbars to represent relativefrequenciesofscorevalues(orclassintervals).

FiGure 2.1relative frequency histogram for the aggressiveness scores in table 2.2.

0.40

0.30

0.20

Relat

ive fr

eque

ncy

0.10

0 1 2Aggressiveness score

3 4 5

ER9405.indb 30 7/5/07 11:06:05 AM

GraphingFrequencyDistributions 31

Toconstructthehistogram,placeabarovereachscorevalue.Thebarextendsuptotheappropriatefrequencymarkontheordinate.Thus,abarsheightisavisualanalogueofthescorevaluesrelativefrequency:thehigherthebar,thegreatertherelativefrequency.

Relativefrequencyhistogramscanalsobedrawnforgroupeddistributions.Forthesedistributions,abarisplacedovereachclassinterval.

Figure2.2isarelativefrequencyhistogramoftheYRSMKscoresinTable2.5.Some-times,onlythemidpointsofeachintervalareshownontheabscissa.Themidpointofaclassintervalistheaverageoftheintervalslowerboundandtheupperbound.Again,theheightofeachbarcorrespondstoitsrelativefrequency.

TherelativefrequencyhistogramillustratedinFigure2.2makesparticularlyclearsomeofthesalientcharacteristicsofthedistribution.Forexample,itiseasytoseethatmostofthescoresareinthemiddleofthedistributionandthatthereisadecreaseinfrequencyfromthemoderatescorestothehigherscores.

Frequency Polygons

Figure2.3 is an example of a relative frequency polygon using the WISDM scores inTable2.8.Theaxesofarelativefrequencypolygonarethesameasforahistogram.How-ever,insteadofplacingabarovereachmidpoint(orscorevalue),adotisplacedover

learning from data - an introduction to statistical reasoning

Documents

exercises 42er9405

dataan introduction

taylor francis web site

statistical reasoning

inferential statistics

idescriptie statistics

registered trademarks

abingdonoxon ox14