p8130: biostatistical methods i · 2017-11-29 · measures of location: median • compared to the...
TRANSCRIPT
P8130:BiostatisticalMethodsILecture2:DescriptiveStatistics
CodyChiuzan,PhDDepartmentofBiostatisticsMailmanSchoolofPublicHealth(MSPH)
Lecture1:Recap• IntrotoBiostatistics• TypesofData• StudyDesigns
DescriptiveStatistics
• Thecollectionandpresentationofthedatathroughgraphicalandnumericaldisplays
• Lookforpatternsinthedataandsummarizeinformation
• Measuresoflocation
• Measuresofdispersion
• Graphicaldisplay
MeasuresofLocation• Measuresoflocationorcentraltendency indicatethecenterofthedata
• Mean(average)
• Median(the50th percentile)
• Mode
MeasuresofLocation:MeanDefinition:thearithmeticmeanrepresentsthesumofallobservationsdividedbythenumberofobservations
Samplemeanforasampleofn observationsisgivenby:
𝑥=∑ 𝑥#/𝑛&#'(
Samplemeanisusedtoestimatethepopulationmeanμ whichistypicallyunknown
MeasuresofLocation:Mean• Themostcommonusedmeasureoflocation
• Overlysensitivetooutliers(unusualobservations),thusnotrecommendedifthedataareskewed
• Notappropriatefornominalorcategoricalvariables
MeasuresofLocation:MedianDefinition:Thesamplemedianiscomputedas:1. Ifnisodd,medianiscomputedas &)(
*𝑡ℎ largestiteminthesample
2. Ifniseven,medianiscomputedastheaveragebetween &*𝑎𝑛𝑑 &
*+ 1 th
largestitems
Example:Givenn=7(odd)totalsampleobservations,medianisthe1)(* = 4𝑡ℎ largestitemGivenn=10(even)totalsampleobservations,medianistheaverageofthe
(4* = 5𝑡ℎand (4* + 1 = 6𝑡ℎ largestitems
MeasuresofLocation:Median• Comparedtothemean,themedianisnotaffectedbyeveryvalueinthedatasetincludingoutliers
• Themedianisdefinedasthemiddlevalueorthe50th percentile• Thismeansthathalfofthedataarelessthanorequaltoit,andatleastaregreatertanorequaltoit
•Mediancalculationstartsbyfirstorderingthedata(increasingorder)• Appropriatemeasureforordinaldata
OtherMeasuresofLocationPercentiles:medianisthe50th percentile
• Ingeneral:thekth percentileisavaluesuchthatmostk%ofthedataaresmallerthanitand(100-k)%arelarger• Deciles:10th,20th,30th,…• Quartiles:25th (Q1),50th,75th (Q3)
• Question:whatdoesitmeanifyourGREscoreisinthe90thpercentile?
MeasuresofLocation:ModeDefinition:themostfrequentlyoccurringvalueinthedata
• Youcanhavemultiplemodesornone(really?)
• Problematicifthereisalargenumberofpossiblevalueswithinfrequentoccurrence
MeasuresofDispersionDescribethespreadofthedata:• Range
• Inter-quartilerange(IQR)
• Variance/Standarddeviation
• Coefficientofvariation(CV)
MeasuresofDispersionRange:Max– Min
Inter-quartilerange:IQR=75th (Q3)– 25th (Q1)
Sincetherangeonlydependsontheminimumandmaximumvalues,itcanbeinfluencedbytheextremes
Solution?UsetheIQR
MeasuresofDispersionPopulationVarianceistheaveragesquareddeviationfromthemean:
𝜎*=(<∑ (𝑥# − 𝜇)*<#'(
PopulationStandardDeviationisjustthesquarerootofthevariance:
𝜎 = 𝜎*�
Valuesoftenunknownandthenwereferbacktosample…
MeasuresofDispersionSampleVarianceistheaveragesquareddeviationfromthemean:
𝑠*= (&C(
∑ (𝑥# − 𝑥)*&#'(
PopulationStandardDeviationisjustthesquarerootofthevariance:
s= 𝑠*�
Lotsofchangesinnotationandalsoformula!!
MeasuresofDispersion
Meanandstandarddeviationsarethemostusedmeasuresoflocationandspread.Why?It’sallaboutthe…
Property:lineartransformationsdoaffectthesemeasures
Let𝑌 = 𝑐𝑋 + 𝑏 bealineartransformationavariableX
Meanof𝑌 = 𝑐𝑋 + 𝑏StandardDeviation𝑠H = 𝑐𝑠I
MeasuresofDispersion
CoefficientofVariation(CV)isameasurethatrelatesthemeanandthestandarddeviation.• Sometimesthevariancechangeswithitsmean
• Population:𝐶𝑉 = LM×100%
• Sample:𝐶𝑉 = QR×100%
• CVisunitless andcanbeinterpretedintermsofvariabilitytotheaverage
GraphicalDisplay
• Apictureisworthathousandwords(sometimes)
• Bargraphs
• Histograms
• Box-plots
• Scatterplots(laterinlinearregression)
BarGraph• Dataaredividedintogroupsandfrequenciesaredeterminedforeachgroup• Rectanglesareconstructedwiththebaseofconstantwidthandheightsproportionaltothefrequencies
Histogram• Numericalvaluesaregroupedintomeasurementsclasses,definedbyequal-lengthintervalsalongthenumericalscale• Eachvaluebelongstoonlyoneclass• Usually5-12classes• Likebargraph,thisplothasfrequenciesontheverticalaxis• Ifthemean>median:rightskew• Ifthemean<median:leftskew
Box-plot• ExtendsfromtheQ1(25th)totheQ3(75th)quartile– thebox• The‘whiskers’extendfromthesmallesttothelargestvalues• Ifoneofthewhiskersislong,itindicatesskewnessinthatdirection• IfadatavalueislessthanQ1–1.5(IQR)orgreaterthanQ3+1.5(IQR),thenitisconsideredanoutlierandgivenaseparatemarkontheboxplot
Readings
Rosner,FundamentalsofBiostatistics,Chapter2
• Sections:2.2– 2.6
• Sections:2.9– 2.10