quan%fying central tendency and...
TRANSCRIPT
-
Quan%fying central tendency and variability
-
Outline for today
BigDataBaseball–chapter3Be2erknowaplayer:BoJacksonReview:
• Plo=ngcategoricalandquanAtaAvedata
MoredescripAvestaAsAcs:• MeasuresofcentraltendencyforquanAtaAvedata• Measuresofvariability
Worksheet2:duemidnightWednesdayFeb15th
-
DataFest2017• March31sttoApril2nd
Amherstishavingpre-DataFestworkshops• Firstoneisat7:30onFebruary15th
IfyouareinterestedinparAcipaAnginDataFestletmeknow
-
Big Data Baseball Chapter 3
• Thoughts?
-
Be=er know a player
BoJackson
-
Review
-
Categorical and quan%ta%ve data
Descrip(vesta(s(csdescribesthesampleofdatayouhaveCategoricalvariables:fallintodisAnctcategories
E.g.,team(RedSox,Yankees,Mets,etc.)
Quan(ta(vevariables:numericaldataE.g.,Numberofhomeruns
-
Categorical variables: propor%on
Thepropor(onofacategoryisfoundby:
ProporAonofcategory=Numberinthatcategory totalnumber
Example:proporAonofhitsthatarehomeruns160hitstotal30/160=.1875
1B 2B 3B HRCount 90 38 2 30ProporAon 0.56 0.24 0.01 0.19
>hit.typeshit.types/sum(hit.types)
-
R: barplot(x)
R: pie(x)
PloDng categorical data
-
Describing quan%ta%ve variables
R: stem(x)
R: hist(x)
-
hist(player.data.2013$HR,n=30,xlab="HR",main="HistogramofHRsfor2013playerswithover300PA")
ObservaAonsaboutthedistribuAon?
-
Dotplot for individuals’ HR in 2013
Whenwehavediscretedata,wecanalsousedotplotstogetasenseofhowthedataisdistributed
-
Dotplot for individuals HR in 2013
OnewaytogetasenseoftheshapeofadistribuAonistouseadotplot
R: mosaic::dotPlot(x)
MiguelCabrera
ChrisDavis
-
Common shapes for Distribu%ons
-
sta%s%cs measuring the center of distribu%on
Graphsareusefulforvisualizingdatatogetasenseofwhatofwhatthedatalooklike
Wecanalsosummarizedatanumerically
Anumericalsummary(funcAon)ofsampleiscalledsta(s(c
TwoimportantstaAsAcsthatcanbeusedtodescribethecenterofthedataarethemeanandthemedian.
-
The mean
Mean=SumofalldatavaluesNumberofdatavalues
Mean=x1+x2+x3+…+xn= Σxin n
R: mean(x) Ifyoudatahasmissingvaluesuse:
mean(x, na.rm = TRUE)
-
Mean number of games played (G)
Canyoucalculatethemeannumberofgamesplayedforplayerswhohad300plateappearancesin2013>players.2013mean(players.2013$G)
Doyouthinkthemeannumberofgamesplayedwouldbehigherifwecalculateditfromonlyplayerswhohad500plateappearances?
-
Sample vs. Popula%on mean
Themeanforasampleisdenotedx̄(pronounced“x-bar”)Themeanforapopula)onisdenotedμ,whichistheGreekle2er“mu”
μ
x̄
-
Give the proper nota%on: μ vs. x̄ ?̄
Werandomlyselect50baseballplayersandtaketheirmeanheight?
Welookatallprofessionalbaseballplayersandtaketheirmeanheight?
-
The median
Themedianofadatasetofsizenis
• Ifnisodd:Themiddlevalueofthesorteddata
• Ifniseven:Theaverageofthemiddletwovaluesofthesorteddata
ThemediansplitsthedatainhalfR: median(x)
-
Resistance
WesaythatastaAsAcsisresistantifitisrelaAvelyunaffectedbyextremevalues(outliers).
Themedianisresistantwhenthemeanisnot
-
Football player salary examples
SomeNFLfootballplayersarepaidalotmorethanothers(starquarterbackscanbepaidmorethan$20million)
MeanNFLsalary=$1.87millionMedianNFLsalary=$838,000
MeanandmediansalaryforallUSworkers?DistribuAonofsalariesforUSworkers?
-
Which is the mean A or B?
A B
-
Summary sta%s%cs quan%fying the spread of quan%ta%ve variables
-
Standard devia%on
R: sd(x)
-
In class worksheet: compu%ng the standard devia%on
54 35 23 28 3229 23 30 35 3738
Values DeviaAons SquaredDeviaAons
54 20.91 437.19
35 1.91 3.64
NumberofhomerunsDavidOrAzhadinthelast11seasons: