stat 206 chapter 2 notes
TRANSCRIPT
1
STAT206:Chapter2(OrganizingandVisualizingVariables)
• MethodstoOrganizeandVisualizeVariables• ForCategoricalVariables:
• SummaryTable;contingencytable(2.1)• Barchart,piechart,Paretochart,side-by-sidebarchart(2.2)
• ForNumericalVariables• (Array),OrderedArray,frequencydistribution,relativefrequencydistribution,percentage
distribution,cumulativepercentagedistribution(2.3)• Stem-and-Leafdisplay,histogram,polygon,cumulativepercentagepolygon(2.4)• Othermethodslater…
• 2.1OrganizingCategoricalVariables
• MustidentifyvariabletypetodeterminetheappropriateorganizationandvisualizationtoolsèRecallVariableTypes• Categorical(Category)
• Nominal–NameofaCategory• Ordinal–Hasanaturalordering
• Numerical/Quantitative(Quantity)• Discrete–distinctcutoffsbetweenvalues• Continuous–onacontinuum
• Definitions:• SummaryTable
• ContingencyTable
• Eachresponsecounted/talliedintooneandonlyonecategory/cell• Example(Problem2.2,p.40):Thefollowingdatarepresenttheresponsestotwoquestionsasked
inasurveyof40collegestudentsmajoringinbusiness:• Whatisyourgender?(M=male;F=female)• Whatisyourmajor?(A=Accounting;C=ComputerInformation;M=Marketing)
Gender: M M M F M F F M F M
Major: A C C M A C A A C CGender: F M M M M F F M F F
Major: A A A M C M A A A CGender: M M M M F M F F M M
Major: C C A A M M C A A A
Gender: F M M M M F M F M M
Major: C C A A A A C C A C
2
• Nowtocombinethetwovariables(GenderandMajor):
MAJORCATEGORIES
GENDER A(Accounting) C(Computer) M(Marketing) TOTALSMale(M) 14 9 2 25Female(F) 6 6 3 15TOTALS 20 15 5 40• TablebasedonTotalpercentages:
MAJORCATEGORIES
GENDER A(Accounting) C(Computer) M(Marketing) TOTALSMale(M) Female(F) TOTALS • TablebasedonRowpercentages:
MAJORCATEGORIES
GENDER A(Accounting) C(Computer) M(Marketing) TOTALSMale(M) Female(F) TOTALS • TablebasedonColumnpercentages:
MAJORCATEGORIES
GENDER A(Accounting) C(Computer) M(Marketing) TOTALSMale(M) Female(F) TOTALS
• Questions:
• HowmanyofthesurveyedstudentswerefemalesmajoringinMarketing?
• WhatpercentageofthesurveyedstudentswerefemalesmajoringinMarketing?
• WhatpercentageofthemalestudentssurveyedweremajoringinComputer?
• OfthestudentsmajoringinAccounting,whatpercentagewasmale?
value frequencyrelative.
frequency percentageA"(Accounting) 20 0.500 50.0C"(Computer) 15 0.375 37.5M"(Marketing) 5 0.125 12.5
TOTALS 40 1.000 100.0
Summary.Table.(Major):
value frequencyrelative.
frequency percentageMale%(M) 25 0.625 62.5Female%(F) 15 0.375 37.5
TOTALS 40 1.000 100.0
Summary.Table.(Gender):
3
• 2.3VisualizingCategoricalVariables• Piechart–usessectionsofacircletorepresentthetallies/frequencies/percentagesforeach
category• Barchart–aseriesofbars,witheachbarsrepresentingthetallies/frequencies/percentagesfora
singlecategory• Considerourpreviousexample
forMajorCategory:
• PieChart:
• BarChart:
• Question:Whichmajorhasthelowestconcentrationofstudents?
• Discussion:Preferencefortypeofchart?
PercentagebyMajorCategory
A(Accounting)
C(Computer)
M(Marketing)
0.0% 10.0% 20.0% 30.0% 40.0% 50.0% 60.0%
A(Accounting)
C(Computer) M(Marketing)
PercentagebyMajorCategory
SummaryTable(Major):
value frequency relativefrequency percentageA(Accounting) 20 0.500 50.0C(Computer) 15 0.375 37.5M(Marketing) 5 0.125 12.5
TOTALS 40 1.000 100.0
4
• Paretochart–aseriesofverticalbarsshowingtallies/frequencies/percentagesindescendingorder
• Example:
• Discussion:HoworwhydoyouthinkthataParetochartwouldbeusefulinthebusinessworld?
• Side-by-SideBarcharts–Usessetsofbarstoshowthejointresponsefromtwocategoricalvariables
• Example:
• Discussion:Whatcanyoudetermineaboutproductutilizationforthisside-by-sidebarchartthatyoumightnotbeabletotellotherwise?
050100150200250300350400
ParetoChart
SummaryTableofCausesofIncompleteATMTransactions
Cause Frequency Percentage
ATMmalfunctions 32 4.42%
ATMoutofcash 28 3.87%
Invalidamountrequested 23 3.18%
Lackoffundsinaccount 19 2.62%
Cardunreadable 234 32.32%
Warpedcardjammed 365 50.41%
Wrongkeystroke 23 3.18%
TOTAL 724 100.00%
A-Q295
A-Q395
A-Q495
A-Q196
A-Q296
B-Q295
B-Q395
B-Q495
B-Q196
B-Q296
C-Q295
C-Q395
C-Q495
C-Q196
C-Q296
D-Q295
D-Q395
D-Q495
D-Q196
D-Q296
0
50,000
100,000
150,000
200,000
250,000
300,000
A-Q295
A-Q395
A-Q495
A-Q196
A-Q296
B-Q295
B-Q395
B-Q495
B-Q196
B-Q296
C-Q295
C-Q395
C-Q495
C-Q196
C-Q296
D-Q295
D-Q395
D-Q495
D-Q196
D-Q296
A4256 - Calibration Solution / Chips
5
• 2.2OrganizingNumericalVariables• Orderedarrayarrangesthevaluesofanumericalvariableinrankorder(smallestvaluetolargest
value)ArrayèOrderedArray• Example(Table2.8A&B,p.42):
• FrequencyDistributiontalliesthevaluesofanumericalvariableintoasetofnumericallyorderedclasses,calledaclassinterval• Howmanyclasses?
• Determinetheintervalwidthbythefollowing:
• UsingourMealCostdata,weestimatethatwewant__________classessotheintervalwidthis:
6
• RelativeFrequencyDistributionpresentsrelativefrequency,orproportionofthetotalforeachgroup
• Proportionorrelativefrequency,ineachgroupisequaltothenumberofvaluesineachclassdividedbythetotalnumberofvalues• Example:
CITY SUBURBAN
MealCost($)
FrequencyRelativeFrequency Percentage Frequency
RelativeFrequency Percentage
20,but<30 4 4 30,but<40 10 17 40,but<50 12 13 50,but<60 11 10 60,but<70 7 4 70,but<80 5 2 80,but<90 1 0
TOTALS 50 1 100.00% 50 1 100.00%
• TOTALoftherelativefrequencycolumnMUSTBE1.00• TOTALofthepercentagecolumnMUSTBE100.00
• CumulativePercentageDistributionprovidesawayofpresentinginformationaboutthe
percentageofvaluesthatlessthanaspecificamount
CITYandSUBURBAN
MealCost($)Frequency
RelativeFrequency Percentage
<lowerboundary CumulativePercentage<lowerboundary
20,but<30 8 0.08 8.0% <20 0(nomealscostlessthan$20)
30,but<40 27 0.27 27.0% <30 8%=0+8%
40,but<50 25 0.25 25.0% <40 35%=0+8%+27%50,but<60 21 0.21 21.0% <50 60%=0+8%+27%+25%
60,but<70 11 0.11 11.0% <60 81%=0+8%+27%+25%+21%
70,but<80 7 0.07 7.0% <70 92%=0+8%+27%+25%+21%+11%80,but<90 1 0.01 1.0% <80 99%=0+8%+27%+25%+21%+11%+7%
TOTALS 100 1.00 100.0% <90 100%=0+8%+27%+25%+21%+11%+7%+1%
• Question:Whatpercentageofmealcostswaslessthan$50?
7
• 2.4VisualizingNumericalVariables• Stem-and-LeafDisplay–Howtocreate:
1. Separateeachobservationinto• Stem(allbutfinaldigit(s))and• Leaf(finaldigit(s)).
2. Writestemsinverticalcolumn–smallestontop3. Writeeachleaf,inincreasingnumericalorder,
inrownexttoappropriatestem• Example:Foreachstate,percentage(withonedecimal
place)ofresidents65andolder• Noticestemof“7”doesnothavealeafèweconcludeno
valueof7.xthereshouldbethesamenumberofleavesasobservations!IncludeALLstemsevenifnovalues/leaves• Leaveaspaceholderifnoleafforastem• Nopunctuation(i.e.,nodecimalpoints,nocommas)• Leavesshouldbelinedontopofoneanotherto
determineSHAPE• Simplewaytodeliveralotofdetailedinformation
FORTHISEXAMPLEreaddatavaluesas: 6.8 - 8.8 9.8,9.9 10.0,10.8 11.1,11.5,11.5,11.6,11.6 12.0,12.1,12.2,12.2,12.2,12.4,12.4,12.4,12.4,12,4,12.5,12.7,12.8,12.8,12.9,12.9,12.913.0,13.1,12.3,13.3,13.3,13.3,13.4,13.4,13.4,13.4,13.8,13.9,13.9 14.0,14.2,14.6,14.6.14.6 15.2,15.3 16.8
8
• Histogram:Displaysaquantitativevariableacrossdifferentgroupingsofvalues• Carefulwhenchoosinghowtogrouptogethervalues!
• Groupingsmustcoverthesamerangesohaveofequalwidth• Heightusedtocomparethefrequencyofeachrangeofvalues• Stepstocreateafrequencyhistogram:
• Createequalwidthclasses(groupings)
• Countnumberofvaluesineachclass
• Drawhistogramwithabarforeachclass
• Heightofabarrepresentsthecountforthatbar’sclass
• BarstouchsincethereareNOGAPSbetweenclasses
• Becareful:• Numberofcategoriescan’tbe
toolargeortoosmall• Don’tskipanycategories• Beclearaboutcontentsof
eachcategory• HistogramExample:usingAgeatTimeofFirstOscarAward:
• Groupingschosenhereare:
[20,25)[25,30)[30,35)[35,40)[45,50),…• Where“[“meansthenumberisINCLUDEDintheinterval,
but“)”meansthenumberisNOTincludedintheinterval• Question:IfJackNicholsonwonBestActoratage70,whichcategoryfrequencywould
increase?A. [60,65)B. [65,70)C. [70,75)D. [75,80)
HISTOGRAMofMealCostLocation=CITY
9
• Percentagepolygon–usedforvisualizationwhendividingthedataofanumericalvariableintotwoormoregroups
• Usesmidpointsofeachclasstorepresentthedataintheclass• Combinesdatafromtwogroupstoalloweasiercomparison
Let’s&examine&what&it&means&to&turn&a&frequency&into&a&rela6ve&frequency&by&looking&at&the&age&at&Oscar&data&
• Rela6ve&frequency&histogram&depicts&the&rela6ve&frequency&rather&than&the&raw&frequency&(count)&of&categories&
• Do&shapes&of&the&frequency&and&rela6ve&frequency&histograms&differ?&
TOTAL &&&76&
Conclusions?
10
• CumulativePercentagePolygon(Ogive)usesthecumulativepercentagedistribution(discussedpreviously)toplotthecumulativepercentagesalongtheYaxis
• LOWERBOUNDSoftheclassintervalsareplottedontheXaxis
Conclusions?