3. describing & presenting data

15
Describing & Presen-ng Data Nooriah Mohamed Salleh MBBS(Malaya), MPH(Tulane), DrPH(Tulane) Recap: Variables & Data Variables are labels. The value of variables can vary Example: age, Gender Occupa-on Ethnicity Data is the valu e you get from observa-on thru’ measuring, coun-ng etc. Example: # of pa-ent weight of baby (kg) # of doctors Recap: Variables & Data Variable Data Age (of mother) 23 years old Weight (of baby) 3.0 kg Gender male Ethnic group Malay Occupation of mother Housewife Types of Data Data Categorical Data Numerical Data Nominal Data Ordinal Data Discrete Data Continuous Data

Upload: aiman-arifin

Post on 13-Nov-2015

238 views

Category:

Documents


4 download

DESCRIPTION

Community Health

TRANSCRIPT

  • Describing*&*Presen-ng*Data*

    Nooriah*Mohamed*Salleh*MBBS(Malaya),*MPH(Tulane),*DrPH(Tulane)*

    Recap:*Variables*&*Data* Variables*are*labels.*The*value*of*variables*can*vary**

    * Example:**

    age,** Gender* Occupa-on* Ethnicity**

    *

    Data*is*the*value*you*get*from*observa-on*thru*measuring,*coun-ng*etc.*

    * Example:****#*of*pa-ent****weight*of*baby*(kg)****#*of*doctors******

    Recap:*Variables*&*Data*

    Variable Data

    Age (of mother) 23 years old

    Weight (of baby) 3.0 kg

    Gender male

    Ethnic group Malay

    Occupation of mother

    Housewife

    Types*of*Data*

    Data

    Categorical Data

    Numerical Data

    Nominal Data

    Ordinal Data

    Discrete Data

    Continuous Data

  • Nominal*categorical*data*

    It*can*be*allocated*into*one*of*a*number*of*categories.*

    Has*no*meaningful*order* Example:*

    Blood*type*(A,B,*AB,O),*sex*(M,*F)*

    Ordinal*categorical*data*

    It*can*be*allocated*to*one*of*a*number*of*categories*arranged*in*a*meaningful*order.*

    Example:*Very*sa-sed,*sa-sed,*neutral,*unsa-sed,*very*unsa-sed.**

    Grade*I,*Grade*II,*Grade*III*(Tumor*Grading)*Moderate,*Severe,*Very*Severe*(Pain)*

    Discrete*numerical*data*

    Countable*variables.* Integer*form*(discrete)* Numbers*of*things* Example:*

    number*of*pregnancies*Number*of*pa-ents*Number*of*teeth***

    Con-nuous*numerical*data*

    Measurable*variables.* Round*to*the*nearest*integer* Example:*

    Weight*(Kg)*Height*(metre)*BP*(mmHg)*Age*(years)*Dura-on*of*surgery*(hour)*

  • Describing*data*with*tables*

    1)*frequency*table** 2)*rela-ve*and*cumula-ve*frequency* 3)*grouped*frequency* 4)*open[*ended*groups* 5)*cross[tabula-on**

    1.*Frequency*table*

    A*picture*of*the*frequency*distribu-ons*

    Mortality (%) Tally No. of ICU patients

    11.2-15.1 1, 1, 1, 1, 1, 1, 1, 1, 1 9

    15.2-20.1 1, 1, 1, 1, 1, 1, 1, 1 8

    20.2-25.1 1, 1, 1, 1, 1 5

    25.2-30.1 1, 1, 1 3

    30.2-35.1 1, 1

    variables frequency

    2.*Rela-ve*frequency,*cumula-ve*frequency*

    Rela-ve*frequency:*percentage*of*the*total*Cumula-ve*frequency:**

    parity No.of women Percentage (relative frequency)

    Cumulative percentage

    0 5 12.5 12.5

    1 6 15 27.5

    2 14 35 62.5

    3 10 25 87.5

    4 3 7.5 95

    7 1 2.5 97.5

    8 1 25 100

    3.*Grouped*frequency* Grouped*frequency:*for*con-nuous*metric*data*

    Birthweight No. of infants 2700-2999 2 3000-3299 3 3300-3599 9 3600-3899 9 3900-4199 4 4200-4499 3

    A group width of

    300g

    The class lower limit

    The class upper limit

  • Table*for*display*of*Data*

    Type of data Table

    Ordinal numerical Discrete

    Frequency table

    Continuous numerical data

    Grouped Frequency

    4.*Open[ended*group*

    One*or*two*values*which*are*called*outliers,*are*a*long*way*from*the*general*mass*of*the*data.*

    Use**or***

    5)*Cross[tabula-on*

    2 or fewer children

    Breast lump diagnosis

    Totals

    Malignant Benign

    Yes 4 21 25

    No 4 11 15

    Totals 8 32 40

    Association between breast lump and parity

    3.*Describing*data*with*charts*

    1.Nominal*data:* (1)*the*pie*chart* (2)*the*simple*bar*chart* (3)*the*cluster*bar*chart* (4)*the*stacked*bar*chart*

    2.*Ordinal*data:* (1)*the*pie*chart* (2)*the*bar*chart* (3)*the*dotplot***3.*Discrete*numerical*data*

    4.*Con-nuous*numerical*data*[*histogram*

    *5.*Cumula-ve*ordinal*or*discrete*

    data*[*step*chart**6.*Cumula-ve*con-nuous*data*[*

    cumula-ve*frequency*or*ogive**7.*Time*based*data:*-meseries*

    chart**

  • 1.1.*Pie*chart*

    Pie chart: Hair color of children reciving d-phenothrin

    blonde, 18, 18%

    brown, 55, 57%

    red, 4, 4%

    dark , 21, 21%

    blonde brown red dark

    4[5*categories* Describe*1*variable* Start*at*0*in*the*same*order*as*the*table*

    1.2*Simple*bar*chart*

    Bar Chart: Hair colar of the chidren receiving d-phenothrin

    18

    55

    4

    21

    0

    10

    20

    30

    40

    50

    60

    blonde brown red dark

    Same*widths,*equal*spaces*between*bars*

    PharmacistsNursesDoctorsDentists

    6000

    5000

    4000

    3000

    2000

    1000

    0

    Profession

    Num

    ber

    of w

    orke

    rs

    Bar chart for number of health professionals 1.3**Clustered*bar*chart*

    Cluster percetage bar chart of the hair color receiving Malathion and d-

    phenothrin

    16 18

    5256

    4 4

    2822

    0

    10

    20

    30

    40

    50

    60

    malathion d-penothrin

    blondebrownreddark

  • PrivatePublic

    Dentists Doctors Nurses Pharmacists

    0

    1000

    2000

    3000

    4000

    Profession

    Num

    ber

    of w

    orke

    rsClustered bar chart for number of health professionals

    Dentists Doctors Nurses Pharmacists

    Private Public

    0

    1000

    2000

    3000

    4000

    Sector

    Num

    ber o

    f wor

    kers

    Clustered bar charts of number of health professionals

    Plotting by sector rather than by profession Look at the data from a different angle Highlight different aspects of the data

    1.4*Stacked*bar*chart*

    stacked bar chart

    0%

    20%

    40%

    60%

    80%

    100%

    Breast-fed Bottle-fed

    Non-smokersFomer smokersSmokers

    PrivatePublic

    PharmacistsNursesDoctorsDentists

    6000

    5000

    4000

    3000

    2000

    1000

    0

    Profession

    Num

    ber o

    f wor

    kers

    Stacked bar chart for number of health professionals

    Variation of the basic bar chart

  • Dentists Doctors Nurses Pharmacists

    PublicPrivate

    6000

    5000

    4000

    3000

    2000

    1000

    0

    Sector

    Num

    ber

    of w

    orke

    rs

    Stacked bar charts by sector

    PrivatePublic

    PharmacistsNursesDoctorsDentists

    100

    9080

    70

    60

    50

    4030

    20

    100

    Profession

    Per

    cent

    by

    sect

    or

    Segmented bar charts by profession

    PrivatePublic

    PharmacistsNursesDoctorsDentists

    4000

    3000

    2000

    1000

    0

    Profession

    Num

    ber

    of w

    orke

    rs

    Clustered bar chart for number of health professionals

    PrivatePublic

    Dentists Doctors Nurses Pharmacists

    0

    1000

    2000

    3000

    4000

    5000

    6000

    Profession

    Num

    ber

    of w

    orke

    rs

    Stacked bar chart for number of health professionals

    PrivatePublic

    Dentists Doctors Nurses Pharmacists

    010

    20

    3040

    50

    60

    70

    8090

    100

    Profession

    Per

    cent

    by

    sect

    or

    Segmented bar charts by profession

    Dentists Doctors Nurses Pharmacists

    PublicPrivate

    4000

    3000

    2000

    1000

    0

    Sector

    Num

    ber

    of w

    orke

    rs

    Clustered bar chart of number of health professionals

    Dentists Doctors Nurses Pharmacists

    PublicPrivate

    6000

    5000

    4000

    3000

    2000

    1000

    0

    Sector

    Num

    ber

    of w

    orke

    rs

    Stacked bar charts by sector

    Dentists Doctors Nurses Pharmacists

    PublicPrivate

    100

    9080

    70

    60

    50

    4030

    20

    100

    Sector

    Per

    cent

    with

    in s

    ecto

    r

    Percentage bar charts by sector

    Dentists Doctors Nurses Pharmacists

    PublicPrivate

    100

    9080

    70

    60

    50

    4030

    20

    100

    Sector

    Per

    cent

    with

    in s

    ecto

    r

    Segmented bar charts by sector

  • Time Trend

    Exaggerate visually the increase in # prescriptions written per person by starting at 8 rather than 0

    Stacked bar chart of yearly mortality rate per 1000 births

    Pagano & Gauvreau (1999) Principles of Biostatistics, Duxbury.

    Table*1:*Response*under*two*treatments*

    Response to Treatment

    None Partial

    Complete

    Total

    A 3

    15 9

    27

    B 2 22 30

    54

    Treatment

    NonePartial

    Complete

    BA

    100

    9080

    70

    60

    50

    4030

    20

    100

    Treatment

    With

    in tr

    eatm

    ent p

    erce

    ntag

    e

    treatmentResponse to

    Can compare the response type percentages for the two treatments

  • NonePartialComplete

    A B

    010

    20

    3040

    50

    60

    70

    8090

    100

    Treatment

    With

    in tr

    eatm

    ent p

    erce

    ntag

    etreatmentResponse to

    Stacked bar charts for percentage figures Histogram Divide the range of the data into a suitably chosen number of intervals, all of the same

    width The number of observations that fall

    within each interval is plotted

    Relative frequency histogram Plot the proportions of observations that

    fall within the class intervals

    40 60 80 100 120 140 160 180 200 220

    0

    10

    20

    SysVol

    Fre

    quen

    cy

    Heart Attack PatientsHistogram of End-Systolic Volume for 45 Male

    40 60 80 100 120 140 160 180 200 220

    0

    10

    20

    30

    40

    SysVol

    Per

    cent

    Relative frequency polygon for SysVol

  • Histogram*

    Exercise 3-5, Histogram

    05

    10152025303540

    19 20-24 25-29 30-34 35

    Percentage age distribution of pregnant women

    Thrombosis cases

    Step[up*chart*Exercise 3.8 Cumulative percetage o finfants

    6.6716.67

    36.67

    60

    90100

    0

    20

    40

    60

    80

    100

    120

    0 5 10

    Cumulativepercetage ofinfants

    Cumula-ve*frequency*curve*Exercise 3.9 Ogive

    0

    20

    40

    60

    80

    100

    120

    15-24 25-34 35-44 45-54 55-64 65-74 75-84 > 85

    Percentage cumulative frequency curves of age for male suicide attempters and later succeeders

    Attempting suicideLater successful

    4.*Describing*data*from*its*distribu-onal*shape*

    1.*symmetric*mound[shaped*distribu-ons*Exercise 3-5, Histogram

    05

    10152025303540

    19 20-24 25-29 30-34 35

    Percentage age distribution of pregnant women

    Thrombosis cases

  • Non-Symmetrical Histograms

    These histograms are skewed.

    Common Shapes of Histograms

    Skewed Histograms

    Skewed left (negative skew)

    Skewed right (positive skew)

    Common Shapes of Histograms

    Skewed Histograms

    Skewed left (negative skew)

    Skewed right (positive skew)

    Note: the SKEW follows the TAIL

    Skewed*distribu-ons*Exercise 4.2 shape

    020406080

    100120140160

    15-24

    25-34

    35-44

    45-54

    55-64

    65-74

    75-84

    >85

    Age distribution for female suicide attempters and later succeeders

    Attempting suicide

  • Shape*of*data*distribu-ons*******

    Symmetrical*or*skewed*

    Right-Skewed Left-Skewed Symmetric Mean = Median = Mode Mean Median Mode Median Mean Mode

    Bimodal*distribu-ons*

    A*bimodal*distribu-on*is*one*with*two*dis-nct*humps*or*peaks*

    Scaeer*Plots*

    Scaeer*plots*are*similar*to*line*graphs*in*that*each*graph*uses*the*horizontal*(*x*)*axis*and*ver-cal*(*y*)**axis*to*plot*data*points.*

    * Scaeer*plots*are*most*ogen*used*to*show*correla-ons*or*rela-onships*among*data.*

    Scaeer*Plots**Posi-ve*Correla-on*

    Study Time Class Grade

    0 55

    0.5 61

    1 67

    1.5 73

    2 81

    2.5 89

    3 91

    3.5 93

    4 95

    4.5 97

    How Study Time Affects Grades

    0

    20

    40

    60

    80

    100

    120

    0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

    Time in hours

    Ove

    rall

    grad

    e

  • Scaeer*Plots**Nega-ve*Correla-on*Work out time Weight

    0 200

    0.5 205

    1 190

    1.5 195

    2 180

    2.5 190

    3 170

    3.5 177

    4 160

    4.5 170

    5 150

    5.5 168

    6 140

    6.5 150

    7 130

    7.5 170

    8 120

    8.5 130

    9 110

    9.5 115

    10 100

    10.5 120

    11 90

    11.5 90

    12 80

    Weight Loss Over Time

    0

    50

    100

    150

    200

    250

    0 2 4 6 8 10 12

    Days worked out per month

    Wei

    ght

    Weight

    Scaeer*Plot*of*the*Data*

    Sandwich

    Total Fat (g) (X)

    Total Calories (y)

    Hamburger 9 260

    Cheeseburger 13 320

    Quarter Pounder 21 420

    Quarter Pounder with Cheese 30 530

    Big Mac 31 560

    Arch Sandwich Special 31 550

    Arch Special with Bacon 34 590

    Crispy Chicken 25 500

    Fish Fillet 28 560

    Grilled Chicken 20 440

    Grilled Chicken Light 5 300

    Fat Grams and Calories in Food

    0

    100

    200

    300

    400

    500

    600

    700

    0 5 10 15 20 25 30 35 40

    Total Fat Grams

    Tota

    l Cal

    orie

    s

    Damaged*for*life*by*too*much*TV*

    N Z Herald (04/10/2005)

    Damaged*for*life*by*too*much*TV*

  • Damaged*for*life*by*too*much*TV*

    TV watching

    Hea

    lth S

    core

    r = - 0.93

    Causal relationship?

    5.*Describing*data*with*numeric*summary*value*

    1.*numbers,*percentages*and*propor-ons* 2.*summary*measures*of*central*loca-on/central*tendency*

    3.*summary*measures*of*spread/dispersion*

    5.1.*Numbers,*percentages*and*propor-ons**

    Numbers[the*numerical*summaries*of*data* A*percentage*is*a*propor-on*mul-plied*by*100.**

    1)*Prevalence:*number*of*exis-ng*cases*in*some*popula-on*at*a*given*-me.*

    2)*Incidence*(incep-on):*the*number*of*new*cases*occurring*per*100,*or*per*1000,*of*the*popula-on,*during*some*period*of*-me.*

    5.2.*Summary*measures*of*Central*loca-on*

    1)*Mode:*category*or*value*occurs*the*most*ogen,*****[*Categorical,*numerical,*discrete*2)*Median:*middle*value*(data*in*ascending*order),*central[ness.*

    ****[*Use:*ordinal*and*numerical*data.*3)*Mean*(average):*divide*the*sum*of*the*values*by*the*number*of*values*

    4)*Percen>le:*divide*the*total*number*of*the*values*into*100*equal[sized*groups.*

  • Choosing*the*most*appropriate*measure*

    mode median mean

    Nominal

    Ordinal

    Numerical discrete Numerical continuous

    yes

    yes

    yes

    yes

    no

    yes

    Yes, if markedly skewed

    Yes, if markedly skewed

    no

    no

    yes

    yes

    5.3.*Summary*measure*of*spread/dispersion/variability*

    * Range:*maximum*value**minimum*value*

    IQR*(interquar>le*range):*=*(75th**25th)*percen-le************************************************=*Q3*Q1**

    BoxHand*whiskerplot:*graphical*summary*of*the*three*quar-le*values,*the*minimum*and*maximum*values,*and*outliers.*

    Box[and[Whisker*Plot*

    *****Graphical*Display*of*Data*Using** *5HNumber*Summary*

    Median

    4 6 8 10 12

    Q 3 Q 1 X Maximum value

    X Minimum value

    Standard*devia-on*

    The*spread*in*a*set*of*data;**average*distance*of*all*the*data*values*from*the*mean*value.**

    The*smaller*the*average*distance*is,*the*narrower*the*spread,*and*vice*versa.*

    Use:*numerical*data*only.*