introduction to biostatistics dr.s.shaffi ahamed asst. professor dept. of family and comm. medicine...

Post on 20-Dec-2015

225 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

INTRODUCTION TO INTRODUCTION TO BIOSTATISTICSBIOSTATISTICS

DR.S.Shaffi AhamedDR.S.Shaffi AhamedAsst. ProfessorAsst. ProfessorDept. of Family and Comm. MedicineDept. of Family and Comm. MedicineKKUHKKUH

This session covers:This session covers:

Origin and development of BiostatisticsOrigin and development of Biostatistics Definition of Statistics and BiostatisticsDefinition of Statistics and Biostatistics Reasons to know about BiostatisticsReasons to know about Biostatistics Types of dataTypes of data Graphical representation of a dataGraphical representation of a data Frequency distribution of a dataFrequency distribution of a data

““Statistics is the science which deals Statistics is the science which deals with collection, classification and with collection, classification and tabulation of numerical facts as the tabulation of numerical facts as the basis for explanation, description basis for explanation, description and comparison of phenomenon”.and comparison of phenomenon”.

------ Lovitt------ Lovitt

Origin and development of Origin and development of statistics in Medical Researchstatistics in Medical Research In 1929 a huge paper on application of In 1929 a huge paper on application of

statistics was published in Physiology statistics was published in Physiology Journal by Dunn.Journal by Dunn.

In 1937, 15 articles on statistical methods In 1937, 15 articles on statistical methods by Austin Bradford Hill, were published in by Austin Bradford Hill, were published in book form.book form.

In 1948, a RCT of Streptomycin for In 1948, a RCT of Streptomycin for pulmonary tb., was published in which pulmonary tb., was published in which Bradford Hill has a key influence.Bradford Hill has a key influence.

Then the growth of Statistics in Medicine Then the growth of Statistics in Medicine from 1952 was a 8-fold increase by 1982. from 1952 was a 8-fold increase by 1982.

Douglas Altman Ronald Fisher Karl Pearson C.R. Rao

Gauss -

““BIOSTATISICSBIOSTATISICS””

(1) Statistics arising out of biological (1) Statistics arising out of biological sciences, particularly from the fields of sciences, particularly from the fields of Medicine and public health.Medicine and public health.

(2) The methods used in dealing with (2) The methods used in dealing with statistics in the fields of medicine, biology statistics in the fields of medicine, biology and public health for planning, and public health for planning, conducting and analyzing data which conducting and analyzing data which arise in investigations of these branches.arise in investigations of these branches.

Reasons to know about Reasons to know about biostatistics:biostatistics:

Medicine is becoming increasingly Medicine is becoming increasingly quantitative.quantitative.

The planning, conduct and interpretation The planning, conduct and interpretation of much of medical research are of much of medical research are becoming increasingly reliant on the becoming increasingly reliant on the statistical methodology.statistical methodology.

Statistics pervades the medical literature.Statistics pervades the medical literature.

Example: Evaluation of Penicillin (treatment A) Example: Evaluation of Penicillin (treatment A) vs Penicillin & Chloramphenicol (treatment B) for vs Penicillin & Chloramphenicol (treatment B) for treating bacterial pneumonia in children< 2 yrs.treating bacterial pneumonia in children< 2 yrs.

What is the sample size needed to demonstrate the What is the sample size needed to demonstrate the significance of one group against other ?significance of one group against other ?

Is treatment A is better than treatment B or vice versa ?Is treatment A is better than treatment B or vice versa ? If so, how much better ?If so, how much better ? What is the normal variation in clinical measurement ? (mild, What is the normal variation in clinical measurement ? (mild,

moderate & severe) ?moderate & severe) ? How reliable and valid is the measurement ? (clinical & How reliable and valid is the measurement ? (clinical &

radiological) ?radiological) ? What is the magnitude and effect of laboratory and technical What is the magnitude and effect of laboratory and technical error ?error ? How does one interpret abnormal values ? How does one interpret abnormal values ?

CLINICAL MEDICINECLINICAL MEDICINE

Documentation of medical history of Documentation of medical history of diseases.diseases.

Planning and conduct of clinical studies.Planning and conduct of clinical studies. Evaluating the merits of different Evaluating the merits of different

procedures.procedures. In providing methods for definition of In providing methods for definition of

“normal” and “abnormal”.“normal” and “abnormal”.

PREVENTIVE MEDICINEPREVENTIVE MEDICINE

To provide the magnitude of any health To provide the magnitude of any health problem in the community.problem in the community.

To find out the basic factors underlying To find out the basic factors underlying the ill-health.the ill-health.

To evaluate the health programs which To evaluate the health programs which was introduced in the community was introduced in the community (success/failure).(success/failure).

To introduce and promote health To introduce and promote health legislation.legislation.

WHAT DOES STAISTICS WHAT DOES STAISTICS COVER ?COVER ?

PlanningPlanning DesignDesign Execution (Data collection)Execution (Data collection) Data ProcessingData Processing Data analysisData analysis PresentationPresentation InterpretationInterpretation PublicationPublication

HOW A “BIOSTATISTICIAN” HOW A “BIOSTATISTICIAN” CAN HELP ?CAN HELP ?

Design of studyDesign of study Sample size & power calculationsSample size & power calculations Selection of sample and controlsSelection of sample and controls Designing a questionnaireDesigning a questionnaire Data ManagementData Management Choice of descriptive statistics & graphsChoice of descriptive statistics & graphs Application of univariate and multivariateApplication of univariate and multivariate statistical analysis techniquesstatistical analysis techniques

INVESTIGATIONINVESTIGATION

Data Colllection

Data Presentation

TabulationDiagramsGraphs

Descriptive Statistics

Measures of LocationMeasures of Dispersion

Measures of Skewness & Kurtosis

Inferential Statistiscs

Estimation Hypothesis TestingPonit estimate

Inteval estimate

Univariate analysis

Multivariate analysis

TYPES OF DATATYPES OF DATA

QUALITATIVE DATAQUALITATIVE DATA DISCRETE QUANTITATIVEDISCRETE QUANTITATIVE CONTINOUS QUANTITATIVECONTINOUS QUANTITATIVE

QUALITATIVEQUALITATIVE

NominalNominal Example: Sex ( M, F)Example: Sex ( M, F)

Exam result (P, F)Exam result (P, F)

Blood Group (A,B, O or AB)Blood Group (A,B, O or AB)

Color of Eyes (blue, green,Color of Eyes (blue, green,

brown, black)brown, black)

ORDINALORDINAL Example:Example: Response to treatmentResponse to treatment (poor, fair, good)(poor, fair, good) Severity of diseaseSeverity of disease (mild, moderate, severe)(mild, moderate, severe) Income status (low, middle,Income status (low, middle, high)high)

QUANTITATIVE (DISCRETE)QUANTITATIVE (DISCRETE) Example: The no. of family membersExample: The no. of family members The no. of heart beatsThe no. of heart beats The no. of admissions in a dayThe no. of admissions in a day

QUANTITATIVE (CONTINOUS)QUANTITATIVE (CONTINOUS) Example: Height, Weight, Age, BP, Example: Height, Weight, Age, BP,

SerumSerum Cholesterol and BMI Cholesterol and BMI

Discrete data -- Gaps between possible values

Continuous data -- Theoretically,no gaps between possible values

Number of Children

Hb

CONTINUOUS DATA CONTINUOUS DATA

DISCRETE DATA DISCRETE DATA

wt. (in Kg.) : under wt, normal & over wt.wt. (in Kg.) : under wt, normal & over wt.

Ht. (in cm.): short, medium & tallHt. (in cm.): short, medium & tall

hospital length of stay Number Percent

1 – 3 days 5891 43.3

4 – 7 days 3489 25.6

2 weeks 2449 18.0

3 weeks 813 6.0

1 month 417 3.1

More than 1 month 545 4.0

Total 14604 100.0

Mean = 7.85 SE = 0.10

Table 1 Distribution of blunt injured patients according to hospital length of stay

Scale of measurementScale of measurement

Qualitative variable: A categorical variable

Nominal (classificatory) scale  - gender, marital status, race

Ordinal (ranking) scale  - severity scale, good/better/best

Scale of measurementScale of measurementQuantitative variable: A numerical variable: discrete; continuous

Interval scale : Data is placed in meaningful intervals and order. The unit of measurement are arbitrary.

- Temperature (37º C -- 36º C; 38º C-- 37º C are equal) and No implication of ratio (30º C is not twice as hot as 15º C)

Ratio scale:

Data is presented in frequency distribution in logical order. A meaningful ratio exists.

- Age, weight, height, pulse rate

- pulse rate of 120 is twice as fast as 60

- person with weight of 80kg is twice as heavy as the one with weight of 40 kg.

Scales of MeasureScales of Measure

Nominal Nominal – qualitative classification of equal – qualitative classification of equal value: gender, race, color, city value: gender, race, color, city

Ordinal Ordinal - qualitative classification which - qualitative classification which can be rank ordered: socioeconomic status can be rank ordered: socioeconomic status of families of families

IntervalInterval - Numerical or quantitative data: - Numerical or quantitative data: can be rank ordered and sizes compared : can be rank ordered and sizes compared : temperature temperature

RatioRatio - Quantitative interval data along with - Quantitative interval data along with ratio: time, age.ratio: time, age.

INVESTIGATIONINVESTIGATION

Data Colllection

Data Presentation

TabulationDiagramsGraphs

Descriptive Statistics

Measures of LocationMeasures of Dispersion

Measures of Skewness & Kurtosis

Inferential Statistiscs

Estimation Hypothesis TestingPonit estimate

Inteval estimate

Univariate analysis

Multivariate analysis

Frequency DistributionsFrequency Distributions

data distribution – pattern of data distribution – pattern of variability.variability. the center of a distributionthe center of a distribution the rangesthe ranges the shapesthe shapes

simple frequency distributionssimple frequency distributions grouped frequency distributionsgrouped frequency distributions

midpointmidpoint

PatienPatient Not No

HbHb

(g/dl)(g/dl)

PatienPatient Not No

HbHb

(g/dl)(g/dl)

PatienPatient Not No

HbHb

(g/dl)(g/dl)

11 12.012.0 1111 11.211.2 2121 14.914.9

22 11.911.9 1212 13.613.6 2222 12.212.2

33 11.511.5 1313 10.810.8 2323 12.212.2

44 14.214.2 1414 12.312.3 2424 11.411.4

55 12.312.3 1515 12.312.3 2525 10.710.7

66 13.013.0 1616 15.715.7 2626 12.512.5

77 10.510.5 1717 12.612.6 2727 11.811.8

88 12.812.8 1818 9.19.1 2828 15.115.1

99 13.213.2 1919 12.912.9 2929 13.413.4

1010 11.211.2 2020 14.614.6 3030 13.113.1

Tabulate the hemoglobin values of 30 adult Tabulate the hemoglobin values of 30 adult male patients listed belowmale patients listed below

Steps for making a Steps for making a tabletable

Step1 Find Minimum (9.1) & Maximum (15.7)Step1 Find Minimum (9.1) & Maximum (15.7)

Step2 Calculate difference 15.7 – 9.1 = 6.6 Step2 Calculate difference 15.7 – 9.1 = 6.6

Step3 Decide the number and width of Step3 Decide the number and width of the classes (7 c.l) 9.0 -9.9, 10.0-10.9,---- the classes (7 c.l) 9.0 -9.9, 10.0-10.9,----

Step4 Prepare dummy table – Step4 Prepare dummy table – Hb (g/dl), Tally mark, No. patientsHb (g/dl), Tally mark, No. patients

Hb (g/dl) Tall marks No. patients

9.0 – 9.910.0 – 10.911.0 – 11.912.0 – 12.913.0 – 13.914.0 – 14.915.0 – 15.9

   

Total    

   Hb (g/dl) Tall marks No.

patients

9.0 – 9.910.0 – 10.911.0 – 11.912.0 – 12.913.0 – 13.914.0 – 14.915.0 – 15.9

lllllllllll llll

lllllllll

13610532

Total - 30

DUMMY TABLEDUMMY TABLE Tall Marks TABLETall Marks TABLE

Hb (g/dl) No. of patients

9.0 – 9.910.0 – 10.911.0 – 11.912.0 – 12.913.0 – 13.914.0 – 14.915.0 – 15.9

136

10532

Total 30

Table Frequency distribution of 30 adult male Table Frequency distribution of 30 adult male patients by Hb patients by Hb

Table Frequency distribution of adult patients byTable Frequency distribution of adult patients by Hb and gender:Hb and gender:

Hb(g/dl)

Gender Total

Male Female

<9.09.0 – 9.9

10.0 – 10.911.0 – 11.912.0 – 12.913.0 – 13.914.0 – 14.915.0 – 15.9

0136

10532

23586420

248

1416952

Total 30 30 60

Elements of a TableElements of a TableIdeal table should have Number

Title Column headings Foot-notes

Number – Table number for identification in a report

Title,place - Describe the body of the table, variables, Time period (What, how classified, where and when)

Column - Variable name, No. , Percentages (%), etc.,Heading

Foot-note(s) - to describe some column/row headings, special cells, source, etc.,

Death rate (/1000 per annum)No. of divisions7.0-7.9 4 (3.3)

8.0 - 8.9 13 (10.8)9.0 - 9.9 20 (16.7)

10.0 - 10.9 27 (22.5)11.0 - 11.9 18 (15.0)12.0 - 12.9 11 (0.2)13.0 - 13.9 11 (9.2)14.0 - 14.9 6 (5.0)15.0 - 15.9 2 (1.7)16.0 - 16.9 4 (3.3)17.0 - 18.9 3 (2.5)

19.0 + 1 (0.8)Total 120 (100.0)

Table II. Distribution of 120 (Madras) Corporation divisions according to annual death rate based on registered deaths in 1975 and 1976

Figures in parentheses indicate percentages

DIAGRAMS/GRAPHSDIAGRAMS/GRAPHS

Discrete dataDiscrete data --- Bar charts (one or two groups)--- Bar charts (one or two groups)

Continuous dataContinuous data --- Histogram--- Histogram --- Frequency polygon (curve)--- Frequency polygon (curve) --- Stem-and –leaf plot--- Stem-and –leaf plot --- Box-and-whisker plot--- Box-and-whisker plot

Example dataExample data

68 63 42 27 30 36 28 3279 27 22 28 24 25 44 6543 25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 64 47 23 22 43 27 49 28 23 19 11 52 46 3130 43 49 12

HistogramHistogram

Figure 1 Histogram of ages of 60 subjects

11.5 21.5 31.5 41.5 51.5 61.5 71.5

0

10

20

Age

Freq

uen

cy

PolygonPolygon

71.561.551.541.531.521.511.5

20

10

0

Age

Freq

uen

cy

Example dataExample data

68 63 42 27 30 36 28 3279 27 22 28 24 25 44 6543 25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 64 47 23 22 43 27 49 28 23 19 11 52 46 3130 43 49 12

Stem and leaf plotStem and leaf plot

Stem-and-leaf of Age N = 60

Leaf Unit = 1.0

6 1 122269

19 2 1223344555777788888

(11) 3 00111226688

13 4 2223334567999

5 5 01127

4 6 3458

2 7 49

Box plotBox plot

10

20

30

40

50

60

70

80A

ge

Descriptive statistics Descriptive statistics report: Boxplotreport: Boxplot

- minimum score- maximum score- lower quartile- upper quartile - median- mean

- the skew of the distribution: positive skew: mean > median & high-score whisker is longer negative skew: mean < median & low-score whisker is longer

10%

20%

70%

Mild

Moderate

Severe

The prevalence of different degree of Hypertension

in the population

Pie Chart•Circular diagram – total -100%

•Divided into segments each representing a category

•Decide adjacent category

•The amount for each category is proportional to slice of the pie

Bar GraphsBar Graphs

912

2016

128

20

0

5

10

15

20

25

Smo Alc Chol DM HTN NoExer

F-H

Risk factor

Numb

er

The distribution of risk factor among cases with Cardio vascular Diseases

Heights of the bar indicates frequency

Frequency in the Y axis and categories of variable in the X axis

The bars should be of equal width and no touching the other bars

HIV cases enrolment in USA HIV cases enrolment in USA by genderby gender

0

2

4

6

8

10

12

1986 1987 1988 1989 1990 1991 1992

Year

En

rollm

ent

(hu

nd

red

)

MenWomen

Bar chart

HIV cases Enrollment HIV cases Enrollment in USA by genderin USA by gender

0

2

4

6

8

10

12

14

16

18

1986 1987 1988 1989 1990 1991 1992

Year

Enro

llm

ent (T

hou

sands)

WomenMen

Stocked bar chart

Graphic Presentation of Graphic Presentation of DataData

the histogram (quantitative data)

the bar graph (qualitative data)

the frequency polygon (quantitative data)

General rules for designing General rules for designing graphsgraphs

A graph should have a self-explanatory A graph should have a self-explanatory legendlegend

A graph should help reader to understand A graph should help reader to understand datadata

Axis labeled, units of measurement Axis labeled, units of measurement indicatedindicated

Scales important. Start with zero (otherwise Scales important. Start with zero (otherwise // break)// break)

Avoid graphs with three-dimensional Avoid graphs with three-dimensional impression, it may be misleading (reader impression, it may be misleading (reader visualize less easilyvisualize less easily

Any QuestionsAny Questions

top related