introduction to biostatistics (zju 2008) wenjiang fu, ph.d associate professor division of...

43
Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU 2008) 2008) Wenjiang Fu, Ph.D Wenjiang Fu, Ph.D Associate Professor Associate Professor Division of Biostatistics, Division of Biostatistics, Department of Epidemiology Department of Epidemiology Michigan State University Michigan State University East Lansing, Michigan 48824, USA East Lansing, Michigan 48824, USA Email: Email: [email protected] [email protected] www: www: http://www.msu.edu/~fuw http://www.msu.edu/~fuw

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Introduction to Introduction to Biostatistics (ZJU Biostatistics (ZJU

2008)2008)Wenjiang Fu, Ph.DWenjiang Fu, Ph.DAssociate ProfessorAssociate Professor

Division of Biostatistics, Department of Division of Biostatistics, Department of Epidemiology Epidemiology

Michigan State UniversityMichigan State UniversityEast Lansing, Michigan 48824, USAEast Lansing, Michigan 48824, USA

Email: Email: [email protected]@msu.eduwww: www: http://www.msu.edu/~fuwhttp://www.msu.edu/~fuw

Page 2: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

IntroductionIntroduction Biostatistics ? Why do we need to study Biostatistics? Biostatistics ? Why do we need to study Biostatistics? A test A test

for myself !for myself !

Statistics – Data science to help to decipher data collected in Statistics – Data science to help to decipher data collected in many aspects of events using probability theory and many aspects of events using probability theory and statistical principles with the help of computer. statistical principles with the help of computer.

Statistics Statistics TheoreticalTheoreticalAppliedApplied BiostatsBiostats

EconomicsEconomicsFinanceFinanceEngineeringEngineering

SportsSports… …… …

Data:Data: Events: Events: party, disease, accident, award, party, disease, accident, award, game …game …

Subjects: human, animal … Subjects: human, animal … Characteristics: Characteristics: sex, race, age, weight, sex, race, age, weight,

height …height …

Page 3: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Inferentialstatistics

Estimation

Hypothesistesting

Prediction

StatisticsStatistics

samplingpopulation sample

descriptivestatistics

parameterstatistic

Most commonly, statistics refers to numerical data or other data. Statistics may also refer to the process of collecting, organizing, presenting, analyzing and interpreting data for the purpose of making inference, decision, policy and assisting scientific discoveries.

frequency

probability

Page 4: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Grand challenges we are Grand challenges we are

facingfacing … …

“Data”

Knowledge&

InformationDecision

Statistics

21st century will be the golden age of statistics !

Page 5: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Grand challenges we are Grand challenges we are

facingfacing … …1.1. Data collection technology has advanced Data collection technology has advanced

dramatically, but dramatically, but withoutwithout sufficient sufficient statistical sampling design and experimental statistical sampling design and experimental design.design.

2.2. Advancement of technology for discovering Advancement of technology for discovering and retrieving useful information has been and retrieving useful information has been lagginglagging and has become the bottleneck. and has become the bottleneck.

3.3. More sophisticated approaches are More sophisticated approaches are neededneeded for decision making and risk management.for decision making and risk management.

Page 6: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Statistical Challenges Statistical Challenges - Massive Amount of - Massive Amount of

DataData

Page 7: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Statistical Challenges – Statistical Challenges – Image DataImage Data

Page 8: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Statistics in ScienceStatistics in Science

Cosmic microwave background radiationHigh Energy Physics

Tick-by-tick stock data Genomic/protomic data

Page 9: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Statistics in ScienceStatistics in Science

Finger Prints Microarray

Page 10: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

What do we do?What do we do? New ways of thinking and attacking New ways of thinking and attacking

problemsproblems Finding sub-optimal but Finding sub-optimal but

computationally feasible solutions.computationally feasible solutions. New paradigm for new types of dataNew paradigm for new types of data Be satisfied with ‘very rough’ Be satisfied with ‘very rough’

approximationsapproximations Turn research results into easy and Turn research results into easy and

publicly available software and publicly available software and programs programs

Join force with computer Join force with computer scientists. scientists.

Page 11: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Some ‘Some ‘hothot’ research ’ research directionsdirections

Dimension reductionDimension reduction VisualizationVisualization Dynamic systemsDynamic systems Simulation and real time Simulation and real time

computationcomputation Uncertainty and risk Uncertainty and risk

managementmanagement Interdisciplinary researchInterdisciplinary research

Page 12: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Reasons to Study Reasons to Study Biostatistics IBiostatistics I

Biostatistics is everywhere around Biostatistics is everywhere around us:us: Our life: entertainment, sports game, Our life: entertainment, sports game,

shopping, party, communication (cell shopping, party, communication (cell phone), travel …phone), travel …

Our work: career, business, school …Our work: career, business, school … Our health: food, weather, disease … Our health: food, weather, disease … Our environment: safety, security, chemical, Our environment: safety, security, chemical,

animal, animal, Our well-being: physical examination, Our well-being: physical examination,

hospital, being happy, longevity. hospital, being happy, longevity.

Page 13: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Reasons to Study Reasons to Study Biostatistics IBiostatistics I

Entertainment - party: music / dance /foodEntertainment - party: music / dance /food Alcohol, cigarette, drug, etc. Alcohol, cigarette, drug, etc.

Sports game Sports game Car racing, skiing (time to event – survival Car racing, skiing (time to event – survival

analysis).analysis). Shopping: diff taste /preference : Shopping: diff taste /preference :

Allergy to certain food /smell : peanut, flowers … Allergy to certain food /smell : peanut, flowers … Communication - cell phone use Communication - cell phone use

Potential hazard – leads to health problem (CA …) Potential hazard – leads to health problem (CA …) Travel – infectious diseases, safety, accident Travel – infectious diseases, safety, accident

……

Page 14: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Reasons to Study Reasons to Study Biostatistics IIBiostatistics II

We care our society, our family, our We care our society, our family, our environment, our school, scientific research …environment, our school, scientific research …

Major impact on society and communities.Major impact on society and communities. Disease transmissionDisease transmission Healthcare benefit, health economicsHealthcare benefit, health economics Quality of life (research, health improvement)Quality of life (research, health improvement) Safety issue (outbreaks of diseases, etc.)Safety issue (outbreaks of diseases, etc.)

Job market is very promising. Job market is very promising. Applications in a wide-range of areas.Applications in a wide-range of areas.

Healthcare, quality of life, Healthcare, quality of life, Career – job market: scientific, public or private, Career – job market: scientific, public or private,

industrial …industrial …

Page 15: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Reasons to Study Reasons to Study Biostatistics IIIBiostatistics III

Biostatistics research and applicationsBiostatistics research and applications Major employers in the USMajor employers in the US

Research universities, Hospitals, Institutes Research universities, Hospitals, Institutes (NIH), CDC, DoD, NASA, pharmaceutical (NIH), CDC, DoD, NASA, pharmaceutical industry, biotech industry, banks and other industry, biotech industry, banks and other data warehouse … data warehouse …

Major universities having biostatistics Major universities having biostatistics department in the USdepartment in the US Harvard U, U. Michigan, U. Washington Harvard U, U. Michigan, U. Washington

(Seattle), UC (Berkeley, LA, SF), JHU, Yale U, (Seattle), UC (Berkeley, LA, SF), JHU, Yale U, Stanford U … Stanford U …

Page 16: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Reasons to Study Reasons to Study Biostatistics IVBiostatistics IV

New Biostatistics research areas (still growing)New Biostatistics research areas (still growing) Medical research. Medical research. Recent trend in employment Recent trend in employment

Private industry: Google, Microsoft …Private industry: Google, Microsoft … Affymetrix, Illumina, Agilent, Golden Helix, Affymetrix, Illumina, Agilent, Golden Helix,

23andMe …23andMe … Investment – stock market, Capital One, Bank of Investment – stock market, Capital One, Bank of

America, Goldman Sack, etc.America, Goldman Sack, etc.

Nano tech, green energy (alternative energy) Nano tech, green energy (alternative energy) ……

Page 17: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Example 1. Medical study Example 1. Medical study data: Ob/Gyndata: Ob/Gyn

Modeling of PlGF: Placental Growth Factor

Page 18: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Example 2. Genomics study Example 2. Genomics study Single Nucleotide Single Nucleotide

Polymorphism (SNP) Polymorphism (SNP) Homologous pairs of chromosomesHomologous pairs of chromosomes

Paternal allelePaternal allele

Maternal alleleMaternal allele

Paternal allele

Maternal allele

ACGAACAGCTTGCTTGTCGA

ACGAGCAGCTTGCTCGTCGA

SNP A/G

Page 19: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Computational Genomics: SNP Computational Genomics: SNP GenotypeGenotype

Error rate : around 5% : Genome-wide association studies – millions of SNPs

Page 20: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

ApplicationsApplications

Genetic counseling: Genetic counseling: gene expression + family medical history gene expression + family medical history

diseasedisease Breast cancer (BRCA) …Breast cancer (BRCA) …

Achieve accurate estimation and prediction Achieve accurate estimation and prediction Early detection / early treatment (cancer, …)Early detection / early treatment (cancer, …) Accurate diagnosis (HIV +)Accurate diagnosis (HIV +)

Help development of new drugs for treatment.Help development of new drugs for treatment. Help to protect environment, live longer and Help to protect environment, live longer and

happier, improve quality of life. happier, improve quality of life.

Page 21: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Did I pass my test?Did I pass my test?

I hope I have convinced you to study I hope I have convinced you to study biostatistics.biostatistics.

Page 22: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Chapter 2. Descriptive Chapter 2. Descriptive StatisticsStatistics

First important thing to do is to First important thing to do is to visualize data.visualize data.

Plot of dataPlot of data Scatter plot – pair-wise (var 1 vs. var 2)Scatter plot – pair-wise (var 1 vs. var 2)

Page 23: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Scatter plotScatter plot

Page 24: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Descriptive StatisticsDescriptive Statistics

Summarize data using statisticsSummarize data using statistics Central location (mean, median)Central location (mean, median) Range (min, max)Range (min, max) Variability (variance, standard deviation)Variability (variance, standard deviation) ModeMode Quantiles (percentiles)Quantiles (percentiles)

Rank data, but avoid long listing (use Rank data, but avoid long listing (use grouping, instead)grouping, instead)

Page 25: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Measure of LocationMeasure of Location

1

1 N

ii

xN

MeanMean

The mean is the sum of all the observations divided by the number of observations.

Population mean :Population mean :

Sample mean :Sample mean :

N The number of observations in the population.

n The number of observations in the sample.

n

iixn

x1

1

Page 26: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

The mean is the most widely used measure of location and has the following properties :The mean is the most widely used measure of location and has the following properties :

The mean is oversensitive to extreme values in the sample.The mean is oversensitive to extreme values in the sample.

,baxy ii ni ,,1 bxay

N

i

n

iii xxx

1 1

0)()(

Properties of the meanProperties of the mean

Page 27: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Translation of dataTranslation of data

Page 28: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Measure of LocationMeasure of Location

Median and ModeMedian and Mode

The median is the value of the “middle” point of samples, when samples are arranged in ascending order.The median is the value of the “middle” point of samples, when samples are arranged in ascending order.

Median = The [(n+1)/2]th largest observation if n is odd.

= The average of the (n/2)th and (n/2+1)th largest observation if n is even.

The mode is the most frequently occurring value among all the observations in a sample. It is the most probable value that would be obtained if one data point is selected at random from a population.

The mode is the most frequently occurring value among all the observations in a sample. It is the most probable value that would be obtained if one data point is selected at random from a population.

Page 29: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Calculate the median and mode of the following data:

12, 24, 36, 25, 17, 19, 24, 11

Sorted data : 11, 12, 17, 19, 24, 24, 25, 36

Example: Median and ModeExample: Median and Mode

19 2421.5,

2

Median = Mode = 24

Page 30: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

≤ ≤ = =

Mean Median Mode

≤ ≤

The mean is influenced by outliers while the median is not.The mean is influenced by outliers while the median is not.

The mode is very unstable. Minor fluctuations in the data can change it substantially; for this reason it is seldom calculated.

mode mode

bimodal

Page 31: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

When the shape of a distribution to the left and the right is mirror image of each other, the distribution is symmetrical. Examples of symmetrical distribution are shown below :

When the shape of a distribution to the left and the right is mirror image of each other, the distribution is symmetrical. Examples of symmetrical distribution are shown below :

A skewed distribution is a distribution that is not symmetrical . Examples of skewed distributions are shown below :A skewed distribution is a distribution that is not symmetrical . Examples of skewed distributions are shown below :

Positively skewed Negatively skewed

Symmetry and Skewness in Distribution

Page 32: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Range and Mean Absolute Deviation (MAD)Range and Mean Absolute Deviation (MAD)

The Range is the simplest measure of dispersion. It is simply the difference between the largest and smallest observations in a sample.

The mean absolute deviation is the average of the absolute values of the deviations of individual observations from the mean.

minmax xxRange

n

xxMAD

n

ii

1

||

Measure of DispersionMeasure of Dispersion

Page 33: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Quantile (percentile) is the general term for a value at or below which a stated proportion (p/100) of the data in a distribution lies.

Quantile (percentile) is the general term for a value at or below which a stated proportion (p/100) of the data in a distribution lies.

Quartiles: p = .25, .50, .75 Quantile / Percentile : p is any probability value

Quantiles or PercentilesQuantiles or Percentiles

Measure of DispersionMeasure of Dispersion

Page 34: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Let [k] denote the largest integer k. For example, [3]=3, [4.7]=4.

The p-th percentile is defined as follows:

• Find k = np/100.

• If k is an integer, the p-th percentile is the mean of the k-th and (k+1)-th observations (in the ascending sorted order).

• If k is NOT an integer, the p-th percentile is the [k]+1-th observation.

Calculating Quantiles or PercentilesCalculating Quantiles or Percentiles

Page 35: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Sorted data : 2, 4, 7, 8, 12, 14, 16, 17, 19, 20

(n = 10)10th percentile: k = np/100 = 10×10/100 = 1

Average of 1st and 2nd observations = (2+4)/2 = 3

75th percentile: k = np/100 = 10×75/100 = 7.5[7.5]+1 = 7+1 = 8th observation = 17

ExampleCalculate the 10th percentile and the 75th percentile of the following data:

7, 12, 16, 2, 8, 4, 20, 14, 19, 17

Page 36: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

The variance is a measure of how spread out a distribution is. It is computed as the average squared deviation of each number from its mean. The standard deviation is the square root of the variance. It is the most commonly used measure of spread.

The variance is a measure of how spread out a distribution is. It is computed as the average squared deviation of each number from its mean. The standard deviation is the square root of the variance. It is the most commonly used measure of spread.

sample variance

Variance and Standard DeviationVariance and Standard Deviation

Measure of DispersionMeasure of Dispersion

1

)(1

2

2

n

xxs

n

ii

x

sample standard deviation2xx ss

,baxy ii ni ,,1 ,222xy sas | | ,y xs a s

Page 37: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Five people have their body mass index (BMI) calculated as

[body weight (kg)] / [height] 2

18, 20, 22, 25, 24

ExampleExample

1

2 2

1

1 10921.8

5

1 32.8( ) 8.2

1 5 1

8.2 2.86

n

ii

n

x ii

x

X xn

s x Xn

s

Page 38: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

A direct comparison of two or more measures of dispersion may be difficult because of difference in their means.

A relative dispersion is the amount of variability in a distribution relative to a reference point or benchmark.

A common measure of relative dispersion is the coefficient of variation (CV).

A direct comparison of two or more measures of dispersion may be difficult because of difference in their means.

A relative dispersion is the amount of variability in a distribution relative to a reference point or benchmark.

A common measure of relative dispersion is the coefficient of variation (CV).

This measure remains the same regardless of the units used when only scaling applies. Very useful !

Good Example: Weight, Kg versus Lb.

Bad Example: Temperature: C vs F.

This measure remains the same regardless of the units used when only scaling applies. Very useful !

Good Example: Weight, Kg versus Lb.

Bad Example: Temperature: C vs F.

x

sCV x100

Relative Dispersion – Coefficient of VariationRelative Dispersion – Coefficient of Variation

Page 39: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Frequency DistributionFrequency Distribution

Long list of data collection can be confusing, and need to be grouped in moderate intervals, rather than listed as raw data point.

Hospital Length of Stay (LOS)__________________________________________________________________________________________81 44 29 23 16 13 12 11 11 64 43 28 22 16 13 12 11 11 12 1263 43 28 21 16 13 12 11 11 58 42 28 21 15 13 12 11 10 11 1098 58 42 28 20 15 13 12 11 10 93 56 36 28 20 15 12 12 11 1086 55 36 27 19 15 12 12 11 10 83 50 32 27 18 14 12 12 11 1083 50 32 26 27 14 12 12 11 10 81 48 30 23 17 14

Hospital Length of Stay (LOS)__________________________________________________________________________________________81 44 29 23 16 13 12 11 11 64 43 28 22 16 13 12 11 11 12 1263 43 28 21 16 13 12 11 11 58 42 28 21 15 13 12 11 10 11 1098 58 42 28 20 15 13 12 11 10 93 56 36 28 20 15 12 12 11 1086 55 36 27 19 15 12 12 11 10 83 50 32 27 18 14 12 12 11 1083 50 32 26 27 14 12 12 11 10 81 48 30 23 17 14

Page 40: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Interval Frequency Relative Frequency

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

A summary table works better

than raw data.

A summary table works better

than raw data.

Page 41: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

A bar graph is simply a bar chart of data that has been classified into a frequency distribution. The attractive feature of a bar graph is that it allows us to quickly see where the most of the observations are concentrated.

Graphic MethodsGraphic Methods

Interval Frequency

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

LOS

Bar Graph

Page 42: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

Histogram provides a distribution plot, where the bars are not necessarily of the same length. The area of each bar is proportional to the density of the data or percentage of data points within the bar.

Graphic MethodsGraphic Methods

Histogram

Page 43: Introduction to Biostatistics (ZJU 2008) Wenjiang Fu, Ph.D Associate Professor Division of Biostatistics, Department of Epidemiology Michigan State University

3 1

1 31.5 , 1.5

IQR Q Q

MIN Q IQR MAX Q IQR

MIN MAX

The box Plot is summary plot based on the median and interquartile range (IQR) which contains 50% of the values. Whiskers extend from the box to the highest and lowest values, excluding outliers. A line across the box indicates the median.

The box Plot is summary plot based on the median and interquartile range (IQR) which contains 50% of the values. Whiskers extend from the box to the highest and lowest values, excluding outliers. A line across the box indicates the median.

Graphic MethodsGraphic MethodsBox Plot