chapter 1: examining distributions. 1.1 displaying distributions with graphs

60
Chapter 1: Examining Distributions

Upload: todd-gibson

Post on 21-Jan-2016

261 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Chapter 1: Examining Distributions

Page 2: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

1.1 Displaying Distributions with graphs

Page 3: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Many public health efforts are directed toward increasing levels of physical activity. “Physical Activity in Urban White, African American, and Mexican American Women” (Medicine and Science in Sports and Exercise [1997]) reported on physical activity patterns in urban women. The accompanying data set given the preferred leisure-time physical activity for each of 30 Mexican American Women. The following coding is used; W=walking, T=weight training, C=cycling, G=gardening, A=aerobics.W T A W G T W W C WT W A T T W G W W CA W A W W W T W W T

Construct what you think is an appropriate graph to display this information.

Page 4: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.

California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63

Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

Page 5: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

IndividualDefinition:

VariableDefinition:

CategoricalDefinition:

Examples:

Types of graphs used:

QuantitativeDefinition:

Examples

Types of graphs used:

Pg. 4-19

Page 6: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

IndividualDefinition: object described by a set of data

VariableDefinition: characteristic of an individual

CategoricalDefinition: placing into group or category

Examples: gender, race, smoker, marital status

Types of graphs used: bar graph; pie chart

QuantitativeDefinition: Numerical values as a result of a measurement

Examples: age, blood pressure, salary

Types of graphs used: histogram, stemplot, time plot

Pg. 4-19

Page 7: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Categorical Variable Bar Graph (pictograph)

What does the height show?

count or %Does graph need

to include all categories?

noPg 8 #1.3 & 1.4

Pie ChartShows?

Visual for comparison with whole group

Does graph need to include all categories?

yes

Page 8: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Histogram

0 - 2 20

3 - 5 14

6 - 8 15

9 - 11 2

12 - 14 1

Rating Frequency

Page 9: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

HistogramHas a horizontal axis that often represents

groups of data rather than individual dataMethod:

Divide data into classes of equal width (5-15) Count number in each class Draw bar graph with no space between bars

Example: NCAA

Page 10: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.

California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63

Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

Page 11: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

HistogramsThese six histograms each describe the same set of data from Table 1.2 on page 11 of your book.

Which one is most useful? least useful? Why?

A B C

D E F

Page 12: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs
Page 13: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Interpreting histograms

Look for overall pattern & striking deviations

Describe shape, center, and spreadSymmetricSkewed to the right – right side extends much

farther out than the left side

Page 14: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Quantitative variable cont. Stemplot

For small data setsQuicker to make and presents more detailed

infoStem consists of all but final, rightmost digit,

and leaf is the final digitExample: NCAA

Time plotTo show a change over timeExample: pg 19 #1.10

Page 15: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.

California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63

Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

Page 16: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

What kind of graph would be appropriate? Whether a spun penny lands “heads” or “tails” The number of calories in a fast food sandwich The life expectancy of a nation The occupational background of a Civil War general The weight of an automobile For whom an American voted in the 1992 Presidential

election The age of a bride on her wedding day The average low temperature in January for Appleton

Page 17: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Misleading graphs

Page 18: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

In trying to make the graph more visually interesting by replacing the bars of a bar chart with milk buckets, areas are distorted.

Page 19: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs
Page 20: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Another common distortion occurs when a third dimension is added to bar charts or pie charts. The 3-D version distorts the areas, and as a consequence, is much more difficult to interpret correctly.

Page 21: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

It is common to see scatterplots with broken axes, but be cautious of time plots, bar graphs, or histograms with broken axes. Broken axes in time plots can exaggerate the magnitude of change over time.

Page 22: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

In bar graphs and histograms, the vertical axis should never be broken. For example, by starting the vertical axis at 50 exaggerates the gain. The area for the rectangle representing 68 is more than three times the area of the rectangle representing 55.

Page 23: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Watch out for unequal time spacing in time plots.

Page 24: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Information from research studies is sometimes taken out of context. Think critically!

Page 25: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

What might be wrong with the following? Only 3% of the men surveyed read cosmopolitan magazine. Since most automobile accidents occur within 15 miles of a person’s

residence, it is safer to make long trips. A television commercial claims that “our razor blades are

manufactured to such high standards that they will give you a shave that is 50% closer”.

A national health food magazine claims that “95% of its subscribers who follow the magazines recommendation and take megadoses of vitamin C are healthy and vigorous”.

During 1990 there were 234 accidents involving drunken drivers and 15,897 accidents involving drunken pedestrians reported in Danville. Can we conclude that it is safer in Danville to be a drunken driver than a drunken pedestrian?

Page 26: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Review of graphs: Pg 14 #1.7 & 1.8 Pg 20 #1.11, 1.18, 1.19 SAT scores: Make a histogram to better

understand data given and interpret the histogram.

Page 27: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

1.2 Describing distributions with numbers

Page 28: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Population – the entire group of individuals that we want information about

Sample – part of the population that we actually examine in order to gather information and make conclusions

Page 29: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Mean

Measure of its center or average

µ used for population mean

n

xxxxx n...321

n

xx or

Page 30: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Median

Midpoint of distribution To find median: Symmetrical distribution – mean and

median are close together Skewed distribution – the mean is farther

out in the long tail than is the median

http://www.rossmanchance.com/applets/DotPlotAppletAug11/DotPlotApplet.html

Page 31: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Mode Data that is repeated most often

Mode = Mean = MedianSYMMETRIC

SKEWED LEFT(negatively)

Mean Mode Median

SKEWED RIGHT(positively)

Mean Mode Median

Page 32: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Quartiles Spread of the middle half of data To calculate

arrange data in ascending order and locate median lower quartile (Q1) is the median of the low half of

data upper quartile (Q3) is the median of the upper half

Q1 is larger than 25% of data

Q2 is larger than 50% of data

Q3 is larger than 75% of data

Page 33: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

12 35 23 9 5 21 45 56 24 6 28 31

5 6 9 12 21 23 24 28 31 35 45 56

Q2

Median23.5

Q1

10.5Q3

33

Find the Quartiles for the following data.

Page 34: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

5 number summary and boxplot 5 number summary – minimum, Q1, Q2,

Q3, maximum Boxplot – graph of 5 number summary

Best used for side-by-side comparison of more than one set of data

Include numerical scale in the graph

5 10.5 23.5 33 56

Min Q1 Q2 Q3 Max

http://www.rossmanchance.com/applets/DotPlotAppletAug11/DotPlotApplet.html

Page 35: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

The chronicle of Higher Education (August 31, 2001) reported graduation rates for NCAA Div. 1 schools. The rates reported are the % of full-time freshmen in fall 1993 who had earned a bachelor’s degree for August 1999.

California: 64 41 44 31 37 73 72 68 35 37 81 90 82 74 79 67 66 66 70 63

Texas: 67 21 32 88 35 71 39 35 71 63 12 46 35 39 28 65 25 24 22

Page 36: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Outliers

An unusually small or large data value Calculate interquartile range (Q3 – Q1) An observation is an outlier if it falls more

than 1.5 times the IQR above Q3 or below Q1

Page 37: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Standard Deviation

Measures spread by looking at how far the observations are from their mean

Variance formula: Standard deviation formula: s used for sample data; σ is used for

population (equation is slightly different)

1

)( 22

n

xxs i

2ss

Page 38: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Waiting Times of Bank Customers at Different Banks (in minutes)

Jefferson Valley

Bank

Bank of

Providence

6.5

4.2

6.6

5.4

6.7

5.8

6.8

6.2

7.1

6.7

7.3

7.7

7.4

7.7

7.7

8.5

7.7

9.3

7.7

10.0

Jefferson Valley Bank

7.15

7.20

7.7

Bank of Providence

7.15

7.20

7.7

Mean

Median

Mode

What is the Standard Deviation of the data from JV Bank? from BofP?

Page 39: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Dotplots of Waiting Times

Visually, which one has the greater spread?

Page 40: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Calculate the mean & standard deviation for each set of test scores95 99 100 92 90

92 94 95 87 90

90 83 89 90 89

85 75 65 90 89

87 93 95 89 90

Page 41: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Calculate the mean & standard deviation for each set of test scores95 99 100 92 90

92 94 95 87 90

90 83 89 90 89

85 75 65 90 89

87 93 95 89 90

89.8 88.8 88.8 89.6 89.6

3.96 9.65 13.86 1.82 .55

Page 42: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Choosing a summary

The five number summary is used for describing a skewed distribution or a distribution with outliers

Use mean for reasonably symmetric distributions that are free of outliers

Page 43: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Section 1.2 practice

Pg 41 - 45 #1.35, 1.38, 1.47, 1.48, 1.49

Page 44: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

1.3 Normal Distributions

Compact picture of the overall pattern of the data

Page 45: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Density curveScores on national tests often have a regular distribution

pg 46 & 47

symmetrical

make total area under curve equal one

partial area represents % of total “students” (observations)

Page 46: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Normal Distributions What are they?

Density curves that are symmetrical, single-peaked, and bell-shaped

Curve is described by its . . . mean µ and standard deviation σ

Where is the mean located? at the center of the curve

What controls how spread out the curve is? Standard deviation controls the spread; the larger the

σ the more spread out the data Where is the σ on the curve?

at the points of change of curvature

pg 51-52

Page 47: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs
Page 48: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Why are normal curves important?

Good descriptions for some distributions of real data (scores on tests, measurements of same quantity, characteristics of biological populations)

Good approximations to the results of many kinds of chance outcomes (tossing coin, rolling die)

Page 49: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

68-95-99.7 rule

In a normal distribution: 68% of the observations fall within 1 of

the mean 95% of the observations fall within 2 of

the mean 99.7% of the observations fall within 3 of

the meanhttp://www.rossmanchance.com/applets/DotPlotAppletAug11/DotPlotApplet.html

Page 50: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

68-95-99.7 rule

x x - s x + s

68% within1 standard deviation

34% 34%

x - 2s x + 2s

95% within 2 standard deviations

13.5% 13.5%

x - 3s x + 3s

99.7% of data are within 3 standard deviations of the mean

0.1% 0.1%2.4% 2.4%

Page 51: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

example: Light bulbs: x = 1600 hrs, s = 100 hr

68% of light bulbs last: 95% of light bulbs last: 99.7% of light bulbs last:

Practice problems: pg 54 #1.53, 1.54, 1.55

Page 52: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Standard normal curve standardizing a normal curve is making all

normal distributions the same normal distribution with mean = 0 and standard

deviation = 1 z-score (# of standard deviations a value is

away from the mean) Formula:

any question about what proportion of observations lie in some range of values can be answered by finding the area under the curve (percentage)

x

z Practice problem pg 56 #1.56

Page 53: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

What % of the population has a z-score. . . Less than -1.76

Shaded area = .0392 or 3.92% Less than 0.58

Shaded area = .7190 or 71.90% Greater than 1.96

Lower area = .9750 so shaded area = .0250 or 2.50%

Between -1.76 and .58 .7190 - .0392 = .6798 or 67.98%

Page 54: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

In a standard normal distribution, find the z-score that cuts off the bottom 10%

.1003 is z = -1.28 top 15%

.8508 is z = 1.04

.10

.15.85

Page 55: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

If the probability of getting less than a certain z-value is .1190, what is the z-value?

z = -1.18.1190

Page 56: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

If the probability of getting larger than a certain z-value is .0129, what is the z-value?

1 - .0129 = .9871 z = 2.23

.0129

Page 57: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

In a normal distribution µ=25 and =5. What is the probability of obtaining a value greater than 30?

z = (30-25)/5 = 1 1-.8413 = .1587 or 15.87%

less than 15? z = (15-25)/5 = -2 .0228 or 2.28%

between 20 and 30? z = -1 and z = 1 .8413-.1587 = .6826 or 68.26%

30

15

20 30

Page 58: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

The Flatt Tire Corporation claims that the useful life of its tires is normally distributed with a mean life of 28,000 miles and with a standard deviation of 4000 miles. What percentage of the tires are expected to last more than 35,000 miles?

z = (35000-28000) / 4000 = 1.75 1 - .9599 = .0401 or 4.01%

35000

Page 59: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Suppose it takes you 20 minutes to drive to school, with a standard deviation of 2 minutes.

• How often will you arrive on school in less than 22 minutes?• How often will it take you more than 24 minutes?• 75% of the time you will arrive in x minutes or less. Solve

for x.• 43% of the time you will arrive in y minutes or more. Solve

for y.

Page 60: Chapter 1: Examining Distributions. 1.1 Displaying Distributions with graphs

Section 1.3 practice problems

Pg. 61 #1.57, 1.58 Pg. 64 #1.61, 62, 63, 65, 66, 68, 70