chapter 3 displaying and describing categorical data

50
. Chapter 3 Displaying and Describing Categorical Data

Upload: julian-allen

Post on 30-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 3 Displaying and Describing Categorical Data

.

Chapter 3

Displaying and Describing

Categorical Data

Page 2: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 2

The Three Rules of Data Analysis

The three rules of data analysis won’t be difficult to remember:

Make a picture—things may be revealed that are not obvious in the raw data. These will be things to think about.

Make a picture—important features of and patterns in the data will show up. You may also see things that you did not expect.

Make a picture—the best way to tell others about your data is with a well-chosen picture.

Page 3: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 3

Case Study - Titanic

At 11:40 on the night of April 14, 1912, Frederick Fleet’s cry of “Iceberg, right ahead” signal the beginning of a nightmare that has become legend.

By 2:15 am the Titanic thought by many to be unsinkable, had sunk, leaving more than 1,500 passengers and crew members on board to meet their icy fate.

Page 4: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 4

Data – passengers and crew aboard the Titanic

Survival Age Sex Class

Dead Adult Male Third

Dead Adult Male Crew

Dead Adult Male Second

Dead Adult Male Crew

Dead Adult Male Crew

Dead Adult Female Second

Alive Adult Female First

Dead Child Male First

Page 5: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 5

Titanic (cont.)

Each case (row) of the data table represents a person on board the ship.

The variables are

Person’s Survival

Person’s Age

Sex

Ticket Class.

Page 6: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 6

Titanic – Variables (cont.)

Survival

Age

Sex

Ticket Class

Dead or Alive

Categorical

Adult or Child

Categorical

Male or Female

Categorical

First, second, third, crew

Categorical

Page 7: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 7

Where from here?

We will study categorical variables. Interesting when we look at how categorical

variables work together. Example of questions:

- What percent of people were in first class? What in second class?

- Was the percent of survivors higher in first class than in second class?

Page 8: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 8

What are the Ws?

Who What

When Where hoW

Why

People on the Titanic

Survival status, age, sex, ticket class

April 14, 1912

North Atlantic

A variety of sources and internet sites

Historical Interest – see questions above

Page 9: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 9

Frequency Tables: Making Piles

We can “pile” the data by counting the number of data values in each category of interest.

We can organize these counts into a frequency table, which records the totals and the category names

ClassCount

(Frequency)First 325

Second 285Third 706Crew 885

2201

Page 10: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 10

Frequency Tables: Making Piles (cont.)

A relative frequency table is similar, but gives the percentages (instead of counts) for each category.

Class CountRelative

FrequencyPercentage

First 325 0.1477 14.77%Second 285 0.1295 12.95%

Third 706 0.3208 32.08%Crew 885 0.4021 40.21%

2201 1.0000 100.00%

Page 11: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 11

Answer to first question on slide 3-7

What percent of people were in first class? What in second class?

There were 325 people in first class.

There were 285 people in second class.

There were 2201 people on board.

So, there were 325 / 2201 ≈ 14.77% of the

people in first class and

285 / 2201 ≈ 12.95% in second class.

Page 12: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 12

What’s Wrong With This Picture?

The length of the ship is the count in each class.

When we look at each ship, we see the area taken up by the ship, instead of the length of the ship.

The ship display makes it look like most of the people on the Titanic were crew members, with a few passengers along for the ride.

Page 13: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 13

The Area Principle

The ship display violates the area principle:

The area occupied by the graph should correspond to the magnitude of the value it represents.

Page 14: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 14

Bar Charts

A bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison.

A bar chart stays true to the area principle.

Thus a better display for the ship data is:

0

200

400

600

800

1000

First Second Third Crew

Class

Fre

qu

ency

Page 15: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 15

Bar Charts (cont.)

A relative frequency bar chart displays the relative proportion of counts for each category.

A relative frequency bar chart also stays true to the area principle.

Replacing counts with percentages in the ship data:

0%

10%

20%

30%

40%

50%

First Second Third Crew

Class

Pro

po

rtio

n

Page 16: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 16

Pie Charts

When you are interested in parts of the whole, a pie chart might be your display of choice.

Pie charts show the whole group of cases as a circle.

They slice the circle into pieces whose size is proportional to the fraction of the whole in each category.

Page 17: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 17

Contingency Tables

A contingency table allows us to look at two categorical variables together.

It shows how individuals are distributed along each variable, contingent on the value of the other variable.

Example: we can examine the class of ticket and whether a person survived the Titanic:

Page 18: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 18

Contingency Tables (cont.)

The margins of the table, both on the right and on the bottom, give totals and the frequency distributions for each of the variables.

Each frequency distribution is called a marginal distribution of its respective variable.

Ex. The marginal distribution of Survival is:

Class FrequencyAlive 711Dead 1490Total 2201

Survival

Page 19: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 19

Contingency Tables (cont.)

Each cell of the table gives the count for a combination of values of the two values.

Page 20: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 20

Contingency Table (cont.)

How many crew members died when the Titanic sunk?

673 What percentage of passengers were crew

members and died when the Titanic sunk?

673 / 2201 ≈ 30.58% How many first class passengers survived?

203

Page 21: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 21

Conditional Distributions

A conditional distribution shows the distribution of one variable for just the individuals who satisfy some condition on another variable.

The conditional distribution of ticket Class, conditional on having survived:

Page 22: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 22

Conditional Distributions (cont.)

The following is the conditional distribution of ticket Class, conditional on having perished:

Page 23: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 23

Conditional Distributions (cont.) The conditional distributions tell us that there is a

difference in class for those who survived and those who perished.

This can be shown with pie charts of the two distributions:

Page 24: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 24

Conditional Distributions (cont.)

Better yet – use side bar charts

Conditional Distributions

0

100

200

300

400

500

600

700

800

First Second Third Crew

Class

Fre

qu

en

cie

s

Alive

Dead

Page 25: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 25

Conditional Distributions (cont.)

The variables would be considered independent when the distribution of one variable in a contingency table is the same for all categories of the other variable.

We see that the distribution of Class for the survivors is different from that of the non-survivors.

This leads us to believe that Class and Survival are associated, that they are not independent.

Page 26: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 26

Segmented Bar Charts

A segmented bar chart displays the same information as a pie chart, but in the form of bars instead of circles.

Here is the segmented bar chart for ticket Class by Survival status:

Page 27: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 27

Recall – Conditional distributions

Page 28: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 28

Answer to second question on slide 3-7

Was the percent of survivors higher in first class than in second class?The Who here is restricted to survivors!There were 203 survivors in first class.There were 118 survivors in second class.There were a total of 711 survivors in the ship.So, the percent of survivors in first class is

203 / 711 ≈ 28.6% and 118 / 711 ≈ 16.6% in second class.

Thus, the percent of survivors in first class were higher than in second class.

Page 29: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 29

Recall – Contingency table

Page 30: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 30

Change question:

What percent of passengers in first class survived? What percent of passengers second class survived?

There were 203 survivors in first class.There were 325 first class passengers.So, 203 / 325 ≈ 62.5% of passengers in first class survived.

There were 118 survivors in second class.There were 285 second class passengers.So, 118 / 285 ≈ 41.4% of passengers in second class survived.

Page 31: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 31

What Can Go Wrong?

Don’t violate While some people might like the pie chart on the left

better, it is harder to compare fractions of the whole, which a well-done pie chart does.

Page 32: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 32

What Can Go Wrong? (cont.)

This plot of the percentage of high-school students who engage in specified dangerous behaviors has a problem. Can you see it?

—make sure your display shows what it says it shows.

Page 33: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 33

What Can Go Wrong? (cont.)

Don’t confuse similar sounding percentages. Pay particular attention to the wording of the context.

Don’t forget to look at the variables separately too. Examine the marginal distributions, since it is important to know how many cases are in each category.

Be sure to use enough individuals! Do not make a report like “We found that 66.67% of the rats improved their performance with training. The other rat died.”

Page 34: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 34

What Can Go Wrong? (cont.)

Don’t overstate your case—don’t claim something you can’t.

Page 35: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 35

What have we learned?

We can summarize categorical data by counting the number of cases in each category (expressing these as counts or percents).

We can display the distribution in a bar chart or pie chart.

And, we can examine two-way tables called contingency tables, examining marginal and/or conditional distributions of the variables.

If conditional distributions of one variable are the same for every category of the other, the variables are independent.

Page 36: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 36

Exercise 3.25 - SeniorsPrior to graduation, a high school class was surveyed about its plans. The following table displays the results for white and minority students (included African American, Asian, Hispanic and Native American)

Plans White Minority4-year college 198 442-year college 36 6

Military 4 1Employment 14 3

Other 16 3

Page 37: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 37

Exercise 3.25 (cont.)

What percent of the graduates are white? What percent of the graduates are planning to

attend a 2-year college? What percent of the graduates are white and

planning to attend a 2-year college? What percent of the white graduates are

planning to attend a 2-year college? What percent of the graduates planning to

attend a 2-year college are white?

Page 38: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 38

Answers Exercise 3.25

First, get a table with marginal totals:

Plans White Minority Total4-year college 198 44 2422-year college 36 6 42

Military 4 1 5Employment 14 3 17

Other 16 3 19Total 268 57 325

Page 39: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 39

Answer Exercise 3.25 (cont.)

What percent of the graduates are white?

There are white graduates and total graduates.

≈ 82.5% of the graduates are white.

Page 40: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 40

Answer Exercise 3.25 (cont.)

What percent of the graduates are planning to attend a 2-year college?

There are 42 graduates

42 / 325 ≈ of the graduates are planning to attend a 2-year college.

Page 41: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 41

Answer Exercise 3.25 (cont.)

What percent of the graduates are white and planning to attend a 2-year college?

white graduates are planning to attend 2-year colleges.

36 / 325 ≈ of graduates are white and planning to attend a 2-year college.

Page 42: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 42

Answer Exercise 3.25 (cont.)

What percent of the white graduates are planning to attend a 2-year college?

white graduates are planning to attend 2-year colleges.

There are white graduates.

≈ 13.4% of white graduates are planning to attend a 2-year college.

Page 43: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 43

Answer Exercise 3.25 (cont.)

What percent of the graduates planning to attend a 2-year college are white?

graduates are planning to attend 2-year colleges.

white graduates are planning to attend 2-year colleges.

≈ 85.7% of the graduates planning to attend a 2-year college are white.

Page 44: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 44

Exercise 3.27 – Seniors

Find the conditional distributions (percentages) of plans for the white students.

Find the conditional distributions (percentages) of plans for the minority students.

Create a graph comparing the plans of white and minority students

Do you see any important differences in the post-graduation plans of white and minority students? Write a brief summary of what these data show, including comparisons of conditional distributions

Page 45: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 45

Answer Exercise 3.27 – Seniors

Find the conditional distributions (percentages) of plans for the white students.

Plans White Plans White4-year college 4-year college2-year college 2-year college

Military 4 Military 1.5%Employment 14 Employment 5.2%

Other 16 Other 6.0%Total 268 Total 100.0%

Page 46: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 46

Answer Exercise 3.27 – Seniors (cont.)

Find the conditional distributions (percentages) of plans for the minority students.

Plans Minority Plans Minority4-year college 4-year college2-year college 2-year college

Military 1 Military 1.8%Employment 3 Employment 5.3%

Other 3 Other 5.3%Total 57 Total 100.0%

Page 47: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 47

Answer Exercise 3.27 – Seniors (cont.)

Create a graph comparing the plans of white and minority students.

Plans White Minority4-year college 73.9% 77.2%2-year college 13.4% 10.5%

Military 1.5% 1.8%Employment 5.2% 5.3%

Other 6.0% 5.3%

Page 48: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 48

Answer Exercise 3.27 – Seniors (cont.)

Segmented bar chart:

0%

20%

40%

60%

80%

100%

White Minority

Other

Employment

Military

2-year college

4-year college

Page 49: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 49

Alternatively – side bar charts:

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

4-year college

2-year college

Military

EmploymentOther

White

Minority

Page 50: Chapter 3 Displaying and Describing Categorical Data

Slide 3- 50

Answer Exercise 3.27 – Seniors (cont.)

Do you see any important differences in the post-graduation plans of white and minority students? Write a brief summary of what these data show, including comparisons of conditional distributions

-

- Caution should be used with the percentages for Minority graduates, because the total is so small – each graduate is almost 2%

- These graphs show that race and plans for graduation are . There is little evidence of an association

between race and plans for graduation.