+ chapter 2: describing data using graphs and tables lecture powerpoint slides discovering...

37
+ Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

Upload: marybeth-kelley

Post on 26-Dec-2015

242 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+

Chapter 2:Describing Data Using Graphs and Tables

Lecture PowerPoint Slides

Discovering Statistics

2nd Edition Daniel T. Larose

Page 2: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ Chapter 2 Overview

2.1 Graphs and Tables for Categorical Data

2.2 Graphs and Tables for Quantitative Data

2.3 Further Graphs and Tables for Quantitative Data

2.4 Graphical Misrepresentations of Data

2

Page 3: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ The Big Picture

Where we are coming from and where we are headed…

In Chapter 1 we learned the basic concepts of statistics, such as population, sample, and types of variables, along with methods of collecting data.

In Chapter 2 we learn about graphs and tables for summarizing qualitative and quantitative data, and we examine how to prevent our graphics from being misleading.

In Chapter 3, we will learn how to describe a data set using numerical measures like statistics rather than graphs and tables.

3

Page 4: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ 2.1: Graphs and Tables for Categorical Data

Objectives:

Construct and interpret a frequency distribution and a relative frequency distribution for qualitative data.

Construct and interpret bar graphs and Pareto charts.

Construct and interpret pie charts.

Construct crosstabulations to describe the relationship between two variables.

Construct a clustered bar graph to describe the relationship between two variables.

4

Page 5: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

5

Frequency Distributions

Data sets are not always clear. We need ways to summarize the values in a data set.

The frequency, or count, of a category refers to the number of observations in each category. A frequency distribution for a qualitative variable is a listing of all the values (e.g., categories) that the variable can take, together with the frequencies for each value.

Page 6: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

6

Relative Frequency DistributionsSuppose you don’t know the size of the sample in the survey. Comparing the frequency to the total sample size gives us the relative frequency.

The relative frequency of a particular category of a qualitative variable is its frequency divided by the sample size. A relative frequency distribution for a qualitative variable is a listing of all values that the variable can take, together with the relative frequencies for each value.

Page 7: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

7

Bar Graphs (Bar Charts)

Frequency distributions and relative frequency distributions are tabular. The graphical equivalent of these distributions is called a bar graph.

A bar graph (or bar chart) is used to represent the frequencies or relative frequencies for categorical data. It is constructed as follows.1.On the horizontal axis, provide a label for each category.2.Draw rectangles (bars) of equal width for each category. The height of each rectangle represents the frequency or relative frequency for that category. Ensure that the bars are not touching each other.

Page 8: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

8

Pareto Charts

The bars in a bar graph may be presented horizontally or vertically.

A Pareto chart is a bar graph in which the rectangles are presented in decreasing order from left to right.

Page 9: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

9

Pie Charts

Pie charts are a common graphical device for displaying the relative frequencies of a categorical variable

A pie chart is a circle divided into sections, with each section representing a particular category. The size of the section is proportional to the relative frequency of the category.

Page 10: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

10

Crosstabulations

Crosstabulation is a tabular method for simultaneously summarizing the data for two categorical variables.

Steps for Constructing a Crosstabulation1.Put the categories of one variable at the top of each column, and the categories of the other variable at the beginning of each row.2.For each row and column combination, enter the number of observations that fall in the two categories.3.The bottom of the table gives the column totals, and the right-hand column gives the row totals.

Crosstabulations are also known as two-way tables or contingency tables.

Emotion

Gender Sadness Fear Anger Disbelief Vulnerability Not sure Total

Female 94 21 87 80 28 4 314

Male 56 16 141 50 36 5 304

Total 150 37 228 130 64 9 618

Page 11: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

11

Clustered Bar GraphsClustered bar graphs are useful for comparing two categorical variables and are often used in conjunction with crosstabulations.

Emotion

Gender Sadness Fear Anger Disbelief Vulnerability Not sure Total

Female 94 21 87 80 28 4 314

Male 56 16 141 50 36 5 304

Total 150 37 228 130 64 9 618

Page 12: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ 2.2: Graphs and Tables for Quantitative Data

Objectives:

Construct and interpret a frequency distribution and a relative frequency distribution for discrete and continuous data.

Use histograms and frequency polygons to summarize quantitative data.

Construct and interpret stem-and-leaf displays and dotplots.

Recognize distribution shape, symmetry, and skewness.

12

Page 13: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

13Frequency Distributions and Relative Frequency DistributionsSection 2.1 introduced tables and graphs for summarizing qualitative data. Most of the data sets we will encounter are quantitative. We can apply frequency and relative frequency distributions to quantitative data just as we did for qualitative data.

Consider Table 2.13 on page 54.

Page 14: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

14

ClassesWe can combine several ages together into “classes,” in order to produce a more concise distribution. Classes represent a range of data values and are used to group the elements in a data set.

Page 15: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

15

Class LimitsWe use the following to construct frequency distributions and histograms.

The lower class limit of a class equals the smallest value within that class.

The upper class limit of a class equals the largest value within that class.

The class width equals the difference between the lower class limits of two successive classes.

The class boundary of two successive classes is found by taking the sum of the upper class limit of a class and the lower class limit of the class to its right, and dividing sum by two.

The lower class boundary of the left-most class equals its upper class boundary minus the class width.

The upper class boundary of the right-most class equals its lower class boundary plus the class width.

To construct a frequency distribution for continuous data:1.Choose the number of classes.2.Determine the class width.3.Find the upper and lower class limits.4.Calculate the class boundaries.5.Find the frequencies of each class.

Page 16: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

16

HistogramsOne example of a graphical summary for quantitative data is a histogram.

A histogram is constructed using rectangles for each class of data.

The heights of the rectangles represent the frequencies or relative frequencies of the class.

The widths of the rectangles represent the class widths of the corresponding distribution.

The class boundaries are placed on the horizontal axis, so that the rectangles are touching each other.

To construct a histogram:

1.Find the class limits and draw the horizontal axis.

2.Determine the frequencies and draw the vertical axis.

3.Draw the rectangles.

Page 17: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

17

HistogramsTwenty management students, in preparation for graduation, took a course to prepare them for a management aptitude test.

A simulated test provided the following scores:

77 89 84 83 80 80 83 82 85 9287 88 87 86 99 93 79 83 81 78

Page 18: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

18

Frequency PolygonsFrequency polygons provide the same information as histograms, but in a slightly different format.

A frequency polygon is constructed as follows:

1.For each class, plot a point at the class midpoint, at a height equal to the frequency for that class.

2.Join each consecutive pair of points with a line segment.

Page 19: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

19

Stem-and-Leaf DisplaysStem-and-leaf displays contain more information than frequency distributions and histograms.

Consider the final-exam scores of 20 psychology students below:75 81 82 70 60 59 94 77 68 9886 68 85 72 70 91 78 86 51 67

Find the leading digits of the numbers. Place these five numbers, called the stems, in a column:

Now consider the ones place of each data value. Place this number, called the leaf, next to its stem.

51

91088750720812656481

56789

Page 20: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

20

DotplotsA simple but effective graphical display is a dotplot. In a dotplot, each data point is represented by a dot above the number line.

Below is a dotplot of the 20 management aptitude test scores.

Dotplots are useful for comparing two variables. Suppose an instructor taught two sections of a management course and gave a simulated MAT exam in each section. The two groups could be compared using dotplots.

Page 21: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

21

Distribution ShapeFrequency distributions are tabular summaries of the set of values that a variable takes.

The distribution of a variable is a table, graph, or formula that identifies the variable values and frequencies for all elements in the data set.

The shape of a distribution is the overall form of a graphical summary, approximated by a smooth curve.

A distribution is symmetric if there is a line (axis of symmetry) that splits the image in half so that one side is the mirror image of the other.

A distribution is skewed if it has a longer “tail” on one side of the image.

Page 22: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

22

Distribution Shape

Symmetric, bell-shaped

Right-skewed

Left-skewed

Page 23: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ 2.3: Further Graphs and Tables for Quantitative DataObjectives:

Build cumulative frequency distributions and cumulative relative frequency distributions.

Create frequency ogives and relative frequency ogives.

Construct and interpret time series graphs.

23

Page 24: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

24

Cumulative Frequency DistributionsSince quantitative data can be put in ascending order, we can keep track of the accumulated counts at or below a certain value using a cumulative frequency distribution or cumulative relative frequency distribution.

For a discrete variable, a cumulative frequency distribution shows the total number of observations less than or equal to the category value.

For a continuous variable, a cumulative frequency distribution shows the total number of observations less than or equal to the upper class limit.

A cumulative relative frequency distribution shows the proportion of observations less than or equal to the category value (for a discrete variable) or the proportion of observations less than or equal to the upper class limit (for a continuous variable).

Page 25: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

25

Cumulative Frequency DistributionsThe frequency distribution below displays the total 2007 attendance for 25 Major League Baseball teams. We can use this to construct a cumulative relative frequency distribution.

Page 26: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

26

OgivesHistograms and frequency polygons are the graphical equivalent of frequency distributions. Ogives are the graphical equivalent of cumulative frequency distributions.

An ogive (pronounced “oh jive”) is the graphical equivalent of a cumulative frequency distribution or a cumulative relative frequency distribution.

Like a frequency polygon, an ogive consists of a set of plotted points connected by line segments.

The x coordinates of these points are the upper class limits; the y coordinates are the cumulative frequencies or cumulative relative frequencies.

Page 27: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

27

Time Series GraphsData analysts are often interested in how the value of a variable changes over time. Data that are analyzed with respect to time are called time series data.

A graph of time series data is called a time series plot.

The horizontal axis of a time series plot represents time (e.g., hours, days, months, years).

The values of the time series data are plotted on the vertical axis, and line segments are drawn to connect the points.

Atmospheric CO2 at Mauna Loa

Page 28: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ 2.4: Graphical Misrepresentations of DataObjectives:

Understand what can make a graph misleading, confusing, or deceptive.

In the Information Age, when our world is awash in data, it is important for citizens to understand how graphics may be misleading, confusing, or deceptive. Such an understanding enhances our statistical literacy and makes us less prone to be deceived by misleading graphics.

28

Page 29: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

29

Making a Graph Misleading

Eight Common Methods for Making a Graph Misleading

1.Graphing/selecting an inappropriate statistic.

2.Omitting the zero on the relevant scale.

3.Manipulating the scale.

4.Using two dimensions (area) to emphasize a one-dimensional difference.

5.Careless combination of categories in a bar graph.

6.Inaccuracy in relative lengths of bars in a bar graph.

7.Biased distortion or embellishment.

8.Unclear labeling.

Page 30: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

30

Making a Graph Misleading

Example 2.19 Inappropriate choice of statistic

Page 31: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

31

Making a Graph Misleading

Example 2.20 Omitting the zero

MediaMatters.com reported that CNN.com used a misleading graph to exaggerate the difference between the percentages of Democrats and Republicans who agreed with the Florida court’s decision to remove the feeding tube from Terri Schiavo in 2005.

Page 32: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

32

Making a Graph Misleading

Example 2.21 Manipulating the scale

This figure shows a Minitab relative frequency bar graph of the majors chosen by 25 business school students.

If we wanted to de-emphasize the differences, we could extend the vertical scale up to its maximum, 1.0 = 100%.

Page 33: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

33

Making a Graph Misleading

Example 2.22 Using two dimensions for a one-dimensional difference

This graphic compares the leaders in career points scored in the NBA All-Star Game among players active in 2007.

The height of the players is supposed to represent the total points, but this is not clearly labeled. Points should be indicated using a vertical axis, but there is no vertical axis at all.

Page 34: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

34

Making a Graph Misleading

Example 2.23 Careless combination of categories and biased embellishment

This figure shows a graphic of how often people have observed drivers running red lights.

Page 35: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

35

Making a Graph Misleading

Example 2.24 Inaccuracy in relative lengths of bars and unclear labeling

This figure is a horizontal bar graph of the three teams with the most World Series victories in baseball history.

Note that 127 is more than twice as many as 52, and so the Yankees’ bar should be more than twice as long as the Cardinals’ bar, which it is not.

Page 36: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

36

Making a Graph Misleading

Example 2.25 Presenting the same data set as symmetric and skewed

The table below displays scores on the TIMSS Science test, administered to eighth-grade students in different countries.

Page 37: + Chapter 2: Describing Data Using Graphs and Tables Lecture PowerPoint Slides Discovering Statistics 2nd Edition Daniel T. Larose

+ Chapter 2 Overview

2.1 Graphs and Tables for Categorical Data

2.2 Graphs and Tables for Quantitative Data

2.3 Further Graphs and Tables for Quantitative Data

2.4 Graphical Misrepresentations of Data

37