math 7 notes – unit 07 collecting, displaying and analyzing...

30
Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 1 of 30 Revised 2013 CCSS Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Data Syllabus Objective: (5.1) The student will formulate questions that guide the collection of data. In order to achieve this objective, students must first understand what a “statistical question” is and what it is not. Simply put, a statistical question is one that can be answered (in a variety of ways) numerically. A statistical question anticipates an answer that varies from one individual to another; in doing so, the responses to a statistical question result in a set of data. Data are the numbers produced in response to a statistical question. If answers to a statistical question do not predict variability, then the question is not statistical. For example, a student asking themselves, “How old am I?” is not a statistical question; the answer is predictable. Example 1: Students should begin by simply “recognizing” statistical questions. Teachers should present a group of both statistical and non-statistical questions to students. Student teams or cooperative groups are allowed time to decide whether each question is statistical or non- statistical. Then team responses should be shared classroom wide. Questions - Statistical or not? How many pets do each of your teachers own? How old is the oldest member of a household? What is your classmate’s favorite flavor of ice cream? How many times does a sixth grader eat (on average) each day? What lunch item is served every day on “Pizza Day”? Questions a, b, and d are statistical; they can be responded to numerically and the questions will have varied responses. Questions c and e are non-statistical questions. Question C is not a numerically statistical question because students cannot respond numerically to the question. Question E is not numerically statistical because the answer would be, predictably, pizza. Additionally, the answer is not numerical. Example 2: Teachers should provide students with slides of questions (either via a power point or smartboard) to enhance student ability to identify “statistical” questions. Students will respond individually to whether the question is statistical or not with a thumbs up or thumbs down. If the question is statistical, students will be instructed to put their thumb up; if not, then they will put their thumb down. After students have the opportunity to respond, teachers should expand on WHY the questions are either statistical or not. Possible questions: At what age did your classmates begin riding a 2 wheel bicycle?

Upload: lengoc

Post on 17-Jul-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 1 of 30 Revised 2013 CCSS

Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Data

Syllabus Objective: (5.1) The student will formulate questions that guide the collection of data.

In order to achieve this objective, students must first understand what a “statistical question” is and what it is not. Simply put, a statistical question is one that can be answered (in a variety of ways) numerically. A statistical question anticipates an answer that varies from one individual to another; in doing so, the responses to a statistical question result in a set of data. Data are the numbers produced in response to a statistical question. If answers to a statistical question do not predict variability, then the question is not statistical. For example, a student asking themselves, “How old am I?” is not a statistical question; the answer is predictable.

Example 1: Students should begin by simply “recognizing” statistical questions. Teachers should present a group of both statistical and non-statistical questions to students. Student teams or cooperative groups are allowed time to decide whether each question is statistical or non-statistical. Then team responses should be shared classroom wide.

Questions - Statistical or not?

How many pets do each of your teachers own?

How old is the oldest member of a household?

What is your classmate’s favorite flavor of ice cream?

How many times does a sixth grader eat (on average) each day?

What lunch item is served every day on “Pizza Day”?

Questions a, b, and d are statistical; they can be responded to numerically and the questions will have varied responses. Questions c and e are non-statistical questions. Question C is not a numerically statistical question because students cannot respond numerically to the question. Question E is not numerically statistical because the answer would be, predictably, pizza. Additionally, the answer is not numerical.

Example 2: Teachers should provide students with slides of questions (either via a power point or smartboard) to enhance student ability to identify “statistical” questions. Students will respond individually to whether the question is statistical or not with a thumbs up or thumbs down. If the question is statistical, students will be instructed to put their thumb up; if not, then they will put their thumb down. After students have the opportunity to respond, teachers should expand on WHY the questions are either statistical or not.

Possible questions:

At what age did your classmates begin riding a 2 wheel bicycle?

Page 2: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 2 of 30 Revised 2013 CCSS

What are the favorite colors of the students in the class?

Whom do you admire most?

What time do you wake up on school days?

What time does your school begin?

How many pairs of shoes do you currently own?

Which pair of shoes is your favorite?

How many pets do your classmates own?

How many siblings do your peers have?

How much do the students in our math class weigh?

Next, teachers should allow students (or student groups) the opportunity to formulate both statistical and non-statistical questions. Teachers should instruct students (or student teams) to divide their notebook paper into two columns (on labeled “statistical”, on labeled “non-statistical”). Individuals (or teams) should be given time to brainstorm examples of statistical or non-statistical questions. Student examples should be shared orally with the class with explanations provided by the students.

STUDENT REFLECTION: What makes a question numerically “statistical”? Use the word “variability” in your response. Provide an example (with possible responses) to support your view. Syllabus Objective: (5.2) The student will organize data using a variety of displays with and without technology.

Now that students understand what statistical questions are they need to be given opportunities to gather data and make graphical representation of their data. It is necessary to show students how to create the various types of graphical representations as well as their benefits and limitations.

To begin working with data, we must first understand the two types of data we will encounter in this unit - categorical data and numerical data. Categorical data consists of names, labels or other non-numerical values such as movie preferences, types of animals, types of advertising, colors of medals given to winners of sporting events, etc. Categorical data is usually displayed in circle graphs and bar graphs. Numerical data consists of numbers such as weights of animals, lengths of rivers, heights of waterfalls, rainfall in a particular city, etc. Numerical data is displayed in histograms, line graphs, scatter plots, stem-and-leaf plots, and box-and-whisker plots.

Page 3: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 3 of 30 Revised 2013 CCSS

As students begin to work with data displays, it is important that they choose an appropriate display for the data set. Below is a quick summary of each of the displays students will encounter in this unit.

Visual Display Usage

bar graph

to compare categorical data

box-and-whisker plot

to organize numerical data into four groups of approximately equal size

circle graph

to represent categorical data as parts of a whole

histogram to compare frequencies of numerical data that fall in equal intervals. Appropriate for displaying large sets of data or data sets with a large range. Strength is that it allows data to be manipulated by varying the intervals; gives an overall picture of the data by intervals without highlighting specific pieces of data within the set.

line graph

to display numerical data that change over time

scatter plot

to see trends in paired numerical data

6 0 3 5 9 7 4 8 Key 4 2 = 42

stem-and-leaf-plot

to organize numerical data based on their digits

Interval Tally Frequency 0 - 9 15

10 - 19 7

20 - 29 1

30 - 39 6

frequency table to organize numerical data according to the number of times the item occurs

0 5 10

2 4 6 8 10

2

4

6

8

10

0 x

y

0

5

10

Test Scores

Page 4: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 4 of 30 Revised 2013 CCSS

X X X X X X X X X X X X X 0 1 2 3 4 5 6 7 8

dot plot (line plot)

To display numerical data. Good to use for small to moderate sets of data with small to moderate range. Strength is that it is easy to create, highlights the distribution including clusters, gaps, and outliers

Bar Graph

The purpose of a bar graph is to display and compare categorical data. The bars all have the same width, but the height of each bar is proportional to the size of the data it represents.

Using the data below about the population of California from 1930 to 1970, let’s construct a bar graph. We will begin by selecting a data unit of convenient size to represent the population. Since the numbers are quite large, I choose to use a scale from 0 to 25 million, in increments of five million for my vertical scale. Looking at my data I will have to estimate the length of my bars by rounding/approximating the data.

The years will be written along the horizontal axis, or bottom of our graph, so I will need to determine the width of each bar and how much spacing I will leave before and after each bar. I

may decide to make each space one unit wide and each bar two units wide, or the spacing might be equal. You may choose.

I need to label the horizontal axis, vertical axis, and create a title for my graph. Once my basic graph is set up on my paper, I can then fill in the population for each year. Looking at the table, the population for 1930 is 5,677,000 so I will make my bar extend just a bit beyond the 5 million mark. For 1940, I will make my bar extend to about the 7 million mark. Continue filling in the lengths of each bar for the data given in the table. The completed graph is shown on the next page.

0

5000000

10000000

15000000

20000000

25000000

1 2 3 4 5

Num

ber o

f peo

ple

Year

California Populations

Year Population of California 1930 5,677,000 1940 6,907,000 1950 10,586,000 1960 15,717,000 1970 19,953,000

1930 1940 1950 1960 1970

Page 5: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 5 of 30 Revised 2013 CCSS

The bar graph above is called a vertical bar graph. The graph enables you to make direct comparisons of the data at a glance. It is easier to determine the 10 year period in which the population had the greatest increase using the bar graph rather than from the table or a paragraph.

Below is an example of a horizontal bar graph. The direction of the bars is your choice.

Notice that from both graphs you can estimate the data, but you cannot read the data exactly.

0

5000000

10000000

15000000

20000000

25000000

1 2 3 4 5

Num

ber o

f peo

ple

Year

California Populations

0 1000 2000 3000 4000 5000 6000

carbon dioxide

air

alcohol

water

Feet per Second

Subs

tanc

es

Speed of Sound at Zero Degrees Centigrade

1930 1940 1950 1960 1970

Page 6: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 6 of 30 Revised 2013 CCSS

Box-and-Whisker Plot

Note that box-and-whisker plots are tested in CCSD but not at the state level on the CRT’s for 7th grade.

The purpose of a box-and-whisker plot is to organize numerical data into four groups of approximately equal size. Let’s take a look at what might be a way of giving notes to your students to help them learn the steps for creating a box-and-whisker plot. As you demonstrate the example in the third column, students are working with you taking the notes. Column 1 allows for guided, group or independent practice once they have an idea how to make a box-and-whisker plot.

Making a Box & Whisker Plot

You try:

10 12 8 14 16 16 11 13 11 15 8

Steps Ex: 5 10 7 9 8 6 11

1. Arrange data in increasing order (minimum value & maximum value are the endpoints).

5 6 7 8 9 10 11

2. Find median of the entire list (median value).

5 6 7 8 9 10 11

a. If there is a number in the list that is the middle term, circle it and draw a line thru it. (median)

8 is the median

5 6 7 8 9 10 11

b. If there is not a number that is in the middle, draw a line between the two numbers. (Median is the number halfway between the two numbers.)

Does not apply to this problem

Page 7: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 7 of 30 Revised 2013 CCSS

3. Look at the bottom half of the numbers. Find the median of the bottom half of numbers (lower quartile, same as #2 above).

5 6 7

4. Look at the top half of the numbers. Find the median of the upper half of numbers (upper quartile, same as #2 above).

9 10 11

5. Draw a number line that will cover the range of data (evenly spaced marks).

6. Slightly above the number line place dots at the following points: minimum, lower quartile, median, upper quartile, and maximum.

7. Draw a box with side borders being the lower and upper quartiles. Draw two lines, one from each side of the box, connecting the minimum point on one side and the maximum point on the other side. Draw a vertical line at the median point from the top to the bottom of the box.

The data in the box represents the interquartile range – IQR, the average, the middle 50%.

4 6 8 14 16 10 12

minimum maximum

median lower quartile upper quartile

4 6 8 14 16 10 12

4 6 8 14 16 10 12

4 6 8 14 16 10 12

Page 8: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 8 of 30 Revised 2013 CCSS

The whisker on the left represents the bottom quartile, the bottom 25%; the whisker on the right represents the top 25%.

The difference between the upper and lower quartiles is called the “interquartile range” (IQR).

A statistic useful for identifying extremely large or small values of data is called an “outlier”. An outlier is commonly defined as any value of the data that lies more than 1.5 IQR units below the lower quartile or more than 1.5 IQR units above the upper quartile.

As we get more in depth with these plots we will find that an extreme value is commonly defined as any value of the data that lies more than 6 IQR units below the lower quartile or more than 6 IQR units above the upper quartile.

In our example the lower quartile was at 6, the upper at 10.

Using that the IQR =10 6 4− = . Multiplying that by 1.5, we have ( ) ( )1.5 4 6=

Therefore, any score below 6 − 6 = 0 is an outlier, as is any score above 10 6 16+ = . There are no points below 0, so we are OK on the left. There are no points greater than 16, so we are OK on the right.

This would be an ideal place to use technology. Next are the instructions for drawing a box-and-whisker plot on the TI-84. The examples will address outliers.

Entering Data and Drawing a Box-and-Whisker plot on the TI-84

1. STAT 2. EDIT 3. Enter the numbers in List 1 (L1) or List 2 (L2) 4. Return to the home screen 5. STAT PLOT 6. Turn Stat Plot 1 on and select the type of boxplot (modified or

regular) 7. ZOOM 8. ZoomStat (9) 9. GRAPH

modified

regular

Modified

Regular

Page 9: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 9 of 30 Revised 2013 CCSS

The TI-84 graphing calculator may indicate whether a box-and-whisker plot includes outliers. One setting on the graphing calculator gives the regular box-and-whisker plot which uses all numbers, so the furthest outliers are shown as being the endpoints of the whiskers.

Another calculator setting (modified) gives the box-and-whisker plot with the outliers specially marked (in this case, with a simulation of an open dot), and the whiskers going only as far as the highest and lowest values that aren't outliers.

Find the outliers and extreme values, if any, for the following data set, and draw the box-and-whisker plot. Mark any outliers with an asterisk and any extreme values with an open dot.

20, 21, 21, 23, 23, 24, 25, 25, 26, 27, 29, 33, 40

To find the outliers and extreme values, I first have to find the IQR. Since there are thirteen values in the list, the median is the seventh value, so Q2 = 25. The first half of the list is 20, 21, 21, 23, 23, 24, so Q1 = 22; the second half is 25, 26, 27, 29, 33, 40 so Q3 = 28. Then IQR = 28 – 22 = 6.

The outliers will be any values below 22 – 1.5×6 = 22 – 9 = 13 or above 28 + 1.5×6 = 28 + 9 = 37. The extreme values will be those below 22 – 3×6 = 22 – 18 = 4 or above 28 + 3×6 = 28 + 18 = 46

Another example: L2 =21, 23, 24, 25, 29, 33, 49

So I have an outlier at 49 but no extreme values, so I won't have a top whisker because Q3 is also the highest non-outlier, and my plot looks like this:

An interesting reason to study box-and-whisker plots is when we begin to compare two or more plots and examine the data. Following are several examples to highlight this concept.

Page 10: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 10 of 30 Revised 2013 CCSS

Example: Given below are boxplots displaying the annual temperatures for two cities, Seattle and Boston. What information and generalizations can you see in the plots?

Some information you should note during your discussion with students, could include, but not be limited to:

• The median temperature for Seattle and Boston is essentially the same. • Boston’s data is more spread out. • Boston’s range is greater so its temperature fluctuates more than Seattle’s. • Boston’s temperature range is wider than Seattle’s. • Boston has greater high temperatures and lower low temperatures. • More than 25% of the days in Boston the low temperatures are below Seattle’s lowest

temperature. • 1/4 of the days with high temperatures in Boston are higher than Seattle’s highest

temperature.

Example: Look at the three boxplots below. Even without a scale, what can you say about the 3 temperature plots in general?

A.

B.

C.

20 30 40 70 80 50 60

29 36.5 51.5 66.5 74

41 45 52 61 66

Seattle

Boston

Page 11: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 11 of 30 Revised 2013 CCSS

50 60 70 80 90 100 110

x

General discussions might include:

• This city has a lot of cold days (in comparison to the other two). • ¼ of the days the temperatures are very close (in the lower quartile). • The number of days of lowest temperatures appear to have a great range. • The number of days of temperatures in the upper quartile have a great

range.

• This city has a lot of hot days (in comparison to the other two). • The variability of cold temperatures is great. • The variability of high temperatures for ½ the year is small.

• This city has real extremes - outliers. • Without the outliers, this city has the smallest range of temperatures.

Looking back at the boxplots, if you were told one of the graphs represents temperatures in Las Vegas and another represents temperatures in Hawaii which graphs might represent them? Let students explain why they would choose one plot over another. In several Take It To the Mat articles, the issue of box-and-whiskers is addressed. Below is an excerpt from one of the articles. For the complete articles go to www.rpdp.net > Math > Take It To The Mat> HS Edition>Data Analysis and Probability> October 2003, November 2003 or December 2003. The impression of Las Vegas residents is that October 2003 was unusually warm. Half of the days had high temperatures of 89º F or more and fully three-fourths of days were at or above 84º. The coolest high temperature was 63º, but after looking at the raw data that seems like an anomaly when compared to the rest. Was it really that warm in October 2003? How did it compare with the year before? One of the powers of boxplots is to answer questions about comparing distributions. In this case, we want to compare the highs in October 2003 with those from October 2002. The five-number summary for October 2002 is {59, 74, 80, 84, 92}. Parallel boxplots for both Octobers 2002 and 2003 are shown at right. The boxplots clearly show that October 2003 was warmer, on the whole, than October 2002. The median high temperature in October 2003 is 9 degrees warmer than in October 2002. The coolest

High Temperature (deg F)

2003

2002

Plot A

Plot B

Plot C

Page 12: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 12 of 30 Revised 2013 CCSS

50 60 70 80 90 100 110

x

three-fourths of high temperatures in 2002 were below the first quartile in 2003. The middle halves of each data set don’t even overlap! We know that October 2003 was much warmer than October 2002, but how does it compare with what is considered “normal?” (Normal is the average daily high temperature since 1937.) Consider the parallel boxplots at right and draw your own conclusions.

Useful suggestions for ways to connect these vocabulary words and concepts with your students way include:

Box and Whiskers Song By: Karl Spendlove

Sung to the tune of “Oh My Darling, Clementine”

Put in order Find the median

Find the median of the top Find the median of the bottom

Then you draw the whisker plot

Chorus: Box and whiskers Box and whiskers

Put the data to the test Boxes show the middle 50

And the whiskers show the rest.

Norm

2002

High Temperature (deg F)

2003

Page 13: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 13 of 30 Revised 2013 CCSS

Mountain Park = 62 360 93240

o o× ≈

State Beach = 96 360 144240

× ≈

City Zoo = 82 360 123240

× ≈

Park

Beach

Zoo

Student Votes on End-of-Year Trip

Circle Graph

A circle graph is used to represent categorical data for comparing parts of a whole. Each circle graph represents the total of the information you have collected or 100% of the data.

Example: The seventh grade class voted to decide where to have their year-end picnic. The results were as follows: Mountain Park, 62 votes; State Beach, 96 votes; City Zoo, 82 votes.

To begin, we need to know the total number of votes, the fractional part of votes for each choice and the number of degrees of the circle for each of the choices. Once this information is computed, we section off the circle using a protractor.

Putting the finishing touches on our graph, includes labeling each section of the circle graph, coloring the sections and creating a title. The finished product is shown to the right.

Bar graphs and circle graphs are both used to display categorical information. The two graphs below display the same information but look entirely different. Students must

Page 14: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 14 of 30 Revised 2013 CCSS

0

100

200

300

400

500

600

Thur Fri Sat Sun

Num

ber o

f Box

es

Day of the Week

Boxes of Cookies Sold

choose an appropriate data display. If the numerical data (in this case the amount of cookies sold on a particular day of the week) is important, then students should create the bar graph. If the comparison of parts to whole is important, then a circle graph would be the best choice.

Some test questions relate to converting bar graphs to circle graphs, or vice versa. Be sure to teach this skill to your students.

Dot Plots (Line Plots)

Dot plots (as known as line plots) is the most basic type of graph for representing data. It gives students a good visual representation of the shape of the data, where the center might be, and a visual of the data’s variability or spread. However, it is not very useful when there are a lot of data pieces because it can be cumbersome to create.

Dot Plot A shows the writing rubric scores of students based upon “organization”. Dot Plot B shows the writing rubric scores of the same group of 30 students based upon “ideas”.

Thur

Fri

Sat

Sun

Boxes of Cookies Sold

Page 15: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 15 of 30 Revised 2013 CCSS

Dot Plot A – Organization

Showing the two dot plots vertically allows students to make direct comparisons in regard to the center of the graph, its spread, and its overall shape.

Even though students have not been formally introduced to the mathematical definitions of mean, median, mode, and range, informal discussion about the center and spread of the data and picture can be discussed.

Questions teachers might ask include:

Where is the overall center of the data in graph A versus graph B?

How is the shape of the graph different in graph B? What do you think causes this difference?

Where are the scores “clustered” in graph A? Graph B?

In which graph is data more spread out? Support your answer.

Where is the actual middle of the set of data for A and B?

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 1 2 3 4 5

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 1 2 3 4 5

Dot Plot B – Ideas

Page 16: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 16 of 30 Revised 2013 CCSS

Example 1: Students were asked their shoe size. The data was then collected and a dot plot was produced (with guidance from the teacher). An example dot plot is seen below.

Shoe sizes for 6th Graders

Discussion questions:

How many students responded to the question about their shoe size?

How would you describe the “shape” of the graph? What does that shape indicate?

Locate and describe the “center” of the data. That center is like the _______________?

Most students have a shoe size around 6, 7, or 8; the data is “clustered” there. This means that if I were to calculate the measures of central tendency………

The data is not very spread out. Why do you think that is so? What does this tell us about the measures of variability?

STUDENT REFLECTION: What does the shape of the scores have to do with the sameness or differences in their values? Explain. Does the range of the data affect its shape?

Frequency Table and Histogram

The frequency of a data value is the number of times it occurs in a data set. How can you tell the frequency of a data value by looking at a dot plot?

It is sometimes more convenient to show data that has been divided into intervals than to display individual data values. A histogram is a type of bar graph whose bars represent the frequencies of data within intervals.

Example: Make a histogram for the data given: 12, 3, 8, 1, 1, 6, 10, 14, 3, 6, 2, 1, 3, 2, 7

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X 4 5 6 7 8 9 10 11 12

Page 17: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 17 of 30 Revised 2013 CCSS

First, make a frequency table:

Interval Tally Frequency How many data values are in this interval?

1-4 8 5-8 4 9-12 2 13-16 1

A histogram is made up of adjoining vertical rectangles or bars. In a histogram the horizontal axis typically identifies the topic of the graph and the vertical axis describes the frequency of those observations. If we turn a stem-and-leaf graph 90 degrees counter clockwise and made the rectangles as high as the leaf portion, we would have a histogram. A histogram looks like a bar graph, except the rectangles are connected.

A histogram would typically identify what you are talking about on the horizontal axis. The vertical axis describes the frequency of those observations.

One problem you might encounter on a histogram is when data falls on the line that divides two rectangles. In which rectangle do you count the data? Another problem is the width of the rectangles: how wide do you want them?

Both of these problems are easily overcome and we will address this as we construct the graph. Let’s use the following test scores to construct a histogram.

0

1

2

3

4

5

6

7

8

9

13 - 16

Age of Children at the Park

# of

chi

ldre

n

Age Groups

1 - 4 5 – 8 9 - 12

Page 18: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 18 of 30 Revised 2013 CCSS

82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80, 81, 81, 82, 82

To determine the width, first find the range, which is the difference in the largest score and the smallest.

70, 72, 75, 76, 76, 80, 81, 81, 82, 82, 82, 83, 84, 88 and 97

Using the data from the example we have: 97 70 27− =

If you wanted three categories, you divide 27 by three; then each width would be about nine. If you wanted four categories, you’d divide 27 by 4; then the width would be a little bigger than 6. It’s your decision.

That takes care of the width problem. Now what about if something falls on a line that separates the rectangles? Do we count it in the left or right rectangle? Well, we just won’t let that happen. We’ll expand the range by one half—then no score can fall on a line. Don’t you just love how easy that was to take care of?

So, I’m deciding to have four groups, the width is a little more than 6—I’ll say seven. And I’m going to begin at 69.5 rather than 70. That should result in all my data falling within a rectangle.

Let’s see what it looks like.

0

2

4

6

8

69.5-76.4 76.5-83.4 83.5-90.4 90.5-97.4

Test Scores

Page 19: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 19 of 30 Revised 2013 CCSS

-5

0

5

10

15

20

25

30

35

Jan. Feb Mar. Apr. May June July Aug. Sept. Oct Nov. Dec

Aver

age

High

Tem

pera

ture

(in

Celc

ius)

Month

Temperatures in Davenport, Iowa

Line Graph

The purpose of a line graph is to display numerical data that change over time. To create a line graph for the data given in the table below, we evenly space the months along the horizontal axis and place the temperatures along the vertical axis. Since the data ranges from 1− to 30, I determined the increments to be units of five. Next you plot the data for each month and connect the points in order. As usual we label both axes and create a title for the graph.

Temperatures in Davenport, Iowa Month Average

High Temp ( C

) Jan. 1− Feb. 1 Mar. 9 Apr. 17 May 23 June 28 July 30 Aug. 29 Sept. 25 Oct. 19 Nov. 9 Dec. 1

The double line graph (to the right) allows us to quickly make comparisons between the data given for Key West and Juneau. Note that the legend given on the graph helps to identify which information relates to which city.

66

68

70

72

74

1 2 3 4 5 6 7 8

Tem

pera

ture

(in

Fahr

enhe

it)

September

Record Highs in Juneau Compared to Record Lows in Key West

Key West

Juneau

Page 20: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 20 of 30 Revised 2013 CCSS

Scatter Plot

The purpose of a scatter plot is to see trends in paired numerical data. As you examine scatter plots, you may notice that the data points show a trend, or correlation. There are three types of relationships that students need to be able to identify.

Positive correlation Negative correlation No correlation

Let’s look at some data relating the age and diameter of some Red Maple trees. Using the data in the list to the right, we begin setting up a coordinate plane.

I let my x-axis represent the age of the tree. I set my intervals in increments of 5, from 0 to 60 years.

The y-axis will represent the diameter of the tree in increments of 2, from 0 to 12 inches.

Next, plot each piece of data as an ordered pair (x, y) or in this case, (age, diameter).

To complete the graph, students must label the axes and create a title for the graph.

The completed graph is displayed below.

Age Diameter 20 4.8 20 5.5 23 4.2 23 5.1 25 5.8 25 8 30 6.2 31 5.7 32 7.8 32 8.7 35 5.9 35 7.7 40 8.2 40 8.9 40 9.2 44 7 44 8.7 45 7 50 7.8 50 11

In general, as the values of one set of data increase, the other set increase also.

In general, as the values of one set of data increase, the values of the other set decrease.

The values of one set of data are not related to the values of the other set.

Page 21: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 21 of 30 Revised 2013 CCSS

0

2

4

6

8

10

12

0 10 20 30 40 50 60

Diam

eter

(in.

)

Age (Years)

Age and Diameter of Red Maple Trees

In addition to the creation of the graph, students need to analyze the graph. Discussions should include conversations about the type of correlation shown, why at given ages on the graph there are different ‘y-values’, extracting data from the graph, estimating data for trees from years 1 – 19, or beyond age 50, approximating the diameter of a tree at different ages not given in the data. Generalizations may be made and discussed.

In the next graph for example, is there a correlation between the grade a student receives and the number of hours they study? What kind of conclusions or generalizations can be made about the grades students received and the hours they studied for the class?

0

20

40

60

80

100

0 1 2 3 4 5 6 7

Stud

ent G

rade

s

Study Hours

Grades and Hours of Study A

B

C

D

F

Page 22: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 22 of 30 Revised 2013 CCSS

Stem-and-Leaf Plots

The purpose of a stem-and-leaf plot is to organize numerical data based on their digits.

Let’s use the following test scores to construct a stem-and-leaf plot.

82, 97, 70, 72, 83, 75, 76, 84, 76, 88, 80, 81, 81, 82, 82

We first determine how the stems will be defined. In our case, the stem will represent the tens column in the scores, and the leaf will represent the ones column.

When we present our information, it will be in two parts, the stem and the leaf. Let’s say I had this: 5 | 7 4. The way I would read that is by knowing the stem represents fifty, and the leaf has two scores, 7 and 4. Reading that information, I have a 57 and a 54.

Knowing that, let’s arrange our data in a stem-and-leaf plot.

Knowing our lowest score is in the 70’s and the highest is in the 90’s, our stem will consist of 7, 8, and 9. Usually, the smaller stems are placed on top.

You can make the decision for yourself. Another decision you can make is whether or not you put the scores in order in the leaf portion. As you can see, I didn’t.

7 0 2 5 6 6

8 2 3 4 8 0 1 1 2 2

9 7

Notice that leaf part of the graph did not have to be in any particular order. So a person reading this plot would know the scores are 70, 72, 75, 76, 76, 82, 83, 84, 88, 80, 81, 81, 82, 82 and 97.

NOTE: If one plans to use the stem-and-leaf plot to find the median or mode of a data set, it is important to put the information in order from least to greatest.

Page 23: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 23 of 30 Revised 2013 CCSS

Interpreting Data

Syllabus Objective: (5.3) The student will analyze data represented in a variety of formats with and without technology.

Syllabus Objective: (5.4) The student will make predictions from a set of data using interpolation and extrapolation.

Syllabus Objective: (5.5) The student will explain predictions from a set of data using interpolation and extrapolation.

Syllabus Objective: (5.1) The student will formulate questions that guide the collection of data.

When looking at results, you must consider several questions like: Is the data display misleading? Are the sets of data related, or not? Are there any trends in the data? What could have happened if…? What did happen between the lines or data given? Can you make any generalizations about the data given? Are the conclusions supported by the data?

Example: A survey of 300 randomly selected cat owners finds that 120 cat owners prefer Brand C cat food. Predict how many owners in a town of 2000 cat owners prefer Brand C cat food.

The percent of cat owners in the sample that preferred Brand C is

120 40% ; 40% 2000 800300

of cat owners= =

Example 1: Which of the following graphs most accurately depicts the hourly wages earned with respect to time worked?

Discussion: Students should recognize that if one is paid by the hour, then only Graph C could illustrate it. A person would not earn money at zero hours as shown in graph A. Looking at the other graphs, you could rule out B since a person would not earn the same amount for different

Hours Worked

Earn

ings

Hours Worked

Earn

ings

Hours Worked

Earn

ings

C

Hours Worked

Earn

ings

D

A

B

Page 24: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 24 of 30 Revised 2013 CCSS

amounts of hours worked. Graph D shows a person earning less money as the person works more hours.

Example 2: The graphs below show the numbers of baskets made by Player A and Player B during 5 basketball practices. Each player takes 100 practice shots during each practice. According to the graphs, who was more successful at making baskets?

Discussion: In this problem, there will be students who think Player B has scored the most baskets because of the steepness of the linear segments. Some students will think Player A scored the most baskets because his line segments looks consistently higher. They need to look carefully at the vertical scaling and note that the graph representing Player A started at 70 and ended at 80, and the graph representing Player B also started at 70 and ended at 80. What is different is the graph for player A began at 0 and the graph for Player B began at 68. Thus the second graph misleads us at first glance. (Answer is C.)

A. Player A did much better.

B. Player B did much better.

C. Their scores appear to be about the same.

D. More information is needed.

Player A

1 2 3 4 50

20

40

60

80

100

0

Practice Session

# of

Bas

kets

Player B

1 2 3 4 50

20

40

60

80

100

0

Practice Session

80

78

76

74

72

# of

Bas

kets

Page 25: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 25 of 30 Revised 2013 CCSS

14

15

16

17

18

Lynx Cheetah Sphinx Panther

Mile

s pe

r Gal

lon

Car Models

Series1

Example 3: Why is this graph misleading?

The heights of the balls are used to represent the number of spectators. However, the area of the balls distorts the comparison.

The attendance for both sports doubled in the 10 year period; the size of the basketball makes it look like that increase may have been a lot more.

Example 4: Why is this graph misleading?

The two bar graphs shown here picture the advertised gasoline mileage for four cars.

The graph on the right distorts the differences in mileage because its left-hand scale does not start with 0. In reading a graph, always check to see that numbered scales start with 0.

0

5

10

15

20

Mile

s pe

r Gal

lon

Car Models

Series1

0

10

20

30

40

50

0

Season Attendance (in thousands)

1990 2000 1990 2000

# of

spec

tato

rs (t

hous

ands

)

Page 26: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 26 of 30 Revised 2013 CCSS

When you analyze data you are trying to determine: • if there is a correlation between the two sets of data • if the correlation is weak or strong • if the correlation is positive or negative.

Example: I have heard that people who drive red cars get more tickets than others. Is this statement true?

We could analyze two sets of data. We could find the average number of tickets all drivers get and the average number of tickets red-car drivers get. Then we could analyze the data to see if there is a correlation, what kind of correlation there is and how strong the correlation is (or is not). Looking at the data we should see if red-car drivers do or do not get more tickets than anyone else. If red-car drivers get more tickets, do they get many more or only a few more? Plotting data in a scatter plot would be one way to figure out the correlation. The closer the points come to forming a straight, slanted line, the stronger the correlation is. If the line is horizontal or vertical, that means that no matter what happens to one variable, the other never changes. This makes it hard to determine correlation.

Example: The owner of a store wants to cut down on merchandise he carrries in his store. He would begin by determining which products sell best. He could identify if sales for particular products are increasing, decreasing or staying the same. One way to identify trends like this is to plot data in graphs.

Below is an example comparing sales for hardcover and paperback books. What do the trends show? If the owner needed to reduce his stock, should he discontinue buying hardcover or paperback books? Why or why not? What else might he need to know?

0

200

400

600

800

1000

1200

1400

2003 2004 2005 2006 2007 2008 2009

Book

s Sol

d (i

n m

illio

ns)

Year

Books Sold

Hardcover

Paperback

Page 27: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 27 of 30 Revised 2013 CCSS

0

0.5

1

1.5

2

2.5

3

3.5

4

0 1 2 3 4 5 6 7

Tota

l Rai

nfal

l (in

inch

es)

Length of Rain Shower (in hours)

Rainfall During Shower

05

101520253035404550

$0.00 $0.50 $1.00 $1.50 $2.00 $2.50

Num

ber o

f Buy

ers (

out o

f 50)

Price (in dollars)

Survey Results

Sometimes we need to make predictions by extending a graph beyond the range of the data you already have. This is called extrapolation.

Example: Let’s say you create a new candy bar and need to know how much to charge for a candy bar before they are shipped from your manufacturing company. You might survey people to see how many of them would buy it at various prices. You would then use that data to predict how many people would buy that candy bar at various prices, so you can determine a reasonable price point. Suppose you were to survey people and got the results as shown below.

Price # of Buyers $0.75 46 $1.00 43 $1.25 38 $1.50 35 $1.75 30 $2.00 24

Sometimes you use a graph or relation to find an unknown value between data points you already know. This is called interpolation.

Example: Use the data to estimate how many inches of rain had fallen by the end of the

first 122

hours of the

shower.

Page 28: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 28 of 30 Revised 2013 CCSS

As the graph shows, the points are close to a line that shows the number of inches of rainfall

increasing by about 12

an inch per hour. Based on these data, you would expect that about 1.25

inches fell in 122

hours.

Measures of Central Tendency

Syllabus Objective: (5.6) The student will find measures of variability including range, distribution, and possible outcomes.

Statistical averages used to summarize data and represent the “middle” of a data set are called the measures of central tendency.

3 Measures of Central Tendency include: 1. Mean

2. Median 3. Mode

The mean is the one you are probably most familiar with; it’s the one often used in school for grades. To find the mean, you simply add all the values in a set and divide by the number of values. In other words, if you had a 70, 80, and 90 on three tests, you’d add those scores and divide by three. The mean is 80. Your average is 80.

The median, often used in finance, is the middle score when the data is listed in either ascending or descending order. If there is no middle score, then you take the two middle scores, add them and divide by 2 (find the average of the two scores).

Example 1: Find the median of 72, 65, 93, 85, and 55.

Rewriting in order, I have 55, 65, 72, 85 and 93. The middle score is 72, so the median is 72.

Example 2: Find the median of 11, 15, 8, 17, 9, and 14.

Rewriting in order, I have 8, 9, 11, 14, 15, and 17. Since there are 6 pieces of data, the median is the average of the third and fourth. So,

11 14 25 1122 2 2+

= =

112 .2

is the median

Page 29: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 29 of 30 Revised 2013 CCSS

The mode is the piece of information that appears most frequently. You’ve used the mode quite often before. If you have ever described the average weight of a particular population, the average height, the average shoe size, the average shirt size, the average number of points scored in a particular type of game—those are all examples of you using the mode.

Note: a distribution may have no mode, one mode or more than one mode.

Example 1: Find the mode of 55, 64, 64, 76, 78, 81, 81, 81, and 92.

What score appears most often? The mode is 81.

Example 2: Find the mode of 8, 9, 11, 14, 15, and 17.

What score appears most often? None of these, so there is no mode.

Example 3: Find the mode of 17, 15 15, 14, 14, 11.

What score appears most often? Both 15 and 14 appear twice, so the modes are 15 and 14.

An outlier – an extreme value that is much smaller than or much larger than the rest of the data in a data set – can greatly affect the mean. It is important to note “which” measure of central tendency is “most” useful to use.

In addition to the measures of central tendency just discussed, sometimes we need to see if the data we have are close or spread out. Sometimes knowing the range can help you decide whether the differences among your data are important.

The range of a set of data is the difference between the greatest and the least values. Using the data 72, 65, 93, 85, and 55 we might first arrange the data in order from least to greatest : 55, 65, 72, 85 and 93. (Note: We could have chosen to arrange the data from greatest to least.) Such an arrangement is called a distribution of data. To find the range we would subtract 93, the greatest value, minus 55, the least value, to get the range. 93 55 38.− = The range is 38.

Measure Most useful when mean the data are spread fairly evenly median the data set has an outlier mode the data involve a subject in which many data points of one value

are important, such as election results

Page 30: Math 7 Notes – Unit 07 Collecting, Displaying and Analyzing Datarpdp.net/admin/images/uploads/resource_2723.pdf · 2014-08-20 · to see trends in paired numerical data : 6 0 3

Math 7, Unit 07: Collecting, Displaying and Analyzing Data Holt: Chapter 7 Page 30 of 30 Revised 2013 CCSS

Useful suggestions for ways to connect these vocabulary words and concepts with your students may include:

Mode Song

(to the tune of “Row, Row, Row Your Boat”)

Mode, mode, mode the most

Average is the mean,

Median, median, median, median is

always in between.

Median: It might help students connect to this term if you connect the idea of “middle” to the cement abutments in our roadways and on freeways that separate northbound and southbound traffic. Another approach is to point out the height of the letter “d” in the middle of the word.

Range: It might help students connect to this term is you give examples such as: A singer who can sing very high and very low notes is said to have a wide range. A shortstop who can cover a lot of ground has a large range.