syllabus content pedagogy teaching ideas - numeracy · public schools nsw – learning and...

33
Welcome Back Our fourth issue of The Mathematical Bridge focuses on Statistics across Stage 3 and 4 Mathematics. As the strand of Statistics and Probability is quite large and contains a number of different and interrelated concepts, we have chosen to separate the two for the purposes of our newsletter so we can focus on each one and the complexities they involve. Please note that as the continuum of learning develops from Stage 2 into 3 and 4, the investigations undertaken in Chance are recorded and analysed as part of Data in table and graph form. As these concepts develop, more links and connections between the concepts can be seen and explored. Our fifth edition that will be released later in Term 4 will be on Probability specifically. We hope you find these resources useful and we welcome any feedback and/or suggestions. Teaching focus for Stage 3 Data Collection, interpretation and evaluation In Stage 3, ‘Students need to be provided with opportunities to discuss what information can be drawn from various data displays. Advantages and disadvantages of different representations of the same data should be explicitly taught.’ Background Knowledge, mathematics K-10 syllabus. Along with students developing their skills in collecting data and drawing graphs, they need to begin to be critical about data. Students need to be able to make informed choices about the data they collect, communicate their findings through the way in which the data is represented and be able to justify those reasons. Our new syllabus has ‘upped the ante’ in regards to the language used to specifically name and classify data, this provides a clear flow from the way data is handled in Stage 4. Something new for us in primary is the use of dot plots; there are some activities in our newsletter to support the teaching of dot plots. They are not a difficult concept to teach and are an extension of the picture and to some extent the column graph. Students will now be learning about the differences between categorical and numerical data and how the type of data influences the choice of data display. Students learn to compare two sets of categorical data from a two-way table or side-by- side column graphs. Note that sometimes column graphs are referred to as ‘bar’ graphs; this is different to divided bar graphs which represent percentages. Syllabus content Pedagogy Teaching ideas Issue 4, October 2014 PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE OCTOBER 2014

Upload: phamkien

Post on 25-Aug-2019

214 views

Category:

Documents


0 download

TRANSCRIPT

Welcome Back Our fourth issue of The Mathematical Bridge focuses on Statistics across Stage 3 and 4 Mathematics. As the strand of Statistics and Probability is quite large and contains a number of different and interrelated concepts, we have chosen to separate the two for the purposes of our newsletter so we can focus on each one and the complexities they involve. Please note that as the continuum of learning develops from Stage 2 into 3 and 4, the investigations undertaken in Chance are recorded and analysed as part of Data in table and graph form. As these concepts develop, more links and connections between the concepts can be seen and explored. Our fifth edition that will be released later in Term 4 will be on Probability specifically. We hope you find these resources useful and we welcome any feedback and/or suggestions.

Teaching focus for Stage 3 Data Collection, interpretation and evaluation In Stage 3, ‘Students need to be provided with opportunities to discuss what information can be drawn from various data displays. Advantages and disadvantages of different representations of the same data should be explicitly taught.’ Background Knowledge, mathematics K-10 syllabus. Along with students developing their skills in collecting data and drawing graphs, they need to begin to be critical about data. Students need to be able to make informed choices about the data they collect, communicate their findings through the way in which the data is represented and be able to justify those reasons. Our new syllabus has ‘upped the ante’ in regards to the language used to specifically name and classify data, this provides a clear flow from the way data is handled in Stage 4.

Something new for us in primary is the use of dot plots; there are some activities in our newsletter to support the teaching of dot plots. They are not a difficult concept to teach and are an extension of the picture and to some extent the column graph.

Students will now be learning about the differences between categorical and numerical data and how the type of data influences the choice of data display. Students learn to compare two sets of categorical data from a two-way table or side-by-side column graphs.

Note that sometimes column graphs are referred to as ‘bar’ graphs; this is different to divided bar graphs which represent percentages.

Syllabus content Pedagogy Teaching ideas

Issue 4, October 2014

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE OCTOBER 2014

2 The term ‘variable’ also appears in Stage 3 in the new syllabus and is used to describe data. This term is used extensively in Stage 4 and beyond and is useful for us to model to students. It is not in the language section of the syllabus, students can still refer to the information collected as ‘data’. Variable is a very descriptive word and will provide students with an understanding that information can change over time, can be different from trial to trial, and can be dependent on a number of factors. Students can also use these side-by-side column graphs to compare primary (data they have collected themselves) and secondary data (data from another class or group). With the general capabilities embedded in our syllabus content, we can be assured that as we teach Data to our students, they are developing their critical and creative thinking skills. Many of the verbs used to describe how students will learn about Data provide us with the specific skills we are trying to develop in our students:

- identify the relationship - tabulate collected data - recognise the most appropriate type of

representation for data - compare the effectiveness of data - explain which display they are using - describe conclusions - identify sources of bias - identify misleading representations - gives valid reasons - use appropriate methods

One area where these critical and creative thinking skills can be applied is data in the media. The following information is an excerpt from AAMT’s Top Drawer Teachers resource http://topdrawer.aamt.edu.au/Statistics/Misunderstandings/Belief-in-the-media Belief in the media Students are met with claims from various media every day, including Facebook, websites, radio, television and newspapers. Some claims are based on samples of size one, which is a single incidence of some event, or on sensational headlines. One of the general capabilities described in the curriculum is critical and creative thinking. Among the organising elements of this capability are:

- Inquiring – identifying, exploring and clarifying information

- Analysing, synthesising and evaluating information. (Source: ACARA)

These skills are needed for students to engage critically with reports in the media, whether those reports are about current affairs or about commercial products available for purchase.

Related to the general capability for critical thinking is the Framework for critical statistical literacy. This is useful for encountering media claims and includes:

- identifying statistical terminology used - exploring and understanding the use of the

terminology in the context - evaluating and criticising claims made

without proper justification.

There are some fantastic lesson resources that accompany this article, specifically designed for Years 4 - 10. Data is also an application of many mathematical skills and understandings; there are links to Whole Numbers, Addition and Subtraction, Patterns and Algebra, Measurement, Position and Chance. It also has links to other key learning areas such as Science and Technology K-6, where results and data are collected and communicated as part of any investigation or design project. Stage 3 and Stage 4 Teachers should note that divided bar graphs are no longer in the Stage 3 mathematics syllabus and have now moved into the Stage 4 Mathematics Syllabus. However they do appear in the Science and Technology Syllabus in Stage 3 as an application. Cars and Jellybeans Remember, it is wonderful to introduce students to the world of data through collecting information about favourite colours, favourite food and numbers of cars that pass your school. However, we want to transform our school learning into lifelong learning and to make genuine, authentic connections to the real world for our students. Teaching data through another key learning area of interest or data in relation to local school and community issues can be more purposeful. It also teaches students how and why statisticians collect data and that we don’t stop once we have collected the data - it is only the beginning.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

3 Focus for Stage 4: Data Collection and representation The focus in Stage 4 Statistics is for students to: • identify variables as categorical or numerical

(discrete or continuous) data • identify and distinguish between a ‘population’

and a ‘sample’ • investigate techniques for collecting data

and consider their implications and limitations • collect and interpret data from primary and

secondary sources, including surveys • construct and interpret frequency tables,

histograms and polygons • construct and interpret dot plots, stem-and-leaf

plots, divided bar graphs, sector graphs and line graphs.

What is a variable? A variable is something measureable or observable that is expected to change either over time or between individual observations, according to our syllabus glossary. Examples of variables are hair colour, height, temperature or country of birth. Variables are categorised into two types: numerical data and categorical data. A categorical variable is a variable whose values are categories, e.g. Blood

group is a categorical variable, with possible values A, B, AB, O or construction type of a house with values such as brick, concrete, timber, steel. Categories may also have numerical labels such as postcodes, where there is no numerical significance, e.g. 2219, 2010, 2218 or data collected on a like scale (Likert-type scale). Numerical variables are variables whose values are numbers. A discrete numerical variable is a variable each of whose possible values is separated from the next by a defined gap. Continuous numerical variables are a measurement, e.g. height, weight, temperature. Discrete numerical variables are a count or distinct whole values, e.g. number of children, number of runs, shoe size or number of cars.

Numerical Variables Can be sorted into two further categories

Discrete Continuous

Usually a whole number count

Usually a measurement

School population

Height (cm)

Shoe size

Weight (g)

Number of cars

Temperature (0C)

Money

Volume (mL)

Categorical Variables Can be sorted into two further categories

Nominal Ordinal

Named data Adjective that describes the numerical position of

a subject

Gender

Quality good, average, poor

Hair colour

Report grade A, B, C, D, E

Method of travel

Month of birth

Ice cream flavours

Olympic medal colour Gold, silver, bronze

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

4 How is data collected?

Once students have identified variables as numerical or categorical we begin to build an understanding of the various ways we collect data. Three ways of collecting data are identified in the Stage 4 syllabus: collecting data by sample, census or observation.

A sample is part of a population. It is a subset of the population, often randomly selected for the purpose of estimating the value of a characteristic of the population as a whole. A population is the complete set of individuals, object, places etc. that we want information, about and a census is an attempt to collect information about the whole population. For example a randomly selected group of 8 year old children (the sample) might be selected to estimate the incidence of tooth decay in 8 year old children in Australia (the population). We collect data by observation, e.g. the direction travelled by vehicles arriving at an intersection or type of native animals in a local area. We also collect data by a census or sample, e.g. a census to collect data about income or education of Australians and a sample for TV ratings. Students are encouraged to discuss the practicalities of collecting data through a census compared to a sample, including limitations due to population size, e.g. in countries such as China and India, a census is conducted only once per decade. Investigating techniques for collecting data and considering their implications and limitations Next students explore the practicalities and implications of obtaining data through sampling. Students collect data using a random process, e.g. numbers from a page in a phone book, or from a random number generator, and identify issues that may make it difficult to obtain representative data from either primary or secondary sources.

Students are encouraged to discuss the constraints that may limit the collection of data or result in unreliable data, e.g. lack of proximity to the location where data could be collected, lack of access to digital technologies, or cultural sensitivities that may influence the results. Students may also investigate and question the selection of data used to support a particular viewpoint, e.g. the selective use of data in product advertising.

Students identify the difference between data collected from primary and secondary sources, e.g. data collected in the classroom compared to data drawn from a media source. The difference between primary and secondary data sources are

can be relayed to students through this short animation (1:04min).

Students explore issues involved in constructing & conducting surveys, such as sample size, bias, type of data and ethics and discuss the effect of different sample sizes. Students describe how a random sample may be selected in order to collect data.

1. Simple random sample – every member of the population has an equal chance of being selected

Example: a lecturer delivers a lecture to 200 people – you need the names of all 200 students, in no particular order then select the number of names you need for the sample. Or you could use a random number generator, and then select all the names with that number on the list.

2. Stratified random sample - this is used when the population contains different characteristics.

Example: there are 400 students in a school 100 are girls and 300 are boys, using a random sample you may include all the girls and not enough boys, the sample size in this case will not be proportionate to the number of girls and boys in the population. So the information you collect does not accurately represent the population. However, if you divide your sample number into the same population ratio of girls to boys and randomly select the number of girls and boys. In this example 1:3, your sample is 100 students, you need to select ¼ girls and ¾ boys 25 girls and 75 boys in your sample will accurately represent your population.

3. Cluster – Cluster sampling divides your population into groups and a simple random selection of these groups is made. Then survey everyone within the selected group.

Student activities include collecting and interpreting information from secondary sources presented as tables, graphs, sporting data, information about different countries etc. Provide students with examples for discussion where they can detect and discuss bias. Design activities where students construct survey questions and record both numerical and categorical data, followed by discussions about ethical issues that may arise.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

5 Data representation

Looking closely at the continuum of learning from Stage 3 to Stage 4 you can see that students are constructing data displays, including tables, column graphs, dot plots and line graphs, appropriate for the data type. Students describe and interpret data presented in tables, column graphs, dot plots and line graphs. They then progress to interpreting two-way tables and side-by-side column graphs which are new to the Stage 3 syllabus. Students are expected to compare a range of data displays to determine the most appropriate display for particular sets of data, as well as interpret and critically evaluate data presented in digital media.

Moving along into Stage 4, students construct and interpret frequency tables, histograms and polygons, dot plots, stem-and-leaf plots, divided bar graphs, sector graphs and line graphs. These need explicit teaching and students will benefit greatly when provided with a checklist of steps to follow for creating different types of graph. Divided bar graphs are used to show how a total is divided into parts. A divided bar graph uses a single bar divided proportionally into sections to represent the parts of a total. The proportion of the rectangle indicates the part of the 'whole' that each variable represents. It is important to know the exact size of the whole rectangle, as each part of the bar represents a fraction of that amount. Label each part with the variable and the percentage. Place an appropriate title for the divided bar graph. Draw a rectangle and divide accordingly. The length of each section should represent the percentages shown. The entire bar should add to 100%.

Colour of Cars

Orange cars Green cars Blue cars

Single Variable Data Analysis

Taking a closer look at the continuum of learning you will see that averages have been moved out of Stage 3 Mathematics and into Stage 4 Mathematics. However, once again in the Science and Technology Stage 3 Syllabus students are required to calculate averages of small sets of data. In Stage 4 students:

Calculate mean, median, mode and range for sets of data

Investigate the effect of outliers on the mean and median

Describe and interpret a variety of data displays using mean, median and range

Calculate and compare summary statistics of different samples drawn from the same population.

An outlier is a data value that appears to stand out from the data set by being unusually high or low. Students are required to investigate the effect of outliers on the mean, median, mode and range by considering a small set of data and calculating each measure, with and without the inclusion of an outlier.

Remember: Outliers lie outside the other values.

The most effective way of identifying outliers in a data set is to graph the data first and then ask students questions about the data with and without the outlier.

Example: 1. Use the table to create a dot plot

60% 30%

10%

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

6 2. What is the mean? What is the mode? What is the median?

3. Now exclude the outlier. What is the mean? What is the mode? What is the median?

4. What effect did the outlier have on the mean, mode and median?

Think, pair, share Pose questions to students for discussion in pairs and have them report back to the class, using reasoning to justifying their response.

Discussion 1

Analysing house prices in a particular suburb, which data would be most useful the mean, mode or median? Why?

Discussion 2

A salesperson ordering shoes for the store, analysing data of shoes purchased, which statistical data analysis would be most useful the mean, mode or median? Why?

Statistics in Sport While keeping a sustained focus on the outcomes from the syllabus, obtaining data for collection, representation and analysis is really up to the teachers. This provides teachers with the opportunity to encourage students to select data that is relevant, interesting and engaging to study. Organising activities where students look up statistics about their favourite Rugby League

team, AFL team, Netball Team, Soccer team or other sporting team will provide students with an interesting context to apply their mathematics. This gives meaning and reason to the results generated when collecting, representing and analysing data. Think about the following:

• There are statisticians at every game for both teams, they sit in the coach’s box and collect the data

• The data is interpreted, displayed and analysed on a professional level for many purposes

• The data is used for the halftime players’ talk in the players sheds and decisions made are based on the statistics collected about player performance and team performance

• The data presents itself in many forms on NRL websites & newspapers

• Each player’s statistics determine how much a player is worth

• Team statistics determine whether a team will make it to the Grand Final

• A multimillion dollar industry is based on player and team performance statistics

• Concrete examples are a great platform for students to apply mathematical skills and develop an understanding of Statistics - think of all the data analysed for the English Premier League

Cristiano Ronaldo ~ Manchester United – by Paolo Camera CC BY 2.0

When planning statistical investigations, it is important to develop students’ knowledge and understanding:

• of the ways in which relevant and sufficient data can be collected as well as implications and limitations

• of what constitute appropriate sources of data, both primary and secondary

• of how data and statistics are used in many aspects of our everyday life

• that data is collected to provide information on many topics of interest and to assist in making decisions regarding important issues, e.g. projects aimed at improving or developing products and services. Users at all levels need to have skills in organising, displaying, collecting, interpreting and analysing data.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

7 Continuum of learning Stages 2, 3 & 4 Measurement and Geometry Strand

The continuum of learning shows the progression of concepts and ideas within the Statistics and Probability Strand.

Stage 2 Stage 3 Stage 4 Data Part 1

• Plan methods for data collection

• Collect data, organise into categories and create displays using lists, tables, picture graphs and simple column graphs (one-to-one correspondence)

• Interpret and compare data displays

Part 2

• Select, trial and refine methods for data collection, including survey questions and recording sheets

• Construct data displays, including tables, and column graphs and picture graphs of many-to-one correspondence

• Evaluate the effectiveness of different displays

Data Part 1

• Collect categorical and numerical data by observation and by survey

• Construct data displays, including tables, column graphs, dot plots and line graphs, appropriate for the data type

• Describe and interpret data presented in tables, column graphs, dot plots and line graphs

Part 2

• Interpret and create two-way tables

• Interpret side-by-side column graphs

• Compare a range of data displays to determine the most appropriate display for particular sets of data

• Interpret and critically evaluate data presented in digital media and elsewhere

Data Collection and Representation

• Identify variables as categorical or numerical (discrete or continuous)

• Identify and distinguish between a ‘population’ and a ‘sample’

• Investigate techniques for collecting data and consider their implications and limitations

• Collect and interpret data from primary and secondary sources, including surveys

• Construct and interpret frequency tables, histograms and polygons

• Construct and interpret dot plots, stem-and-leaf plots, divided bar graphs, sector graphs and line graphs

Single Variable Data Analysis

• Calculate mean, median, mode and range for sets of data

• Investigate the effect of outliers on the mean and median

• Describe and interpret a variety of data displays using mean, median and range

• Calculate and compare summary statistics of different samples drawn from the same population

Chance Part 1

• Identify and describe possible ‘outcomes’ of chance experiments

• Predict and record all possible combinations in a chance situation

• Conduct chance experiments and compare predicted with actual results

Part 2

• Describe possible everyday events and order their chances of occurring

• Identify everyday events where one occurring cannot happen if the other happens

• Identify events where the chance of one occurring will not be affected by the occurrence of the other

Chance Part 1

• List outcomes of chance experiments involving equally likely outcomes

• Represent probabilities using fractions

• Recognise that probabilities range from 0 to 1

Part 2

• Compare observed frequencies in chance experiments with expected frequencies

• Represent probabilities using fractions, decimals and percentages

• Conduct chance experiments with both small and large numbers of trials

Probability Part 1

• Construct sample spaces for single-step experiments with equally likely outcomes

• Find probabilities of events in single-step experiments

• Identify complementary events and use the sum of probabilities to solve problems

Part 2

• Describe events using language of ‘at least’, exclusive ‘or’ (A or B but not both), inclusive ‘or’ (A or B or both) and ‘and’

• Represent events in two-way tables and Venn diagrams and solve related problems

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

8

Stage 4: Teaching Ideas

collects, represents and interprets single sets of data, using appropriate statistical displays MA4-19SP communicates and connects mathematical ideas using appropriate terminology, diagrams and

symbols MA4-1WM recognises and explains mathematical relationships using reasoning MA4-3WM Teaching Strategy: Students identify variables as categorical or numerical (discrete or continuous) Provide students with a range of scenarios to categorise into types of variables. In pairs students discuss and sort variables into categories and provide reasons for their decisions.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

9

Stage 4: Teaching Ideas

collects, represents and interprets single sets of data, using appropriate statistical displays MA4-19SP analyses singe set of data using measures of location and range MA4-20SP communicates and connects mathematical ideas using appropriate terminology, diagrams and

symbols MA4-1WM recognises and explains mathematical relationships using reasoning MA4-3WM

Teaching Strategy: Students calculate mean, median, mode and range for sets of data

1. Use the internet to collect data about your favourite Rugby League team players. List all the data about players’ names, age, weight and height in the table below.

Player Name

Age

Weight

Height

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

10

2. Create a dot plot showing players’ heights. Find the median, mean, mode and range for the sets of data collected.

Players’ Heights

mean

mode

median

range

3. Create a stem and leaf plot showing players’ weights. Find the median, mean, mode and range for the set of data collected.

Players’ Weights

Stem Leaf

8

9

10

11

mean

mode

median

range

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

11

4. Rugby League Casualty Ward Table

Use the following table to answer the questions below.

Player Age Reason Injured In Due Back

Broncos

Mitchell Dodds Knee 2013 Season

Jack Reed Shoulder Round 26 Season

Bulldogs

Jacob Loko Knee 2013 Season

Chase Stanley Shoulder Round 21 Finals

Sam Kasiano Ankle Round 22 Season

Pat O’Hanion Fractured ankle Final Week 1 Indefinite

Panthers

George Jennings Dislocated elbow Pre-Season Season

Isaac John Achilles Round 10 Season

Peter Wallace Knee Round 19 Season

Tyrone Peachey Pec Round 18 Season

Elijah Taylor Knee Round 21 Season

Bryce Cartwright Ankle Round 21 Season

Kevin Kingston Leg fracture Round 24 Indefinite

Latest data can be found at the following website

http://www.nrl.com/News/WeeklyFeatures/CasualtyWard/tabid/10247/Default.aspx

Types of data are shown in the diagram below:

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

12

a. What kind of information does the Casualty Ward table display?

b. There is more than one type of data represented above. What kind of data is represented in the Casualty Ward table? Categorical or numerical, discrete or continuous. Provide reasons for your answer.

c. Explain the difference between categorical data, discrete data and continuous data.

Label the following variables as categorical data, discrete or continuous data. Provide reasons for your answer. • Weight of players __________________________________________________

• Players’ shoe size ___________________________________________________ • Players’ heights _____________________________________________________ • Jersey number______________________________________________________ • Number of games played by each player ________________________________ • Number of injuries per team __________________________________________

d. The table above is not complete use the internet to find the age of the injured players.

Calculate the average age of the injured players, the mode and range of players injured in the three teams listed above.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

13

Stage 4: Teaching Ideas

collects, represents and interprets single sets of data, using appropriate statistical displays MA4-19SP analyses singe set of data using measures of location and range MA4-20SP communicates and connects mathematical ideas using appropriate terminology, diagrams and

symbols MA4-1WM recognises and explains mathematical relationships using reasoning MA4-3WM

Activity: Students work in pairs 1. Each pair will need a 50 gram packet of smarties 2. Empty the packet and identify the most popular colour 3. Complete the frequency table attached to reflect your results 4. Graph results of findings using ChartGizmo (two types of graphs need to be drawn)

Data collection sheet - Frequency Table, count the number of Smarties in each colour, complete the following table and graph your results on Chartgizmo.

Web Essentials: The links below are web tools for drawing graphs and creating posters

ChartGizmo - http://chartgizmo.com Glogster – http://edu.glogster.com

Syllabus content: Constructs divided bar graphs, sector graphs and line graphs, with the use of digital technologies.

Learning across the curriculum: Personal and Social Capability- students work effectively in teams making responsible decisions and establishing positive relationships.

Colour Tally Frequency

Total:

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

14 From findings:

1. Compare findings as a class and discuss 2. Save graphs from ChartGizmo and imbed your graphs onto Glogster to create a poster 3. Your Glogs will be presented to the class

Further activities and discussion questions

1. Type of data: What is the difference between numerical and categorical data? What kind of data did you collect? Categorical or numerical data?

2. Drawing conclusions from data: Analyse the sample data and write three sentences which draw conclusions from the data displayed. You will need to find the mean and mode of your data, e.g. “The Smartie colour with the lowest frequency is yellow” or “The graph shows that most of the Smarties are red”

3. Population data: Collate the class data as a representation of the population, where the team data are samples. Find the total number of Smarties in the class and the number of Smarties of each colour. Each team collects the data from the class for a particular colour and the data is collated in a table on the board. Teams draw a divided bar graph to represent the whole class collection of Smarties or a sector graph.

4. Data collected from primary sources: compare the population data with sample data a. Compare and contrast to see if the sample data represents the actual population data. b. Calculate the percentage of the whole represented by each colour Smartie in the population data

and the sample data. c. Write the percentages on the graph. d. Write three sentences which draw conclusions about the two sets of data.

Data collected from Secondary sources: get the results from another year 7 class and compare your class data with their class data. Write three sentences which draw conclusions about the secondary data source and your primary data

Example: Display population data and sample data in divided bar graphs

Population data: Graph of all the Smartie packets together for the entire class

Sample data: Graph your teams Smartie packet

Red 25%

Orange 40%

Green 5%

Yellow 15%

Green 30%

Orange 30%

Yellow 10%

Red 15%

Blue 15%

Blue 15%

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

15 Data Representation in Stage 4: Summary of data displays

Students: • Construct and compare a range of data displays, including stem-and-leaf plots and dot plots • Compare the strengths and weaknesses of different forms of data display • Identify and explain which graph types are suitable for the type of data considered, e.g. sector

& divided bar graphs are suitable for categorical data, but not for numerical data • Draw conclusions from data displayed in a graph

Frequency Distribution

Use a tally to organise data into a frequency distribution table

Frequency Histogram

Construct and interpret Select and use appropriate scales and labels on horizontal and vertical axes Recognise why a half-column-width space is necessary between the vertical axis and the first column of a histogram

NSW Syllabus MathematicsK-10,BOSTES Frequency Polygon

Construct and interpret Select and use appropriate scales and labels on horizontal and vertical axes

HistFreqPoly.JPG by WolfVanZandt

Dot Plots

Construct and interpret Explain the importance of aligning data points

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

16 Line Graphs

Construct with and without technology Interpret line graphs Select and use appropriate scales and labels on horizontal and vertical axes

Distance Travelled

Days

Sector Graphs

Construct with and without technology Interpret sector graphs Calculate the angle at the centre required for each sector of the sector graph

NSW Syllabus Mathematics K-10,BOSTES Divided Bar Graphs

Construct with and without technology Interpret divided bar graphs Calculate the length of bar required for each section of divided bar graphs Calculate the percentage of the whole represented by categories in a divided bar graph

Stem and leaf plot (two digit stem)

Construct one and two digit stem and leaf plots Interpret stem and leaf plots Explain the importance of ordering and aligning data values

NSW Syllabus Mathematics K-10,BOSTES

km

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

17

Checklists to assist students with unpacking Tables and Graphs

Include a checklist or steps for interpreting and constructing graphs at the top of student worksheets and tasks. This provides students with a strategy for thinking and unpacking any graph or table they come across. With practise and reference to the checklists students will become fluent in their approach to interpreting graphs. Listed below are examples of checklists that can be provided for students.

Strategies for reading and interpreting tables 1. What is the title? 2. What are the headings on each row? 3. What are the headings on each column? 4. Are the values increasing or decreasing? 5. What information can I read from the table?

Strategies for reading and interpreting graphs 1. What is the title? What type of graph is it?

2. Look at the horizontal axis

What is the label on the axis? What is the scale being used? What is the unit of measurement?

3. Look at the vertical axis

What is the label on the axis? What is the scale being used? What is the unit of measurement?

4. How are numbers shown? Thousands, hundreds, fractions, decimals or percentages. What do these

numbers represent?

Strategies for constructing Line graphs 1. Create an appropriate scale on each axis. 2. Label each axis, include the units of measurement. 3. Chose an appropriate title for the graph. 4. Plot each coordinate carefully on the graph. 5. Join each point with a straight continuous line. 6. Always use a ruler when constructing line graphs and be precise

Strategies for constructing Column graphs 1. Columns are equally spaced apart. 2. Columns may be vertical or horizontal. 3. The graph starts one column width into the horizontal axis. 4. Each column has the same width. 5. Label both the vertical and horizontal axes. 6. Place the appropriate scale on both axes. 7. Label the graph with an appropriate title.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

18 Strategies for constructing Sector graphs A data display that uses a circle divided proportionally into sectors to represent the parts of a total 1. Use a compass to draw a circle. 2. Draw a radius – a line from the centre of the circle to the circumference as a reference line for measuring and creating each sector. 3. Determine the size of each sector. If the size is given as a percentage, multiply the percentage by 3600 this will give the required angle of each sector which can be measured with a protractor on the radius line. For example: A sector representing 50%, 50/100 x 360 = 1800 A sector representing 25%, 25/100 x 360 = 900 A sector representing 10%, 10/100 x 360 = 360

If the size is given as a number out of a total number, convert to a fraction and multiply by 3600 to get the angle of each sector. For example: A sector representing 4/20, 4/20 x 360 = 720 A sector representing 6/20, 6/20 x 360 = 1080 A sector representing 10/20, 10/20 x 360 = 1800 4. To ensure you have not made a mistake check that all the sectors add to 3600 5. Use a different colour to represent each sector, label each sector or create a key to represent each sector. 6. Place a title at the top of the sector graph.

NAPLAN Strategies 2013

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

19 Strategies for constructing Frequency Histograms and Frequency Polygons

A frequency histogram is a frequency distribution that uses vertical columns with no gaps between them to represent the frequencies of scores. The line that joins the midpoints of the tops of the columns is called the Frequency Polygon. 1. Columns are vertical and adjacent to each other with no gaps between them. 2. Column heights show the frequency for each score from the table. 3. The graph starts half a column width into the horizontal axis. 4. Each column has the same width. Place the appropriate scale on both axes. 5. Label the vertical axis as the frequency and horizontal axis as the score. 6. Label the graph with an appropriate title.

Score Frequency 1 7 2 10 3 15 4 12 5 6

Frequency Histogram

Understanding Stem and Leaf Plots A back-to-back stem-and-leaf plot is a method for comparing two data distributions by attaching two sets of 'leaves' to the same 'stem' in a stem-and-leaf plot. For example, the stem-and-leaf plot below displays the distribution of pulse rates of 19 students before and after gentle exercise. (Glossary)

A stem-and-leaf plot is a method of organising and displaying numerical data in which each data value is split into two parts, a 'stem' and a 'leaf'. For example, the stem-and-leaf plot below displays the resting pulse rates of 19 students. (Glossary)

Frequency Polygon

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

20 Data Investigation and Interpretation a guide for teachers

The Improving Mathematics Education in Schools (TIMES) Project released this guide for teachers about Data representation and interpretations in Stage 5. There are teaching ideas and investigations for data representation and analysis.

This resource plus many more can be found on SCOOTLE

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

21

Stage 2 Teaching Ideas- Data

These lesson ideas are specifically for Stage 2 and build knowledge that is required for students in Stage 3. You may like to explore these concepts with your Stage 3 students to gain knowledge of their current level of understanding. Strand: Statistics and Probability Substrand: Data Outcomes: WM2-1WM uses appropriate technology terminology to describe, and symbols to represent mathematical ideas WM2-2WM selects and uses appropriate mental or written strategies, or technology, to solve problems WM2-3WM checks the accuracy of a statement and explains the reasoning used MA2-18SP selects appropriate methods to collect data, and constructs, compares, interprets and evaluates data displays, including tables, picture graphs and column graphs Students: Collect data, organise it into categories, and create displays using lists, tables, picture graphs and simple column graphs, with and without the use of digital technologies (ACMSP069)

Identify questions or issues for categorical variables; identify data sources and plan methods of data collection and recording

Collect data and create a list or table to organise the data, eg collect data on the number of each colour of lollies in a packet

Construct vertical and horizontal column graphs and picture graphs that represent data using one-to-one correspondence

The following sequence of learning can be taught in conjunction with Personal Development, Health and Physical Education. Strand: Safe Living Outcome: SLS2.13 discusses how safe practices promote personal wellbeing Activity 1: Collecting data As a group, discuss the various ways students travel to and from school and the safety aspects of each method of travel. Ask students to pose questions about this matter in order to obtain information in relation to the class, e.g.

- What is the most popular method of travel to and from school for students in our class? - What is the least popular method of travel to and from school for students in our class? - What method of travel is most suitable for students in our class? - What method of travel is not utilised at all by students in our class?

Have students predict and create a list of categories to collect data in relation to the method of travel by students in the class. Identify issues for data collection and refine investigations, e.g. ‘What if some members of our class travel by car to school but walk home?’ Students record the various ways students travel to school each day in a table and make a tally to show the number of students and the different ways that they travel to school.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

22 Activity 2: Creating a column graph Ask students to use this data to construct a column graph, using grid paper. Discuss with students:

- How will grid paper assist you in constructing a column graph? - What will you name the horizontal and vertical axes? - What would be an appropriate title for your column graph?

Activity 3: Interpreting and evaluating the graph Ask students, in small groups, to interpret the information in the graph. Ask students questions such as:

- What is the most popular way of travelling to and from school? - What is the least popular way of travelling to and from school? - What is the safest method of travel? Why? - What safety considerations would you need to be aware of if: catching the train; riding a bike;

walking to school? Activity 4: Creating a column graph with Excel Have students use Excel to convert the survey information to a column graph.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

23 Activity 5: Interpreting data Have students describe the information presented in the column graph and make conclusions about the data presented, e.g.

- The car is the most utilised method of travelling to and from school for students in our class. - The bike is the least utilised method of travelling to and from school for students in our class.

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

24

Stage 3 Teaching Ideas - Data

These lesson ideas are specifically for Stage 3 and build knowledge that is required for students in Stage 4. Strand: Statistics and Probability Substrand: Data Outcomes: WM3-1WM describes and represents mathematical situations in a variety of ways using mathematical terminology and some conventions WM3-3WM uses appropriate methods to collect data and constructs, interprets and evaluates data displays, including dot plots, line graphs and two-way tables MA3-18SP uses appropriate methods to collect data and constructs, interprets and evaluates data displays, including dot plots, line graphs and two way tables Students: Collect data, organise it into categories, and create displays using lists, tables, picture graphs and simple column graphs, with and without the use of digital technologies (ACMSP069)

Pose questions and collect categorical or numerical data by observation or survey Construct displays, including column graphs, dot plots and tables, appropriate for data type, with or

without the use of digital technologies Describe and interpret different data sets in context

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

25

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

26

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

27

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

28

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

29

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

30

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

31

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

32

Links: SMART Teaching Strategies: Data, Data Collection and Representation, Single Variable Data Analysis http://www.schools.nsw.edu.au/learning/7-12assessments/naplan/teachstrategies/yr2014/index.php?id=ns_data_s3b_14 NRICH – enriching mathematics http://nrich.maths.org/public/search.php?search=data Further reading: Teaching Data – Stage 3: Dot Plots http://www.tale.edu.au/tale/live/teachers/shared/BC/Teaching-data_Stage-3-dot-plots.pdf The Development of Graph Understanding in Mathematics Curriculum http://www.curriculumsupport.education.nsw.gov.au/primary/mathematics/assets/pdf/dev_graph_undstdmaths.pdf

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

33

Subscription link DEC Mathematics Curriculum network Click on this image to be added to our network list for all newsletters and professional learning information

Scootle

MANSW

GeoGebra Institute Applets and teaching ideas

Further information Learning and Leadership Directorate

Primary Mathematics Advisor [email protected] Primary Mathematics AC Advisor [email protected]

Secondary Mathematics AC Advisor [email protected]

Secondary Mathematics Advisor [email protected]

Level 3, 1 Oxford Street Sydney NSW 2000 9266 8091 Nagla Jebeile 9244 5459 Katherin Cartwright © October 2014 NSW Department of Education and Communities

PUBLIC SCHOOLS NSW – LEARNING AND LEADERSHIP DIRECTORATE ISSUE 4 SEPTEMBER 2014

DATA INVESTIGATION AND INTERPRETATION

10YEAR

The Improving Mathematics Education in Schools (TIMES) Project

A guide for teachers - Year 10 June 2011

STATISTICS AND PROBABILITY Module 8

Data Investigation and Interpretation

(Statistics and Probability : Module 8)

For teachers of Primary and Secondary Mathematics

510

Cover design, Layout design and Typesetting by Claire Ho

The Improving Mathematics Education in Schools (TIMES)

Project 2009‑2011 was funded by the Australian Government

Department of Education, Employment and Workplace

Relations.

The views expressed here are those of the author and do not

necessarily represent the views of the Australian Government

Department of Education, Employment and Workplace Relations.

© The University of Melbourne on behalf of the International

Centre of Excellence for Education in Mathematics (ICE‑EM),

the education division of the Australian Mathematical Sciences

Institute (AMSI), 2010 (except where otherwise indicated). This

work is licensed under the Creative Commons Attribution‑

NonCommercial‑NoDerivs 3.0 Unported License. 2011.

http://creativecommons.org/licenses/by‑nc‑nd/3.0/

Helen MacGillivray

10YEAR

DATA INVESTIGATION AND INTERPRETATION

The Improving Mathematics Education in Schools (TIMES) Project

A guide for teachers - Year 10 June 2011

STATISTICS AND PROBABILITY Module 8

DATA INVESTIGATION AND INTERPRETATION

{4} A guide for teachers

ASSUMED BACKGROUND FROM F-9

It is assumed that in Years F‑9, students have had many learning experiences involving

choosing and identifying questions or issues from everyday life and familiar situations,

planning statistical investigations and collecting or accessing data, and have become

familiar with the concepts of statistical variables and of subjects of a data investigation. It

is assumed that students are now familiar with categorical, count and continuous data,

have had learning experiences in recording, classifying and exploring individual datasets

of each type, using tables and column graphs for categorical data and count data with a

small number of different counts treated as categories, and dotplots, stem‑and‑leaf plots

and histograms for continuous and count data. It is assumed that students are familiar

with the use of frequencies and relative frequencies of categories (for categorical data) or

of counts (for count data) or of intervals of values (for continuous data), and that students

have used and interpreted averages (that is, sample means), medians and ranges of

quantitative (that is, count or continuous) data. Students have used tables and graphs to

explore more than one set of categorical data on the same subjects, investigating data

on pairs of categorical variables. Students have used stem‑and‑leaf plots and histograms

to explore continuous data (and count data with many different values) and categorical

data on the same subjects, comparing features of the continuous data, on the same scale,

across categories.

Through learning experiences in many familiar and everyday contexts, students have

come to recognise the need for data to be obtained randomly in circumstances that are

representative of a more general situation or larger population with respect to the issues

of interest. Students have examined the challenges of obtaining randomly representative

data, emphasizing the importance of clear reporting of how, when and where data are

obtained or collected, and of identifying the issues or questions for which data are desired

to be representative. Throughout the years, students have seen a variety of examples of

collecting data, with Years 8 and 9 explicitly identifying surveys, observational studies and

experimental investigations, and contrasting sampling with taking a census.

In order to understand how to interpret and report information from data, students have

developed some understanding of the effects of sampling variability. Consideration of

such effects has been implicit throughout data investigations in all years with more explicit

focus and allowance for sampling variability in commenting on data, developing in Years 8

and 9.

{5}The Improving Mathematics Education in Schools (TIMES) Project

MOTIVATION

Statistics and statistical thinking have become increasingly important in a society that

relies more and more on information and calls for evidence. Hence the need to develop

statistical skills and thinking across all levels of education has grown and is of core

importance in a century which will place even greater demands on society for statistical

capabilities throughout industry, government and education.

A natural environment for learning statistical thinking is through experiencing the process

of carrying out real statistical data investigations from first thoughts, through planning,

collecting and exploring data, to reporting on its features. Statistical data investigations

also provide ideal conditions for active learning, hands‑on experience and problem‑

solving. No matter how it is described, the elements of the statistical data investigation

process are accessible across all educational levels.

Real statistical data investigations involve a number of components: formulating a problem

so that it can be tackled statistically; planning, collecting, organising and validating data;

exploring and analysing data; and interpreting and presenting information from data in

context. No matter how the statistical data investigative process is described, its elements

provide a practical framework for demonstrating and learning statistical thinking, as well as

experiential learning in which statistical concepts, techniques and tools can be gradually

introduced, developed, applied and extended as students move through schooling.

CONTENT

In this module, in the context of statistical data investigations, we build on the content

of Years F‑9 to extend the focus in Year 9 on comparing quantitative data across the

categories of one or more categorical variables, and to extend the exploration in Year 6 of

association between categorical variables, to exploration of possible relationships between

continuous variables.

Quartiles and boxplots are introduced and used to further develop the learning

experiences in comparing quantitative data across categories of one or more categorical

variables. Boxplots are compared with histograms, and the relative merits of the four types

of plots for quantitative data (dotplots, stem‑and‑leaf plots, histograms and boxplots)

are compared. Comparisons are made with regard to location, spread and shape, with

reference to plots and/or the summary statistics of sample means, medians, quartiles and

ranges, as appropriate.

{6} A guide for teachers

Scatterplots are used to investigate and comment on possible relationships between

continuous variables. Examples include situations involving time, and examples from

digital media illustrate graphical techniques for exploring more complex situations with

social, environmental and health ramifications.

Count data with many different values (usually large values) of counts, may also be

explored using the plots and summary statistics that are used for continuous data,

because of the many different values. For convenience in this module, we will use the

terms continuous data and continuous variable, with the understanding that count data

with many different values of counts may also be treated in the same ways. One example

on such data – on the number of blinks per minute – is included as illustration.

Throughout this module, students build on their understanding of the importance of clear

reporting of how, when and where data are obtained or collected, and of identifying the

issues or questions for which data are desired to be representative. In the direct extension

of Year 9 content, this module makes use of the Year 9 examples of data investigations

initiated, designed, planned and carried out by students. In exploring relationships

amongst quantitative variables, this module uses examples ranging from student data

investigations to issues of international concern and importance.

Throughout F‑10, the examples and new content of modules are developed within the

statistical data investigation process through the following:

• considering initial questions that motivate an investigation;

• identifying issues and planning;

• collecting, handling and checking data;

• exploring and interpreting data in context.

The examples consider situations familiar and accessible to students and build on

situations considered in F‑9.

SUMMARY OF STUDENT DATA INVESTIGATION EXAMPLES.

The following are brief summaries of some data investigations initiated, designed and

undertaken entirely by students, involving a number of variables including one or more

quantitative variables and at least categorical variable. These will be used in the examples

of this module. Most are used in the examples in the Year 9 module, and more details are

provided there, particularly on the details of the planning, practicalities and collecting of the

data. The groups of students involved chose their context and the aspects of it of interest

to them, identified the variables and subjects of the investigation, planned the practicalities

of the data collection to obtain randomly representative data, carried out appropriate pilot

studies and collected their data, then explored and reported on their data.

For each example below, the students were interested in a number of questions and

issues, only some of which are explored in this module.

{7}The Improving Mathematics Education in Schools (TIMES) Project

EXAMPLE A: GOGOGO!

The students in this group were interested in investigating whether speed of approaching

traffic lights tended to be different for green or amber traffic lights and whether this was

affected by driver gender, age or vehicle type, colour or make. They recorded data only

for vehicles that had free approach to the lights – that is, not impeded in any way by other

vehicles. To collect information on speed, they recorded the time in seconds that vehicles

took to pass through a 50 metre section just before the set of lights. They also recorded

gender and (broad) age group of driver, and colour, type and make of vehicle.

In this module, we consider only the time to travel the 50 metre section (in seconds) and

the colour of the lights.

EXAMPLE B: HOW OFTEN DO PEOPLE BLINK?

This group of students decided to conduct a simple survey on opinions on a topic such

as travel, asking questions for one minute. There were four students in the group and they

collected their data in pairs. One member of the pair asked the questions while the other

unobtrusively counted the number of times each subject blinked. The students used the

same questions, and stayed in the same pairs of investigators to collect their data. The

investigators recorded the gender and age of the subject, the number of blinks in the

minute of the survey, whether the questions were asked inside or outside, in the morning

or afternoon, the subject’s eye colour, and whether the subject wore glasses or not. They

also recorded the pair who collected the data for each subject. They discovered during

their exploration of the data that this last variable was important. It happened by accident

more than design, that the group of two boys and two girls decided to collect their data in

same gender pairs – that is, the two girls formed one pair of collectors and the two boys

formed the other pair. In this module, we consider the number of blinks per minute, the

gender of the subject and the gender of the observer pair, but in practice, as with other

examples in these modules, all of the variables are likely to be of interest, and it is likely

that combinations of variables could affect the number of blinks.

EXAMPLE C: OPTICAL ILLUSIONS

There are pictures that can be looked at in two ways. For example, there is a well‑known

father and son optical illusion (see, for example, http://www.moillusions.com/2010/07/

father‑and‑son‑optical‑illusion.html ). The group of students who thought of this topic

were interested not only in which picture people saw first and how long they took in

seeing it, but also whether they were interested in seeing the other picture and whether

they were right or left‑handed. The investigators also recorded each subject’s gender and

age. A brief explanation was given to each subject before showing the picture, namely,

“I’m going to show you a picture that could be seen as a picture of an old man or of a

young man. Tell me as soon as you’ve seen either the old or the young man, and which

one you see.”

In this module, we consider only the variables time to see a picture, which picture was

seen and the gender of the subject.

{8} A guide for teachers

EXAMPLE D: THE FLIGHT OF PAPER PLANES

This student group investigated variables that might affect the distance and the flight time

of different designs and materials of paper aeroplanes. The experiment was conducted

in an enclosed space to minimise the influence of the weather. Three different plane

designs were made using three different types of paper (rice, plain and cartridge), and each

combination was thrown four times by each of four different throwers. For each throw, the

flight time, distance, type of landing (nosedive/glide), position on landing (upright/not) and

whether there had been any obstacles, were all recorded. All flights took place on the same

day in the same location. The order in which the planes were thrown was randomised.

In this module, the flight times, flight distance, and the design and paper type will be considered.

EXAMPLE E: BODY STATISTICS

The students conducting this investigation were interested in a variety of body

measurement data and the person’s ability to perform unique body‑related skills (touching

toes, touch nose with tongue, curl tongue). They took nine different body measurements

as well as recording gender and age and the three body‑related skills. In this module, we

will consider head circumference (measured around eyebrows, in cm), age, shoulder

width (shoulder tip to shoulder tip, in cm) and gender.

EXAMPLE F: REFLEXES

The group conducted an experiment to investigate human reflexes. A ruler was dropped

(from 15.2cm above the hand and by the same group member) on the count of three and

the aim was to catch the ruler as quickly as possible. The subjects forearm was positioned

perpendicular to the body while the thumb was at right angles with the fingers. A green

fluorescent and a clear ruler were used, and each subject was asked to catch each ruler,

once each with each hand (right/left). For each subject, a coin toss randomised both the

order of which the different rulers were dropped and also which hand the subject would

use first. Distances were measured from the bottom of the ruler to the catching position.

For each subject, age, gender, and dominant hand were recorded as well as the result for

each of their “catches”, including if they missed altogether.

QUARTILES AND BOXPLOTS

Students have become familiar with the concept and use of medians of quantitative data

since Year 7. When the data are ordered from smallest to largest, the median of the data is

the “middle” observation, with an equal number of the observations less than it and greater

than it.

For an odd number of observations, the median is the middle observation. For example,

for 51 observations, the median is the 26th observation after the data are ordered from

smallest to largest, because the 26th observation has 25 values on each side of it. Thus for

an odd number of observations, the median is one of the data values, and it has half of the

rest of the observations on each side of it.

{9}The Improving Mathematics Education in Schools (TIMES) Project

For an even number of observations, any value between the middle two has equal

numbers of observations on each side of it, and the convention is that we take the median

as the midpoint of the two central values. Hence for an even number of observations, the

median is not one of the observations and it has half of the observations on each side of it.

If a stem‑and‑leaf plot is readily available, it is easy to obtain the median from it. Below are

two stem‑and‑leaf plots for the data of Example B, of the number of times per minute a

person blinks, one for the 48 females and the other for the 53 males in the dataset.

EXAMPLE B: MEDIAN NUMBER OF BLINKS PER MINUTE FOR FEMALES AND MALES

Number of blinks per minute for 48 females of Example B

Leaf unit = 1.0

0 4

0 78889

1 111224

1 55666667799

2 233333444

2 5789

3 334

3 5789

4 00

4 88

5 3

For 48 observations, the median is the midpoint of the 24th and the 25th (ordered)

observations. From the stem‑and‑leaf, we see that the 24th is 22 and the 25th is 23, so the

median is taken as 22.5. Note that the variable is number of blinks per minute – a count

variable – but we do not round the median to a whole number because it is giving us an

estimate of the number of blinks that females are equally‑likely to blink more or less than.

Number of blinks per minute for 53 males of Example B

Leaf unit = 1.0

0 2

0 56678

1 1133444

1 55566677778899

2 12223344

2 55668

3 012244

3 558

4 24

4 7

5 0

{10} A guide for teachers

For the 53 males, the median is the 27th observation as it has 26 observations of either

side of it. From the stem‑and‑leaf plot, we see that the 27th observation is 19. Note that it

is immaterial that there are two values of 19 in the dataset – 19 is still the 27th observation

whether we approach it from the top or the bottom of this dataset.

The quartiles divide the (ordered) dataset into halves again, so that the quartiles plus

the median divide the dataset into 4 with equal numbers of observations in each

“quarter”. Hence, once the median has divided the data into two, with equal numbers of

observations in each “half”, then the lower quartile can be thought of as the median of

the lower half of the data, and the upper quartile can be thought of as the median of the

upper half.

This is illustrated using the above example.

EXAMPLE B: QUARTILES FOR THE NUMBER OF BLINKS PER MINUTE FOR FEMALES AND MALES

For the 48 females, the group below the median has 24 observations. Hence the median

of this group is taken as the midpoint between the 12th and 13th observation from the

smallest. Looking at the stem‑and‑leaf plot, this is the midpoint of 14 and 15, and so

the lower quartile is 14.5. The group above the median also has 24 observations, from

observation number 25 to observation number 48. Hence the median of this group is

taken as the midpoint between the 12th and 13th observations from the largest. Looking

at the stem‑and‑leaf plot, this is the midpoint between 33 and 29, and hence the upper

quartile is 31.

For the 53 males, the group below the median has 26 observations. Hence the median

of this group is the midpoint between the 13th and 14th observations. From the stem‑

and‑leaf plot, we see this is the midpoint between 14 and 15 and hence is 14.5. The group

above the median also has 26 observations, from observation number 28 to observation

number 53, and the upper quartile is therefore the midpoint between the 13th and 14th

observations from the largest. Looking at the stem‑and‑leaf plot, this is the midpoint

between 30 and 28. Hence the upper quartile is 29.

We have seen that the median provides information on where the data are centred

or located, and that the overall range from minimum to maximum provides some

information on the spread of the data. However, the smallest or largest observation can

sometimes be quite a distance from the bulk of the data, and the overall range could be

misleading with regard to where most of the data are. A measure of spread that is not as

vulnerable to extremes is the inter‑quartile range – the distance between the quartiles. This

gives the range of the middle 50% of the data.

In the above example, for the females, the median number of blinks per min is 22.5 and

the inter‑quartile distance is 16.5. For the males, the median number of blinks is 19 and the

inter‑quartile distance is 14.5. There is little difference between females and males, with the

females having a slightly higher median and being slightly more variable than the males.

{11}The Improving Mathematics Education in Schools (TIMES) Project

The minimum, maximum, median and the two quartiles, are sometimes called the five

number summary. Sometimes the lower quartile is called the first quartile, because it

marks the first quarter of the (ordered) data. The median is then the second quartile

although this term is very seldom used, and the upper quartile is called the third quartile

because it marks three‑quarters of the way through the data from smallest to largest.

These five summary statistics are the key information in a boxplot which is explained via

the diagram below.

maximum

3rd quartile

1st quartile

minimum

median

50

% o

f d

ata

The above diagram is the simplest form of boxplot, but it has a disadvantage in that there

is no information on how far the minimum and the maximum are from the rest of the

data. A version of the boxplot more often used in statistics draws the “whiskers” from the

box to the data points that are within a certain distance from the edges of the box, and

marks the data points that are outside this distance by *’s. The boxplot below illustrates

this for the overall dataset of Example B.

BOXPLOT OF NUMBER OF BLINKS

NO

. OF

BLI

NK

S

10

20

50

60

30

40

d

1.5 d

*

0

Upper quartile

Lower quartile

If the inter‑quartile distance is denoted by d, then the whiskers go out to the last data point

inside the distance 1.5d from the edges of the box. Any data points outside this distance

from the box are marked by *’s.

{12} A guide for teachers

We see in the boxplot of number of blinks that there is only one data point further away

than 1.5 times the inter‑quartile distance from the quartiles, and hence this gives little

further information than the simpler boxplot showing only the five number summary.

However for other datasets, the simpler boxplot may hide information that shows with the

better version of the boxplot.

Note that the axis giving the values of the data is vertical. We will see why in the examples

below – it is for ease of presenting many boxplots on one graph. So in the boxplot above,

the median is approximately 21, and the lower and upper quartiles are approximately 14

and 29.

Note also that the horizontal dimension of the boxplot above has no meaning.

When a number of boxplots are presented on the same graph, this dimension simply

accommodates the number of boxplots.

In the examples in the Year 9 module, it is seen that comparing more than two histograms

on the same scale is not necessarily straightforward, while back to back stem‑and‑leaf plots

can compare only two groups of data at once. Many boxplots can be drawn on the same

graph and hence boxplots provide a convenient way of comparing many groups at once.

Of course there are disadvantages and cautions to go with this quick and easy graphical

comparison of groups of data. Apart from providing only a summary of the data with

much detail omitted, the main caution in using boxplots is because there is no information

on numbers of observations. An associated caution is that boxplots should not be used

for small sets of data. Guidelines are sometimes given, but from the fact that boxplots

essentially divide the data into 4 groups with roughly equal numbers of observations in

each, we can see that 20 or more observations per group is reasonable, and that boxplots

for fewer than 12 observations per group could be misleading.

How do boxplots compare with the other plots used for continuous data? To illustrate, below

is a dotplot and a histogram for the overall number of blinks presented above in a boxplot.

7 14 21 28 35 42 49

NO. OF BLINKS

NO. OF BLINKS

FREQ

UE

NC

Y

5

10

10

20

25

50

15

30

20

400 0

{13}The Improving Mathematics Education in Schools (TIMES) Project

We see that the data are slightly skew to the right, which is shown in the boxplot by the

upper half of the box being slightly longer than the lower half, and the upper whisker

being longer than the lower whisker. Notice that both the dotplot and the histogram

suggest that there may be two groups in these data, but a boxplot cannot suggest this.

Despite the disadvantages of boxplots, we see in the examples in the next section just

how useful they are in comparing continuous data across a number of categories.

USING BOXPLOTS TO COMPARE CONTINUOUS DATA ACROSS CATEGORIES; COMPARISONS WITH HISTOGRAMS AND DOTPLOTS

Data on the continuous variable (or count data with many different values) of some of the

above examples are now explored across one or two of the categorical variables, using

boxplots (on the same scale). Some comparisons with histograms and dotplots are also made.

EXAMPLE A: GOGOGO!

Below are boxplots and histograms on the same scale for the time in seconds to travel the

last 50 metre section and the colour of the lights.

BOXPLOT OF TIME IN SECS

LIGHTS

TIM

E IN

SE

CS

2

3

6

7

GREEN

4

AMBER

5

0

*

*

HISTOGRAM OF TIME IN SECS

TIME IN SECS

FREQ

UE

NC

Y

2

4

10

12

14

16

18

6

8

03 4 75 62

3 4 75 62

{14} A guide for teachers

The boxplots give us an instant comparison, showing that the time of approach to amber

lights is generally less than that of approach to green, with the difference in medians being

about 0.6 sec over 50 metres. The inter‑quartile distances are similar and generally the

variation in times is similar. There are two extreme values for the times to approach green

– one large and one small. These extreme values in the histogram give the impression

of the times to approach green being more variable than the times to approach amber,

but note how the boxplots emphasize that they are extremes and that apart from them,

the variability in times to approach amber and green are not too dissimilar. The times

to approach green are skew to the right; the times to approach amber are slightly

asymmetric but are not particularly skew to right or left.

EXAMPLE C: OPTICAL ILLUSIONS

Below are histograms, dot plots and boxplots on the time to see a picture in secs and

which picture was seen (old or young man).

HISTOGRAM OF TIME TO SEE IN SECS

TIME TO SEE IN SECS

OLD YOUNG

FRE

QU

EN

CY

10

20

30

40

01.6 3.2 8.0 9.6 11.24.8 6.40.0

1.6 3.2 4.8 6.40.0

1.4 2.8 4.2 5.6 7.0 8.4 9.8

TIME TO SEE

YOUNG

OLD

{15}The Improving Mathematics Education in Schools (TIMES) Project

OLD OR YOUNG

TIM

E T

O S

EE

2

4

10

YOUNG

6

OLD

8

0

****

**

We see how the boxplots reflect the dot plots and the histograms. The median times to

see a picture are much the same whether a person sees the old or young man first, but

for those who saw the old man first, the times are very skew to the right, more variable

for the central 50% of subjects’ times, and there are a number who took much longer. For

those who saw the picture of the young man first, the times are only slightly skew to the

right, and even the slowest to see the young man was not an extreme time for those who

saw the old man.

Does there tend to be any difference if the subject is a boy or girl? Below are boxplots

of the time to see a picture, across the combination of which picture was seen first and

whether the subject was a boy or girl.

GENDER

OLD OR YOUNG

TIM

E T

O S

EE

2

4

10

GIRL GIRL

6

BOY BOY

YOUNGOLD

8

0

**

*

**

*

We see that although the overall tendencies noted above apply to both boys and girls,

there was a much greater contrast in the times for boys to see the old man or the young

man than there is for girls. Although the median time to see each picture was about the

same, for the boys who saw the young man first, the times were much less variable and

much less skew to the right than for the boys who saw the old man first.

Note how being able to look at four boxplots on the same scale in just one graph provides

an excellent overview of the comparisons across datasets. However we would need

to check how many are in each group to ensure we do not have boxplots with greatly

uneven numbers of observations. In this example there are at least 40 in each group of a

total of 203 observations.

{16} A guide for teachers

EXAMPLE B: HOW OFTEN DO PEOPLE BLINK?

For this example in the Year 9 module, it was noted that there is little difference between

the male and female subjects, but there seems to be quite a difference in the number of

blinks per minute of subject depending on whether they were interviewed by a male or a

female, remembering that the two pairs of collectors consisted of an interviewer and an

observer and the two pairs were both females or both males. The number of blinks per

minute tended to be generally greater and more variable for the female interviewer than

the male interviewer. Could this be due to the way the interviewer asked the questions or

a difference in response to male and female interviewers?

A question that immediately arises is whether the different combinations of interviewer

and subject genders show any effects. Below are histograms and boxplots of the number

of blinks per minute with the data divided into the 4 groups formed by these different

combinations.

HISTOGRAM OF NUMBER OF BLINKS

FRE

QU

EN

CY

FEMALE, FEMALE OBSERVER

MALE, FEMALE OBSERVER MALE, MALE OBSERVER

NO. OF BLINKS

MALE, MALE OBSERVER

8 24 32 40 48 560 16

8 24 32 40 48 560 16

2.5

5.0

7.5

10.0

0.0

2.5

5.0

7.5

10.0

0.0

BOXPLOTS OF NUMBER OF BLINKS

GENDER

OBSERVER

NO

. OF

BLI

NK

S

10

20

50

60

MALE MALE

MALE OBSERVER

30

FEMALE FEMALE

FEMALE OBSERVER

40

0

**

{17}The Improving Mathematics Education in Schools (TIMES) Project

The boxplots provide us with an instant overview that emphasizes the differences

between female and male observers for female and male subjects. The numbers of blinks

are more variable for both subject genders for female observers, but much more for

female subjects than male subjects. For the male observers, the median and the spread

of the number of blinks are similar for female and male subjects but the female data are

considerably skew to the right, and much more than the male data.

If we are just using the boxplots, we should check the numbers of observations in each

group. There are approximately 25 subjects in each of the four groups, so there are no

problems in using boxplots.

Once again we see the usefulness of boxplots in an overview of comparisons between

more than two groups of continuous data (Note that in this case the data are count data

but with many different values so using plots and graphs designed for continuous data

is appropriate).

EXAMPLE D: THE FLIGHT OF PAPER PLANES

In the Year 9 module, histograms and stem‑and‑leaf plots are considered for the flight

times in seconds for the different designs and for the different paper types. The flight

times for the three designs are considered and the flight times for the three paper

types are considered, and then the flight times for the three paper types are considered

separately for each of the three designs. However in the Year 9 module, this involves

looking at 9 stem‑and‑leaf plots or 9 histograms. It can be quite difficult to compare more

than two histograms, and even when the stem‑and‑leaf plots are done in groups of three

on the same scale, it is difficult to gain an overview of the combined effects of design and

paper type.

Below are boxplots of the flight times split by both design and type of paper. There are 16

observations for each combination of design and paper type, so it is appropriate to use

boxplots and we also know that we have equal numbers of observations in each boxplot.

Note that the reason we have equal numbers of observations in each group is because

this was an experimental investigation, with controls over all the experimental variables

of plane design, paper type and thrower (4 throwers), with each combination of design,

paper type and thrower replicated 4 times in a random order. There were two response

variables observed: flight time and distance of landing point of plane. (Also observed were

the landing position, type of landing and whether there was any interference.)

{18} A guide for teachers

BOXPLOT OF FLIGHT TIME

PAPER

DESIGN

FLIG

HT

TIM

E

1

2

5

6

STINGRAY GLIDERNICK'S PAPER AEROPLANE

3

GENERIC

RICECARTRIDGE PLAIN RICECARTRIDGE PLAIN RICECARTRIDGE PLAIN

4

0

*

****

*

**

Again the boxplots give us a quick overall view of the data. We see the comparisons for

the different paper types within and across designs. For the generic design, the rice paper

has more chance of giving a longer flight time but it also tends to be more variable. For

Nick’s design, the plain paper tends to produce longer flight times but they also are more

variable, and if the rice paper gives good flight times, they tend to be as good as or better

than the plain paper. For the stingray glider, the cartridge paper is best but again it is most

variable, except that if the rice paper works well, it gives exceptional flight times.

Overall the stingray glider and Nick’s design seem to be the best choices, but using

different papers for each (plain for Nick’s, cartridge for stingray). Using rice paper seems

capable of producing exceptional flight times but only occasionally.

SCATTERPLOTS AND EXPLORING RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

In the section above, and in the Year 9 module, we have been using various plots to

explore and compare datasets of continuous data (and of count data with many different

values) across categories of one or more categorical variables. This can also be viewed

as exploring relationships between a continuous variable and one or more categorical

variables. Such comparisons are of interest in many and varied situations and contexts.

Another type of situation that arises many times in applications across all disciplines,

is exploring relationships between quantitative variables. This is often in the context

of exploring if and how a continuous variable varies across one or more quantitative

variables. The plot that is used to explore this is the scatterplot. Although it can be used for

only two quantitative variables at a time, it is very useful for data exploration. The examples

below also show how a categorical variable can be included, and the next section shows

how it can be used dynamically and how other variables can be incorporated in it.

{19}The Improving Mathematics Education in Schools (TIMES) Project

To explore possible relationships between two variables, it tends not to matter which one

is assigned to the y‑axis and which to the x‑axis. However, if interest is in how one variable

varies, or is affected by, another, then the first should be assigned to the y‑axis and the

second to the x‑axis so that we can see what tends to happen to the y‑variable as the

x‑variable changes.

The x‑variable can be any quantitative variable (continuous or count) but the y‑variable

should be a continuous variable (or a count variable taking many different values).

For a dataset consisting of pairs of quantitative values – that is, a dataset consisting of pairs

of values with each pair observed on the same subject – a scatterplot plots points on a

plot with two axes, with the horizontal axis corresponding to the first value of each pair,

and the vertical axis corresponding to the second value of each pair.

EXAMPLE E: BODY STATISTICS

Below is a scatterplot of head circumference (measured around eyebrows, in cm), and

age in years.

SCATTERPLOT OF HEAD VS AGE

AGE

HE

AD

45

50

65

55

60

0 10 4020 5030 60 70

Each dot represents one observation with a pair of values. The first value of the pair is on

the horizontal axis and the second value of the pair is on the vertical axis. For example, the

point highlighted in the above plot corresponds to a person of age 39 years with a head

circumference of 52.5cm.

Looking at the plot, does there seem to be any relationship between head circumference

and age? Obviously there are a few children in this dataset with ages less than 12 and

smaller heads than most of the older subjects, but not by much and there are quite a few

older people with head circumferences as small, and one aged about 35 years with a

smaller head circumference than the child aged 2 years which seems highly unlikely. This

datapoint would have to be checked in case it was a mistake, but there is another subject

aged 22 years with a head circumference of not much more. Perhaps the reliability of the

measures of head circumference need to be checked.

{20} A guide for teachers

In general, what the plot shows is that there is very little relationship between head

circumference and age, but there is a lot of variability! That is, for people of the same

approximate age, there is a lot of variability in their head circumferences.

Below is a scatterplot of head circumference (measured around eyebrows, in cm), and

shoulder width in cm.

SCATTERPLOT OF HEAD VS SHOULDERS

SHOULDERS

HE

AD

45

50

65

55

60

20 25 4030 4535 50 55

We see that people with wide shoulders tend to have bigger heads, but that people with

smaller or medium shoulder widths can have a wide range of head circumferences.

Clearly age should be taken into account and so should gender. We can look at part of

this dataset, restricting to adults for example, but how can we allow for gender?

This scatterplot is repeated below, restricted to people at least 18 years old, and with

different symbols for males and females.

SCATTERPLOT OF HEAD VS SHOULDERS

In this plot different symbols are used for females and males in plotting head

circumference against shoulder width. We see that for males and females, there is very little

relationship between head circumference and shoulder width, and that there appears to be

a group of females with unusually small head circumferences for their shoulder widths.

There is no particular reason why we should plot head circumference on the vertical axis

and shoulder widths on the horizontal axis. Below is the plot of the same data, with the

axes reversed.

SHOULDERS

HE

AD

45

50

65

55

60

35 40 5545 50

FEMALE MALEGENDER

{21}The Improving Mathematics Education in Schools (TIMES) Project

SCATTERPLOT OF SHOULDERS VS HEAD

HEAD

SHO

ULD

ER

S

35

40

55

45

50

45 50 6555 60

FEMALE MALEGENDER

We see that there is a lot of variation in male shoulder widths for similar head sizes,

although there is a slight tendency for larger shoulders to correspond to larger heads.

This tendency is also present in the females, but again we see that there appears to be

two different groups. Perhaps there are different ethnic groups present in the data?

Note that in scatterplots, because we are focussing on possible relationships between the

variables, the horizontal and vertical scales cover the range of the data; they do not need

to start at 0. Forcing the scales of scatterplots to start at 0 would often tend to obscure

information in the data. For example, below is a scatterplot of the head circumferences (in

cm) and the heights (in cm) for the dataset of Example E for ages 11 and above, with the

horizontal and vertical axes starting at 0. This plot is virtually useless for seeing how head

circumference varies with height in this dataset, as all the points are crowded together in

one corner of the plot.

HEIGHT_CM

HE

AD

_C

M

10

0

20

50

70

30

40

60

500 150100 200

{22} A guide for teachers

EXAMPLE F: REFLEXES

A ruler was dropped (from15.2cm above the hand and by the same person for each

subject) on the count of three and the aim for each subject was to catch the ruler as

quickly as possible. The subject’s forearm was positioned perpendicular to the body while

the thumb was at right angles with the fingers. A green fluorescent and a clear ruler were

used, and each subject was asked to catch each ruler, once each with each hand (right/

left). For each subject, a coin toss randomised both the order of which the different rulers

were dropped and also which hand the subject would use first. Distances were measured

in cm from the bottom of the ruler to the catching position. For each subject, age,

gender, and dominant hand were recorded as well as the result for each of their “catches”,

including if they missed altogether.

Below are scatterplots of the reflex distances for the fluorescent and the clear rulers

caught with the right hand, and of the reflex distance for the fluorescent ruler caught with

the right and then the left hand versus age (in years), with different symbols for dominant

hand. Remember that the smaller distances represent faster reflexes.

We see, as expected, that reflexes with the fluorescent and the clear rulers, caught with

the right hand, tend to be related, with people who are slower in catching the clear

ruler also tending to be slower in catching the fluorescent ruler. However, it is possibly

surprising how much variation there is between the two catches, given these are the same

subjects catching with the same hand. Possibly, there is considerable natural variation in

this activity; that is, it is not easy for a person to obtain very similar results each time.

For the catches with the right hand, we see that older people do tend to have slower

reflexes than younger people in general, but that there is enormous variation in the

reflexes of younger people. Not surprisingly, the left‑handed people did not tend to have

the fastest reflexes catching the ruler with the right hand, but what is interesting is that

their reflexes with the right hand did not tend to vary much with age.

SCATTERPLOT OF RIGHT FLUORESCENT VS RIGHT CLEAR

RIGHT CLEAR

RIG

HT

FLU

OR

ESC

EN

T

5

0

10

25

15

20

30

1510 2520 30

{23}The Improving Mathematics Education in Schools (TIMES) Project

SCATTERPLOT OF RIGHT FLUORESCENT VS AGE

AGE

RIG

HT

FLU

OR

ESC

EN

T

5

0

10

25

15

20

30

30 40 5010 20 7060 80 90 100

L RR/L HANDED

The most striking aspect of the scatterplot of the reflexes for catching the fluorescent ruler

with the left hand is the increase in variation for both right‑handed and left‑handed subjects.

SCATTERPLOT OF LEFT FLUORESCENT VS AGE

AGE

LEFT

FLU

OR

ESC

EN

T

5

0

10

25

15

20

30

30 40 5010 20 7060 80 90 100

L RR/L HANDED

{24} A guide for teachers

EXAMPLE G: FISHING

The scatterplot below shows the weights (in gm) and the lengths (in mm) of fish caught

during a weekend fishing expedition (on Stradbroke Island in Queensland).

SCATTERPLOT OF WEIGH_GM VS LENGTH_MM

LENGTH_MM

WE

IGH

T_

GM

200

0

400

1000

600

800

200 250 300100 150 400350 450

We can see that as length increases, weight also tends to increase, and it appears to be

a reasonably linear sort of trend, but what is interesting in this plot, is that there appear to

be a number of subgroups of fish. In these sub‑groups, not only do the ranges of weights

and lengths tend to differ, but it appears that the relationship between weight and length

may differ to at least some extent between these groups.

EXAMPLE H: GUESSING TIME PERIODS

The scatterplot below shows the estimates of 10 seconds for subjects ranging in age, with

their ages given in their decades. Can age group be treated here as a quantitative variable?

Yes, because we can think in terms of measuring ages in decades. Is the scatterplot

useful? In some ways it is, because it is like having 6 dotplots on the same scale on the

same plot that enables an overview of the variation in guesses across the age groups.

The plot does not show any relationship between size of guess and age, although it does

show that the guesses are much more variable for some age groups than others.

SCATTERPLOT OF 10 SEC GUESS VS AGE GROUP

AGE GROUP

10 S

EC

GU

ESS

5.0

0

7.5

15.0

10.0

12.5

3 4 51 2 6

{25}The Improving Mathematics Education in Schools (TIMES) Project

Note that categorical variables cannot and should not be used as either of the variables

in a scatterplot. Giving numerical codes to the categories of a categorical variable does

NOT turn it into a quantitative variable. Equal interval lengths of values of quantitative

variables must represent the same quantity.

DYNAMIC AND CLEVER SCATTERPLOTS IN DIGITAL MEDIA

The examples of scatterplots above illustrate their value in exploring data and the

variety and extent of situations in which they are potentially useful. But the examples

also illustrate that often we have more than two variables whose variation and inter‑

relationships we would like to explore. We see above that a categorical variable can

be included in a scatterplot through using different symbols, but that there are often

situations when we would like to explore the variation and inter‑relationships of more than

two quantitative variables.

The excellent Gapminder resources at http://www.gapminder.org/ provide an amazing

extent of innovative and dynamic plots of data from official international and national

sources, particularly focussed on public health issues. These plots cleverly combine

three continuous and one categorical variable and a fourth variable of time is able to

be included dynamically as the viewer follows the development of the plots over time.

Gapminder users can choose or adjust their variables, capture a plot at a timepoint (a

particular year) or capture a “worm trail” plot that follows the development of two of the

continuous variables for selected categories of the categorical variable.

Another feature of Gapminder that firmly places it in top quality resources is that full

details are available on exactly how data were collected, and any challenges or problems

that exist in the collection of such data.

Some examples of captured plots and screen captures are given below, with comments.

EXAMPLE I: LIFE EXPECTANCY, INCOME PER PERSON, POPULATION, WORLD REGION AND TIME

The plots below show two of the plots in the sequence of plots over years, of life

expectancy and income per person for the countries of the world. The population of the

country (the third continuous variable) is represented by the size of the bubble, and the

region of the world is represented by the colour of the bubble.

We see that there is a relationship, with life expectancy tending to increase as income

per person increases, but it does tend to “plateau” and there is much variation, particularly

amongst the countries with lower incomes per person, and in African countries in 2007.

{26} A guide for teachers

EXAMPLE J: CO2 EMISSIONS, INCOME, POPULATION, WORLD REGION AND YEAR

Below is a screen capture of one of the plots over time of CO2 emissions (in tonnes per

person) versus income per person, for each country, again with bubble size representing

population size and with bubble colour representing a region of the world.

Note that the relationship between CO2 emissions per person and income per person is

curved and appears to have less variation than for life expectancy and income per person,

but the variation increases as the income per person increases.

In the dynamic plots on the Gapminder website, the country’s name appears as the cursor

moves across a bubble. The next example includes the names of some countries.

{28} A guide for teachers

EXAMPLE K: MATHS RESULTS FOR GRADE 8

The plot below is a screen capture of one of the plots over time of 8th grade Maths

results, income per person, with the size of the bubbles representing the relevant

population and the colour of the bubble representing region of the world. The data are

based on an international maths test for children in 4th and 8th grades, from the TIMSS

(Trends in International Mathematics and Science Study).

LINKS ACROSS F-10

From F‑9, students have gradually developed understanding and familiarity with concepts

and usage of the statistical data investigative process, types of data and variables, types of

investigations and some graphical and summary presentations of data appropriate for the

different types of data. Students have planned and carried out data investigations involving

different types of variables and used a variety of graphical and summary presentations of

data to explore and comment on features of data in relation to issues of interest. In Year

6 they have considered questions or issues involving two or more categorical variables,

exploring how data from one categorical variable may be affected by another. In Year 9,

students have extended these concepts and experiences to data investigations involving

at least one quantitative (mostly continuous) and at least one categorical variable, and

used histograms and stem‑and‑leaf plots on the same scale, and the summary statistics

of mean, median and range, to explore and comment on features of quantitative data

{29}The Improving Mathematics Education in Schools (TIMES) Project

across categories of a categorical variable, including some concepts of shape of data.

Year 10 continues this theme, introducing boxplots as another graphical tool for such

comparisons. From this focus on relationships between continuous and categorical

variables, Year 10 then moves to consider using scatterplots to explore relationships

between quantitative (usually continuous) variables, including examples involving also a

categorical variable, and examples available in digital media that follow relationships over

time in a dynamic way.

Throughout Years 1‑10, in considering more and more aspects of data investigations,

students have experienced and discussed the challenges of obtaining randomly

representative data, with emphasis on the importance of clear reporting of how, when

and where data are obtained or collected, and of identifying the issues or questions for

which data are desired to be randomly representative. In Year 8, students used real data

and simulations, including re‑sampling from real data, to illustrate how sample data and

data summaries such as sample proportions and averages can vary across samples. The

concepts explored in Years 7 and 8 of the effects of sampling variability and of describing

and/or allowing for variability within and across datasets, have been an important part of

learning to comment on data in Years 9 and 10.

In exploring the practicalities and implications of obtaining data that can be used to

comment on general situations or populations with respect to issues of interest, students

have developed understanding of the nature of censuses, surveys, observational studies

and experimental investigations.

Throughout, concepts are introduced, developed and demonstrated in contexts

that continue the ongoing development of experiential learning of the statistical data

investigation process. The examples continue to illustrate the extent of statistical

thinking involved in all aspects of a statistical data investigation, including identifying the

questions/issues, in planning and implementing obtaining of data, in exploring data and in

commenting on information obtained from data in context.

www.amsi.org.au

The aim of the International Centre of Excellence for

Education in Mathematics (ICE‑EM) is to strengthen

education in the mathematical sciences at all levels‑

from school to advanced research and contemporary

applications in industry and commerce.

ICE‑EM is the education division of the Australian

Mathematical Sciences Institute, a consortium of

27 university mathematics departments, CSIRO

Mathematical and Information Sciences, the Australian

Bureau of Statistics, the Australian Mathematical Society

and the Australian Mathematics Trust.

The ICE‑EM modules are part of The Improving

Mathematics Education in Schools (TIMES) Project.

The modules are organised under the strand

titles of the Australian Curriculum:

• Number and Algebra

• Measurement and Geometry

• Statistics and Probability

The modules are written for teachers. Each module

contains a discussion of a component of the

mathematics curriculum up to the end of Year 10.