statistics 100 - simon fraser university

49
Statistics 100 Lecture Set 5

Upload: others

Post on 15-Apr-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Statistics 100 - Simon Fraser University

Statistics 100 Lecture Set 5

Page 2: Statistics 100 - Simon Fraser University

•  Read chapters 8, 9 and 10

•  Suggested problems: •  Chapter 8: 8.3, 8.9, 8.11, 8.17, 8.19, 8.23, 8.25 •  Chapter 9: 9.3, 9.7, 9.15, 9.19, 9.23, 9.27 •  Chapter 10: 10.3, 10.5, 10.11, 10.23, 10.25

Page 3: Statistics 100 - Simon Fraser University

Measurement

•  Important question facing researchers: what data should I collect?

•  Text gives example of wanting to do a study attempting to address the question “Are people with larger brains more intelligent?” What to do?

Page 4: Statistics 100 - Simon Fraser University

Measurement

•  Question for your instructors: How do you measure how much a student knows?

•  Hard to do

Page 5: Statistics 100 - Simon Fraser University

Measurement

•  Measurement

Page 6: Statistics 100 - Simon Fraser University

Measurement

•  So, would like the measurements you take to inform the problem at hand

•  What good are surveys and experiments when people often can’t measure the property they want to learn about?

Page 7: Statistics 100 - Simon Fraser University

How is the variable defined?

•  Example: Citizens for Public Justice released a report Bearing the Brunt: How the 2008-2009 Recession Created Poverty for Canadian Families

•  What is poverty and how is it measured?

Page 8: Statistics 100 - Simon Fraser University

Valid or invalid measurement?

•  Measurement is valid if it a relevant representation of the property under study

•  Example: Number of alcohol related deaths in BC since the proliferation of private liquor stores

•  Example: Using number of deaths to measure severity of a hurricane/typhoon

•  Example: Measuring the amount of snow to describe the severity of the winter

Page 9: Statistics 100 - Simon Fraser University

•  Valid or Invalid? –  Sometimes the problem is a matter of scaling:

•  “Most accidents occur within 10 km of home” •  Supposing this is true…why?

•  Counts are frequently influenced by how much

Valid or invalid measurement?

Page 10: Statistics 100 - Simon Fraser University

•  If you want to know your weight, what do you do?

–  Does is always give you the right answer? –  Different scales –  Different clothes –  Recent meals –  Recent…un-meals –  “Random” variation

Accuracy and Precision

Page 11: Statistics 100 - Simon Fraser University

•  Measurement = truth + bias + random error

Accuracy and Precision

Page 12: Statistics 100 - Simon Fraser University

•  How would you decrease variability?

Accuracy and Precision

Page 13: Statistics 100 - Simon Fraser University

•  Basically, this chapter deals with thinking about what you are being told

Chapter 9

Page 14: Statistics 100 - Simon Fraser University

•  Interpreting numbers properly is hard. Why?

Chapter 9

Page 15: Statistics 100 - Simon Fraser University

•  How should you approach reported numbers and summaries?

Chapter 9

Page 16: Statistics 100 - Simon Fraser University

•  Consider a Coquitlam Now article on Alcohol related deaths:

•  “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.”

Chapter 9

Page 17: Statistics 100 - Simon Fraser University

•  Consider the Coquitlam Now article on Alcohol related deaths:

•  “A recent study linking the number of alcohol-related deaths and illnesses to the proliferation of privately run liquor stores has those in the labour movement calling on the province for answers.”

•  During the study period, the number of private stores jumped from 727 in 2003 to 977 in 2008, while the number of government stores dropped from 222 in 2003 to 199 in 2008. There was little to no increase in the amount of bars and restaurants during that time, though the number of alcohol-related deaths rose: 1,937 in 2003; 1,983 in 2004; 2,016 in 2005; 2,086 in 2006; 2,074 in 2007; and 2,011 in 2008.

Chapter 9

Page 18: Statistics 100 - Simon Fraser University

•  Comparisons: •  Look for key phrases

–  One of the best… –  No one has more/better/lower… –  Compared to a leading brand…

Chapter 9

Page 19: Statistics 100 - Simon Fraser University

•  If the presenter stands to gain, expect bias

Example

Page 20: Statistics 100 - Simon Fraser University

•  Other ways to mislead:

Chapter 9

Page 21: Statistics 100 - Simon Fraser University

•  From a NY Times editorial about immegrant children: –  Immigrant children lagged in mastering standard academic English,

the passport to college and to brighter futures. Whereas native -born children’s language skills follow a bell curve, immigrants’ children were crowded in the lower ranks: More than three-quarters of the sample scored below the 85th percentile in English proficiency.

Example

Page 22: Statistics 100 - Simon Fraser University

•  10-day weather forecast for Seattle

Example

•  “Is this right?” •  “What do they want me to conclude?” •  “Are there other explanations for this besides what they want me to think?” •  “Is the reporting incomplete?”

High 44 47 48 47 47 43 45 34 47 46

Low 34 31 33 35 36 34 32 33 39 38

Page 23: Statistics 100 - Simon Fraser University

Descriptive Statistics

•  Statistics deals with tools for collecting and understanding data

•  Have discussed ways to collect data

•  How do we deal with the data we collect?

•  Begin by summarizing the data

•  Want to describe or summarize data in a clear and concise way

•  Will first focus on descriptive statistics (graphical and numeric) in chapters 10-14

Page 24: Statistics 100 - Simon Fraser University

Recall …

•  Interested in something about a population

•  Population is a collection of individuals

•  Describe individuals with data

•  Data sets contain information/facts relating to individuals

•  Variables are attributes of an individual (e.g., hair color, pain severity, ...)

•  Distribution of a variable gives the values the variable can take and how often it takes on each value

Page 25: Statistics 100 - Simon Fraser University

Types of Variables

•  Two types of variables: –  Categorical Variables: each individual falls into a category

(ethnicity, machine works or does not, …)

•  A special type of categorical data is ordered categorical (ordinal) •  Categories are ordered in a natural way •  Can apply ideas of >, < (ordering)

–  Quantitative Variables take on numeric values for which

addition and averaging make sense (height, weight, income,…).

Page 26: Statistics 100 - Simon Fraser University

Types of Variables – Which type?

•  Hair color:

•  Color preference (red=1, blue=2, green=3):

•  Length of time slept:

•  Height of an individual:

•  Level of education (Some HS, HS Grad, Some post HS, Associate’s Degree, Bachelor’s Degree, Graduate Degree)

Page 27: Statistics 100 - Simon Fraser University

Chapter 10: Descriptive Statistics

•  Want to describe or summarize data in a clear and concise way

•  Two basic methods: graphical and numerical

Page 28: Statistics 100 - Simon Fraser University

Graphical Descriptions of Data

•  Often, pictures tells entire story of data

•  Have different plots for the different sorts of variables

•  A graph (or graphic) is any visual display of numbers

•  The goal of a graph is to –  Summarize information from a set of data into a picture that is

easy to understand –  Help to highlight a specific story or point within the data

(sometimes)

Page 29: Statistics 100 - Simon Fraser University

Graphical Descriptions of Data

•  Many way to do this: –  Tables –  Pie Charts –  Bar Charts –  Histograms –  Time plots –  Line graphs –  Scatter-plots –  Custom-made graphics –  …

Page 30: Statistics 100 - Simon Fraser University

Graphical Descriptions of Data

•  Recall: –  Data are values of variables that we observe in a sample

–  Sample was drawn from a population

–  We are trying to find out about something about the values of the variable in the population

Page 31: Statistics 100 - Simon Fraser University

Graphical Descriptions of Data

•  Distribution of a variable gives the values the variable can take and how often it takes on each value

–  A population distribution is a distribution for a population of values

•  Also called a probability distribution

–  An empirical distribution is a distribution for a sample •  We have this information in the data

•  So in a graph, we use summaries of an empirical distribution to learn about a population distribution

Page 32: Statistics 100 - Simon Fraser University

Graphical Descriptions for Categorical Data

•  For categorical data of any kind, we can summarize the distribution easily:

–  Identify all of the values the variable can take –  Count the number of times each value is observed –  Count is often called a frequency –  Often compute percentages from the counts –  Can display in a table or a chart

Page 33: Statistics 100 - Simon Fraser University

Graphical Descriptions for Categorical Data

•  A table of the distribution is just a list of values and corresponding counts and/or percentages.

•  Tables are great for detail –  Takes time to scan and digest

Page 34: Statistics 100 - Simon Fraser University

Bar Charts

•  Variable values are the category labels (typically placed along the x-axis)

•  Heights of bar is the count (percentage) of values falling in that category.

•  Note bars are the same width!

•  Usually start axis at zero …. WHY? 0

20

40

60

80

100

Cat.

1

Cat.

2

Cat.

3

Countor %

Page 35: Statistics 100 - Simon Fraser University

Comments

•  Ordering of categories: –  Often done alphabetically.

•  Not necessarily the best! •  Good when there are many bars: categories easy to find

–  Sometimes done in order of heights •  Sometimes called a Pareto Plot •  Good for making comparisons among bars. •  Individual categories can be hard to locate

–  Do what makes sense for you and the reader

•  Start axes at reasonable values … do not try to mislead

Page 36: Statistics 100 - Simon Fraser University

Example (retirement savings)

•  A USA Today (Jan. 4, 2000) poll asked Americans who earn $35,000 or less how they expected to accumulate a $500,000 retirement nest-egg.

•  The results are summarized in the frequency table below:

Response Count

Lottery 4000

Save and invest 3000

Do not know 1400

Inherit Money 1200

Lawsuit or insurance claim 400

Page 37: Statistics 100 - Simon Fraser University

lottery Save Do not know Inherit Lawsuit

Retirement Savings Example

Response

Counts

01000

2000

3000

4000

Page 38: Statistics 100 - Simon Fraser University

lottery Save Do not know Inherit Lawsuit

Retirement Savings Example

Response

Counts

01000

2000

3000

4000

lottery Save Do not know Inherit Lawsuit

Retirement Savings Example

Response

Percent

010

20

30

40

Page 39: Statistics 100 - Simon Fraser University

Pie Charts

•  Variable values are the category labels

•  Each category must appear on the plot

•  Percentage of area of pie covered by pie is relative frequency or percent) of values falling in that category.

•  Can easily see percentage for each category

•  Note: Less flexible than bar chart

East10%

W est 25%

North 45%

South20% East

W est

North

South

Page 40: Statistics 100 - Simon Fraser University

lottery 40%

Save 30%

Do not know 14%

Inherit 12%

Lawsuit 4%

Retirement Savings Example

Page 41: Statistics 100 - Simon Fraser University

Comments

•  Bar charts more flexible than pie charts

•  Bar charts easier to compare frequencies of categories than pie charts

•  Comparisons between datasets are easier using the bar chart than a pie chart

•  Pie chart must have same data as table to make precise comparisons

Page 42: Statistics 100 - Simon Fraser University

Comments

Page 43: Statistics 100 - Simon Fraser University

Plots for Quantitative Variables

•  Can summarize quantitative data using plots

•  Most common plots – time-plots, histogram and box-plots

•  Will introduce box-plots later

Page 44: Statistics 100 - Simon Fraser University

Time-plots (line graph)

•  If measuring a variable across time, plot against time

•  That is, plot you observations on the y-axis versus the time on the x-axis

Page 45: Statistics 100 - Simon Fraser University

Apple stock prices in the past year

Page 46: Statistics 100 - Simon Fraser University

Which stock would you buy?

Page 47: Statistics 100 - Simon Fraser University

More Comments

•  Include in a graph only things that describe the data

•  Beware missing axis labels •  Beware moving axis labels

•  See Example 6/Figure 10.7 in book for great example of messing with the axes to tell different stories

•  Graph is a compromise between summary and detail

Page 48: Statistics 100 - Simon Fraser University

Example http://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/

Page 49: Statistics 100 - Simon Fraser University

Example http://www.excelcharts.com/blog/minard-tufte-kosslyn-godin-napoleon/