161.120 introductory statistics week 1 lecture slides introduction –cast: section 1 –text:...

29
161.120 Introductory Statistics Week 1 Lecture slides • Introduction CAST: section 1 Text: Chapter 1 Exploring Categorical Data: Frequency tables , Pie charts & Bar charts CAST: section 2.1 Text: section 2.1 to 2.3

Upload: erick-eaton

Post on 12-Jan-2016

221 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

161.120 Introductory Statistics Week 1 Lecture slides

• Introduction– CAST: section 1– Text: Chapter 1

• Exploring Categorical Data: Frequency tables , Pie charts & Bar charts– CAST: section 2.1– Text: section 2.1 to 2.3

Page 2: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

1.2 Seven Statistical Stories With Morals

• Case Study 1.1: Who Are Those Speedy Drivers?• Case Study 1.2: Safety in the Skies• Case Study 1.3: Did Anyone Ask Whom

You’ve Been Dating?• Case Study 1.4: Who Are Those Angry Women?• Case Study 1.5: Does Prayer Lower Blood Pressure?• Case Study 1.6: Does Aspirin Reduce

Heart Attack Rates?• Case Study 1.7: Does the Internet Increase

Loneliness and Depression?

Page 3: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Case Study 1.1 Who Are Those Speedy Drivers?

Question: What’s the fastest you have ever driven a car? mph.Data: 87 male and 102 female students from large statistics class at University.

Males: 110 109 90 140 105 150 120 110 110 90 115 95 145 140 110 105 85 95 100 115 124 95 100 125 140 85 120 115 105 125 102 85 120 110 120 115 94 125 80 85 140 120 92 130 125 110 90 110 110 95 95 110 105 80 100 110 130 105 105 120 90 100 105 100 120 100 100 80 100 120 105 60 125 120 100 115 95 110 101 80 112 120 110 115 125 55 90

Females: 80 75 83 80 100 100 90 75 95 85 90 85 90 90 120 85 100 120 75 85 80 70 85 110 85 75 105 95 75 70 90 70 82 85 100 90 75 90 110 80 80 110 110 95 75 130 95 110 110 80 90 105 90 110 75 100 90 110 85 90 80 80 85 50 80 100 80 80 80 95 100 90 100 95 80 80 50 88 90 90 85 70 90 30 85 85 87 85 90 85 75 90 102 80 100 95 110 80 95 90 80 90

Which gender has driven faster? How to summarize data?

Page 4: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Case Study 1.1 Who Are Those Speedy Drivers?

Dotplot: each dot represents the response of an individual student.

Page 5: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Case Study 1.1 Who Are Those Speedy Drivers?

Five-number summary: the lowest value, the cutoff points for ¼ , ½, and ¾ of the data, and the highest value.

Note: ¾ of men have driven 95 mph or more, only ¼ of women have done so.

Moral: Simple summaries of data can tell an interesting story and are easier to digest than long lists.

Page 6: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Using Minitab

Descriptive Statistics: Males, Females

Variable Minimum Q1 Median Q3 Maximum

Males 55.00 95.00 110.00 120.00 150.00

Females 30.00 80.00 89.00 95.00 130.00

Fastest speed (mph)1441281129680644832

Males

Females

Page 7: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Importance of Context• Focus of statistics

– to answer questions that are expressed in the language of some application area

• Data contain information

• Statistical methods are used to extract information from data

• Analysis of data with statistical methods is a core part of statistics, but the context of the data is most important.

Page 8: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Answering a single question in some context

Page 9: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Structure of Data

• All these 3 data sets have the same basic structure– 12 numerical

measurements made from 12 different ‘individuals’

– Individuals have been classified into one of two groups

• Same statistical methods can be applied

Page 10: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Variables and Individuals

Page 11: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Types of variable

Page 12: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Numerical

Consists of numerical values taken on each individual (numbers)

• Discrete – values are whole numbers (counts)– eg. Number of siblings

• Continuous– any values within some range– eg. Heights

Distinction between discrete and continuous variables is important.

Statistical methods used for continuous variables are not always appropriate for discrete variables.

Page 13: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Categorical

Classifies each individual into one of a small number of categories

• Ordinal – meaningfully ordered– eg. Tee shirt size: S, M, L, XL

• grades A, B, C, D, E

• Nominal– order not meaningful – eg. Eye colour

Most statistical methods can be applies to both types of categorical variables.

Page 14: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Labels & Ordering

• Label variable

– each individual may have a unique 'name' that can be used to identify it

– May help to identify unusual observations in the data set

• Individuals in a data set may be ordered.

– For example, blood pressure may be recorded from a patient at 10-minute intervals between 9am and 9pm. The resulting blood pressures are a continuous numerical variable whose values are time-ordered -

– the ordering of the values holds useful information that will help us understand the data.

• Unordered data set

– the weights of 20 cows sampled from a herd.

Page 15: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Variation

• Statistics involves measurements (data) in which there is variability

– not all measurements are the same.

• Explained variation

– Occasionally the observed variability in a measurement can be explained deterministically in terms of other variables through a law-like relationship.

– Example: Ohm’s Law

• Unexplained variation

– In most data sets, some or all variation remains unexplained.

Page 16: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

2.1 Raw Data

• Raw data are for numbers and category labels that have been collected but have not yet been processed in any way.

• When measurements are taken from a subset of a population, they represent sample data.

• When all individuals in a population are measured, the measurements represent population data.

• Descriptive statistics: summary numbers for either population or a sample.

Page 17: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Asking the Right Questions

One Categorical Variable

Question 1a: How many and what percentage of individuals fall into each category?

Example: What percentage of college students favor the legalization of marijuana, and what percentage of college students oppose legalization of marijuana?

Question 1b: Are individuals equally divided across categories, or do the percentages across categories follow some other interesting pattern?

Example: When individuals are asked to choose a number from 1 to 10, are all numbers equally likely to be chosen?

Page 18: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Asking the Right QuestionsTwo Categorical Variables

Question 2a: Is there a relationship between the two variables, so that the category into which individuals fall for one variable seems to depend on which category they are in for the other variable?

Example: In Case Study 1.6, we asked if the risk of having a heart attack was different for the physicians who took aspirin than for those who took a placebo.

Question 2b: Do some combinations of categories stand out because they provide information that is not found by examining the categories separately?

Example: The relationship between smoking and lung cancer was detected, in part, because someone noticed that the combination of being a nonsmoker and having lung cancer is unusual.

Page 19: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Asking the Right Questions

One Quantitative Variable

Question 3a: What are the interesting summary measures, like the average or the range of values, that help us understand the collection of individuals who were measured?

Example: What is the average handspan measurement, and how much variability is there in handspan measurements?

Question 3b: Are there individual data values that provide interesting information because they are unique or stand out in some way?

Example: What is the oldest recorded age of death for a human? Are there many people who have lived nearly that long, or is the oldest recorded age a unique case?

Page 20: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Asking the Right QuestionsOne Categorical and One Quantitative Variable

Question 4a: Are the measurements similar across categories?Example: Do men and women drive at the same “fastest speeds”

on average?

Question 4b: When the categories have a natural ordering (an ordinal variable), does the measurement variable increase or decrease, on average, in that same order?

Example: Do high school dropouts, high school graduates, college dropouts, and college graduates have increasingly higher average incomes?

Page 21: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Asking the Right QuestionsTwo Quantitative Variables

Question 5a: If the measurement on one variable is high (or low), does the other one also tend to be high (or low)?

Example: Do taller people also tend to have larger handspans?

Question 5b: Are there individuals whose combination of data values provides interesting information because that combination is unusual?

Example: An individual who has a very low IQ score but can perform complicated arithmetic operations very quickly may shed light on how the brain works. Neither the IQ nor the arithmetic ability may stand out as uniquely low or high, but it is the combination that is interesting.

Page 22: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Explanatory and Response Variables

Many questions are about the relationship between two variables.

It is useful to identify one variable as the explanatory variable and the other variable as the response variable.

In general, the value of the explanatory variable for an individual is thought to partially explain the value of the response variable for that individual.

Page 23: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Summarizing One or TwoCategorical Variables

First step - count how many fall into each category

• Frequency Table– Frequency (count)– Relative frequency (proportions or percentage)

• proportion = frequency in category / total frequency• percentage = proportion x 100

Gender Frequency Proportion PercentageMale 37 0.4625 46Female 43 0.5375 54Total 80 1 100

Page 24: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Example 2.2 Lighting the Way to Nearsightedness

Survey of n = 479 children.

Those who slept with nightlight or in fully lit room before age 2 had higher incidence of nearsightedness (myopia) later in childhood.

Note: Study does not prove sleeping with light actually caused myopia in more children.

Page 25: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

• Pie Charts: useful for summarizing a single categorical variable if not too many categories.

• Bar Graphs: useful for summarizing one or two categorical variables and particularly useful for making comparisons when there are two categorical variables.

Visual Summaries for Categorical Variables

Page 26: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Example 2.3 Humans Are Not Good Randomizers

Survey of n = 190 college students. “Randomly pick a number between 1 and 10.”

Results: Most chose 7, very few chose 1 or 10.

Page 27: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Bar Graphs and Pie Charts

• Chartjunk – Doesn’t make data easier to understand and can be misleading– Avoid 3D charts– Avoid replacing bars with objects– Better to draw a standard chart smaller than embellishing it with

chartjunk

• Highlight different aspects of the data– Bar charts provides better comparison of the individual

proportions– Pie chart allow us to assess the proportions in two or more

adjacent categories

Page 28: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Example 2.4 Revisiting Nightlights and Nearsightedness

Survey of n = 479 children.

Response: Degree of Myopia

Explanatory:Amount of Sleeptime Lighting

Page 29: 161.120 Introductory Statistics Week 1 Lecture slides Introduction –CAST: section 1 –Text: Chapter 1 Exploring Categorical Data: Frequency tables, Pie

Stacked Bar Graph

Perc

enta

ge

Lighting Conditions Full LightNightlightDark

100

80

60

40

20

0

VariableNoneSomeHigh