hudm4122 probability and statistical inference

78
HUDM4122 Probability and Statistical Inference January 26, 2015

Upload: others

Post on 08-Apr-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HUDM4122 Probability and Statistical Inference

HUDM4122Probability and Statistical Inference

January 26, 2015

Page 2: HUDM4122 Probability and Statistical Inference

ASSISTments

• Did everyone get an account for the ASSISTments system?

• Did anyone have difficulties setting up an account?

• First homework is due in a week

Page 3: HUDM4122 Probability and Statistical Inference

Today

• Ch. 1 in Mendenhall, Beaver, & Beaver

• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis

Page 4: HUDM4122 Probability and Statistical Inference

Variables

• What is a variable?

Page 5: HUDM4122 Probability and Statistical Inference

Variables

• What is a variable?

• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” –MBB p. 8

Page 6: HUDM4122 Probability and Statistical Inference

Which of these are examples of variables?

• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in ASSISTments

• Favorite vegetable• Favorite type of pie• Pi

Page 7: HUDM4122 Probability and Statistical Inference

What is a measurement?

Page 8: HUDM4122 Probability and Statistical Inference

What is a measurement?

• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas

Page 9: HUDM4122 Probability and Statistical Inference

A measurement

• Person furthest towards my left in the front row, what is your name?

Page 10: HUDM4122 Probability and Statistical Inference

Now I have a measurement

Page 11: HUDM4122 Probability and Statistical Inference

A measurement

• Person furthest towards my right in the second row, what is your name?

Page 12: HUDM4122 Probability and Statistical Inference

Now I have data

• A set of measurements

Page 13: HUDM4122 Probability and Statistical Inference

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

Page 14: HUDM4122 Probability and Statistical Inference

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Page 15: HUDM4122 Probability and Statistical Inference

Now I have data

• A set of measurements

• Note that in stats class or education journals, the word “data” is plural

• I only know one exception

Page 16: HUDM4122 Probability and Statistical Inference

Everyone repeat after me

Page 17: HUDM4122 Probability and Statistical Inference

Everyone repeat after me

• “My data are in this Excel file.”

Page 18: HUDM4122 Probability and Statistical Inference

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”

Page 19: HUDM4122 Probability and Statistical Inference

Everyone repeat after me

• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”

• “His data were hard to collect.”

Page 20: HUDM4122 Probability and Statistical Inference

However…

Page 21: HUDM4122 Probability and Statistical Inference

However…

• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner

Page 22: HUDM4122 Probability and Statistical Inference

Any questions or concerns?

Page 23: HUDM4122 Probability and Statistical Inference

Univariate Data

• A single variable is collected

Height5’11”5’11”5’10”5’6”

Page 24: HUDM4122 Probability and Statistical Inference

Univariate Data

• Two variables are collected (for the same data point)

Height Drum‐Playing Skill5’11” 15’11” 25’10” 45’6” 8

Page 25: HUDM4122 Probability and Statistical Inference

Multivariate Data

• 3+ variables are collected

Name Height Drum‐Playing SkillJohn Lennon 5’11” 1

Paul McCartney 5’11” 2George Harrison 5’10” 4

Ringo Starr 5’6” 8

Page 26: HUDM4122 Probability and Statistical Inference

Any questions or concerns?

Page 27: HUDM4122 Probability and Statistical Inference

Types of Variables

Page 28: HUDM4122 Probability and Statistical Inference

Quantitative/Numerical Data

• Data that can be expressed as numbers

Page 29: HUDM4122 Probability and Statistical Inference

What are some examples

• Of numerical data?

Page 30: HUDM4122 Probability and Statistical Inference

Ordinal Data

• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be equal

Page 31: HUDM4122 Probability and Statistical Inference

Examples of Ordinal Data

• Months of the year: January, February, March, April, …

• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree

• Quality of university: Highly selective, selective, somewhat selective, non‐selective

Page 32: HUDM4122 Probability and Statistical Inference

Other examples of ordinal data?

Page 33: HUDM4122 Probability and Statistical Inference

Nominal data

• Values have no order or spacing

• Name• State of Residence

– New Jersey is not greater or less than New York

Page 34: HUDM4122 Probability and Statistical Inference

Nominal data

• Values have no order or spacing

• Name• State of Residence

– New Jersey is not greater or less than New York– Although my brother might disagree

Page 35: HUDM4122 Probability and Statistical Inference

Other Examples of Nominal Data?

Page 36: HUDM4122 Probability and Statistical Inference

Another name

• Nominal data is often also called categorical data

Page 37: HUDM4122 Probability and Statistical Inference

Another name

• Nominal data is often also called categorical data

• Technically ordinal data is also categorical, but no one ever uses the term that way

Page 38: HUDM4122 Probability and Statistical Inference

Any questions or concerns?

Page 39: HUDM4122 Probability and Statistical Inference

Exploratory Data Analysis

• “Analyzing data sets to summarize their main characteristics”

• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”

Page 40: HUDM4122 Probability and Statistical Inference

Goal

• Generate hypotheses• Understand your data better

Page 41: HUDM4122 Probability and Statistical Inference

Often (but not always)done with graphs

Page 42: HUDM4122 Probability and Statistical Inference

Which of these is your favorite type of graph?

• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem‐and‐leaf plot• Box plot• Other

Page 43: HUDM4122 Probability and Statistical Inference

Pie Chart

• Take a set of categories that add to 100%• Show the proportion each category has

Page 44: HUDM4122 Probability and Statistical Inference

Pie Chart: Example

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Page 45: HUDM4122 Probability and Statistical Inference

Interpret This Graph Please

What is everyone's favorite pie?

PumpkinAppleCherryRhubarbBanana Cream

Page 46: HUDM4122 Probability and Statistical Inference

Never Ever Do This:Completely Visually Misleading

Fair use; critique

Page 47: HUDM4122 Probability and Statistical Inference

Let’s make a pie chart

• Using the “your favorite graph” data

Page 48: HUDM4122 Probability and Statistical Inference

Any questions?

Page 49: HUDM4122 Probability and Statistical Inference

Alternative: Bar Graphs

0

5

10

15

20

25

30

Pumpkin Apple Cherry Rhubarb Banana Cream

What is everyone's favorite pie?

Page 50: HUDM4122 Probability and Statistical Inference

Interpret this graph please

0

5

10

15

20

25

30

Pumpkin Apple Cherry Rhubarb Banana Cream

What is everyone's favorite pie?

Page 51: HUDM4122 Probability and Statistical Inference

What are the advantages/disadvantages relative to pie chart?

0

5

10

15

20

25

30

Pumpkin Apple Cherry Rhubarb Banana Cream

What is everyone's favorite pie?

Page 52: HUDM4122 Probability and Statistical Inference

By the way: X and Y axes

0

5

10

15

20

25

30

Pumpkin Apple Cherry Rhubarb Banana Cream

What is everyone's favorite pie?

X axis

Y axis

Page 53: HUDM4122 Probability and Statistical Inference

Strengths of bar graphs

• Categories don’t have to add to 100%• Easier to see small differences between categories

• You can compare variables too

Page 54: HUDM4122 Probability and Statistical Inference

Two‐group bar graph

0

10

20

30

40

50

60

Football Team Chess Team SpidermanTeam

Qua

lity (Highe

r is B

etter)

School Rankings

Midtown High

Harlem Success Academy

Page 55: HUDM4122 Probability and Statistical Inference

Let’s make a bar graph

• Using the “your favorite graph” data

Page 56: HUDM4122 Probability and Statistical Inference

Any questions?

Page 57: HUDM4122 Probability and Statistical Inference

Some suggest always using bar graphs instead of pie charts

Page 58: HUDM4122 Probability and Statistical Inference

Some suggest always using bar graphs instead of pie charts

• “The only thing worse than a pie chart is several of them.” – Edward Tufte

• “Save the pies for dessert.” – Stephen Few

Page 59: HUDM4122 Probability and Statistical Inference

But they’re wrong

Page 60: HUDM4122 Probability and Statistical Inference

But they’re wrong

• Pie charts are good for representing part‐whole relationships in really easy to see ways

• Pie charts are good at representing overall proportions

Page 61: HUDM4122 Probability and Statistical Inference

Nice example(Gabrielle, 2013)

Page 62: HUDM4122 Probability and Statistical Inference

Any questions?

Page 63: HUDM4122 Probability and Statistical Inference

Frequency Histogram

• A type of bar graph – But usually when people say “bar graph”, they do not mean “frequency histogram”

– Also: by convention, no space between bars

• X axis shows values or ranges of a quantitative variable

• Y axis shows how many data points have that value or range for the quantitative variable

Page 64: HUDM4122 Probability and Statistical Inference

Example from the book

Visits to Starbucks

Page 65: HUDM4122 Probability and Statistical Inference

Another Example

0

2

4

6

8

10

12

14

16

18

Freq

uency

Exam Grade

Page 66: HUDM4122 Probability and Statistical Inference

Was this an easy exam or a hard exam?

0

2

4

6

8

10

12

14

16

18

Freq

uency

Exam Grade

Page 67: HUDM4122 Probability and Statistical Inference

Would you rather be in the blue class or the orange class?

0

2

4

6

8

10

12

14

16

18

51‐55

56‐60

61‐65

66‐70

71‐75

76‐80

81‐85

86‐90

91‐95

96‐100

Freq

uency

Exam Grade

0

2

4

6

8

10

12

14

16

18

51‐55

56‐60

61‐65

66‐70

71‐75

76‐80

81‐85

86‐90

91‐95

96‐100

Exam Grade

Page 68: HUDM4122 Probability and Statistical Inference

By the way: outliers

0

2

4

6

8

10

12

14

16

18

Freq

uency

Exam Grade

OUTLIER

Page 69: HUDM4122 Probability and Statistical Inference

If there’s time, let’s make a frequency histogram

• Everybody: What’s your height in feet‐inches?

• (Example: I’m 5’9”)

Page 70: HUDM4122 Probability and Statistical Inference

Any questions?

Page 71: HUDM4122 Probability and Statistical Inference

Line Graph

• Shows trends from left‐to‐right• The trend is usually over time• But it doesn’t have to be…

Page 72: HUDM4122 Probability and Statistical Inference

Example Line Graph

http://www.wilderdom.com/personality/L4‐1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License

Page 73: HUDM4122 Probability and Statistical Inference

Example Line Graph(VanLehn, 2011)

(This graph shows perceptions, not data on effectiveness.)

Page 74: HUDM4122 Probability and Statistical Inference

Any questions?

Page 75: HUDM4122 Probability and Statistical Inference

Not going to discuss today

• Stem‐and‐leaf plot

• Very, very rare to see in actual use• Quite poor for any sizable data set

• If you want to learn about them, see the book

Page 76: HUDM4122 Probability and Statistical Inference

Future Classes

• Scatterplot• Box plot

Page 77: HUDM4122 Probability and Statistical Inference

Upcoming Classes

• 1/28 Describing Data with Numerical Measures– Ch. 2

• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3

• 2/4 Introduction to Probability– Ch. 4

Page 78: HUDM4122 Probability and Statistical Inference

Questions? Comments?