hudm4122 probability and statistical inference
TRANSCRIPT
HUDM4122Probability and Statistical Inference
January 26, 2015
ASSISTments
• Did everyone get an account for the ASSISTments system?
• Did anyone have difficulties setting up an account?
• First homework is due in a week
Today
• Ch. 1 in Mendenhall, Beaver, & Beaver
• Variables and Variable Types• Graphing Data• Basic Exploratory Data Analysis
Variables
• What is a variable?
Variables
• What is a variable?
• “A variable is a characteristic that changes or varies over time and/or for different individuals or objects under consideration.” –MBB p. 8
Which of these are examples of variables?
• GPA• Shoe size• Age• Number of correct answers in ASSISTments• Number of times gamed the system in ASSISTments
• Favorite vegetable• Favorite type of pie• Pi
What is a measurement?
What is a measurement?
• A measurement is the result of measuring a variable on a single experimental unit – A person, if you are studying people– A class, if you are studying classes– A pizza, if you are studying pizzas
A measurement
• Person furthest towards my left in the front row, what is your name?
Now I have a measurement
A measurement
• Person furthest towards my right in the second row, what is your name?
Now I have data
• A set of measurements
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
Now I have data
• A set of measurements
• Note that in stats class or education journals, the word “data” is plural
• I only know one exception
Everyone repeat after me
Everyone repeat after me
• “My data are in this Excel file.”
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”
Everyone repeat after me
• “My data are in this Excel file.”• “Your data aren’t evidence for that conclusion.”
• “His data were hard to collect.”
However…
However…
• I do not recommend insisting that data is plural in bars, on first dates, or at Thanksgiving dinner
Any questions or concerns?
Univariate Data
• A single variable is collected
Height5’11”5’11”5’10”5’6”
Univariate Data
• Two variables are collected (for the same data point)
Height Drum‐Playing Skill5’11” 15’11” 25’10” 45’6” 8
Multivariate Data
• 3+ variables are collected
Name Height Drum‐Playing SkillJohn Lennon 5’11” 1
Paul McCartney 5’11” 2George Harrison 5’10” 4
Ringo Starr 5’6” 8
Any questions or concerns?
Types of Variables
Quantitative/Numerical Data
• Data that can be expressed as numbers
What are some examples
• Of numerical data?
Ordinal Data
• Refers to data where there is a known order, but either– The data clearly isn’t numbers– The space between values is not guaranteed to be equal
Examples of Ordinal Data
• Months of the year: January, February, March, April, …
• Agreement level: Strongly Agree, Agree, Neutral, Disagree, Strongly Disagree
• Quality of university: Highly selective, selective, somewhat selective, non‐selective
Other examples of ordinal data?
Nominal data
• Values have no order or spacing
• Name• State of Residence
– New Jersey is not greater or less than New York
Nominal data
• Values have no order or spacing
• Name• State of Residence
– New Jersey is not greater or less than New York– Although my brother might disagree
Other Examples of Nominal Data?
Another name
• Nominal data is often also called categorical data
Another name
• Nominal data is often also called categorical data
• Technically ordinal data is also categorical, but no one ever uses the term that way
Any questions or concerns?
Exploratory Data Analysis
• “Analyzing data sets to summarize their main characteristics”
• “Seeing what the data can tell us beyond the formal modeling or hypothesis testing task”
Goal
• Generate hypotheses• Understand your data better
Often (but not always)done with graphs
Which of these is your favorite type of graph?
• Pie chart• Bar graph• Frequency histogram• Line graph• Scatterplot• Stem‐and‐leaf plot• Box plot• Other
Pie Chart
• Take a set of categories that add to 100%• Show the proportion each category has
Pie Chart: Example
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
Interpret This Graph Please
What is everyone's favorite pie?
PumpkinAppleCherryRhubarbBanana Cream
Never Ever Do This:Completely Visually Misleading
Fair use; critique
Let’s make a pie chart
• Using the “your favorite graph” data
Any questions?
Alternative: Bar Graphs
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
Interpret this graph please
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
What are the advantages/disadvantages relative to pie chart?
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
By the way: X and Y axes
0
5
10
15
20
25
30
Pumpkin Apple Cherry Rhubarb Banana Cream
What is everyone's favorite pie?
X axis
Y axis
Strengths of bar graphs
• Categories don’t have to add to 100%• Easier to see small differences between categories
• You can compare variables too
Two‐group bar graph
0
10
20
30
40
50
60
Football Team Chess Team SpidermanTeam
Qua
lity (Highe
r is B
etter)
School Rankings
Midtown High
Harlem Success Academy
Let’s make a bar graph
• Using the “your favorite graph” data
Any questions?
Some suggest always using bar graphs instead of pie charts
Some suggest always using bar graphs instead of pie charts
• “The only thing worse than a pie chart is several of them.” – Edward Tufte
• “Save the pies for dessert.” – Stephen Few
But they’re wrong
But they’re wrong
• Pie charts are good for representing part‐whole relationships in really easy to see ways
• Pie charts are good at representing overall proportions
Nice example(Gabrielle, 2013)
Any questions?
Frequency Histogram
• A type of bar graph – But usually when people say “bar graph”, they do not mean “frequency histogram”
– Also: by convention, no space between bars
• X axis shows values or ranges of a quantitative variable
• Y axis shows how many data points have that value or range for the quantitative variable
Example from the book
Visits to Starbucks
Another Example
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
Was this an easy exam or a hard exam?
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
Would you rather be in the blue class or the orange class?
0
2
4
6
8
10
12
14
16
18
51‐55
56‐60
61‐65
66‐70
71‐75
76‐80
81‐85
86‐90
91‐95
96‐100
Freq
uency
Exam Grade
0
2
4
6
8
10
12
14
16
18
51‐55
56‐60
61‐65
66‐70
71‐75
76‐80
81‐85
86‐90
91‐95
96‐100
Exam Grade
By the way: outliers
0
2
4
6
8
10
12
14
16
18
Freq
uency
Exam Grade
OUTLIER
If there’s time, let’s make a frequency histogram
• Everybody: What’s your height in feet‐inches?
• (Example: I’m 5’9”)
Any questions?
Line Graph
• Shows trends from left‐to‐right• The trend is usually over time• But it doesn’t have to be…
Example Line Graph
http://www.wilderdom.com/personality/L4‐1IntelligenceNatureVsNurture.htmlUsed under Creative Commons License
Example Line Graph(VanLehn, 2011)
(This graph shows perceptions, not data on effectiveness.)
Any questions?
Not going to discuss today
• Stem‐and‐leaf plot
• Very, very rare to see in actual use• Quite poor for any sizable data set
• If you want to learn about them, see the book
Future Classes
• Scatterplot• Box plot
Upcoming Classes
• 1/28 Describing Data with Numerical Measures– Ch. 2
• 2/2 Describing Bivariate Data (Asgn. 1 due)– Ch. 3
• 2/4 Introduction to Probability– Ch. 4
Questions? Comments?