data collection analysis of variance displaying and summarizing … · 2018. 1. 4. · −1dfin...
TRANSCRIPT
-
Analysis of Variance (ANOVA)
Lecture 26
March 30, 2017
Four Stages of Statistics
• Data Collection �
• Displaying and Summarizing Data �
• Probability �
• Inference• One Quantitative �
• One Categorical �
• One Categorical and One Quantitative• Matched Pairs Test �
• Difference of Two Means Test �
• ANOVA and Multiple Comparisons
• Two Categorical
• Two Quantitative
F-Distribution
• F-Distribution: continuous probability distribution with the following properties:
• Unimodal and right-skewed
• Always non-negative
• Two parameters for degrees of freedom• One for numerator, one for denominator
• Changes shape depending on degrees of freedom
Examples of F-Distribution
-
Example of F-Table Example #1: F-Table
• Question: What is the F-statistic with 6 df in the numerator, 11 df in the denominator, and 5% of the area in the upper tail?
• Answer: ____________________
_______
Motivation: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Does it appear all the means are equal or is there some difference?
• Answer:• Means appear ________________
_____________________
• But…____________ exists withineach group
• Need inferential technique that can handle _______________
Analysis of Variance (ANOVA)
• Analysis of Variance (ANOVA): statistical technique used to compare the means of three or more populations
• Different from difference of two means test
• Use two sources of variability to compare means• Between group
• Within group
-
Types of Variation
• Between Group Variation: measures the amount of variation between the means of the individual groups
• “How different are the sample means from one another?”
• Within Group Variation: measures the amount of variation that exists in the samples
• “How different are the observations from each other within the individual samples?”
Comparing Types of Variation
Small Between Group• Means ____________________
Large Within Group• Observations within
groups ____________________
Large Between Group• Means _____________________
Small Within Group• Observations within
groups ____________________
ANOVA: Hypotheses and Conditions
• Used For:• Comparing if the means of three or more
independent groups are equal or if some difference exists between a pair of means
• Hypotheses: • ��: �� = �� = ⋯ = �• ��: At least two means are not equal.
• Conditions:• Data comes from normally distributed populations
• Variances of populations approximately equal
• Sampled observations independent
: Number of groups
being compared
Shortcut: Assume all of the conditions for ANOVA hold.
Example #2: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Are the mean math SAT scores the same in all three majors using � = .05?
• Hypothesis Test:1. Hypotheses:
• ��: _____________________________________________________
• ��: _____________________________________________________
2. Conditions: _____________________________________________
-
ANOVA: Types of Variation
• Between Group Variation:
• Within Group Variation:
��� = �� �̅� − �̿� + �� �̅� − �̿
� +⋯+ � �̅ − �̿�
Sample
Size Sample
Mean
Grand Mean: mean of all observations
�̿ =���̅� + ���̅� +⋯+ ��̅
�� + �� +⋯+ �
��� = �� − 1 ��� + �� − 1 ��
� +⋯+ � − 1 ��
Sample
SizeSample
Variance
Example #2: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Are the mean math SAT scores the same in all three majors?
• Data:Comp. Sci. Economics History
680 600 400
800 680 480
660 550 650
750 730 540
710 640 570
Example #2: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Are the mean math SAT scores the same in all three majors using � = .05?
• Statistics:
Statistic Comp. Sci. Economics History
Mean 720 640 530
Std. Dev. 56.12 69.64 90.83
Sample Size 5 5 5
Example #2: ANOVA
• Grand Mean:
�̿ = ___________________________________________________
• Between Group Variation:
• Within Group Variation:
��� = ____________________________________________
= ____________________________________________
= ____________
��� = ___________________________________________
= ___________________________________________
= ____________
-
ANOVA: Test Statistic
• Mean Squared Treatment
��� =���
− 1• Mean Squared Error
��� =���
� −
• Test Statistic:
� =���
���
• Follows F-distribution with − 1 df in numerator and � − df in denominator
Example #2: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Are the mean math SAT scores the same in all three majors using � = .05?
• Hypothesis Test:3. Test Statistic: Need:
• Degrees of freedom• Numerator: ______________________________
• Denominator: ______________________________
• Between group variation: ____________________
• Within group variation: ____________________
Example #2: ANOVA
• Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.
• Question: Are the mean math SAT scores the same in all three majors using � = .05?
• Hypothesis Test:3. Test Statistic:
� = _________________________________________________
4. Critical Value: ____________________
ANOVA: Decision and Conclusion
• Decision: Reject �� for large values of the test statistic
• Implies between group variation (difference between means) is large relative to within group variation.
• Conclusion: Either…• No difference between any means
• At least two means differ
-
• Question: Are the mean math SAT scores the same in all three majors using � = .05?
• Hypothesis Test:5. Decision: _____________________ (_______________________)
6. Conclusion: ____________________________________________ __________________________________________________________
Example #2: ANOVA
Test Statistic:
_______________
Rejection
Region
Drawback of ANOVA
• When we reject ��, we only conclude “at least two means are not equal”
• Problem: Many different ways of rejecting ��• �� ≠ ��, �� = ��, �� = ��• �� = ��, �� ≠ ��, �� = ��• �� = ��, �� = ��, �� ≠ ��• �� ≠ ��, �� ≠ ��, �� = ��• �� ≠ ��, �� = ��, �� ≠ ��• �� = ��, �� ≠ ��, �� ≠ ��• �� ≠ �� ≠ ��
Looking Ahead: ________________________ (next class) will tell
us which of these scenarios is true.
_______________________ not equal
_______________________ not equal
________________ are equal
Example #3: Role of Mean
• Scenario: Boxplots could have same spreads but different sample means.
• Question: What impact does having more spread out means have on rejecting ��?
• Answer: _____________________________________________• Between group variation
______________________
• Within group variation______________________
• Test statistic ______________
Example #4: Role of Standard Deviation
• Scenario: Boxplots could have means of 30, 40, and 50 on left or right depending on std. devs.
• Question: What impact does having smaller standard deviations have on rejecting ��?
• Answer: _____________________________________________• Between group variation
______________________
• Within group variation______________________
• Test statistic ______________
-
Summary
• ANOVA: used to compare means of three or more independent populations
• Compare between group and within groupvariation to calculate test statistic
• Test Statistic: � =��
��!with − 1 and � − df
• Conclusion: either all means are equal or at least two means differ
• Drawback: Cannot tell which means differ (yet)