data collection analysis of variance displaying and summarizing … · 2018. 1. 4. · −1dfin...

7
Analysis of Variance (ANOVA) Lecture 26 March 30, 2017 Four Stages of Statistics Data Collection Displaying and Summarizing Data Probability Inference One Quantitative One Categorical One Categorical and One Quantitative Matched Pairs Test Difference of Two Means Test ANOVA and Multiple Comparisons Two Categorical Two Quantitative F-Distribution F-Distribution: continuous probability distribution with the following properties: Unimodal and right-skewed Always non-negative Two parameters for degrees of freedom One for numerator, one for denominator Changes shape depending on degrees of freedom Examples of F-Distribution

Upload: others

Post on 15-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Analysis of Variance (ANOVA)

    Lecture 26

    March 30, 2017

    Four Stages of Statistics

    • Data Collection �

    • Displaying and Summarizing Data �

    • Probability �

    • Inference• One Quantitative �

    • One Categorical �

    • One Categorical and One Quantitative• Matched Pairs Test �

    • Difference of Two Means Test �

    • ANOVA and Multiple Comparisons

    • Two Categorical

    • Two Quantitative

    F-Distribution

    • F-Distribution: continuous probability distribution with the following properties:

    • Unimodal and right-skewed

    • Always non-negative

    • Two parameters for degrees of freedom• One for numerator, one for denominator

    • Changes shape depending on degrees of freedom

    Examples of F-Distribution

  • Example of F-Table Example #1: F-Table

    • Question: What is the F-statistic with 6 df in the numerator, 11 df in the denominator, and 5% of the area in the upper tail?

    • Answer: ____________________

    _______

    Motivation: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Does it appear all the means are equal or is there some difference?

    • Answer:• Means appear ________________

    _____________________

    • But…____________ exists withineach group

    • Need inferential technique that can handle _______________

    Analysis of Variance (ANOVA)

    • Analysis of Variance (ANOVA): statistical technique used to compare the means of three or more populations

    • Different from difference of two means test

    • Use two sources of variability to compare means• Between group

    • Within group

  • Types of Variation

    • Between Group Variation: measures the amount of variation between the means of the individual groups

    • “How different are the sample means from one another?”

    • Within Group Variation: measures the amount of variation that exists in the samples

    • “How different are the observations from each other within the individual samples?”

    Comparing Types of Variation

    Small Between Group• Means ____________________

    Large Within Group• Observations within

    groups ____________________

    Large Between Group• Means _____________________

    Small Within Group• Observations within

    groups ____________________

    ANOVA: Hypotheses and Conditions

    • Used For:• Comparing if the means of three or more

    independent groups are equal or if some difference exists between a pair of means

    • Hypotheses: • ��: �� = �� = ⋯ = �• ��: At least two means are not equal.

    • Conditions:• Data comes from normally distributed populations

    • Variances of populations approximately equal

    • Sampled observations independent

    : Number of groups

    being compared

    Shortcut: Assume all of the conditions for ANOVA hold.

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:1. Hypotheses:

    • ��: _____________________________________________________

    • ��: _____________________________________________________

    2. Conditions: _____________________________________________

  • ANOVA: Types of Variation

    • Between Group Variation:

    • Within Group Variation:

    ��� = �� �̅� − �̿� + �� �̅� − �̿

    � +⋯+ � �̅ − �̿�

    Sample

    Size Sample

    Mean

    Grand Mean: mean of all observations

    �̿ =���̅� + ���̅� +⋯+ ��̅

    �� + �� +⋯+ �

    ��� = �� − 1 ��� + �� − 1 ��

    � +⋯+ � − 1 ��

    Sample

    SizeSample

    Variance

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors?

    • Data:Comp. Sci. Economics History

    680 600 400

    800 680 480

    660 550 650

    750 730 540

    710 640 570

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Statistics:

    Statistic Comp. Sci. Economics History

    Mean 720 640 530

    Std. Dev. 56.12 69.64 90.83

    Sample Size 5 5 5

    Example #2: ANOVA

    • Grand Mean:

    �̿ = ___________________________________________________

    • Between Group Variation:

    • Within Group Variation:

    ��� = ____________________________________________

    = ____________________________________________

    = ____________

    ��� = ___________________________________________

    = ___________________________________________

    = ____________

  • ANOVA: Test Statistic

    • Mean Squared Treatment

    ��� =���

    − 1• Mean Squared Error

    ��� =���

    � −

    • Test Statistic:

    � =���

    ���

    • Follows F-distribution with − 1 df in numerator and � − df in denominator

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:3. Test Statistic: Need:

    • Degrees of freedom• Numerator: ______________________________

    • Denominator: ______________________________

    • Between group variation: ____________________

    • Within group variation: ____________________

    Example #2: ANOVA

    • Scenario: Compare average math SAT scores for 5 students in three different majors: computer science, economics, and history.

    • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:3. Test Statistic:

    � = _________________________________________________

    4. Critical Value: ____________________

    ANOVA: Decision and Conclusion

    • Decision: Reject �� for large values of the test statistic

    • Implies between group variation (difference between means) is large relative to within group variation.

    • Conclusion: Either…• No difference between any means

    • At least two means differ

  • • Question: Are the mean math SAT scores the same in all three majors using � = .05?

    • Hypothesis Test:5. Decision: _____________________ (_______________________)

    6. Conclusion: ____________________________________________ __________________________________________________________

    Example #2: ANOVA

    Test Statistic:

    _______________

    Rejection

    Region

    Drawback of ANOVA

    • When we reject ��, we only conclude “at least two means are not equal”

    • Problem: Many different ways of rejecting ��• �� ≠ ��, �� = ��, �� = ��• �� = ��, �� ≠ ��, �� = ��• �� = ��, �� = ��, �� ≠ ��• �� ≠ ��, �� ≠ ��, �� = ��• �� ≠ ��, �� = ��, �� ≠ ��• �� = ��, �� ≠ ��, �� ≠ ��• �� ≠ �� ≠ ��

    Looking Ahead: ________________________ (next class) will tell

    us which of these scenarios is true.

    _______________________ not equal

    _______________________ not equal

    ________________ are equal

    Example #3: Role of Mean

    • Scenario: Boxplots could have same spreads but different sample means.

    • Question: What impact does having more spread out means have on rejecting ��?

    • Answer: _____________________________________________• Between group variation

    ______________________

    • Within group variation______________________

    • Test statistic ______________

    Example #4: Role of Standard Deviation

    • Scenario: Boxplots could have means of 30, 40, and 50 on left or right depending on std. devs.

    • Question: What impact does having smaller standard deviations have on rejecting ��?

    • Answer: _____________________________________________• Between group variation

    ______________________

    • Within group variation______________________

    • Test statistic ______________

  • Summary

    • ANOVA: used to compare means of three or more independent populations

    • Compare between group and within groupvariation to calculate test statistic

    • Test Statistic: � =��

    ��!with − 1 and � − df

    • Conclusion: either all means are equal or at least two means differ

    • Drawback: Cannot tell which means differ (yet)