swiss fertility and socioeconomic indicators (1888) datasivana/courses/lecture18 - summer.pdf ·...
TRANSCRIPT
1
Swiss Fertility and Socioeconomic Indicators (1888) Data
Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.
A data frame with 47 observations on 6 variables, each of which is in percent• Fertility Ig, ‘common standardized fertility measure’ •Agriculture % of males involved in agriculture as occupation• Examination % draftees receiving highest mark on army examination •Education % education beyond primary school for draftees.• Catholic % ‘catholic’ (as opposed to ‘protestant’). •Infant.Mortality live births who live less than 1 year.
Swiss Fertility and Socioeconomic Indicators (1888) Data
•Lets look at the relationship:
• between Fertility and Education
• between Fertility and Examination
• between Fertility and Agriculture
•Which one would you use to predict Fertility? Why?
2
Inference for Two-way tables
Statistics 111 - Lecture 18
Example: Fright symptoms and gender
Male Female
Ongoing fright symptoms 7 29
No fright symptoms 31 50
•There is growing body of literature demonstrating that early exposure to frightening movies is associated with lingering fright symptoms
• 117 College students were asked to write narrative accounts of their exposure to frightening movies before the age of 13.
•Here is a table summarizing the results by gender:
3
Example: Fright symptoms and gender
Male Female
Ongoing fright symptoms 18% 37%
No fright symptoms 82% 63%
•Here is the same table but presented in percents conditional on the gender type:
Is there an association
between the gender and
having fright symptoms?
Example: Fright symptoms and gender
•The null hypothesis for two-way table: there is no association between the row variable and the column variable.
•The alternative hypothesis: there is an association between the two variables.
•The null hypothesis: gender and having ongoing fright symptoms are independent!
•The alternative: gender and ongoing fright symptoms are dependent (related).
4
Example: Fright symptoms and gender
Calculating the test statistic
• We are going to compare the observed cell counts with the expected value counts under the assumption that the null hypothesis is true.
Male Female total
Ongoing fright
symptoms
7 29 36
No fright
symptoms
31 50 81
total 38 79 117
Example: Fright symptoms and gender
•What are the expected counts in each cell under the null?
•Under the assumption that gender and having fright symptoms are independent •P(no fright symptoms and female)=?•So the expected number of counts in that cell is:
Male Female total percent
Ongoing fright
symptoms
7 29 36 36/117
No fright
symptoms
31 50 81 81/117
total 38 79 117
percent 38/117 79/117
5
Example: Fright symptoms and gender
•Expected number of counts in the cell
Male Female total percent
Ongoing fright
symptoms
(38/117)*(36/117)*117=11.69 (36/117)*(79/117)*117=24.31 36 36/117
No fright symptoms (38/117)*(81/117)*117=26.31 (81/117)*(79/117)*117=54.69 81 81/117
total 38 79 117
percent 38/117 79/117
Example: Fright symptoms and gender
•The test statistics compares the observed to the expected counts
Expected Male Female total
Ongoing fright
symptoms
11.69 24.31 36
No fright
symptoms
26.31 54.69 81
total 38 79 117
Observed Male Female total
Ongoing fright
symptoms
7 29 36
No fright
symptoms
31 50 81
total 38 79 117
6
Example: Fright symptoms and gender
•The test statistics X compares the observed to the expected counts
•This statistics follows a distribution called chi-square. •The chi-square has one parameter called “degree of freedom”.
03.469.54
)69.5450(...
69.11
)69.117(
count expected
count) expectedcount observed(
22
22
Example: Fright symptoms and gender
•Chi-square distribution
•It’s not a symmetric distribution !
)2(2 )4(2
7
Example: Fright symptoms and gender
Calculating the test statistic•To calculate the p-value we compare the test statistic with a chi-square distribution with (#of rows-1)*(# of columns-1)
Example: Fright symptoms and gender
• In our example the degrees of freedom is (2-1)*(2-1)=1!
8
Example: Fright symptoms and gender
• In our example the degrees of freedom is (2-1)*(2-1)=1!
•Since p-value<0.05 we reject the null at a 5% level. This means that there is enough evidence to suggest there is an association between gender and fright symptoms.
045.0)03.4)1(( 2 PPvalue
Example: Fright symptoms and gender
• This method works for any two categorical variables no
matter how many levels they have.
•One catch! Make sure that the expected values in each of
the cells is bigger than 5 (this is needed in order for the
statistic to follow a chi-square distribution )
9
Goodness of fit test
Example: Vehicle collisions and cell phones during weekdays
•Are you more likely to have a motor vehicle collision when using a cell phone?
•A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question.
Goodness of fit test
The data is classified by the day of the week
• Are the accidents equally likely to occur on any day of the week?
Sun Mon Tue Wed Thu Fri Sat Total
20 133 126 159 136 113 12 699
10
Goodness of fit test
•H0: psun= pmon= ptue= pwed= pthu= pfri = psat
•Ha: otherwise
• What is the alternative saying? •What is the expected counts in each of the weekdays under the null?
Goodness of fit test
Expected counts are
Observed counts are
Sun Mon Tue Wed Thu Fri Sat Total
99.89 99.89 99.89 99.89 99.89 99.89 99.89 699
Sun Mon Tue Wed Thu Fri Sat Total
20 133 126 159 136 113 12 699
11
Goodness of fit
Calculate the test statistic
•The test statistics X compares the observed to the expected counts
85.20889.99
)89.9912(...
89.99
)89.9920(
count expected
count) expectedcount observed(
22
22
Calculate the p-value
This statistics follows a distribution called chi-square with (k-1) degrees of freedom
K is the number of categories which in our case is 7
0005.0)85.208)6(( 2 PPvalue
Goodness of fit
12
Calculate the p-value
This statistics follows a distribution called chi-square with (k-1) degrees of freedom
K is the number of categories which in our case is 7
Conclusion
These types of accidents are not equally likely to occur on each of the seven days of the week
0005.0)85.208)6(( 2 PPvalue
Goodness of fit