swiss fertility and socioeconomic indicators (1888) datasivana/courses/lecture18 - summer.pdf ·...

1

Swiss Fertility and Socioeconomic Indicators (1888) Data

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.

A data frame with 47 observations on 6 variables, each of which is in percent• Fertility Ig, ‘common standardized fertility measure’ •Agriculture % of males involved in agriculture as occupation• Examination % draftees receiving highest mark on army examination •Education % education beyond primary school for draftees.• Catholic % ‘catholic’ (as opposed to ‘protestant’). •Infant.Mortality live births who live less than 1 year.

Swiss Fertility and Socioeconomic Indicators (1888) Data

•Lets look at the relationship:

• between Fertility and Education

• between Fertility and Examination

• between Fertility and Agriculture

•Which one would you use to predict Fertility? Why?

2

Inference for Two-way tables

Statistics 111 - Lecture 18

Example: Fright symptoms and gender

Male Female

Ongoing fright symptoms 7 29

No fright symptoms 31 50

•There is growing body of literature demonstrating that early exposure to frightening movies is associated with lingering fright symptoms

• 117 College students were asked to write narrative accounts of their exposure to frightening movies before the age of 13.

•Here is a table summarizing the results by gender:

3


Male Female

Ongoing fright symptoms 18% 37%

No fright symptoms 82% 63%

•Here is the same table but presented in percents conditional on the gender type:

Is there an association

between the gender and

having fright symptoms?


•The null hypothesis for two-way table: there is no association between the row variable and the column variable.

•The alternative hypothesis: there is an association between the two variables.

•The null hypothesis: gender and having ongoing fright symptoms are independent!

•The alternative: gender and ongoing fright symptoms are dependent (related).

4


Calculating the test statistic

• We are going to compare the observed cell counts with the expected value counts under the assumption that the null hypothesis is true.

Male Female total

Ongoing fright

symptoms

7 29 36

No fright

symptoms

31 50 81

total 38 79 117


•What are the expected counts in each cell under the null?

•Under the assumption that gender and having fright symptoms are independent •P(no fright symptoms and female)=?•So the expected number of counts in that cell is:

Male Female total percent

Ongoing fright

symptoms

7 29 36 36/117

No fright

symptoms

31 50 81 81/117

total 38 79 117

percent 38/117 79/117

5


•Expected number of counts in the cell

Male Female total percent

Ongoing fright

symptoms

(38/117)*(36/117)*117=11.69 (36/117)*(79/117)*117=24.31 36 36/117

No fright symptoms (38/117)*(81/117)*117=26.31 (81/117)*(79/117)*117=54.69 81 81/117

total 38 79 117

percent 38/117 79/117


•The test statistics compares the observed to the expected counts

Expected Male Female total

Ongoing fright

symptoms

11.69 24.31 36

No fright

symptoms

26.31 54.69 81

total 38 79 117

Observed Male Female total

Ongoing fright

symptoms

7 29 36

No fright

symptoms

31 50 81

total 38 79 117

6


•The test statistics X compares the observed to the expected counts

•This statistics follows a distribution called chi-square. •The chi-square has one parameter called “degree of freedom”.

03.469.54

)69.5450(...

69.11

)69.117(

count expected

count) expectedcount observed(

22

22


•Chi-square distribution

•It’s not a symmetric distribution !

)2(2 )4(2

7


Calculating the test statistic•To calculate the p-value we compare the test statistic with a chi-square distribution with (#of rows-1)*(# of columns-1)


• In our example the degrees of freedom is (2-1)*(2-1)=1!

8


• In our example the degrees of freedom is (2-1)*(2-1)=1!

•Since p-value<0.05 we reject the null at a 5% level. This means that there is enough evidence to suggest there is an association between gender and fright symptoms.

045.0)03.4)1(( 2 PPvalue


• This method works for any two categorical variables no

matter how many levels they have.

•One catch! Make sure that the expected values in each of

the cells is bigger than 5 (this is needed in order for the

statistic to follow a chi-square distribution )

9

Goodness of fit test

Example: Vehicle collisions and cell phones during weekdays

•Are you more likely to have a motor vehicle collision when using a cell phone?

•A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question.


The data is classified by the day of the week

• Are the accidents equally likely to occur on any day of the week?

Sun Mon Tue Wed Thu Fri Sat Total

20 133 126 159 136 113 12 699

10


•H0: psun= pmon= ptue= pwed= pthu= pfri = psat

•Ha: otherwise

• What is the alternative saying? •What is the expected counts in each of the weekdays under the null?


Expected counts are

Observed counts are


99.89 99.89 99.89 99.89 99.89 99.89 99.89 699


20 133 126 159 136 113 12 699

11

Goodness of fit

Calculate the test statistic

•The test statistics X compares the observed to the expected counts

85.20889.99

)89.9912(...

89.99

)89.9920(

count expected

count) expectedcount observed(

22

22

Calculate the p-value

This statistics follows a distribution called chi-square with (k-1) degrees of freedom

K is the number of categories which in our case is 7

0005.0)85.208)6(( 2 PPvalue

Goodness of fit

12

Calculate the p-value

This statistics follows a distribution called chi-square with (k-1) degrees of freedom

K is the number of categories which in our case is 7

Conclusion

These types of accidents are not equally likely to occur on each of the seven days of the week

0005.0)85.208)6(( 2 PPvalue

Goodness of fit

swiss fertility and socioeconomic indicators (1888) datasivana/courses/lecture18 - summer.pdf ·...

Documents