swiss fertility and socioeconomic indicators (1888) datasivana/courses/lecture18 - summer.pdf ·...

12
1 Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888. A data frame with 47 observations on 6 variables, each of which is in percent Fertility Ig, ‘common standardized fertility measure’ Agriculture % of males involved in agriculture as occupation Examination % draftees receiving highest mark on army examination Education % education beyond primary school for draftees. Catholic % ‘catholic’ (as opposed to ‘protestant’). Infant.Mortality live births who live less than 1 year. Swiss Fertility and Socioeconomic Indicators (1888) Data Lets look at the relationship: between Fertility and Education between Fertility and Examination between Fertility and Agriculture Which one would you use to predict Fertility? Why?

Upload: others

Post on 26-Mar-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

1

Swiss Fertility and Socioeconomic Indicators (1888) Data

Standardized fertility measure and socio-economic indicators for each of 47 French-speaking provinces of Switzerland at about 1888.

A data frame with 47 observations on 6 variables, each of which is in percent• Fertility Ig, ‘common standardized fertility measure’ •Agriculture % of males involved in agriculture as occupation• Examination % draftees receiving highest mark on army examination •Education % education beyond primary school for draftees.• Catholic % ‘catholic’ (as opposed to ‘protestant’). •Infant.Mortality live births who live less than 1 year.

Swiss Fertility and Socioeconomic Indicators (1888) Data

•Lets look at the relationship:

• between Fertility and Education

• between Fertility and Examination

• between Fertility and Agriculture

•Which one would you use to predict Fertility? Why?

Page 2: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

2

Inference for Two-way tables

Statistics 111 - Lecture 18

Example: Fright symptoms and gender

Male Female

Ongoing fright symptoms 7 29

No fright symptoms 31 50

•There is growing body of literature demonstrating that early exposure to frightening movies is associated with lingering fright symptoms

• 117 College students were asked to write narrative accounts of their exposure to frightening movies before the age of 13.

•Here is a table summarizing the results by gender:

Page 3: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

3

Example: Fright symptoms and gender

Male Female

Ongoing fright symptoms 18% 37%

No fright symptoms 82% 63%

•Here is the same table but presented in percents conditional on the gender type:

Is there an association

between the gender and

having fright symptoms?

Example: Fright symptoms and gender

•The null hypothesis for two-way table: there is no association between the row variable and the column variable.

•The alternative hypothesis: there is an association between the two variables.

•The null hypothesis: gender and having ongoing fright symptoms are independent!

•The alternative: gender and ongoing fright symptoms are dependent (related).

Page 4: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

4

Example: Fright symptoms and gender

Calculating the test statistic

• We are going to compare the observed cell counts with the expected value counts under the assumption that the null hypothesis is true.

Male Female total

Ongoing fright

symptoms

7 29 36

No fright

symptoms

31 50 81

total 38 79 117

Example: Fright symptoms and gender

•What are the expected counts in each cell under the null?

•Under the assumption that gender and having fright symptoms are independent •P(no fright symptoms and female)=?•So the expected number of counts in that cell is:

Male Female total percent

Ongoing fright

symptoms

7 29 36 36/117

No fright

symptoms

31 50 81 81/117

total 38 79 117

percent 38/117 79/117

Page 5: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

5

Example: Fright symptoms and gender

•Expected number of counts in the cell

Male Female total percent

Ongoing fright

symptoms

(38/117)*(36/117)*117=11.69 (36/117)*(79/117)*117=24.31 36 36/117

No fright symptoms (38/117)*(81/117)*117=26.31 (81/117)*(79/117)*117=54.69 81 81/117

total 38 79 117

percent 38/117 79/117

Example: Fright symptoms and gender

•The test statistics compares the observed to the expected counts

Expected Male Female total

Ongoing fright

symptoms

11.69 24.31 36

No fright

symptoms

26.31 54.69 81

total 38 79 117

Observed Male Female total

Ongoing fright

symptoms

7 29 36

No fright

symptoms

31 50 81

total 38 79 117

Page 6: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

6

Example: Fright symptoms and gender

•The test statistics X compares the observed to the expected counts

•This statistics follows a distribution called chi-square. •The chi-square has one parameter called “degree of freedom”.

03.469.54

)69.5450(...

69.11

)69.117(

count expected

count) expectedcount observed(

22

22

Example: Fright symptoms and gender

•Chi-square distribution

•It’s not a symmetric distribution !

)2(2 )4(2

Page 7: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

7

Example: Fright symptoms and gender

Calculating the test statistic•To calculate the p-value we compare the test statistic with a chi-square distribution with (#of rows-1)*(# of columns-1)

Example: Fright symptoms and gender

• In our example the degrees of freedom is (2-1)*(2-1)=1!

Page 8: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

8

Example: Fright symptoms and gender

• In our example the degrees of freedom is (2-1)*(2-1)=1!

•Since p-value<0.05 we reject the null at a 5% level. This means that there is enough evidence to suggest there is an association between gender and fright symptoms.

045.0)03.4)1(( 2 PPvalue

Example: Fright symptoms and gender

• This method works for any two categorical variables no

matter how many levels they have.

•One catch! Make sure that the expected values in each of

the cells is bigger than 5 (this is needed in order for the

statistic to follow a chi-square distribution )

Page 9: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

9

Goodness of fit test

Example: Vehicle collisions and cell phones during weekdays

•Are you more likely to have a motor vehicle collision when using a cell phone?

•A study of 699 drivers who were using a cell phone when they were involved in a collision examined this question.

Goodness of fit test

The data is classified by the day of the week

• Are the accidents equally likely to occur on any day of the week?

Sun Mon Tue Wed Thu Fri Sat Total

20 133 126 159 136 113 12 699

Page 10: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

10

Goodness of fit test

•H0: psun= pmon= ptue= pwed= pthu= pfri = psat

•Ha: otherwise

• What is the alternative saying? •What is the expected counts in each of the weekdays under the null?

Goodness of fit test

Expected counts are

Observed counts are

Sun Mon Tue Wed Thu Fri Sat Total

99.89 99.89 99.89 99.89 99.89 99.89 99.89 699

Sun Mon Tue Wed Thu Fri Sat Total

20 133 126 159 136 113 12 699

Page 11: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

11

Goodness of fit

Calculate the test statistic

•The test statistics X compares the observed to the expected counts

85.20889.99

)89.9912(...

89.99

)89.9920(

count expected

count) expectedcount observed(

22

22

Calculate the p-value

This statistics follows a distribution called chi-square with (k-1) degrees of freedom

K is the number of categories which in our case is 7

0005.0)85.208)6(( 2 PPvalue

Goodness of fit

Page 12: Swiss Fertility and Socioeconomic Indicators (1888) Datasivana/courses/lecture18 - summer.pdf · Swiss Fertility and Socioeconomic Indicators (1888) Data Standardized fertility measure

12

Calculate the p-value

This statistics follows a distribution called chi-square with (k-1) degrees of freedom

K is the number of categories which in our case is 7

Conclusion

These types of accidents are not equally likely to occur on each of the seven days of the week

0005.0)85.208)6(( 2 PPvalue

Goodness of fit