4 normal probability plots at once

30
4 normal probability plots at once par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))} These plots can be produced by going to “file” and “new” and “script file”. Paste the commands into the script file window, press “F10” and the four plots are produced automatically. 4 histograms all at once Same as above, but instead of qqnorm, use hist, and you only need one column rather than dataframe 1 and 2. Also, don’t forget to change your label.

Upload: kaden-hanson

Post on 03-Jan-2016

33 views

Category:

Documents


0 download

DESCRIPTION

4 normal probability plots at once. par(mfrow=c(2,2)) for(i in 1:4) { qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”) title(paste(“yourchoice”,i,sep=“”))} These plots can be produced by going to “file” and “new” and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 4 normal probability plots at once

4 normal probability plots at oncepar(mfrow=c(2,2))

for(i in 1:4) {

qqnorm(dataframe[,1] [dataframe[,2]==i],ylab=“Data quantiles”)

title(paste(“yourchoice”,i,sep=“”))}

These plots can be produced by going to “file” and “new” and

“script file”. Paste the commands into the script file window,

press “F10” and the four plots are produced automatically.

4 histograms all at onceSame as above, but instead of qqnorm, use hist, and you only

need one column rather than dataframe 1 and 2. Also, don’t forget

to change your label.

Page 2: 4 normal probability plots at once

Lab: Chi-Squared Test (X2) Lack of Fit

November 10, 2000

Page 3: 4 normal probability plots at once

History

Invented in 1900 Oldest inference procedure still used in

its original form English statistician Karl Pearson

Page 4: 4 normal probability plots at once

The X2 Test

When you have data values for two categorical variables

Also called a two-way table For example: men/women and NSOE

track; regenerated seaweed (yes/no) and access level (limpet only/limpet and fish/etc).

Page 5: 4 normal probability plots at once

Example: Why do Men and Women Participate in Sports?

Desire to win or do better than others– called social comparison

Desire to improve one’s skills or to do one’s best– called mastery

Page 6: 4 normal probability plots at once

Data Collected from 67 male and 67 female

undergraduate students at a large university

Survey given asking about students’ sports goals.

Students were all categorized either high or low with regard to both of the questions:– high or low social comparison– high or low mastery

Duda, Joan L., Leisures Sciences, 10(1988), pp. 95-106

Page 7: 4 normal probability plots at once

Groups

This leads to four groups:– High social comparison, high mastery. – High social comparison, low mastery. – Low social comparison, high mastery– Low social comparison, low mastery

We want to compare this for men and women.

Page 8: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male

HSC-HM 14 31

HS-LM 7 18

LSC-HM 21 5

LSC-LM 25 13

Total 67 67

Page 9: 4 normal probability plots at once

1. Add Totals

Column: In this case, what population the observation comes from..

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Row: Categorical response variable

Grand total

Page 10: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

A Cell

A table with r rows and c columns contains r x c cells

Page 11: 4 normal probability plots at once

X2 is really an analysis of 5 things in this table:

Frequency (actual count) Percent of overall total Percent of row Percent of column Expected count

Page 12: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Frequency: Just the cell count

Page 13: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Overall Percent: Cell count divided by grand total

14/134=0.105. That is, 10.5% of all those studied were HSC-HM and female.

Page 14: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Row Percent: Cell count divided by row total

14/45=0.311 That is, of all those students reporting HSC-HM,31% were female.

Page 15: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Column Percent: Cell count divided by column total

14/67=0.209 That is, of all female student participants, 21% were HSC-HM..

Page 16: 4 normal probability plots at once

Expected Count

Coming later to a slide near you...

Page 17: 4 normal probability plots at once

These percents are useful in graphical analysis. Overall, row, and column percent can

be calculated for each cell Then questions of interest can be asked We are interested in the effect of sex on

sports goals. In this case, we would examine the

column percents

Page 18: 4 normal probability plots at once

Column percents for sports goals

Goal Female Male

HSC-HM 21 46

HSC-LM 10 27

LSC-HM 31 7

LSC-LM 37 19

Total 100 100

Page 19: 4 normal probability plots at once

05

1015

2025303540

4550

Female Male

HSC-HMHSC-LM

LSC-HM

LSC-LM

Page 20: 4 normal probability plots at once

Surprise, surprise - we want to ask whether these apparently

obvious differences are significant.

Can these differences be attributed to chance?

Calculate the chi-square and compare to a chi-square distribution

Determine the p-value A low p-value means we reject our null

hypothesis (sound familiar?)

Page 21: 4 normal probability plots at once

The hypotheses: Null

No association exists between our row and our column variables– No association exists between sex

and sports goals

– The distributions of sports in the male and female populations are the same.

Page 22: 4 normal probability plots at once

The hypotheses: Alternative Alternative: An association exists

between the row and column variables– No particular direction (not one- or two-

sided)– The distributions of sports goals in the male

and female populations are not all the same.

– Includes many kinds of possible associations

– “Men rate social comparison higher as a goal than do women”

Page 23: 4 normal probability plots at once

OK: Now back to the Expected Count

If the null hypothesis were true, what would the count in each cell be?

For women in the HSC-HM cell, it would work like this:– 33.6% of all respondents are HSC-HM– We have 67 women– So, if no sex difference exists (our null),

we would expect that 33.6% of our 67 women would be HSC-HM --> 22.5 women.

Page 24: 4 normal probability plots at once

Observed Counts for Sports Goals

Goal Female Male Total

HSC-HM 14 31 45

HS-LM 7 18 25

LSC-HM 21 5 26

LSC-LM 25 13 38

Total 67 67 134

Expected Count

1. 45/134=33.6% of all respondents are HSC-HM.

2. 33.6% of 67 women is 22.5.

Page 25: 4 normal probability plots at once

Finally: The Chi-Squared Statistic Itself

Compare the entire set of observed counts with the set of expected counts.

Take the difference in each cell between observed and expected

Square each difference Normalize these (divide by the expected

count) Sum over all cells.

Page 26: 4 normal probability plots at once

The Formula:

Large values of X2 provide evidence against the null hypothesis

A chi-square distribution is used to obtain the p-value

Degrees of freedom are (r-1)(c-1)

2

2 observed count - expected count

expected countX

Page 27: 4 normal probability plots at once

In this case... Chi-squared = 24.898 on 3 df. The p-value is less than 0.0005. The chance of obtaining a chi-squared

value greater than or equal to this due to chance alone is very small

Clear evidence against the null hypothesis

Strong evidence that female and male students have different distributions of sports goals.

Page 28: 4 normal probability plots at once

Is that all you can say? No, you can and should combine the test with

a description that shows the relationship. – Percents in our earlier table and our graph– Summary comments: the percent fo males in each

of the HSC goal classes is more than twice the percent of females.

– The HSC-HM group contains 46% of the males, but only 21% of the females

– The HSC-LM group contains 27% of the males and only 10% of the females

– We conclude that males are more likely to be motivated by social comparison goals and females are more likely to be motivated by mastery goals.

Page 29: 4 normal probability plots at once

Important to remember:

The approximation of the population chi-square by our estimate becomes more accurate as the cell counts increase.

For 2 x 2 tables, the expected count in each of the 4 cells must be five or higher.

For tables larger than 2 x 2, the average of the expected counts must be 5 or higher, and the smallest expected count must be 1 or more.

Page 30: 4 normal probability plots at once

Important to remember:

This is sometimes called the chi-squared test for homogeneity or the chi-squared test of independence.

Although this is is one of the most widely used of statistical tools, it is also one of the least informative.– The only thing you produce is a p-value and

there is no associated parameter to describe the degree of dependence

– the alternative hypothesis is very general (that row and columns are not independent)