structure of contingency tables. symmetry. rules. populations and sub-populations. grand, column,...

39
Contingency Tables

Upload: willa-lynch

Post on 18-Jan-2016

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Contingency Tables

Page 2: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

• Structure of contingency tables.• Symmetry.• Rules.• Populations and sub-populations.• Grand, Column, and Row percentiles.• Simpson’s Paradox.• Using tables for inference testing – Chi2.

Contingency Tables as Descriptive Statistical Tools

Page 3: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

One of the most common types of statistical tools, so-called because the two variables in the table are

contingent (or dependent) upon one another.

Also called crosstabs because the two variables cross tabulate (or refer to) a single subject.

They can be a descriptive or inferential using the Chi2 statistic (pronounced “k-eye” square).

Deceptively simple tool; deceptive because they look easy to interpret but are not!

Show a wealth of information about population and subsets of that population

Contingency Tables as Descriptive Statistical Tools

Page 4: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Structure of Contingency Tables

The variables are in a rXc structure not just columns.

Groups of rows and columns are defined by one variable each such as eye and hair colour.

Individual rows and columns represent the sub categories within each variable - e.g. blue, brown,

black etc.

The number of subcategories define the size of the table – e.g. a 4X4 table would have two variables

each with four sub categories as follows…

Page 5: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Hair and Eye Colour Among Non-Indigenous North American Population

HAIR

BLACK BROWN RED BLOND TOTAL

EYES

Brown 68 119 26 7 220Blue 20 84 17 94 215

Hazel 15 54 14 10 93Green 5 29 14 16 64

TOTAL 108 286 71 127 592

Anatomy of a 4X4 table

Variable #1

Variable #2

Subcategory #2

Subcategory #1

Row totals are for VAR #2

Column totals are for VAR #1

GrandTotal

Data values for subjects in cells

Marginal Totals

Page 6: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Differences Between Contingency Tables and

SpreadsheetsSpreadsheets Contingency Tables

Columns are variables. Columns are a subset of a variable.Rows are cases. Rows are a subset of a variable.Cells are individual case values (e.g. a person) within a single variable.

Cells are total case values (e.g. # of people) with both variables.

Column totals are the sum of all cases within that variable.

Column totals are the sum of all cases within that subset of a variable.

Row totals are the sum of a case across all variables.

Row totals are the sum of all cases within that subset of a variable.

Grand totals (sum of row sums and column sums) are total of all cases in the dataset across all variables.

Grand totals (sum of row sums and column sums) are total of all cases in the dataset across all variables.

Page 7: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Contingency Table Symmetry

Tables are symmetrical when there are the same number of sub-category rows and columns – e.g. a

4X4 table.

Tables are asymmetric when there are dissimilar # of sub-category rows and columns – e.g. a 4X6 table.

Asymmetric tables are difficult to do inferential testing on so avoid them.

High level tables that have more than two variables also exist but are complex to analyse.

Page 8: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Contingency Table Symmetry

VARIABLE

VARIABLE

SUB-CAT SUB-CAT SUB-CAT SUB-CAT

SUB-CAT 1 2 3 4

SUB-CAT 2

SUB-CAT 3

SUB-CAT 4

VARIABLE

VARIABLE

SUB-CAT SUB-CAT SUB-CAT SUB-CAT SUB-CAT SUB-CAT

SUB-CAT 1 2 3 4 5 6

SUB-CAT 2

SUB-CAT 3

SUB-CAT 4

A 4X4 Table

A 4X6 Table

Page 9: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

  Eye Colour

Brown Blue

Hair ColourBrown    

Blue    

  Eye Colour

Brown Blue

GenderMale    

Female    

  Fat Intake

High Low

CholesterolHigh    

Low    

Table rules for mix of variables

If variables have a potentialrelationship then the dependent

variable goes on the ‘Y’ axis

If one variable is a population then that variable goes on the ‘Y’ axis

If both variables are categorical then it does not matter

where each goes

Page 10: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Table Rules Cont…

Independence: every subject must have the same chance of being selected.

Exclusivity: subjects can fall only into one cell; e.g. cannot use data drawn from multiple

responses to a single question (no one eye blue, one eye brown or no eye colour!).

Exhaustive: subcategories should include all responses received (i.e. the sum of rows must equal the sum of columns, or no-one can have hair colour without eye colour or vice versa).

Page 11: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Raw Data

HAIR

BLACK BROWN RED BLOND TOTAL

EYES

Brown 68 119 26 7 220

Blue 20 84 17 94 215

Hazel 15 54 14 10 93

Green 5 29 14 16 64

TOTAL 108 286 71 127 592

Grand Total Percentiles

Black Brown Red Blond TOTAL

EYES

Brown 11.49% 20.10% 4.39% 1.18% 37.16%

Blue 3.38% 14.19% 2.87% 15.88% 36.32%

Hazel 2.53% 9.12% 2.36% 1.69% 15.71%

Green 0.84% 4.90% 2.36% 2.70% 10.81%

TOTAL 18.24% 48.31% 11.99% 21.45% 100.00%

There are fewer black haired than blonde haired people.Rarest combinations:

1. Black hair/green eyes = <1 person in 1002. Blonde hair/brown eyes = @1.2 people in 100

Commonest combinations:1. Brown hair/brown eyes = 20% of population

2. Blonde hair/blue eyes = @ 16% of population

And for the curious…

Page 12: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Deriving Useful Information

Tables depend on proportional calculations using marginal totals (row and column) to be useful.

In reality you are dealing with sub groups of the population.

Each row total and column total represents a sub-population.

Three proportional (percentile) tables are derived and analysed:

Page 13: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Raw Data HAIR BLACK BROWN RED BLOND TOTAL

EYES

Brown 68 119 26 7 220Blue 20 84 17 94 215Hazel 15 54 14 10 93Green 5 29 14 16 64

TOTAL 108 286 71 127 592

Population as a whole.Variable #1 Sub-population – Hair across all eye categories.Variable #2 sub-population – Eyes across all hair categories.

Populations and Sub-Populations

Page 14: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

BROWNBlack

GREENBrown

HAZELBrown

BLUEBlonde

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBlack

BROWNBlack

BROWNBrown

HAZELBlack

HAZELRed

HAZELBlack

HAZELBlack

HAZELRed

HAZELRed

HAZELBrown

HAZELBlonde

HAZELRed GREEN

Brown

GREENBrown

GREENBlack

GREENBlack

BLUEBlonde

BLUEBlonde

BLUEBrown

BLUEBlack

BLUEBrown

BLUEBlonde

BLUEBlonde

BLUEBlonde

BLUEBlack

This is a population of 34 people. Each person has an EYE colour and a Hair colour.

Page 15: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

BROWNBlack

GREENBrown

HAZELBrownBLUE

Blonde

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBlack

BROWNBlack

BROWNBrown

HAZELBlack

HAZELRed

HAZELBlack

HAZELBlack

HAZELRed

HAZELRed

HAZELBrown

HAZELBlonde

HAZELRed

GREENBrown

GREENBrown

GREENBlack

GREENBlack

BLUEBlonde

BLUEBlonde

BLUEBrown

BLUEBlack

BLUEBrown

BLUEBlonde

BLUEBlonde

BLUEBlonde

BLUEBlack

This is the same population of 34. They have been told to group themselves into four sub-populations according to EYE colour.

n=5n=9

n=10

n=10

Subpop #1

Subpop #2

Subpop #3

Subpop #4

Page 16: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

BROWNBlack

GREENBrown

HAZELBrown

BLUEBlonde

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBrown

BROWNBlack

BROWNBlack

BROWNBrown

HAZELBlack

HAZELRed

HAZELBlack

HAZELBlack

HAZELRed

HAZELRed

HAZELBrown

HAZELBlonde HAZEL

Red

GREENBrown

GREENBrown

GREENBlack

GREENBlack

BLUEBlonde

BLUEBlonde

BLUEBrown

BLUEBlack

BLUEBrown

BLUEBlonde BLUE

BlondeBLUE

Blonde

BLUEBlack

This is the same population of 34. Now they have been told to group themselves into four sub-populations according to Hair colour.

n=4n=7

n=13 n=10Subpop #1 Subpop #2

Subpop #3Subpop #4

Page 17: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

TABLE 2: NO SUBSETSGRAND TOTAL %ages

SmokeYes No Total

Disease

Yes 6.5% 3% 9.5%No 18.5% 72% 90.5%

Total 25% 75% 100%

TABLE 3: SUBSET DISEASED ROW %ages

SmokeYes No Total

Disease

Yes 68.4% 31.6% 100%No 20.4% 79.6% 100%

Total 25.0% 75.0% 100%

TABLE 4: SUBSET SMOKERS COLUMN %ages

SmokeYes No Total

Disease

Yes 26.0% 4.0% 9.5%

No 74.0% 96.0% 90.5%Total 100% 100% 100%

TABLE 1: RAW DATA Smoke

Yes No Total

DiseaseYes 13 6 19No 37 144 181

Total 50 150 200

200 total subjects proportionalised to grand

total

200 subjects divided among four categories: yes smoke, no smoke,

yes disease, no disease

200 total subjects proportionalised to row

(diseased) total

200 total subjects proportionalised to column

(smoke) total

ALL SUBJECTS’ OVERALL STATUS

DISEASED’ SMOKING STATUS

SMOKERS’ DISEASE STATUS

ALL SUBJECTS’ RAW DATA

Page 18: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

TABLE 2: NO SUBSETSGRAND TOTAL %ages

SmokeYes No Total

Disease

Yes 6.5% 3% 9.5%No 18.5% 72% 90.5%

Total 25% 75% 100%

TABLE 3: SUBSET DISEASED ROW %ages

SmokeYes No Total

Disease

Yes 68.4% 31.6% 100%No 20.4% 79.6% 100%

Total 25.0% 75.0% 100%

TABLE 4: SUBSET SMOKERS COLUMN %ages

SmokeYes No Total

Disease

Yes 26.0% 4.0% 9.5%

No 74.0% 96.0% 90.5%Total 100% 100% 100%

TABLE 1: RAW DATA Smoke

Yes No Total

DiseaseYes 13 6 19No 37 144 181

Total 50 150 200

200 total subjects proportionalised to

grand total

200 subjects divided among four categories: yes smoke, no

smoke, yes disease, no disease

200 total subjects proportionalised to row

(diseased) total

200 total subjects proportionalised to

column (smoke) total

ALL SUBJECTS’ OVERALL STATUS

DISEASED’ SMOKING STATUS

SMOKERS’ DISEASE STATUS

ALL SUBJECTS’ RAW DATA?

Page 19: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

TABLE 2: NO SUBSETSGRAND TOTAL %ages

SmokeYes No Total

Disease

Yes 6.5% 3% 9.5%No 18.5% 72% 90.5%

Total 25% 75% 100%

TABLE 3: SUBSET DISEASED ROW %ages

SmokeYes No Total

Disease

Yes 68.4% 31.6% 100%No 20.4% 79.6% 100%

Total 25.0% 75.0% 100%

TABLE 4: SUBSET SMOKERS COLUMN %ages

SmokeYes No Total

Disease

Yes 26.0% 4.0% 9.5%

No 74.0% 96.0% 90.5%Total 100% 100% 100%

TABLE 1: RAW DATA Smoke

Yes No Total

DiseaseYes 13 6 19No 37 144 181

Total 50 150 200

200 total subjects proportionalised to

grand total

200 subjects divided among four categories: yes smoke, no

smoke, yes disease, no disease

200 total subjects proportionalised to row

(diseased) total

200 total subjects proportionalised to

column (smoke) total

ALL SUBJECTS’ OVERALL STATUS

DISEASED’ SMOKING STATUS

SMOKERS’ DISEASE STATUS

ALL SUBJECTS’ RAW DATA

?

Page 20: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

TABLE 2: NO SUBSETSGRAND TOTAL %ages

SmokeYes No Total

Disease

Yes 6.5% 3% 9.5%No 18.5% 72% 90.5%

Total 25% 75% 100%

TABLE 3: SUBSET DISEASED ROW %ages

SmokeYes No Total

Disease

Yes 68.4% 31.6% 100%No 20.4% 79.6% 100%

Total 25.0% 75.0% 100%

TABLE 4: SUBSET SMOKERS COLUMN %ages

SmokeYes No Total

Disease

Yes 26.0% 4.0% 9.5%

No 74.0% 96.0% 90.5%Total 100% 100% 100%

TABLE 1: RAW DATA Smoke

Yes No Total

DiseaseYes 13 6 19No 37 144 181

Total 50 150 200

200 total subjects proportionalised to

grand total

200 subjects divided among four categories: yes smoke, no

smoke, yes disease, no disease

200 total subjects proportionalised to row

(diseased) total

200 total subjects proportionalised to

column (smoke) total

ALL SUBJECTS’ OVERALL STATUS

DISEASED’ SMOKING STATUS

SMOKERS’ DISEASE STATUS

ALL SUBJECTS’ RAW DATA?

Page 21: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

TABLE 1: RAW DATA

SmokeYes No Total

Disease

Yes 13 6 19No 37 144 181

Total 50 150 200All 200 subjects are divided up among the four categories:

Smoker with disease (n=13)Smoker with no disease (n=37)Non-smoker with disease (n=6)

Non-smoker with no disease (n=144)And there are four sub-totals:

Not diseased (n=181)Non-smokers (n=150)

Diseased (n=19)Smokers (n=50)

Interpreting Raw Data

How do we draw conclusions about the risks of smoking from these data?

Page 22: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Interpreting Grand Total Percentiles

All 200 subjects are proportionalised to the grand total. Now:

Population who smoked and had heart disease (6.5%)Population who smoked and had no heart disease(18.5%)

Population who didn’t smoke and had heart disease (3.0%)Population who didn’t smoke and had no heart disease (72%)

Are smoking and disease related?

Only 26% of smokers were diseased (6.5%/25%*100).Yet 68% of diseased people were smokers (6.5%/9.5%*100)

TABLE 1: RAW DATA

Smoke Yes No Total

Disease Yes 13 6 19 No 37 144 181

Total 50 150 200

TABLE 2: NO SUBSETS GRAND TOTAL PERCENTAGES

Smoke Yes No Total

Disease Yes 6.5% 3% 9.5% No 18.5% 72% 90.5%

Total 25% 75% 100%

Page 23: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Interpreting Column Percentiles

All 200 subjects are proportionalised to the column total. Now we are interpreting the data from the perspective of a subset of the sample – a

person’s smoking status. Now:

Smoker with disease (26%)Smoker with no disease (74%)Non-smoker with disease (4%)

Non-smoker with no disease (96%)

Now what do we say?About three quarters of smokers don’t get sick!

TABLE 4: SUBSET SMOKERS COLUMN PERCENTAGES

Smoke Yes No Total

Disease Yes 26.0% 4.0% 9.5% No 74.0% 96.0% 90.5%

Total 100% 100% 100%

That’s where you would stop the analysis if you worked for the tobacco companies

Page 24: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Interpreting Row Percentiles

All 200 subjects are proportionalised to the row total. Now we are interpreting the data from the perspective of the other subset

of the sample – a person’s disease status. Now:

Diseased and smoker (68.4%)Not diseased and smoker (20.4%)Diseased and non-smoker (31.6%)

Not diseased and non-smoker (79.6%)Now what do we say?

Sixty-eight percent of people with heart disease also smoke while only about 20% of the sample who were free of heart disease

were smokers.

TABLE 3: SUBSET DISEASED ROW PERCENTAGES

Smoke Yes No Total

Disease Yes 68.4% 31.6% 100% No 20.4% 79.6% 100%

Total 25.0% 75.0% 100%

Page 25: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Summary

Two main points:

1. The different tables give different perspectives so have to be careful to…

Use correct subset interpretation – for example, the row-based percentiles in our analysis were about disease status and not smoking status:

68% of people with heart disease smoke, and not 68% of smokers have heart disease!

Page 26: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Summary

2. Watch proportions and size of sample subsets:

Only 50 of 200 smoked and…only 19 of 200 had heart disease and…

only 13 of 200 had heart disease and smoked.

The effect of so many not being diseased and not smoking can overwhelm the other effects, either

masking them or exaggerating them.

Page 27: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Simpson’s Paradox

Crops up often when using contingency tables in the social sciences.

Refers to the apparent reversal of relationships seen in disaggregated data when it is combined.

Product of disproportionality among subsets and lurking variables (note the previous

smoker/disease data).

An example:

Page 28: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Because dead smokers tell no tales!

Smokers die off considerably faster in the earlier period and

there are fewer of them around to be counted in

the later one. As well, older people’s mortality

is obviously higher.

In both surveys smoker’s die off rates are higher than non-

smokers.

Example of Simpson’s ParadoxResults of two surveys done 20 years apart.

Age 55-64

Dead Alive Total

Smokers 51=44% 64=56% 115=100%

Non-smokers 40=33% 81=67% 121=100%

Total 91=39% 145=61% 236=100%

Age 65-74

Dead Alive Total

Smokers 29=80% 7=20% 36=100%

Non-smokers 101=78% 28=22% 129=100%

Total 130=79% 35=21% 165=100%

Age 55-74 Combined

Dead Alive Total

Smokers 80=53% 71=47% 151=100%

Non-smokers 141=56% 109=44% 250=100%

Total 221=55% 180=45% 401=100%

But when tables are combined, smokers’ die off rates for the whole period are

lower. Why?

Page 29: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Using Tables for Inference Testing

Test for significant differences or relationships rather than just describing the data.

Based on comparing the observed cell values to those that could be expected using probability theory and assuming there are no significant

differences or relationships.

Stated:The probability of falling into a particular cell is the product of the probability of being in a particular row and the probability of being in a particular

column.

Page 30: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Calculating Chi Square

The statistic most frequently used in inferring with contingency tables is called the Chi Square statistic, written as chi2 and given

by the Greek letter χ.

It is based on an expected versus actual values methodology and its formula is:

22

1 1

( )r cij ij

i j ij

x e

e

Page 31: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Calculating Chi Square

Translated this says:

where the expected cell counts are given by:

22 (observed-expected)

the sum of expected

Chi

(row total)*(column total)

grandtotal

Page 32: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

An Example

Are e coli counts different between two lakes in Muskoka, one with cottages and one without?

1. Collect 200 samples of water from each.2. Measure e coli concentrations.3. Is the sample above or below acceptable

background limit?

How to test this?

Page 33: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Lakes

No Cottage Lake

Cottage Lake Total

Above 43 81 124

Below 157 119 276

Total 200 200 400

Collect Observed valuesFour hundred samples, 200 from each lake

Page 34: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

LakeNo

CottagesCottages Total

Above (observed)Above (expected)

4362

8162

124

Below (observed)Below (expected)

157138

119138

276

Total 200 200 400

Calculate Expected Values(row total)*(column total)

grandtotal

E.G. 124*200/400 = 62

Page 35: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

LakeNo

CottagesCottages Total

Above (observed)Above (expected)(O-E)2

4362

361

8162

361

124

Below (observed)Below (expected)(O-E)2

157138361

119138361

276

Total 200 200 400

Calculate Deviation Error Squared (O-E)2 Values for Cells2

2 (observed-expected)the sum of

expectedChi

E.G. (43-62)2 = 361

Page 36: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

LakeNo Cottages Cottages Total

Above (observed)Above (expected)(O-E)2

(O-E)2/E

4362

3615.82

8162

3615.82

124

Below (observed)Below (expected)(O-E)2

(O-E)2/E

1571383612.61

1191383612.61

276

Total 200 200 400

Divide (O-E)2 Values by Expected Values2

2 (observed-expected)the sum of

expectedChi

E.G. 361/62 = 5.82

Page 37: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

LakesNo Cottages Cottages Total

Above (observed)Above (expected)

(O-E)2

(O-E)2/E

4362

3615.82

8162

3615.82

124

Below (observed)Below (expected)

(O-E)2

(O-E)2/E

1571383612.61

1191383612.61

276

Total 200 200 400

Sum the Squared (O-E)2 /Expected Values2

2 (observed-expected)the sum of

expectedChi

Chi2= 5.82 +5.82 +2.61 +2.61 = 16.86

Compare 16.86 to the ‘book’ value.If it is greater than book value, there are

significant differences in the table.

Page 38: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Interpreting the ExampleWe observed 43 samples from no cottage lakes above background but expected 62

We observed 81 samples from cottage lakes above background but expected 62We observed 157 samples from no cottage lakes below background but expected 138

We observed 119 samples from cottage lakes below background but expected 138Lakes

No Cottages Cottages TotalAbove (observed)Above (expected)(O-E)2

(O-E)2/E

4362

5.820.094

8162

5.820.094

124

Below (observed)Below (expected)(O-E)2

(O-E)2/E

1571382.61

0.019

1191382.61

0.019

276

Total 200 200 400

Page 39: Structure of contingency tables. Symmetry. Rules. Populations and sub-populations. Grand, Column, and Row percentiles. Simpson’s Paradox. Using tables

Remember.Watch your table manners.