chapter 18 cross-tabulated counts

Apr 21, 2023

Chapter 18Chapter 18Cross-Tabulated CountsCross-Tabulated Counts

In Chapter 18:

• 18.1 Types of Samples

• 18.2 Naturalistic and Cohort Samples

• 18.3 Chi-Square Test of Association

• 18.4 Test for Trend

• 18.5 Case-Control Samples

• 18.6 Matched Pairs

§18.1 Types of Samples

• The prior chapter considered categorical response variables with two possible outcomes

• This chapter considers categorical variables with any number of possible outcomes

Types of Samples, cont.

Data may be generated by:

I. Naturalistic Samples. An SRS with data then cross-classified according to the explanatory variable and response variable.

II. Purposive Cohort Samples. Fixed numbers of individuals selected according to the explanatory factor.

III. Case-Control Samples. Fixed numbers of individuals selected according to the outcome variable.

Naturalistic SamplesTake an SRS from the population; then cross-classify

individuals with respect to explanatory and response variables.

Purposive Cohort SamplesSelect predetermined numbers of exposed and nonexposed

individuals; then ascertain outcomes in individuals.

Case-Control SamplesIdentify individuals who are positive for the outcome (cases);

then sample the population for negative (controls).

§18.2 Naturalistic and Cohort Samples

• Data from a naturalistic sample are shown in this 5-by-2 table

• Let us always put the explanatory variable in row of such table (for uniformity)

• Totals are tallied in table margins

Smoke+

Smoke−

Total

High school 12 38 50

Assoc. degree 18 67 85

Some college 27 95 122

UG degree 32 239 271

Grad degree 5 52 57

Total 94 491 585

Marginal Distributions • For naturalistic samples

(only) describe marginal distributions

• These may be reported graphically or in terms of percentages

• Top figure: column marginal distribution

• Bottom figure: row marginal distribution

Conditional Percents• The relationship between the row variable and

column variable is explored with conditional percents. Two types of conditional percents :

• Row percents use in cohort and naturalistic samples (describe prevalence and incidence)

• Column percents use in case-control samples

100% totalrow

count cell percent row

100%alcolumn tot

count cell percent column

Incidence and Prevalence (Naturalistic and Cohort Samples only)

• The top table demonstrates R-by-C table notation (R rows and C columns)

• For naturalistic and cohort samples, row percents in column 1 represent group incidence or prevalences

Smoke+ Smoke- Total

Group 1 a1 b1 n1

Group 2 a2 b2 n2

↓ ↓ ↓ n3

Group R aR bR nR

Total m1 m2 N

i

ii n

ap

i

ˆ

group ,proportion

prevalenceor Incidence

Prevalences - Example

24.050

12ˆ

1

11

n

ap

This table shows prevalence by education level

Example of calculation, prevalence group 1:

Relative Risks, R-by-2 Tables

1ˆ

ˆˆp

pRR ii

Let group 1 represent the least exposed group

Relative risks are calculated as follows:

RRs, R-by-2 Tables, ExampleThis table lists RR for the illustrative data

88.02400.0

2118.0ˆ

ˆˆ1

22

p

pRR

Example of calculation

Notice the downward dose-response in RRs

Responses with More than Two Levels of Outcome

Efficacy of Echinacea. A randomized controlled clinical trial pitted echinacea vs. placebo in the treatment of upper respiratory symptoms in children. The response variable was severity of illness classified as: mild, moderate or severe.

Source: JAMA 2003, 290(21), 2824-30

http://jama.ama-assn.org/cgi/content/full/290/21/2824



Echinacea, Conditional Distributions

• Row percents are calculated to determine the incidence of each outcome.

• Example of calculation, top right table cell (data prior slide)

% severe w/echinacea = 48 / 329 × 100% = 14.6%

• Conclusion: the treatment group fared slightly worse than the control group: 14.6% of treatment group experienced severe symptoms compared to 10.9% of the control group.

§18.3 Chi-Square Test of Association

A. Hypotheses. H0: no association in population versus Ha: association in population

B. Test statistic.

C. P-value. Convert the X2stat to a P-value with a a

Table E or software program.

)1)(1( totaltable

alcolumn tot totalrow calculated cellin count expected and

cell count, observed wherecells all

22stat

CRdf

EiE

iOE

EO

ii

ii

ii

Chi-Square Test - ExampleData below reveal a negative association between smoking and education level. Let us test H0: no association in the population vs. Ha: association in the population.

χ2, Expected Frequencies

totaltable

alcolumn tot totalrow sfrequencie xpected

iEE

Chi-Square Statistic - Example

Chi-Square Test, P-value• X2

stat= 13.20 with 4 df

• Using Table E, find the row for 4 df • Find the chi-square values in this row that

bracket 13.20• Bracketing values are 11.14 (P = .025) and

13.28 (P = .01). • Thus, .025 < P < .01 (closer to .01)

Probability in right tail

df 0.98 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.01

4 0.48 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86

Illustrative example X2stat= 13.20 with 4 df

The P-value = AUC in the tail beyond X2

stat

Chi-Square By Computer Here are results for the illustrative data from WinPepi > Compare2.exe > Program F Categorical Data

Yates’ Continuity Corrected Chi-Square Statistic

• Two different chi-square statistics are used in practice• Pearson’s chi-square statistic (covered) is

• Yates’ continuity-corrected chi-square statistic is:

• The continuity-corrected method produces smaller chi-square statistics and larger P-values.

• Both chi-square are used in practice.

cells all

22stat

i

ii

E

EO

||

cells all

2

21

2cstat,

i

ii

E

EO

Chi-Square, cont.

1. How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H0 mounts

2. Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.

Chi-Square, cont.3. Supplement chi-squares with

measures of association. Chi-square statistics do not measure the strength of association. Use descriptive statistics or RRs to quantify “strength”.

4. Chi-square and z tests (Ch 17) produce identical P-values. The relationship between the statistics is:

stat2

df 1stat with z

18.4 Test for Trend

See pp. 431 – 436

§18.5 Case-Control Samples

Case-control sampling method• Identify all cases in the population• From the same source population, randomly

select a series of non-cases (controls)• Ascertain the exposure status of cases and

controls• Cross-tabulate the exposure status of cases and

controls

This provides an efficient way to study rare outcomes

Incidence Density Sampling

This advanced concepts allows students to see that case-control studies are a type of longitudinal “time-failure” design.

As cases are identified in the population; select at random one or more noncases (controls) for each case at time of occurrence.

Case-Control Illustrative Example• Cases: men

diagnosed with esophageal cancer

• Controls: noncases selected at random from electoral lists in same region

• Exposure = alcohol consumption dichotomized at 80 gms/day

64.5109104

66696ˆ12

21

ba

baRO

Interpretation: The rate ratio associated with high-alcohol consumption is about 5.6

(1– α)100% CI for the OR

ROSEzRO

eˆln

21

ˆln

Note use of the natural logarithmic scale

2211

1111ˆln

wherebabaRO

SE

90% CI for the OR – Example

Cases Cntls

E+ 96 109

E− 104 666

)52.7,23.4(

1.645 use confidence 90%For

1752.0

7229.1)640.5ln(ˆln

0181.2,4417.1

2882.07229.1)1752.0)(645.1(7299.1

6661

1091

1041

961

ˆln

e

ee

z

SE

RO

RO

Case-Control - Example

Results from WinPepi > Compare2.exe > A.

WinPepi uses a slightly different formula than ours; the Mid-P results are similar to ours.

Case-Control Studies with Multiple Levels of Exposure

With an ordinal exposure, compare each exposure level to the non-exposed group (next slide):

Case-Control, Ordinal Levels of Exposure

Note dose-response relationship

18.6 Matched Pairs• With matched-pair samples, each

participant is carefully matched to a unique individual as part of the selection process

• This technique is used to mitigate confounding by the matching factor

• Both cohort and case-control samples may avail themselves of matching

Here’s the notation for matched-pair case-control data:

The odds ratio associate with exposure is:

The confidence interval is:

Case E+ Case E−

Control E+ a b

Control E− c d

ROSEzRO

eˆln

21

ˆln

bcROSE 11

ˆln where

Matched Pairs - ExampleA matched case-control study found 45 pairs in which the case but not the control had a low fruit/veg diet; it found 24 pairs in which the control but not the case had a low fruit/veg diet

Case E+ Case E−

Cntl E+ unknown 24

Cntl E− 45 unknown

The odds ratio suggests 88% higher risk in low fruit/veg consumers.

88.124

45ˆ b

cRO

Matched Pair Example, cont.

Data are compatible with ORs between 1.14 and 3.07

WinPepi’s PairEtc.exe program A calculates exact confidence intervals for ORs from matched-pair data. Hand calculated limits will be similar except in small samples.

Hypothesis Test, Matched Pairs

A. H0: OR = 1

B. McNemar’s test statistic.

C. P-values. Convert zstat to P-value with Table B or Table F

If fewer than 5 discordancies are expected, use an exact binomial procedure (see text).

Hypothesis Test, ExampleCase E+ Case E−

Control E+ unknown 24

Control E− 45 unknown

chapter 18 cross-tabulated counts

Documents