chapter 18 cross-tabulated counts
DESCRIPTION
Chapter 18 Cross-Tabulated Counts. In Chapter 18:. 18.1 Types of Samples 18.2 Naturalistic and Cohort Samples 18.3 Chi-Square Test of Association 18.4 Test for Trend 18.5 Case-Control Samples 18.6 Matched Pairs. §18.1 Types of Samples. - PowerPoint PPT PresentationTRANSCRIPT
Apr 21, 2023
Chapter 18Chapter 18Cross-Tabulated CountsCross-Tabulated Counts
In Chapter 18:
• 18.1 Types of Samples
• 18.2 Naturalistic and Cohort Samples
• 18.3 Chi-Square Test of Association
• 18.4 Test for Trend
• 18.5 Case-Control Samples
• 18.6 Matched Pairs
§18.1 Types of Samples
• The prior chapter considered categorical response variables with two possible outcomes
• This chapter considers categorical variables with any number of possible outcomes
Types of Samples, cont.
Data may be generated by:
I. Naturalistic Samples. An SRS with data then cross-classified according to the explanatory variable and response variable.
II. Purposive Cohort Samples. Fixed numbers of individuals selected according to the explanatory factor.
III. Case-Control Samples. Fixed numbers of individuals selected according to the outcome variable.
Naturalistic SamplesTake an SRS from the population; then cross-classify
individuals with respect to explanatory and response variables.
Purposive Cohort SamplesSelect predetermined numbers of exposed and nonexposed
individuals; then ascertain outcomes in individuals.
Case-Control SamplesIdentify individuals who are positive for the outcome (cases);
then sample the population for negative (controls).
§18.2 Naturalistic and Cohort Samples
• Data from a naturalistic sample are shown in this 5-by-2 table
• Let us always put the explanatory variable in row of such table (for uniformity)
• Totals are tallied in table margins
Smoke+
Smoke−
Total
High school 12 38 50
Assoc. degree 18 67 85
Some college 27 95 122
UG degree 32 239 271
Grad degree 5 52 57
Total 94 491 585
Marginal Distributions • For naturalistic samples
(only) describe marginal distributions
• These may be reported graphically or in terms of percentages
• Top figure: column marginal distribution
• Bottom figure: row marginal distribution
Conditional Percents• The relationship between the row variable and
column variable is explored with conditional percents. Two types of conditional percents :
• Row percents use in cohort and naturalistic samples (describe prevalence and incidence)
• Column percents use in case-control samples
100% totalrow
count cell percent row
100%alcolumn tot
count cell percent column
Incidence and Prevalence (Naturalistic and Cohort Samples only)
• The top table demonstrates R-by-C table notation (R rows and C columns)
• For naturalistic and cohort samples, row percents in column 1 represent group incidence or prevalences
Smoke+ Smoke- Total
Group 1 a1 b1 n1
Group 2 a2 b2 n2
↓ ↓ ↓ n3
Group R aR bR nR
Total m1 m2 N
i
ii n
ap
i
ˆ
group ,proportion
prevalenceor Incidence
Prevalences - Example
24.050
12ˆ
1
11
n
ap
This table shows prevalence by education level
Example of calculation, prevalence group 1:
Relative Risks, R-by-2 Tables
1ˆ
ˆˆp
pRR ii
Let group 1 represent the least exposed group
Relative risks are calculated as follows:
RRs, R-by-2 Tables, ExampleThis table lists RR for the illustrative data
88.02400.0
2118.0ˆ
ˆˆ1
22
p
pRR
Example of calculation
Notice the downward dose-response in RRs
Responses with More than Two Levels of Outcome
Efficacy of Echinacea. A randomized controlled clinical trial pitted echinacea vs. placebo in the treatment of upper respiratory symptoms in children. The response variable was severity of illness classified as: mild, moderate or severe.
Source: JAMA 2003, 290(21), 2824-30
Echinacea, Conditional Distributions
• Row percents are calculated to determine the incidence of each outcome.
• Example of calculation, top right table cell (data prior slide)
% severe w/echinacea = 48 / 329 × 100% = 14.6%
• Conclusion: the treatment group fared slightly worse than the control group: 14.6% of treatment group experienced severe symptoms compared to 10.9% of the control group.
§18.3 Chi-Square Test of Association
A. Hypotheses. H0: no association in population versus Ha: association in population
B. Test statistic.
C. P-value. Convert the X2stat to a P-value with a a
Table E or software program.
)1)(1( totaltable
alcolumn tot totalrow calculated cellin count expected and
cell count, observed wherecells all
22stat
CRdf
EiE
iOE
EO
ii
ii
ii
Chi-Square Test - ExampleData below reveal a negative association between smoking and education level. Let us test H0: no association in the population vs. Ha: association in the population.
χ2, Expected Frequencies
totaltable
alcolumn tot totalrow sfrequencie xpected
iEE
Chi-Square Statistic - Example
Chi-Square Test, P-value• X2
stat= 13.20 with 4 df
• Using Table E, find the row for 4 df • Find the chi-square values in this row that
bracket 13.20• Bracketing values are 11.14 (P = .025) and
13.28 (P = .01). • Thus, .025 < P < .01 (closer to .01)
Probability in right tail
df 0.98 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.01
4 0.48 5.39 5.99 6.74 7.78 9.49 11.14 13.28 14.86
Illustrative example X2stat= 13.20 with 4 df
The P-value = AUC in the tail beyond X2
stat
Chi-Square By Computer Here are results for the illustrative data from WinPepi > Compare2.exe > Program F Categorical Data
Yates’ Continuity Corrected Chi-Square Statistic
• Two different chi-square statistics are used in practice• Pearson’s chi-square statistic (covered) is
• Yates’ continuity-corrected chi-square statistic is:
• The continuity-corrected method produces smaller chi-square statistics and larger P-values.
• Both chi-square are used in practice.
cells all
22stat
i
ii
E
EO
||
cells all
2
21
2cstat,
i
ii
E
EO
Chi-Square, cont.
1. How the chi-square works. When observed values = expected values, the chi-square statistic is 0. When the observed minus expected values gets large and evidence against H0 mounts
2. Avoid chi-square tests in small samples. Do not use a chi-square test when more than 20% of the cells have expected values that are less than 5.
Chi-Square, cont.3. Supplement chi-squares with
measures of association. Chi-square statistics do not measure the strength of association. Use descriptive statistics or RRs to quantify “strength”.
4. Chi-square and z tests (Ch 17) produce identical P-values. The relationship between the statistics is:
stat2
df 1stat with z
18.4 Test for Trend
See pp. 431 – 436
§18.5 Case-Control Samples
Case-control sampling method• Identify all cases in the population• From the same source population, randomly
select a series of non-cases (controls)• Ascertain the exposure status of cases and
controls• Cross-tabulate the exposure status of cases and
controls
This provides an efficient way to study rare outcomes
Incidence Density Sampling
This advanced concepts allows students to see that case-control studies are a type of longitudinal “time-failure” design.
As cases are identified in the population; select at random one or more noncases (controls) for each case at time of occurrence.
Case-Control Illustrative Example• Cases: men
diagnosed with esophageal cancer
• Controls: noncases selected at random from electoral lists in same region
• Exposure = alcohol consumption dichotomized at 80 gms/day
64.5109104
66696ˆ12
21
ba
baRO
Interpretation: The rate ratio associated with high-alcohol consumption is about 5.6
(1– α)100% CI for the OR
ROSEzRO
eˆln
21
ˆln
Note use of the natural logarithmic scale
2211
1111ˆln
wherebabaRO
SE
90% CI for the OR – Example
Cases Cntls
E+ 96 109
E− 104 666
)52.7,23.4(
1.645 use confidence 90%For
1752.0
7229.1)640.5ln(ˆln
0181.2,4417.1
2882.07229.1)1752.0)(645.1(7299.1
6661
1091
1041
961
ˆln
e
ee
z
SE
RO
RO
Case-Control - Example
Results from WinPepi > Compare2.exe > A.
WinPepi uses a slightly different formula than ours; the Mid-P results are similar to ours.
Case-Control Studies with Multiple Levels of Exposure
With an ordinal exposure, compare each exposure level to the non-exposed group (next slide):
Case-Control, Ordinal Levels of Exposure
Note dose-response relationship
18.6 Matched Pairs• With matched-pair samples, each
participant is carefully matched to a unique individual as part of the selection process
• This technique is used to mitigate confounding by the matching factor
• Both cohort and case-control samples may avail themselves of matching
Here’s the notation for matched-pair case-control data:
The odds ratio associate with exposure is:
The confidence interval is:
Case E+ Case E−
Control E+ a b
Control E− c d
ROSEzRO
eˆln
21
ˆln
bcROSE 11
ˆln where
Matched Pairs - ExampleA matched case-control study found 45 pairs in which the case but not the control had a low fruit/veg diet; it found 24 pairs in which the control but not the case had a low fruit/veg diet
Case E+ Case E−
Cntl E+ unknown 24
Cntl E− 45 unknown
The odds ratio suggests 88% higher risk in low fruit/veg consumers.
88.124
45ˆ b
cRO
Matched Pair Example, cont.
Data are compatible with ORs between 1.14 and 3.07
WinPepi’s PairEtc.exe program A calculates exact confidence intervals for ORs from matched-pair data. Hand calculated limits will be similar except in small samples.
Hypothesis Test, Matched Pairs
A. H0: OR = 1
B. McNemar’s test statistic.
C. P-values. Convert zstat to P-value with Table B or Table F
If fewer than 5 discordancies are expected, use an exact binomial procedure (see text).
Hypothesis Test, ExampleCase E+ Case E−
Control E+ unknown 24
Control E− 45 unknown