power calculation for qtl association (discrete and quantitative traits) shaun purcell & pak...
TRANSCRIPT
Power calculation for QTL association(discrete and quantitative traits)
Shaun Purcell & Pak Sham
Advanced Workshop
Boulder, CO, 2003
QuantitativeThresholdDiscrete
Variancecomponents
TDT
Case-controlCase-control
TDT
High LowA n1 n2
a n3 n4
Aff UnAffA n1 n2
a n3 n4
Tr UnTrA n1 n2
a n3 n4
Tr UnTrA n1 n2
a n3 n4
Discrete trait calculation
p Frequency of high-risk allele
K Prevalence of disease
RAA Genotypic relative risk for AA genotype
RAa Genotypic relative risk for Aa genotype
N, , Sample size, Type I & II error rate
Risk is P(D|G)
gAA = RAA gaa gAa = RAa gaa
K = p2 gAA + 2pq gAa + q2 gaa
gaa = K / ( p2 RAA + 2pq RAa + q2 )
Odds ratios (e.g. for AA genotype) = gAA / (1- gAA )
gaa / (1- gaa )
Need to calculate P(G|D)
Expected proportion d of genotypes in cases
dAA = gAA p2 / (gAAp2 + gAa2pq + gaaq2 )
dAa = gAa 2pq / (gAAp2 + gAa2pq + gaaq2 )
daa = gaa q2 / (gAAp2 + gAa2pq + gaaq2 )
Expected number of A alleles for cases
2NCase ( dAA + dAa / 2 )
Expected proportion c of genotypes in controls
cAA = (1-gAA) p2 / ( (1-gAA) p2 + (1-gAa) 2pq + (1-gaa) q2 )
G
GPGDP
GPGDPDGP
)()|(
)()|()|(
Full contingency table
“A” allele “a” allele
Case 2NCase ( dAA + dAa / 2 ) 2NCase ( daa + dAa / 2 )
Control 2NControl ( cAA + cAa / 2 ) 2NControl ( caa + cAa / 2 )
E
EO 22 )(
Threshold selection
Genotype AA Aa aa
Frequency q2 2pq p2
Trait mean -a d a
Trait variance 2 2 2
P(X) = GP(X|G)P(G)
P(X)
X
AA
Aa
aa
P(G|X<T) = P(X<T|G)P(G) / P(X<T)
P(X)
X
AA
Aa
Nb. the cumulative standard normal distribution gives the area under the curve, P(X < T)
T
Incomplete LD
Effect of incomplete LD between QTL and marker
A aM pm1 + δ qm1 - δm pm2 – δ qm2 + δ
δ = D’ × DMAX DMAX = min{pm2 , qm1}
Note that linkage disequilibrium will depend on both
D’ and QTL & marker allele frequencies
Incomplete LD
Consider genotypic risks at marker:
P(D|MM) = [ (pm1+ δ)2 P(D|AA)
+ 2(pm1+ δ)(qm1- δ) P(D|Aa)
+ (qm1- δ)2 P(D|aa) ]
/ m12
Calculation proceeds as before, but at the marker
AM/AM
AM/aMor
aM/AM
aM/aM
AAMM
AaMM
aaMM
Haplo.Geno.
MM
Discrete TDT calculation
1. Calculate probability of parental mating type
given affected offspring
2. Calculate probability of offspring genotype given
parental mating type and affected
3. Calculate overall probability of heterozygous
parents transmitting allele A as opposed to a
4. Calculate TDT test statistic, power
Fulker association model
iWiBs
A
is
A
i AAAA
s
jj
s
jj
11
sibshipgenotypic mean
deviation from sibship genotypic mean
The genotypic score (1,0,-1) for sibling i is decomposed into between and within components:
NCPs of B and W tests
SN
DA
B sVV
Vs
Vs
43
21
N
DAW V
VVs 4
321
)1(
Approximation for between test
Approximation for within test
Sham et al (2000) AJHG 66
Practical Exercise
Calculation of power for simple case-control study.
DATA : frequency of risk factor in 30 cases and 30
controls
TEST : 2-by-2 contingency table : chi-squared
(1 degree of freedom)
Step 1 : determine expected chi-squared
Hypothetical risk factor frequencies
Case Control
A allele present 20 10
A allele absent 10 20
Chi-squared statistic = 6.666
E
EO 22 )(
P(T)
T
Critical value
Step 2. Determine the critical value for a given type I error rate,
- inverse central chi-squared distribution
P(T)
T
Critical value
Step 3. Determine the power for a given critical valueand non-centrality parameter
- non-central chi-squared distribution
Calculating Power
1. Calculate critical value (Inverse central 2)
Alpha 0 (under the null)
2. Calculate power (Non-central 2)
Crit. value Expected NCP
http://workshop.colorado.edu/~pshaun/gpc/pdf.html
df = 1 , NCP = 0
X
0.05
0.01
0.001
3.84146
6.63489
10.82754
Determining power
df = 1 , NCP = 6.666
X Power
0.05 3.84146
0.01 6.6349
0.001 10.827
0.73
0.50
0.24
1. Planning a study
Candidate gene study
A disease occurs in 2% of the population
Assume multiplicative model
genotype risk ratio Aa = 2
genotype risk ratio AA = 4
100 cases, 100 controls
What if the risk allele is rare vs common?
2. Interpreting a negative result
Negative candidate gene TDT study,
82 affected offspring trios
“affection” = scoring >2 SD above mean
candidate gene SNP allele frequency 0.25
Desired 80% power, 5% type I error rate
What is the minimum detectable QTL variance
(assume additivity)?
Planning a study
p N cases (=N controls)
0.01 1144
0.05 247
0.2 83
0.5 66
0.8 126
0.95 465
0.99 2286
Interpreting a negative result
QTL Power
0.00 0.05
0.01 0.34
0.02 0.60
0.03 0.78
0.04 0.88
0.05 0.94
Exploring power of association using GPC
Linkage versus association
difference in required sample sizes for specific QTL size
TDT versus case-control
difference in efficiency?
Quantitative versus binary traits
loss of power from artificial dichotomisation?
log(N for 90% power)
1
10
100
1000
10000
100000
1000000
0% 5% 10% 15% 20% 25%
QTL effect
Linkage
Assoc
Linkage versus association
LRT
0
50
100
150
200
250
0% 5% 10% 15% 20% 25%
QTL effect
Linkage
Assoc
Power
0
0.2
0.4
0.6
0.8
1
0% 5% 10% 15% 20% 25%
QTL effect
Linkage
Assoc
QTL linkage: 500 sib pairs, r=0.5QTL association: 1000 individuals
Case-control versus TDT
N units for 90% power
0
200
400
600
800
1000
1200
1400
1600
1800
0 0.05 0.1 0.15 0.2 0.25
Allele frequency
CC (K=0.1)
CC (K=0.01)
TDT
N individuals for 90% power
0
1000
2000
3000
4000
5000
6000
0 0.05 0.1 0.15 0.2 0.25
Allele frequency
CC (K=0.1)
CC (K=0.01)
TDT
p = 0.1; RAA = RAa = 2
Quantitative versus discrete
K=0.5K=0.2K=0.05
To investigate: use threshold-based association
Fixed QTL effect (additive, 5%, p=0.5) 500 individuals
For prevalence K Group 1 has N and TGroup 2 has N and T
)(6 1 KX K500)1(500 K 6)(1 XK
Quantitative versus discrete
K T (SD)
0.01 2.326
0.05 1.645
0.10 1.282
0.20 0.842
0.25 0.674
0.50 0.000
Allele frequency
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.01 0.05 0.1 0.2 0.25 0.5
K
P(A|case)
P(A|control)
Quantitative versus discrete
LRT
0
5
10
15
20
25
30
0 0.1 0.2 0.3 0.4 0.5
K
VC
CC
Incomplete LD
what is the impact of D’ values less than 1?
does allele frequency affect the power of the test?
(using discrete case-control calculator)
Family-based VC association: between and within tests
what is the impact of sibship size? sibling correlation?
(using QTL VC association calculator)
Incomplete LD
Case-control for discrete traits
Disease K = 0.1
QTL RAA = RAa = 2 p = 0.05
Marker1m = 0.05 D’ = { 1, 0.8, 0.6, 0.4, 0.2, 0}
Marker2m = 0.25 D’ = { 1, 0.8, 0.6, 0.4, 0.2, 0}
Sample 250 cases, 250 controls
Incomplete LD
Genotypic risk at marker1 (left) and marker2 (right)
as a function of D’
0.060
0.080
0.100
0.120
0.140
0.160
0.180
0.200
0 0.2 0.4 0.6 0.8 1
D'
Gen
oty
pic
ris
k
gAA
gAa
gaa
0.060
0.080
0.100
0.120
0.140
0.160
0.180
0.200
0 0.2 0.4 0.6 0.8 1
D'
Gen
oty
pic
ris
k
gAA
gAa
gaa
Incomplete LD
Expected likelihood ratio test as a function of D’
0.00
2.00
4.00
6.00
8.00
10.00
0 0.2 0.4 0.6 0.8 1
D'
LR
T Marker1
Marker2
Family-based association
Sibship type
1200 individuals, 600 pairs, 400 trios, 300 quads
Sibling correlation
r = 0.2, 0.5, 0.8
QTL (diallelic, equal allele frequency)
2%, 10% of trait variance
Family-based association
NCP proportional to variance explained
Between test
↓ with ↑ sibship size and ↑ sibling correlation
Within test
0 for s=1, ↑ with ↑ sibship size and ↑ sibling correlation
Between-sibship association
Within-sibship association
Total association
GPC
Usual URL for GPC
http://statgen.iop.kcl.ac.uk/gpc/
Purcell S, Cherny SS, Sham PC. (2003) Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics, 19(1):149-50