epi293 design and analysis of gene association studies winter term 2008 lecture 7: genome-wide...
Post on 20-Dec-2015
219 views
TRANSCRIPT
![Page 1: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/1.jpg)
EPI293Design and analysis of gene association studies
Winter Term 2008
Lecture 7: Genome-wide association scans
Peter Kraft
[email protected] 2 Rm 206
2-4271
![Page 2: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/2.jpg)
1900
1920
1940
1960
1980
2005
Rediscovery of Mendel’s laws
Association between Blood Groups and malignant disease fails to replicate
Microsattelite maps for genome-wide linkage analysis developedHuman Genome Project launched
Human Genome Project working draft completed; beginnings of SNP map
HapMap launched
Risch and Merikangas paper
Principles of Linkage Analysis discovered
Association between Blood Groups and malignant disease published
1990
2000
First Genome-Wide Association Study
HapMap Phase I completed (draft Phase II available)Genome-wide SNP panels developed
RFLPs available for linkage analysis developed
2006
2007
![Page 3: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/3.jpg)
![Page 5: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/5.jpg)
5 December 2007 [email protected]
3
3
Gg
14Control
41Case
GGgg
GGGG
GG
GG
GGGg
Gg
Gg
gg
Gg
Gg
Gggg
gg
gg
gg
![Page 6: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/6.jpg)
5 December 2007 [email protected]
Linkage vs. Association
• Linkage studies– Pro: can scan genome with fewer markers
– Cons: Can only detect alleles with large effect; limited resolution (identify broad region, not individual genes); requires data on multiple family members
• Association studies– Pros: can detect subtle effects; very fine resolution
– Cons: requires 0.5 to 1 million markers to cover whole genome; requires large sample size
![Page 7: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/7.jpg)
Risch and Merikangas (1996) Science 273:1516-7
![Page 8: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/8.jpg)
Schloterer C. Nat Rev Genet. 2004;5:63-9.
![Page 9: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/9.jpg)
![Page 10: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/10.jpg)
• Ozaki K. Myocardial Infarction. Nat Genet 2002;32:650–4.• Klein RJ. Age-related macular degeneration. Science 2005;308:385–9.• Maraganore DM. Parkinson disease. Am J Hum Genet 2005;77:685–93.• Shiffman D. Myocardial Infarction. Am J Hum Genet 2005;77:596–605.• Cheung VG. Gene expression. Nature 2005;437:1365-9.• Stranger BE. Gene expression. PLOS Genet 2005;1:695-704.• Mah S. Schizophrenia. Mol Psychiatry 2006;11:471-8.• Herbert A. Obesity. Science 2006; 312:279-83.
Published Genome-Wide Association Scans
Reviews• Hirschorn J. Nat Reviews Genet 2005;6: 95-108.• Wang WY. Nat Reviews Genet 2005;6: 109-18. • Thomas DC. Am J Hum Genet 2005 77: 337-45.• Thomas DC. Cancer Epidemiol Biomarkers Prev 2006 15: 595-8.• Evans DM. Trends in Genetics 2006 (epub)
OLD SLIDE!!!!
![Page 11: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/11.jpg)
96 cases, 50 controls
103,611 SNPs
rs380390Recessive OR
7.4 (2.9-19)
PAR (70%)
Genotyping errors
Functionality
ReplicationScience 2005;308:421–4
Science 2005;308:419–21
Klein RJ Science 2005;308:385–9
![Page 12: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/12.jpg)
Tier 1 Tier 2
443 sib pairs 332 matched unrelated case-control pairs198,000 SNPs 3,148 SNPs
No SNPs pass Bonferroni-corrected significance threshold (2.510-7).
Maraganore Am J Hum Genet 2005 77:685-93
![Page 13: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/13.jpg)
![Page 14: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/14.jpg)
Known Breast Cancer Genes, November 2006
Known Prostate Cancer Genes, November 2006
![Page 15: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/15.jpg)
Known Breast Cancer Genes, Fall 2007
Known Prostate Cancer Genes, Fall 2007
![Page 16: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/16.jpg)
Kraft and Cox 2008 in: Rao and Gu, eds.
![Page 17: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/17.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 18: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/18.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 19: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/19.jpg)
![Page 20: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/20.jpg)
![Page 21: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/21.jpg)
Known
Unknown
r2
r2
![Page 22: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/22.jpg)
Barrett JC. Nat Genet 200638:659-62 Pe’er I. Nat Genet 2006;38:663-7.
![Page 23: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/23.jpg)
International HapMap Consortium. Nature. 2007 Oct 18;449(7164):851-61
![Page 24: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/24.jpg)
![Page 25: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/25.jpg)
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
MAF < 5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%
.90-1.00
.81-.90
.61-.80
.32-.60
.01-.30
0
Distribution of max r2 with tag panel as a function of MAF
Tags chosen from a “pseudo Phase II HapMap” and evaluated against ENCODE SNPs
![Page 26: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/26.jpg)
The fundamental theorem of the HapMap
The power of a study that genotypes N cases and N controls at a marker that has a correlation of r2 with a disease susceptibility locus has the same power as a study that genotypes N = r2 N cases
and N controls at the disease susceptibility locus.
Power adjusting for tagging efficiency
)()( fNPow
Pritchard JK. Am J Hum Genet 2001;69:1-14.Jorgenson Am J Hum Genet 2006;78:884-8.
Terwilliger JD Eur J Hum Genet 2006;14:426-37.
![Page 27: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/27.jpg)
0 5000 10000 15000 20000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
0 2000 4000 6000 8000 10000
0.0
0.2
0.4
0.6
0.8
1.0
sample size (cases)
OR=1.3 OR=1.5 OR=1.8M
AF
=.0
1M
AF
=.0
5M
AF
=.1
0po
wer
direct
indirect(averaged over r2)
indirect(r2 fixed at 80%)
![Page 28: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/28.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 29: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/29.jpg)
SN
Ps
subjects
TT11 TT22 TT33
![Page 30: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/30.jpg)
Replication analysis Joint analysis
Power = Pr(T1>k1,…,TS>kS)=Pr(T1>k1)…Pr(TS>kS)
Power = Pr(T1*>k1
*,…,TS*>kS
*)
ks = Quantile(1-ms+1/ms)ks
* chosen s.t. expected number of markers (under null) taken to
s+1st stage is ms+1
Ts* = 1..s Ts
mS+1 is number of expected false leads (under the null) at the end of Sth stage
(e.g. mS+1 = .05 is strong control of FWER at α=.05)
Power of multi-stage designs
Skol. Nat Genet 2006;38:209-13; Wang Genet Epidemiol 2006;30:356-68; Kraft (in prep)
![Page 31: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/31.jpg)
Multistage Design and Analysis
• It is (or should be) well known that “replication analysis ” is statistically inefficient [cf Thomas DC et al (1985) AJE, Skol (2006) Nat Genet]
• Usually you can find a multistage design that has almost the same power as a single-stage design but is much cheaper
• Multi-stage design is NOT a way of finessing the multiple testing issue. If genotypes were free, you would genotype everybody for every SNP and test all SNPs at very very small alpha level.
• Multi-stage design IS a way of saving big $s, ₤s, €s, etc.
![Page 32: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/32.jpg)
Amount of savings and cheapest design depend on prices—which are very fluid!
![Page 33: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/33.jpg)
Calculating power for “replication analysis”
P2=1-q,,r,N2,22=M3/M2N2M2
k=Mk+1/Mk
1=M2/M1
Effective level
Πi=1..k PiOverall
Pk=1-q,,r,Nk,kNkMk
…
P1=1-q,,r,N1,1N1M1
PowerNumber of subjects
Number of Markers
Mk+1 is “number of significant tests expected under the null”
E.g. Mk+1=.05 is Bonferroni-corrected threshold for M1 tests
![Page 34: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/34.jpg)
Calculating power for “replication analysis”
2=.0036,0001,500
1=.003
Effective level
Overall
2,400 (1:1 case:control)
500,000
PowerNumber of subjects
Number of Markers
q=10%; dominant OR=1.4; M4=5
.883
.999
.882
Cost: ca. USD7002,400+USD606,000=USD
2.04 million
![Page 35: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/35.jpg)
Calculating power for “replication analysis”
2=.0753,00020,000
=.003
1=.04
Effective level
Overall
3,0001,500
2,400 (1:1 case:control)
500,000
PowerNumber of subjects
Number of Markers
q=10%; dominant OR=1.4; M4=5
.999
.998
.950
.946
Cost: ca. USD7002,400+USD2003,000+
USD603,000=USD 2.46 million
Two-stage study with equivalent power costs > 2.8
million
![Page 36: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/36.jpg)
A B C20000 5.9 17.7 1
1500 35.8 107.4 120 91.7 275.1 1
Nsnp
Three different per-SNP pricing scenarios considered
Prices relative to per-SNP costs for whole-genome platform
![Page 37: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/37.jpg)
Pricing scheme A; cost relative to single stage study using 7,000 subjects
![Page 38: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/38.jpg)
Relative costs for studies with 65% power
![Page 39: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/39.jpg)
Power for single stage studies, accounting for tagging efficiency
![Page 40: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/40.jpg)
Pow
er
relative cost relative cost relative cost
Illumina 550 Affy 500 Affy 1,000
Power for three stage studies, accounting for tagging efficiency
![Page 41: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/41.jpg)
Illumina 550 Affy 500 Affy 1,000
(Simulated) tagging properties of three panels
![Page 42: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/42.jpg)
How to select SNPs for 2nd Stage?
• Rank by increasing p-value– But recall, prob. of being false positive depends not only on p-value,
but also on power and prior
• Hence Bayesian alternatives [WTCCC, Wakefield 2007 Am J Hum Genet]
• Quasi-Bayesian FPRP [Wacholder et al 2004; Samani 2007 NEJM]
• Prior-weighted analyses [Roeder 2007 Genet Epidemiol, Lewinger 2007 Genet Epidemiol]
• Pragmatist: meh, no big difference in practice
• What about multiple SNPs in high LD?– Cull so as to interrogate as many regions as possible (“broad” follow
up), or retain to try and distinguish causal variant (“deep” follow up)?
• Can I improve coverage by genotyping more SNPs around “hits”?– Again: “deep” coverage
![Page 43: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/43.jpg)
“broad” follow-up
“deep” follow-up
“broad” / “deep” defined
![Page 44: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/44.jpg)
Thought Experiment
• Two kinds of GWAS products– Tagging—captures HapMap II at r2>80%
– Random—has density of Affy 500k
• Choose additional SNPs in 2nd stage so that you tag region spanning “hit” in HapMap II at >95%
• Does this increase your power over simply genotyping the top hit?
![Page 45: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/45.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
R2 initial map
R2
de
nse
r m
ap
MAF < 5%MAF 5-12.5%MAF 12.5-25%MAF < 25-37.5%MAF > 37.5%
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
MAF <5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%
.90-1.0
.81-.90
.61-.80
.31-.60
.01-.30
0
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
MAF < 5% MAF 5-12.5% MAF 12.5-25% MAF 25-37.5% MAF 37.5-50%
.90-1.00
.81-.90
.61-.80
.32-.60
.01-.30
0
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
R2 initial map
R2
de
nse
r m
ap
Tag
ging
Pan
elR
ando
m P
anel
3.22 X markers
1.46 X markers
# markers per region
![Page 46: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/46.jpg)
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
cst
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
max
imum
pow
er f
or
bud
get
co
st
Tag
ging
Pan
elR
ando
m P
anel
Broad
Deep
Power of one-stage design
OR=1.3, MAF=.10Two-stage designs
7,000 cases/controls
![Page 47: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/47.jpg)
“deep” follow-up “broad” follow-up
Am J Hum Genet 2007
![Page 48: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/48.jpg)
Very small gain in power from fine mapping=deep follow up. Is it worth the opportunity cost? Genotyping a lot of extra markers “fine mapping” null loci means you will miss the chance to replicate the true signals that happened to be lower on your list.
![Page 49: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/49.jpg)
Power calculations
http://www.sph.umich.edu/csg/abecasis/CaTS/
![Page 50: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/50.jpg)
![Page 51: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/51.jpg)
http://www.hsph.harvard.edu/faculty/kraft/soft.htm
![Page 52: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/52.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 53: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/53.jpg)
• Subject selection• Flexible but simple analysis
– (Multistage design may limit analysis options)
• Sample heterogeneity across stages• Data QC• Population stratification• Bioinformatics• Data sharing, scientific replication, and validation
The design of genome-wide association studies is an art of the possible.
![Page 54: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/54.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 55: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/55.jpg)
Analytic issues
• Multiple comparisons• Phenotypic / Genetic heterogeneity• Epistasis• Incorporating external information• Imputation
![Page 56: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/56.jpg)
BPC3-1 A T A A CBPC3-2 A T G A TBPC3-3 C T A TBPC3-4 C C CBPC3-5 A T G A CBPC3-6 C G A TBPC3-7 A T A TBPC3-8 C C A C BPC3-9 A T A A TBPC3-10 A T G CBPC3-11 T A A BPC3-12 A C G C TBPC3-13 A T A A C
HapM-1 AACGTTTGAACT CCATTGCACHapM-2 AAGGTTTGAACT CTATTGCATHapM-3 CAGGTTTGAACT CTATTGCATBPC3-1 AACGTTTGAACTACTATTGCACBPC3-2 AACGTTTGAACTGCTATTGCATBPC3-3 CAGGTTTGAACT CTATTGCATBPC3-4 CAGGTTCGAACT CTCTTGCACBPC3-5 AATGTTTGAACTGCTATTGCACBPC3-6 CATGTTCGAACTGCTATTGCATBPC3-7 AATGTTTGAACT CTATTGCATBPC3-8 CAGGTTCGAACTACTCTTGCACBPC3-9 AATGTTTGAACTACTATTGCATBPC3-10 AATGTTTGAACTGCT TTGCACBPC3-11 AATGTTTGAACTACTATTGCAC BPC3-12 AATGTTCGAACTGCTCTTGCATBPC3-13 AATGTTTGAACTACTATTGCAC
BPC3-1 AACGTTTGAACTACTATTGCACBPC3-2 AACGTTTGAACTGCTATTGCATBPC3-3 CAGGTTTGAACTACTATTGCATBPC3-4 CAGGTTCGAACTACTCTTGCACBPC3-5 AATGTTTGAACTGCTATTGCACBPC3-6 CATGTTCGAACTGCTATTGCATBPC3-7 AATGTTTGAACTGCTATTGCATBPC3-8 CAGGTTCGAACTACTCTTGCACBPC3-9 AATGTTTGAACTACTATTGCATBPC3-10 AATGTTTGAACTGCT TTGCACBPC3-11 AATGTTTGAACTACTATTGCAC BPC3-12 AATGTTCGAACTGCTCTTGCATBPC3-13 AATGTTTGAACTACTATTGCAC
![Page 57: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/57.jpg)
Accuracy?
Marchini et al. (2007)
![Page 58: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/58.jpg)
Accuracy?
Li et al.
![Page 59: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/59.jpg)
Power Gains?
Marchini et al. (2007)
![Page 60: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/60.jpg)
Implementation
• MACH 1.0 (Li Y et al. submitted)
• IMPUTE (Marchini et al. Nat Genet 2007)
• Bim-Bam (Servin and Stephens, PLoS Genet 2007)
• MACH 1.0 (Li Y et al. submitted)
• IMPUTE (Marchini et al. Nat Genet 2007)
• Bim-Bam (Servin and Stephens, PLoS Genet 2007)
MEC-BMEC-HMEC-LPLCO-B
MEC-JACSATBCEPICHPFS
MEC-WPHS
PLCO-W
(Sub) cohorts
CosmopolitanCHB+JPTCEUReference panel
![Page 61: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/61.jpg)
de Bakker et al. Nat Genet 2007
![Page 62: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/62.jpg)
• Power issues– Tagging efficiency of genome-wide panels
– Multi-stage design and analysis
• Design issues
• Analytic issues– Imputation
• CGEMS examples
Outline
![Page 63: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/63.jpg)
The design of genome-wide association studies is an art of the possible.
Replication Study #1
3000 cases / 3000 controls
Replication Study #2
3000 cases / 3000 controls
Replication Study #3
1200 cases / 1200 controls
Initial Study1200cases / 1200controls
~15,000 SNPs
~1,500 SNPs
Ca. 200 + New ht-SNPs
~500,000 Tag SNPs
Ca. 15-20 Loci
Control Type I error at 510-5
For prostate:PLCONCI’s CGEMS project
Parallel GWA scans for breast and prostate cancer
susceptibility loci
![Page 64: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/64.jpg)
Yeager et al. 2007 Nat Genet
“Fast Track” Partial Replication
Not shown: ca. 100 other “top SNPs” that did not replicate convincingly.
![Page 65: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/65.jpg)
Multi-locus modeling provides evidence for independent effects!
Characterization
Model name Nparms -2 log L p-value AIC BIC BIC Weight0 NULL: Intercept only 1 11691.71 ref 11693.71 11700.75 0.0001 SNP1 - Dominant Model 2 11636.95 1.36E-13 11640.95 11655.03 0.0002 SNP1 - Recessive Model 2 11653.34 5.86E-10 11657.34 11671.42 0.0003 SNP1 - Additive (log odds) Model 2 11622.22 7.68E-17 11626.22 11640.30 0.0004 SNP1 - Codominant Model 3 11621.62 6.02E-16 11627.62 11648.74 0.0005 SNP2 - Dominant Model 2 11614.80 1.79E-18 11618.80 11632.88 0.0006 SNP2 - Recessive Model 2 11674.28 2.98E-05 11678.28 11692.36 0.0007 SNP2 - Additive (log odds) Model 2 11610.08 1.64E-19 11614.08 11628.16 0.0008 Two additive (log odds) SNPs, additive (log odds) interation 3 11548.83 9.43E-32 11554.83 11575.95 0.7479 Two additive (log odds) SNPs, additive (risk scale) interaction 3 11551.00 2.79E-31 11557.00 11578.12 0.253
10 Two codominant SNPs, general interaction 9 11541.41 1.70E-28 11559.41 11622.77 0.000
Say we know two SNPs are associated with risk. Next step is to ask: How? Do they each contribute to disease risk (i.e. conditional on the other SNP, does
adding a SNP improve model fit)? How do they “interact”?
![Page 66: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/66.jpg)
aaAa
AA
bb
Ba
BB
0.00
0.50
1.00
1.50
2.00
2.50
Odds Ratio (relative to '00')
Additive (log odds) SNPs, additive (log odds) interaction
aaAa
AA
bb
Ba
BB
0.00
0.50
1.00
1.50
2.00
2.50
Unrestricted model
a.k.a. “Main effects only”
Although the saturated model (with 8 unrestricted log odds
ratio parameters) is “closest to the data,” the BIC suggests it is “too close.” The exceptional pattern for odds across the A
locus in the BB stratum is probably just noise (small
cells), not “gene-gene interaction”
![Page 67: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/67.jpg)
Pooled Phase I and II Results
Initial Scan Region p-value Rank p-value
8q24 3.07E-19 116 1.12E-04 8q24 6.58E-12 300 3.92E-04
HNF1B 9.58E-10 384 5.21E-04
MSMB 7.31E-13 24,223 0.042 11q13 1.76E-09 2,439 0.004 CTBP2 1.70E-07 319 4.09E-04 JAZF1 2.14E-06 24,407 0.042
Pooled Phase I+II
Thomas et al, in press
![Page 68: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/68.jpg)
Population Attributable Risk (PAR)
0.23
0.27
0.48
0.40
0.49
0.50
0.10
Freq.
1.10
1.17
1.23
1.22
1.22
1.26
1.43
ORmul
14%JAZF1
9%CTB2
20%11q13
16%MSMB
19%HNF1B
22%8q24-c
8%8q24-a
PARLocus
Joint PAR ~ 60%
Thomas et al, in press
![Page 69: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/69.jpg)
PARs do not add!
E
G1
G2
All Cases
Marginal PAR for exposure E is 100%Marginal PAR for gene G1 is 100%
Marginal PAR for gene G2 is 20%
A joint PAR of 60% for top seven loci does not mean there are no other risk loci
nor does it mean modifiable environmental factors do not influence prostate cancer risk
![Page 70: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/70.jpg)
Individual Risk PredictionOdds ratio comparing 90th percentile to 10th
percentile ~ 2.5
Thomas et al, submitted
Based on allele frequencies in
controls and multi-locus model assuming
codominant effects at each locus and
multiplicative effects across loci
![Page 71: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/71.jpg)
Probability that a man in the top 10th percentile of risk according to seven-SNP model develops prostate cancer: 45%
Positive predictive value for screening test that predicts prostate cancer for men above a genetic risk profile above a given threshold; recall PPV involves test sensitivity and specificity AS WELL AS incidence rates (here: age specific rates from ACS website)
![Page 72: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/72.jpg)
Novel Risk Loci
• 8q24– Three independent loci with no known function, associated with risks of
prostate and colorectal cancer
• HNF1B (TCF2)– Prostate cancer risk alleles associated with decreased risk of T2D
• MSMB– Encodes beta-microseminoprotein, a proposed prostate-cancer
biomarker
• CTB2– Has anti-apoptotic activity
• JAZF1– Fused by translocation with SUZ12 in endometrial cancer
![Page 73: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/73.jpg)
Where to from Here?
These results open up new and often unexpected avenues for research (c.f. 8q24 region). They may also point to etiologic pathways as targets for treatment.
Despite large PARs, individually these variants are not good predictors on individual's risk. But taken together they may—MAY—be useful for prediction: either for screening or prognosis. The performance of any screening panel will
need to be evaluated in independent studies, and its ultimate efficacy will depend on its discriminative power, and the availability of an intervention proven
to reduce risk.
In the next 3-5 years we'll see many more discoveries using the simple, brute force approach illustrated here. The new challenge will be making sense out of it all: characterizing effects in different populations, looking for gene-environment
interactions, developing new treatments and sound & ethical prevention strategies to reduce cancer morbidity and mortality.
![Page 74: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/74.jpg)
Acknowledgements
NCI Core Genotyping Facility
NCI Division of Cancer Epidemiology and
Genetics
Harvard School of Public Health
Stephen ChanockGilles Thomas
Meredith YeagerKevin Jacobs
Bob HooverRichard Hayes
Sholom WacholderNilanjan Chatterjee
Kai Yu
David HunterJiali Han
Connie Chen
And all the subjects and support staff from the participating studies!
![Page 75: EPI293 Design and analysis of gene association studies Winter Term 2008 Lecture 7: Genome-wide association scans Peter Kraft pkraft@hsph.harvard.edu Bldg](https://reader036.vdocuments.mx/reader036/viewer/2022062516/56649d545503460f94a305b7/html5/thumbnails/75.jpg)
Further ReadingNew England Journal of Medicine, 2 August 2007