design considerations in large- scale genetic association studies michael boehnke, andrew skol,...
TRANSCRIPT
![Page 1: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/1.jpg)
Design Considerations in Large-Scale Genetic Association Studies
Michael Boehnke,Andrew Skol, Laura Scott, Cristen Willer,
Gonçalo Abecasis, Anne Jackson, and the FUSION Study Investigators
Department of Biostatistics Center for Statistical Genetics
University of Michigan
![Page 2: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/2.jpg)
Outline
• Assess the utility of HapMap samples for tagSNP selection in a study of type 2 diabetes in Finnish subjects
• Discuss the impact of several design factors on cost and efficiency of genome-wide association (GWA) studies
![Page 3: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/3.jpg)
FUSION Study: Finland-United States Investigation of NIDDM Genetics
National Public Health Institute, HelsinkiUSC Keck School of Medicine, Los AngelesNational Human Genome Research Institute, BethesdaUniversity of Michigan School of Public Health, Ann ArborUniversity of North Carolina School of Medicine, Chapel Hill
![Page 4: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/4.jpg)
Chromosome 14 SNP Selection
• Used early HapMap (May 2004) to select tagSNPs in 18 Mb linkage interval on chr 14
• MAF > .05, Illumina design score > .40
• Unselected SNPs had r2 > .8 with 1 tagSNP
• Added annotation-based SNPs
• Double tagged large bins, filled large gaps
![Page 5: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/5.jpg)
Chromosome 14 SNP Selection
HapMap SNPs in region (MAF > .05) 2276
HapMap tagSNPs (r2 > .8) 1132
Annotation-based SNPs 28
Double-tag SNPs (large bins) 11
Gap-filling SNPs 211
Total SNPs attempted 1382
Total SNPs genotyped 1230
Total SNPs polymorphic and in HWE 1192
![Page 6: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/6.jpg)
Utility of HapMap for tagSNP Selection
for Finnish Subjects
• Question: How comparable were allele, haplotype frequency and r2 in HapMap, Finnish data?
• Compared HapMap data and 1448 Finnish samples from FUSION and Finrisk 2002 studies
• Poster 1621, Willer et al., Friday 1:30 3:30 pm
![Page 7: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/7.jpg)
Allele Frequencies: FUSION vs. HapMap
CEU YRI
CHB JPT
![Page 8: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/8.jpg)
Allele Frequencies: FUSION vs. CEU
7.5% SNP frequencies differ at p < .01 r = .98
![Page 9: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/9.jpg)
LD r2 : FUSION vs. CEU
r = .91
![Page 10: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/10.jpg)
Haplotype Frequencies: FUSION vs. CEU
r = .97
![Page 11: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/11.jpg)
Summary: Chromosome 14 SNP Selection• CEU excellent basis for tagSNP selection in Finns
• Strong correlation between allele frequencies, haplotype frequencies, LD in two samples
• Excess of significant allele and haplotype frequency differences (7% at .01 level), but mostly small
• Nearly all common haplotypes (frequency > .05) in one sample present in both samples– 579/583 from CEU in FUSION – 557/563 from FUSION in CEU
![Page 12: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/12.jpg)
Design of Genome-wide Association Studies
• GWA provides unprecedented opportunity to identify genetic variants predisposing to disease
• Enabled by HapMap, genotyping costs
• Since we may type 100s-1000s of samples on 100Ks of SNPs, efficient study design critical
• Examine two-stage designs for large-scale genetic studies (see Satagopan, Elston, Thomas)
![Page 13: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/13.jpg)
1,2,
3,…
……
……
……
……
,N
1,2,3,……………………………,MSNPs
Sam
ples
One-Stage DesignOne-Stage Design
Stage 1
Sta
ge 2
samples
markers
Two-Stage DesignTwo-Stage Design
1,2,3,……………………………,MSNPs
Sam
ples
1,2,
3,…
……
……
……
……
,N
One- and Two-Stage GWA Designs
![Page 14: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/14.jpg)
SNPs
Sam
ples
Replication-based analysisSNPs
Sam
ples
Stage 1
Stag
e 2
One-Stage DesignOne-Stage Design
Joint analysisSNPs
Sam
ples
Stage 1
Stag
e 2
Two-Stage DesignTwo-Stage Design
![Page 15: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/15.jpg)
Joint Analysis is More Powerful than Replication-Based Analysis Skol et al., Friday 8:45, 180, Hall 3
300,000 markers genotyped on 1000 cases, 1000 controlsMultiplicative model, prevalence 10%, GRR = 1.4
One-stage power
![Page 16: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/16.jpg)
Factors that Influence Cost and Efficiency of GWAs
• Fraction samples typed in Stage 1 (samples)
• Fraction SNPs typed in Stage 2 (markers)
• Stage 2 to Stage 1 per genotype cost ratio (R)
![Page 17: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/17.jpg)
For a two-stage GWA study, what is the optimal fraction of samples genotyped in Stage 1 (samples) ?
Stage 2 per genotype costR =
Stage 1 per genotype cost
Case 1: R = 1
Case 2: R = 1, 2, 5, 10
![Page 18: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/18.jpg)
Cost as a Function of Samples Typed in Stage 1 Per Genotype Cost Ratio R=1
Fraction of Markers Followed-up Varies to Ensure Constant Power
![Page 19: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/19.jpg)
For a two-stage GWA study, what is the optimal fraction of samples genotyped in Stage 1 (samples) ?
Stage 2 per genotype costR =
Stage 1 per genotype cost
Case 1: R = 1
Case 2: R = 1, 2, 5, 10
![Page 20: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/20.jpg)
Cost as a Function of Samples Typed in Stage 1 Per Genotype Cost Ratio R = 1, 2, 5, 10
Fraction of Markers Followed-up Varies to Ensure Constant Power
R=10
R=1
R=5
R=2
![Page 21: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/21.jpg)
Summary: Two-Stage GWA Designs
• Two-stage GWA designs efficient, cost-effective; joint analysis more powerful than replication
• For equal Stage 1, 2 per genotype costs (R=1), 250K SNPs, genomewide significance =.05, genotype 20-30% of samples in Stage 1
• For R>1, less stringent significance, fewer SNPs, genotype 30-40% SNPs in Stage 1
![Page 22: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/22.jpg)
Acknowledgements
• Chromosome 14: Cristen Willer, Anne Jackson; FUSION, CIDR, and HapMap investigators
• Two-stage designs: Andrew Skol, Laura Scott, Gonçalo Abecasis
• Thanks!
![Page 23: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/23.jpg)
![Page 24: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/24.jpg)
Excluded slides follow
![Page 25: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/25.jpg)
0
1
2
3
0 40 80 120Position (cM)
ML
S
FUSION 1: 495 ASP families
FUSION 2: 242 ASP families
FUSION 1+2
FUSION Chromosome 14 T2D Linkage
![Page 26: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/26.jpg)
Power of One- and Two-Stage Designs
![Page 27: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/27.jpg)
How does a change in significance level change the optimal proportion
of samples in Stage 1 (samples)?
Case 1: =.05/250,000 genomewide significance
Case 2: =10/250,000 less stringent significance
Case 2’: =.05/1,250 candidate gene significance
![Page 28: Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,](https://reader030.vdocuments.mx/reader030/viewer/2022033103/56649dce5503460f94ac223a/html5/thumbnails/28.jpg)
Impact of Significance Level on Optimal Proportion of Samples in Stage 1