analysis update for geneva meeting 2011
TRANSCRIPT
![Page 1: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/1.jpg)
Novel Statistical Methods
Gary K. ChenUniversity of Southern California
May 17, 2011
![Page 2: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/2.jpg)
An outline
Association testing in admixed populations
Gene-gene interactions
Copy number inferences
![Page 3: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/3.jpg)
Local ancestry inference
I Assumption: 2 or more homogeneouspopulations gave rise to today’s admixedpopulation. e.g. Hispanics, African Americans
I Software:I LAMPI HAPAAI Hapmix
I Relevance:I Not taking ancestry into account can cause large
problems in confoundingI However, understanding local ancestry can enhance
inference in gene mapping
![Page 4: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/4.jpg)
Hidden Markov Model of HAPMIXprogram
![Page 5: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/5.jpg)
Combining evidence from both localancestry and association
![Page 6: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/6.jpg)
Novel MIX score statistic
I A χ21 test combining association and admixture
associationI Likelihood:
I Lcombined(pA, pE ,R) =LAA,AE ,AA(pA, pE ,R)Ladmix(Ω(R))
I Assumption: the SNP odds ratio R is re-used inthe ancestry odds ratio Ω(R)
I MIX = 2[ maxpA,0,pE ,0,R
logLcombined(pA,0, pE ,0,R)−
max pA,0, pE ,0logLcombined(pA,0, pE ,0, 1)]
![Page 7: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/7.jpg)
Afr-Am Prostate Cancer GWAS
![Page 8: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/8.jpg)
Afr-Am Prostate Cancer Admixture scan
![Page 9: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/9.jpg)
Top results from scan of MIX statistic
chr position rs adm mix snp beta se pvalue8 128187997 rs7844219 49.9614 83.6454 54.5324 0.279106 0.0366375 1.96509e-148 128193308 rs1551512 50.4199 81.3104 51.5792 0.266105 0.0364108 2.15938e-138 128198554 rs6989838 52.7453 80.7646 51.0694 0.266931 0.036335 1.61315e-138 128199669 rs7013255 50.4199 80.7332 51.7533 0.266914 0.0363362 1.62204e-138 128194098 rs16901979 49.9614 79.6505 51.0524 0.265175 0.0363043 2.22267e-138 128176062 rs6983561 49.9614 79.386 49.888 0.276348 0.037292 9.90319e-148 128194377 rs10505483 51.8086 78.1544 49.3351 0.260651 0.0363121 5.71765e-138 128174913 rs7012442 49.5051 77.371 48.4881 0.278969 0.0376443 9.78106e-148 128219343 rs6987409 49.5051 62.8232 46.5145 0.36881 0.0502564 1.41553e-138 128202258 rs7000307 49.9614 57.1498 37.9689 0.254356 0.0408703 4.13043e-108 128225845 rs7822987 54.6449 56.9699 40.3666 0.349385 0.0501898 2.4144e-128 128204516 rs7840773 52.2758 56.5461 36.8886 0.2491 0.0405111 6.68617e-108 128223073 rs7018243 49.051 56.4211 40.683 0.345422 0.0498431 3.02292e-128 128225870 rs7822995 49.9614 56.1409 40.2347 0.349646 0.0502108 2.37455e-128 128173525 rs13254738 52.7453 55.9177 37.4838 -0.255658 0.0386944 3.37064e-118 128204547 rs7824364 47.2561 55.7612 37.2464 0.2476 0.0404363 7.88135e-108 128173119 rs1456315 49.9614 54.9189 41.9607 -0.231627 0.0357647 8.22321e-118 128482487 rs6983267 55.6079 49.2272 19.0306 -0.280904 0.0593348 2.00619e-068 128168637 rs1840709 50.8806 44.0089 22.5331 -0.204043 0.0410319 6.27666e-078 128257237 rs16902003 50.4199 38.9456 29.1061 0.340009 0.0612992 2.29044e-08
![Page 10: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/10.jpg)
An outline
Association testing in admixed populations
Gene-gene interactions
Copy number inferences
![Page 11: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/11.jpg)
Detecting higher order interactions
I Statistical epistasis may account for somehidden heritability
I Statistical and computational challenges areobvious
I Possible approaches for variable selectionI Constrain search to only variables with strong
marginal effectsI Place priors on the effect sizes, informed through
biology: (e.g. Chen and Thomas Genetic Epi 2010)
I Search space can still be hugeI Implement massively parallel optimization
algorithmsI Provide a good fit for hardware architecture of
Graphics Processing Units
![Page 12: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/12.jpg)
Organization of gridblock of threadblockson GPU
![Page 13: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/13.jpg)
Overview of algorithmI Newton-Raphson kernel
I Each threadblock maps to a block of 512 subjects(theads) for 1 variable
I Each thread calculates subject’s contribution togradient and hessian
I Sum (reduction) across 512 subjectsI Sum (reduction) across subject blocks in new
kernel
I Compute log-likelihood change for eachvariable (like above).
I Apply a max operator (log2 reduction) toselect variable with greatest contribution tolikelihood.
I Iterate repeatedly until likelihood increase lessthan epsilon
![Page 14: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/14.jpg)
![Page 15: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/15.jpg)
Evaluation on large dataset
I GWAS dataI 6,806 African American subjects in a case control
study of prostate cancerI 1,047,986 SNPs typed
I Elapsed walltime for 1 LASSO iteration (sweepacross all variables)
I 15 minutes on optimized serial implementationacross 2 slave CPUs
I 5.8 seconds on parallel implementation across 2nVidia Tesla C2050 GPU devices
I 155x speed up
![Page 16: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/16.jpg)
Application
I Defined 28 risk regions (Haiman et al PLoSGenet in press)
I 6,256 SNPs typed
I Fit a model with 19,571,896 variables usingLASSO penalized multivariate logisticregression
I Avg run time per variable: 1 min 40 seconds
![Page 17: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/17.jpg)
Results
Table: 1st 10 variables to enter the model
Interaction β 1df χ2
SNP 1 SNP 2 Multivariate Univariate Interaction SNP 1 SNP 2rs10050937 rs17794619 -0.472152 -0.512223 35.0549 6.6248 15.6842rs12484747 rs5759052 -0.382707 -0.34638 27.221 3.89604 18.1621rs12943477 rs7130881 0.243003 0.267117 42.1636 5.71494 31.5322rs13417654 rs5759256 0.216708 0.240687 32.3129 16.0361 0.0104941rs2625403 rs4872172 -0.12534 -0.148221 30.0041 11.1439 14.8556rs266880 rs7949453 0.136513 0.152762 28.983 12.0237 10.7806rs2963275 rs360802 -0.53471 -0.583309 29.8975 1.63451 7.26303rs339319 rs7075009 -0.225684 -0.263573 31.4443 22.2988 6.06348rs4129455 rs9333335 -1.33385 -1.78312 33.2588 6.37851 1.19051rs6798749 rs8079894 0.179629 0.201867 29.0323 12.0371 9.97029
![Page 18: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/18.jpg)
An outline
Association testing in admixed populations
Gene-gene interactions
Copy number inferences
![Page 19: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/19.jpg)
Application to cancer tumor data
I Copy number inference in tumors morechallenging
I Tissues can be contaminated with normal cellsI Furthermore, intra tumor heterogeneity can lead to
sub-clones with distinct CN profiles
I A large state space HMMI Consider differing normal-tumor copy number and
genotype combinationsI For each combination, a possible contamination
proportionI Copy Num: z = (1-α)znormal + α ztumor
![Page 20: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/20.jpg)
Simplified Example of a State Spacestate CNfrac BACnormal CNtumor BACtumor0 2 0 2 01 2 1 2 12 2 2 2 23 0 0 0 04 0 1 0 05 0 2 0 06 0.5 0 0 07 0.5 1 0 08 0.5 2 0 09 1 0 1 010 1 1 1 011 1 1 1 112 1 2 1 113 1.5 0 1 014 1.5 1 1 015 1.5 1 1 116 1.5 2 1 117 2.5 0 3 018 2.5 1 3 119 2.5 1 3 220 2.5 2 3 321 3 0 4 022 3 1 4 123 3 1 4 224 3 1 4 325 3 2 4 426 3.5 0 4 027 3.5 1 4 128 3.5 1 4 229 3.5 1 4 330 3.5 2 4 4
![Page 21: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/21.jpg)
Comparison of algorithms
I We implement 8 kernels. Examples:I Re-scaling transition matrix (for SNP spacing)
I Serial: O(2nm2); Parallel: O(n)
I Forward backwardI Serial: O(2nm2); Parallel: O(nlog2(m))
I Normalizing constant (Baum-Welch)I Serial: O(nm); Parallel: O(log2(n))
I MLE of transition matrix (Baum-Welch)I Serial: O(nm2); Parallel: O(n)
![Page 22: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/22.jpg)
Speedups
Table: 1 iteration of HMM training on Chr 1 (41,263 SNPs)
states CPU GPU fold-speedup128 9.5m 37s 15x512 2h 35m 1m 44s 108x
![Page 23: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/23.jpg)
Chr 21 0 percent tumor
![Page 24: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/24.jpg)
Chr 21 100 percent tumor
![Page 25: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/25.jpg)
Chr 21 50 percent tumor
![Page 26: Analysis update for GENEVA meeting 2011](https://reader033.vdocuments.mx/reader033/viewer/2022060202/559cd85f1a28ab885f8b47b7/html5/thumbnails/26.jpg)
Thanks to
I Admixture scoring: Bogdan Pasaniuc
I CNV work: Kai Wang, Christina Curtis
I Access to GPU server: Tim Triche, ZachRamjan
I (Chris’s Acknowledgement slide)