spatial mapping of malaria parasite genetics · 2019-04-29 · spatial mapping of malaria parasite...
TRANSCRIPT
![Page 1: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/1.jpg)
Spatial Mapping of Malaria Parasite GeneticsChallenges and Opportunities of High Diversity Genetic Loci
Maxwell Murphy – UC Berkeley/UCSF4/17/2019
1
![Page 2: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/2.jpg)
Outline
• Applying gaussian process regression to estimate allelefrequency surfaces and classify origin of clinical infections
• Challenges of utilizing polygenomic data• Approaches to utilize polygenomic data with high diversity
genetic loci
2
![Page 3: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/3.jpg)
Estimating Allele Frequencies UsingGaussian Processes
![Page 4: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/4.jpg)
Motivation
• Classifying malaria cases as local vs imported and determiningorigin of infection is of great interest
• Spatial models of parasite diversity and distributions wouldmake for interesting inputs into other spatial models
• Integrating spatial information with genetic data remainschallenging
3
![Page 5: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/5.jpg)
Application to VivaxGEN Data
• Publicly availablemicrosatellite datafrom VivaxGEN
• 617 Samplesgenotyped at 9microsatellites
• Polyclonal sampleswere restricted todominant alleles
4
![Page 6: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/6.jpg)
Application to Regional Clinical Data
• 2585 samplescollected from thenorthern region ofNamibia (Tessema etal., eLife 2019)
• Genotyped at 26microsatellite markers
• Polyclonal sampleswere restricted todominant alleles
5
![Page 7: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/7.jpg)
Takeaways
• Using vanilla GP regression with an exponential kernel anddeep enough sampling, spatial signal can be extracted that isuseful for origin classification
• More nuanced approaches to modeling spatial covariance wouldlikely be very fruitful in exposing spatial connectivity of parasitepopulations
• Publicly available regional databases of genetic data will becritical in developing allele frequency maps
6
![Page 8: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/8.jpg)
The Dirty Little Secret of MalariaGenomics
![Page 9: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/9.jpg)
Challenges of Complex Infections and High Diversity GeneticLoci
• “Polyclonal samples were restricted to dominant alleles”• “Analysis was restricted to monoclonal infections”• Primarily motivated by using statistics that were not designed
to be used in the context of mixed DNA populations• Also a consequence of noisiness of genotyping method
• e.g. microsatellite data• Statistical convenience is being prioritized at the consequence
of bias, either due to sampling or because of properties ofestimator
7
![Page 10: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/10.jpg)
Example: Estimating Complexity of Infection
●
●●
●
●●●
●
●●●●
●
●●
●
●
●●●●●●
●
●
●●
●
●
●
●●●
●●
●
●●
●●
●
●●
●
●●
●
●●●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●●●●
●
●●●●●
●●
●
●
●●●●●●●●●
●●
●
●
●●
●
●●●●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●●
●●●●
●
●
●
●●●
●
●
●●
●●
●●
●
●●
●●●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●●
●
●●
●
●
●●
●●●
●
●
●
●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●
●●●●●●
●●
●●●●
●
5
10
0 100 200 300Sample
Est
imat
e Estimate_Type●
●
naive_coitrue_coi
• Simulated data using12 Loci ranging from5 to 25 alleles
• FP Rate: .03• FN Rate: .1• Complexity of
infection estimated bytaking maximumobserved number ofalleles
8
![Page 11: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/11.jpg)
Challenges of Complex Infections and High Diversity GeneticLoci
• Some tools exist to estimate parameters such as complexity ofinfection (COI), allele frequencies, and population structure
• COIL, THE REAL McCOIL• Restricted to SNP data, incorporates genotyping error model
• MALECOT• Supports multi-allelic data, does not incorporate genotyping
error model
9
![Page 12: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/12.jpg)
Constructing Models to Account for Complex Infections UsingMulti-allelic Data
gi,j
g∗i,j
µi
ε+
ε−
πj
j locus
i sample
j locus
10
![Page 13: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/13.jpg)
Constructing Models to Account for Complex Infections UsingMulti-allelic Data
gi,j
g∗i,j
µi
ε+
ε−
πj
j locus
i sample
j locus
L(π, ε−, ε+, µ|G) =n∏
i=1
k∏
j=1
∑
g∗∈G∗
P (gi,j |g∗i,j , ε+, ε−)P (g∗i,j |µi, πj)
P (gi,j |g∗i,j , ε+, ε−) =a∏
k=1
(1− ε−)g∗i,j,k if gi,j,k = 1 and g∗i,j,k > 0
(ε−)g∗i,j,k if gi,j,k = 0 and g∗i,j,k > 0
(ε+) if gi,j,k = 1 and g∗i,j,k = 0
(1− ε+) if gi,j,k = 0 and g∗i,j,k = 0
P (g∗i,j |µi, πj) =µi!
g∗i,j,1! · · · g∗i,j,k!πg∗i,j,1
j,1 · · ·πg∗i,j,k
j,k
11
![Page 14: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/14.jpg)
Computational Complexity
L(π, ε−, ε+, µ|G) =n∏
i=1
k∏j=1
∑g∗∈G∗
P(gi ,j |g∗i ,j , ε+, ε−)P(g∗i ,j |µi , πj)
12
![Page 15: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/15.jpg)
Computational Complexity
L(π, ε−, ε+, µ|g) =n∏
i=1
k∏j=1
∑g∗∈G∗
P(gi ,j |g∗i ,j , ε+, ε−)P(g∗i ,j |µi , πj)
1 2 3 4 5 62 2 3 4 5 6 74 4 10 20 35 56 848 8 36 120 330 792 171616 16 136 816 3876 15504 5426432 32 528 5984 52360 376992 232478464 64 2080 45760 766480 10424128 119877472128 128 8256 357760 11716640 309319296 6856577728
13
![Page 16: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/16.jpg)
Tackling Complex Infections withHigh Diversity Genetic Loci
![Page 17: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/17.jpg)
Estimating Marginal Likelihoods
• We do not need to calculate the exact marginal likelihood foreach sample
• An unbiased estimate of the probability density results in aMarkov Chain with the exact target as its stationarydistribution (Andrieu and Roberts, 2009)
Computationally Intractable
r := P(x ′)g(x |x ′)P(x)g(x ′|x)
Tractabler̂ := P̂(x ′)g(x |x ′)
P̂(x)g(x ′|x)
14
![Page 18: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/18.jpg)
Estimating Marginal Likelihoods
• For each sample at each locus, we can generate an unbiasedestimate of the likelihood instead of calculating an exactlikelihood
P(g |π, ε−, ε+, µ) =∑
g∗∈G∗P(g |g∗, ε+, ε−)P(g∗|µi , πj)
15
![Page 19: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/19.jpg)
Importance Sampling
• Importance Sampling provides a computationally tractablemethod to generate unbiased estimates of the marginallikelihood of a given data point
• Intuitively makes sense, as the vast majority of potential truegenotypes contribute very little to the likelihood
P̂(g |π, ε−, ε+, µ) = 1K
K∑k=1
P(g |g∗k , ε+, ε−)P(g∗k |µi , πj)q(g∗)
g∗ ∼ q
16
![Page 20: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/20.jpg)
Choosing an Importance Sampling Distribution
• Choice of Importance Sampling Distribution has significantimpact on variance of estimate, and subsequently the efficiencyof sampling
• A good heuristic for choosing a sampling distribution is toreweight the estimated allele frequency distribution based onthe observed genotype and false negative rate
• Ex:• π = [.1, .2, .3, .4]• ε− = .1• g = [1, 0, 0, 0]• q = [.9, .02, .03, .04]
17
![Page 21: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/21.jpg)
Demonstration: Low Diversity
5
10
0 5 10True COI
Est
imat
ed C
OI
• Simulated data using12 Loci with 5 alleles
• FP Rate: .03• FN Rate: .1
18
![Page 22: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/22.jpg)
Demonstration: High Diversity
2.5
5.0
7.5
10.0
12.5
0 5 10True COI
Est
imat
ed C
OI • Simulated data using
12 Loci with between5 and 25 alleles
• FP Rate: .03• FN Rate: .1
19
![Page 23: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/23.jpg)
Future Directions
![Page 24: Spatial Mapping of Malaria Parasite Genetics · 2019-04-29 · Spatial Mapping of Malaria Parasite Genetics Challenges and Opportunities of High Diversity Genetic Loci MaxwellMurphy–UCBerkeley/UCSF](https://reader034.vdocuments.mx/reader034/viewer/2022042803/5f4fa080d563fb38505a2cc6/html5/thumbnails/24.jpg)
Future Directions
• With this computational framework, we will extend our spatialmodeling of allele frequencies to incorporate all observedgenetic data
• Further develop this framework for estimating other parametersof interest
20