a new nonparametric bayesian model for genetic recombination in open ancestral space presented by...

22
A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University February 26, 2007 Paper by E. P. Xing and K-A. Sohn

Upload: cameron-andrews

Post on 18-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

A New Nonparametric Bayesian Model for Genetic Recombination in

Open Ancestral Space

Presented by Chunping Wang

Machine Learning Group, Duke University

February 26, 2007

Paper by E. P. Xing and K-A. Sohn

Page 2: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Outline

• Terminology and Introduction

• DP Mixtures for Non-recombination Inheritance

• HMDP for Recombination

• Results

• Conclusions

Page 3: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

• Allele: a viable DNA coding on a chromosome – observation

• Locus : the location of an allele – index of an observation

• Haplotype: a sequence of alleles – data sequence

• Recombination: exchange pieces of paired chromosome – state-transition

• Mutation: any change to a haplotype during inheritance – emission

Terminology and Introduction (1)

Page 4: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Terminology and Introduction (2)

Ancestors

Descendants

Page 5: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Terminology and Introduction (3)

Problems:

1. Ancestral inference: recovering ancestral haplotypes;

2. Recombination analysis: inferring the recombination hotspots;

3. Ancestral mapping: inferring the ancestral origin of each allele in each modern haplotype.

Page 6: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

DP Mixtures for Non-recombination Inheritance (1)

Non-recombination:

• Only mutation may occur during inheritance;

• Each modern haplotype is originated from a single ancestor.

Only true for haplotypes spanning a short region in a chromosome.

Page 7: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

DP Mixtures for Non-recombination Inheritance (2)

Q

0Q

i

ihn

)(~|

~|

),(~,| 00

ihii

i

Ph

QQ

QDPQQ

Kka kkk ,,1),,(* where , the distinct values of , denote the joint of the kth ancestor and the mutation parameter corresponding to the kth ancestor.

nii 1}{

Page 8: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

DP Mixtures for Non-recombination Inheritance (3)

Page 9: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (1)

For long haplotypes possibly bearing multiple ancestors, we consider recombinations (state-transitions across discrete space-interval).

jQ

ji

jihjm

2Q

2i

2ih

2m

1Q

0Q

1i

2ih

1m

F

Page 10: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Each row of the transition matrix in HMM is a DP. Also these DPs are linked by the top level master DP, and have the same set of target states.

The mixing proportions for each lower level DP are denoted as , then the jth row of the transition matrix is .

HMDP for Recombination (2)

],,[ 2,1, jjj

j

Page 11: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (3)

Modern haplotypeAncestor haplotype

The indicators of ith modern haplotype for all the loci, which specify the corresponding ancestral haplotype

• when no recombination takes place during the inheritance process producing haplotype Hi,

• when a recombination occurs between loci t and t+1,

tkC ti ,,

1,, titi CC

Page 12: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (4)

Introduce a Poisson point process to control the duration of non-recombinant inheritance (space-inhomogeneous)

ex

xp x

!

1)|(

Denote

d: the physical distance between loci t and t+1 ;

r: recombination rate per unit distance.

Then

x-the number of recombinations

1)|0( dredrxp

dredrxp 1)|0(

Page 13: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (5)

Combine with the standard stationary HMDP, the non-stationary state transition probability:

)',()1()|'( ',,1, kkkCkCp kktiti

While d or r goes to infinity, , , the inhomogeneous HMDP model goes back to a standard HMDP.

0 dre 1

Page 14: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (6)

Inference:

The emission function:

),(~ hhBeta

),|( achp

where

The prior base: )()(),( pApAF

)(Ap uniform

Integrate over , the marginal likelihood: )(p

Page 15: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

HMDP for Recombination (7)

Inference:

Two sampling stages:

1. Sample given all haplotypes h and the most recently sampled ancestor pool a;

2. Sample every ancestor Ak given all haplotypes h and the current

}{ ,tiC

}{ ,tiC

Combine the HDP prior and the marginal likelihood,

we can infer the posterior for and , which are the variables of interest.

}{ ,tiC }{ ,tkA

Page 16: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (1)Simulated data:

30 populations, each includes 200 haplotypes from K=5 ancestral haplotypes. T=100

Compare: HMDP, HMMs with K=3,5 and 10

The average ancestor reconstruction errors for the five ancestors

Even the HMM with K=5 cannot beat the HMDP

Page 17: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (2)

Box plot of the empirical recombination rates

The vertical gray lines - the pre-specified recombination hotspots

Threshold 1

Threshold 2

Page 18: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (3)

Population maps: 1. true map; 2. HMDP; 3-5. HMMs with K=3,5,10

Each vertical thin line – one modern haplotype;

Each color – one ancestral haplotype.

Measure for accuracy: the mean squared distance to the true map

Page 19: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (4)Real haplotype data sets 1: Daly data – single population

512 haplotypes. T=103

Bottom: empirical recombination rates

Upper vertical lines: recombination hotspots.

Red dotted lines: HMM; blue dashed lines: MDL; black solid lines: HMDP

Page 20: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (5)

A Gaussian mixture fitting of empirical recombination rates

Choose the threshold

Page 21: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Results (6)

Estimated population map

Each vertical thin line – one modern haplotype;

Each color – one ancestral haplotype.

Page 22: A New Nonparametric Bayesian Model for Genetic Recombination in Open Ancestral Space Presented by Chunping Wang Machine Learning Group, Duke University

Conclusions

• This HMDP model is an application and extension of the HDP into the population genetics field;

• The HDP allows the space of states in HMM to be infinite so that it is suitable for inferring unknown number of ancestral haplotypes;

• The HMDP model also allows the recombination rates to be non-stationary;

• The HMDP model can jointly infer a number of important genetic variables.