identifying conserved segments in rearranged and divergent genomes

26
Identifying conserved segments in rearranged and divergent genomes Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling

Upload: tovah

Post on 06-Jan-2016

28 views

Category:

Documents


0 download

DESCRIPTION

Identifying conserved segments in rearranged and divergent genomes. Bob Mau, Aaron Darling, Nicole T. Perna Presented by Aaron Darling. Comparing genomic architectures. Genome sequence and architecture comparison can lead to insight about organismal Evolutionary forces Gene functions - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Identifying conserved segments in rearranged and divergent genomes

Identifying conserved segments in rearranged and divergent

genomes

Bob Mau, Aaron Darling, Nicole T. Perna

Presented by Aaron Darling

Page 2: Identifying conserved segments in rearranged and divergent genomes

Comparing genomic architectures

Genome sequence and architecture comparison can lead to insight about organismal

• Evolutionary forces• Gene functions• Phenotypes

Rearrangement, gene gain, loss, and duplication obfuscate homology

Page 3: Identifying conserved segments in rearranged and divergent genomes

Structure of the bacterial chromosome

Origin of replication

Terminus

Replication proceeds simultaneously on each “replichore”

Breakpoints of inversions occur an equal distance from the origin to maintain replichore balance.

(Tillier and Collins 2000, Ajana et. al. 2002)

We call such rearrangements “symmetric inversions”

Replichore size difference > 20% is selected against (Guijo et. al. 2001)

Page 4: Identifying conserved segments in rearranged and divergent genomes

A dot plot: Each dot is a pairwise (or n-way) local alignment

Page 5: Identifying conserved segments in rearranged and divergent genomes

Goal: Identify local homologous (orthologous) segments

Blue:

Same strand

Red:

Opposite strand

Page 6: Identifying conserved segments in rearranged and divergent genomes

Tools for segmental homology detection

GRIMM-Synteny (Pevzner et. al. 2003, Bourque et. al. 2004)

- cluster markers within a fixed distance

FISH (Vision et. al. 2003)- find statistically over-represented

clusters of markers within a fixed distance

LineUp (Hampson et. al. 2003)- find collinear runs of markers among

pairs of genomes, allowing degeneracy

Some alignment tools:Shuffle-LAGAN (Brudno et. al. 2003), Mauve (Darling et. al. 2004)

Page 7: Identifying conserved segments in rearranged and divergent genomes
Page 8: Identifying conserved segments in rearranged and divergent genomes
Page 9: Identifying conserved segments in rearranged and divergent genomes

Small segments separated by lineage-specific regions may not be detected by methods based strictly on distance.

Key idea: use a combination of conserved marker order (collinearity) and alignment score

Page 10: Identifying conserved segments in rearranged and divergent genomes

Finding conserved regions: A pseudo-Gibbs sampler method

Given: A set of M monotypic markers MDo: Assign a posterior probability that any marker m є

M is part of a conserved region

Use MCMC methodology to sample the frequency of

each marker’s inclusion in high-scoring configurations.

Use frequency as an estimate of “posterior probability”

Page 11: Identifying conserved segments in rearranged and divergent genomes

Finding conserved regions: A pseudo-Gibbs sampler method

Define a configuration X as a vector of length M ofbinary random variables:

e.g. X = ( X1, X2, …, XM )

A configuration value xj maps marker mj to either signal (1) or noise (0)

e.g. x = (0,1,0,0,1,1,…,1,0)There are 2M possible configurationsRun a Markov chain of length N over configuration space: (X1, X2, …, XN)

Page 12: Identifying conserved segments in rearranged and divergent genomes

Sample possible marker configurations

Start with a random initial configuration, THEN:

Select a marker, sample whether it should be a 0 or 1 based on the current configuration

R

jvvvj

j

Lvvvj xwwxwmScore

1

1

)|( x

Sum of scores for all collinear markers to the left

Sum of scores for all collinear markers to the right

Score of marker j

wv is the score of marker v, xv is the configuration value (0 or 1)

Page 13: Identifying conserved segments in rearranged and divergent genomes

Transform LCB score to probability

The scale parameter c is used in tandem with the sigmoid to map a marker’s score to a probability:

1

1)|1( /)(

/)(1

cmScore

cmScorenn

j j

j

e

eXP x

Page 14: Identifying conserved segments in rearranged and divergent genomes

Sample a new value for xj

Set xj to 1 with probability given by the marker’s

score transformation

First allow the chain a “burn-in” period, then

continue for many iterations.

The frequency, or “posterior probability” of mj is:

samples of #

1 samples of #

Page 15: Identifying conserved segments in rearranged and divergent genomes

Our method assigns each marker a p.p.

Threshold γ separates signal from noise

Page 16: Identifying conserved segments in rearranged and divergent genomes

Our method assigns each marker a p.p.

Using γ = .5, the X pattern appears

Page 17: Identifying conserved segments in rearranged and divergent genomes

Our method assigns each marker a p.p.

Using γ = .5, the X pattern appears

Page 18: Identifying conserved segments in rearranged and divergent genomes

Application to 4 divergent Streptococcus

Markers are reciprocal best blastp hits of ORFs among:

S. agalactiae

S. pyogenes

S. pneumoniae

S. mutans

S. pneumoniae

Page 19: Identifying conserved segments in rearranged and divergent genomes

What is the distribution of segment sizes in Streptococci?As resolution increases, large segments are broken up by

smaller segments

3

11

29

7 72

61 3 1 2 1 0 0 2

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 13 14 17 18 24

Nu

mb

er o

f L

CB

s

Segment sizes (Markers per segment)

c = 75, γ = .45

“Low resolution”

c = 30, γ = .45

“Medium resolution”

c = 20, γ = .50

“High-1 resolution”

c = 20, γ = .30

“High-2 resolution”

14

20

7 72

62 3 1 2 0 0 1 2

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 13 14 17 18 24

Nu

mb

er o

f L

CB

s

0 0 2 4 62

61

41 1 0 2 1 2

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 13 14 17 18 24

Nu

mb

er o

f L

CB

s

0 0 0 1 2 16

25 3 1 0 2 1 2

0

5

10

15

20

25

30

35

2 3 4 5 6 7 8 9 10 11 13 14 17 18 24

Nu

mb

er o

f L

CB

s

26

32

57

72

Total Segments

Page 20: Identifying conserved segments in rearranged and divergent genomes

What was the ancestral genome organization?

Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments

Page 21: Identifying conserved segments in rearranged and divergent genomes

What was the ancestral genome organization?

Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments

Failed: The suggested rearrangements do not maintain replichore balance

Page 22: Identifying conserved segments in rearranged and divergent genomes

What was the ancestral genome organization?

Try building inversion phylogeny by applying GRIMM and MGR to the 57 high resolution segments

Failed: The suggested rearrangements do not maintain replichore balance

Try using the 26 larger, low resolution segments

Surprise! A success:

Page 23: Identifying conserved segments in rearranged and divergent genomes

Transforming S. agalactiae into S. pyogenes

Page 24: Identifying conserved segments in rearranged and divergent genomes

Conclusions

- The pseudo-Gibbs sampler method detects

collinear segments at a variety of scales

- It would be nice to have an inversion phylogeny

inference tool that accounts for replichore balance!

- Large segments in Streptococci appear to

rearrange by symmetric inversions

- Small segments? An open problem.

Page 25: Identifying conserved segments in rearranged and divergent genomes

Future directions

Can a biologically relevant full joint probability distribution be expressed over configurations?

- If so, then a true Gibbs sampler could be employed

Problems:- Some rearrangements occur with different

frequency (e.g. symmetric inversions about the terminus vs. IS-mediated translocation)

- Distinguish rearrangement by H.T., gene duplication and subsequent loss, symmetric inversion, etc.

Page 26: Identifying conserved segments in rearranged and divergent genomes

Acknowledgements

Bob Mau – did most of this workMy Ph.D. advisers:

Nicole T. Perna and Mark Craven

Others who have contributed insight:Jeremy Glasner, Fred Blattner, Eric CabotGEL@UW-Madison

Grant $. Money : NIH Grant GM62994-02. NLM Training Grant 5T15M007359-03 to A.E.D.