presented by guohui ding r&d, sibs, cas

37
A Temporal Profile for Animal Transmembrane Gene Duplication (Insights into the Coupling of Duplicat ion and Macroevolution) Presented by Guohui Ding R&D, SIBS, CAS

Upload: ocean

Post on 13-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

A Temporal Profile for Animal Transmembrane Gene Duplication (Insights into the Coupling of Duplication and Macroevolution). Presented by Guohui Ding R&D, SIBS, CAS. Background. Gene Duplication[1,2] - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Presented by Guohui Ding R&D, SIBS, CAS

A Temporal Profile for Animal Transmembrane Gene Duplication (Insights in

to the Coupling of Duplication and Macroevolution)

Presented byGuohui Ding

R&D, SIBS, CAS

Page 2: Presented by Guohui Ding R&D, SIBS, CAS

Background• Gene Duplication[1,2]

– The predominant mechanism by which genes with new functions and associated phenotypic novelties arise

– Several models try to explain the process of gene duplications– Positive selection play a key role in the neo/subfunctionalization (?). It gi

ves the chance to study the interplay of physical and biotic factors.• Macroevolution[3]

– The dynamics of evolution above species level– Biogeographic/geochemic/Palaeontologyical/Ecological data (e.g. fossil

data, ocean chemistry data)• TM proteins[4]

– At least one transmembrane helix– Such as active transport, ion flows, energy transduction, and signal tran

sduction et al– Information exchange between the cell and the environment

Page 3: Presented by Guohui Ding R&D, SIBS, CAS

Gene Duplication

Accumulation of mutations

Environment/Genetic backgroundselection

Genetics. 1999 Apr;151(4):1531-45.

Page 4: Presented by Guohui Ding R&D, SIBS, CAS

Macroevolution’s Evidences/Data

Science. 2002 Aug 16;297(5584):1137-42. Review. Nature. 2005 Mar 10;434(7030):208-10. Nature. 2000 Mar 9;404(6774):177-80. Science. 2000 Dec 1;290(5497):1758-61.

Page 5: Presented by Guohui Ding R&D, SIBS, CAS

TM Proteins

• ……• Old and from long, long ago• Not a good choice for

evolution theory study but maybe a suitable model illustrating the interaction between environment and life

Nature. 2004 Oct 21;431(7011):913.

Page 6: Presented by Guohui Ding R&D, SIBS, CAS

The Question and The Logic• What will the temporal profile of Animal TM gene duplications lo

ok like? Is it a uniform distribution? If not, what scenarios can be used to explain the distribution? (Null hypothesis: Neutral theory)

• Are large-scale cycles and patterns found in phanerozoic fossil records leaving some imprint in the TM Gene duplication temporal profile, if God adjusts the macroevolution by the microevolution or genes? How important is gene duplication to the speciation?[1,5] (Just a little extrapolation) Does duplication events synchronize with the speciation or origination/extinction? When they are asynchronous, what it want to tell us?

• Can this logic/method be applied in understanding the macroevolution? In general, the sequence data are far more readily attainable than the fossil data. Also, it shows a second way.

The logic: Duplicates selected by environmentMore duplicates at a time implying more “diversity” in the environment that time

Page 7: Presented by Guohui Ding R&D, SIBS, CAS

Methods

• TM protein prediction• Family construction• Estimation of molecular time scale• Duplication events detection• Data processing

Page 8: Presented by Guohui Ding R&D, SIBS, CAS

TM protein prediction• Data

– NCBI Reference Sequence (RefSeq) Database (Release 7, September 12, 2004)

– 13 eukaryotic genomes, (61 bacterial genomes, 11 archaebacterial genomes)

• Transmembrane Topology Models prediction– Conpred II

• Identification of TM Proteins– At least one

transmembrane helix

Nucleic Acids Res. 2004 Jul 1;32:W390-3.

Page 9: Presented by Guohui Ding R&D, SIBS, CAS

Family construction• Detection and masking of widespread, typically repetitive domains.• Filtering by SEG and all to all comparison of protein sequence by using

gapped BLAST program with default setting.• E value determination based on overall distribution of E value over the

entire protein space.• Detection of transitive best hit.• Single-linkage clustering to the best hit and get the symmetrical best hit.• Remove the fragment sequence.• Single-linkage clustering again.• Detection of cluster that has no cut-edges (bridge).• Detection of cluster with as least one triangle of mutually consistent, ge

nome-specific best hits (BeTs).• Iterative multiple alignment.• Detection of triangles of mutually consistent, genome-specific best hits• Case by case analysis of each candidate family.

Page 10: Presented by Guohui Ding R&D, SIBS, CAS

About E value• Based on the overall distribution of expectation values over the

entire protein space.• The distribution shown may be thought as the average distribution

of E value for a ‘typical’ protein sequence as a query.• The steep slope at high E value indicates a rapid growth in the

number of sequences that are unrelated to the query sequence.• Every sequence has

its own, only the threshold derived by the averaged distribution is reliable.

• The deviation from straightline starts around 1e-5 in my work.

Proteins. 1999 Nov 15;37(3):360-78.

Page 11: Presented by Guohui Ding R&D, SIBS, CAS

About transitive/ symmetrical best hit

• A threshold of 1e-5 for E value of HSPs.• HSPs are not compatible with a global alignment.• The remaining HSPs cover at least 80% of the p

roteins length.• Their similarity is greater or equal to 50%• Both sequences are complete

Genome Res. 2000 Mar;10(3):379-85.

Page 12: Presented by Guohui Ding R&D, SIBS, CAS

About cluster that has no cut-edges

• It detects densely connected regions in large protein-protein similarity networks.

• Splitting the large family

Cut edges

Page 13: Presented by Guohui Ding R&D, SIBS, CAS

About triangle of mutually consistent, genome-specific best hits (BeTs)

• Triangle• Mutually consistent• Genome-sepcific

Science. 1997 Oct 24;278(5338):631-7. Review

Page 14: Presented by Guohui Ding R&D, SIBS, CAS

Iterative multiple alignment• Multiple sequence alignment with

CLUSTAL W (1.83) in default value• Boot-strapping with 500 bootstraps• If the tree branch’s bootstrap

value less than 50%, break the branch and get two subfamily.

• Multiple sequence alignment withthe subfamily’s members untilthere is no branch whose bootstrapvalue is less than 50% in the family(11 times to the end).

Page 15: Presented by Guohui Ding R&D, SIBS, CAS

Estimation of molecular time scale

• Inference of phylogenetic tree• Calibration time • Maximum likelihood estimation of protein d

ivergence times

Page 16: Presented by Guohui Ding R&D, SIBS, CAS

Inference of phylogenetic tree

• Neighbor-Joining method with Poisson distance

• Prokaryotic or other non-animal sequence as the outgroup to find the root. In the absence of outgroup sequence, the root is given at the midpoint of the longest route connecting two proteins(midpoint rooting).

• Software: LINTREE by N. Takezaki

Mol Biol Evol. 1987 Jul;4(4):406-25.

Page 17: Presented by Guohui Ding R&D, SIBS, CAS

Calibration time• Several calibration

– Mouse-rat: 41mya– Primate-rodent: 91mya– Mammal-bird: 310Mya– Vertebrate-Drosophila: 993Mya– Vertebrate-nematodes: 1177Mya– Animal-plant-fungi: 1576Mya

• Mapping it to the phylogenetic tree manually

• Mark each orthlogous with an evolution rate group

1399_1.trees

To: ppt21Nat Rev Genet. 2002 Nov;3(11):838-49. Review. Trends Genet. 2003 Apr;19(4):200-6.

Page 18: Presented by Guohui Ding R&D, SIBS, CAS

Maximum likelihood estimation of protein divergence times• Specify a empirical mode: mtREV24.dat.• Gamma shape parameter is estimate by th

e soft itself.• Global clock and local clock all will be use

d. (For robust test)• Software: PAML 3.14 by Ziheng Yang

Syst. Biol. 52(5):705-716, 2003

Page 19: Presented by Guohui Ding R&D, SIBS, CAS

Global clock vs. Local clock

Global clock Local clock

Page 20: Presented by Guohui Ding R&D, SIBS, CAS

Global clock vs. Local clock

The coefficient of pearson correlation is 0.7439441 (p < 2.2e-16).: y = x : regression lines for local clock vs. global clock.

Page 21: Presented by Guohui Ding R&D, SIBS, CAS

Duplication events detection• Outparalog[6]: paralogs in the given lineage that evolv

ed by gene duplications that happened before the radiation (speciation) event.

• Orthologous along with the corresponding duplication event have at least two paralogs from different species.

• Exclude gene families that was sharply in conflict with the uncontested animal phylogeny.

• We identified 1651 duplication events in the final data set with 786 gene families. All the duplication events were noted with the time point it happened.

• As 31 duplication events’ time is larger than 4.5 Gya, we only keep 1620 duplication events’ time point.

See: ppt 17

Page 22: Presented by Guohui Ding R&D, SIBS, CAS

Distribution of the Taxonomy • 100% mouse• 97% rat• 92% human• 60% chicken• 27% fly• 23% worm• 10% cress• 6% fission yeast• 6% baker yeast

Page 23: Presented by Guohui Ding R&D, SIBS, CAS

Data processing

• Overall distribution• Duplication and the extinction/origination• Periodogram analysis (FFT)

Page 24: Presented by Guohui Ding R&D, SIBS, CAS

Overall distribution

Control

Kernel density estimates withgaussian method.

Page 25: Presented by Guohui Ding R&D, SIBS, CAS

Result …• About the control

– Randomly sample 1620 time point from all the nodes marked with time point to generate a distribution, without replacement.

– Repeat 10,000 times to get 10,000 randomly generated profile.– An average distribution from the generated distribution by the means of ever

y bins. (Red line in the graph is the average distribution by random).– The distance/correlation between the randomly generated/observed distribu

tion and the average distribution are calculated. By the distribution of the distance/correlation, p << 0.00001.

– By ~2.75Gya, the observed distribution deviate from the control. We use the data after 2.75Gya following.

• Strikingly, the overall distribution of duplication after 2.75 Gya is not a uniform distribution. (D = 0.5318, p < 2.2e-16, Kolmogorov-Smirnov test)

• The distribution of the data conform to a random walk.– Random walk is the model of the form– Sequence of ε is gotten and a KS uniform test is applied to it. As D = 0.9982,

p = 0.2730, we can’t reject the null hypothesis. (注明:该处统计有误,当时做的统计实际上是 ks.test(x, max(x), min(x))。具体的统计应该是 ks.test(x, ‘punif’, max(x), min(x)), 但是统计上不能通过。或者做 Box.test()统计 white noise)

11 iii yy

Page 26: Presented by Guohui Ding R&D, SIBS, CAS

Discussion …• ~ 2.75 Gya is a very important time point in the rise of the atmospheric oxygen. T

here are two scenarios surround this question[7]. Out data show something changed ~2.75 Gya consisting with the evolution of oxygenic photosynthesis by 2.7Gya supported by organic biomarker and carbon stable isotope evidence. In this scenario, we can see the TM Gene’s duplication increased when the oxygenic content of the air changed (e. g, flower plant(~0.146Gya), platsid(~1.58Gya), mitochondria(~1.8Gya), et al).

– Two Great Oxidation Event[8]: 2.0 ~ 2.4Gya; 0.55~0.8Gya– Snowball earth[9]: 0.58 ~ 0.75 Gya

• The emergence of platsid/mitochondrion may take an import role in the TM protein evolution. Organelle has more membrane structure. The rise of complex multicellular life(1~ 1.5Gya) also is the cause[10].

• The rate of the TM protein duplication is non-uniform. This conforms to the result that both large- and small-scale duplications in the evolution.

• The random-walk model of distribution suggests that either these variables were correlated with environmental variables that follow a random walk or so many mechanisms were affecting these variable, in different ways, that the resultant trends appear random.[11]

Page 27: Presented by Guohui Ding R&D, SIBS, CAS

Duplication and the extinction/origination

?

?

Page 28: Presented by Guohui Ding R&D, SIBS, CAS

Result ..• About the extinction

– Early cambrian (512Mya)– End ordovician(439Mya)– Frasnian-Famennian(376Mya)– End-Permian(251Mya)– End-Triassic(206Mya)– Cretaceous-Tertiary(65Mya)

• Almost all the major mass extinction corresponding to a duplication peak, but two peak has no corresponded extinction record.

• Base on the fossil data of marine animal, origination/ extinction rates were computed by linear interpolation for the appropriate time. The correlation of origination/ extinction rates and duplication number are calculated.

– Extinction rates displays positive correlation with duplication profile, but not significant. (r = 0.0259369, p = 0.5483) (r = 0.07933089, p = 0.4144)

– Origination rates shows significant negative correlation with duplication profile. (r = -0.1546602, p = 0.0003174) (r = -0.1230396, p = 0.2046)

– For diversity (r = -0.3018349, p = 0.0015) • Kernel density estimates. (Genetics 147:1965-1975)

Page 29: Presented by Guohui Ding R&D, SIBS, CAS

Discussion …• A funny and plausible mode (creator by extinction)(divergent resolution?)

– When the environment changed dramatically, the population of most species will be smaller, even extinct (extinction). In the gene duplication’s mode, the sudden and various positive selection will fix more new duplicates in neo/sub function. On the other hand, a change which is deleterious to the gene’s function is readily to escape purifying selection in a small population[12]. In the population, its redundance and robust all increase[13]. So the genome structure isn’t a optimized one, but good for survival (note: TM protein mostly belong to dosage-sensitive gene). If the environment level off, the population must increase and migrate. For a redundant genome, it will subfunctionalize some duplicates. This time, most new species will emerge. (Is this one of the possible logic among duplication, extinction, origination?)

– The correlation analysis between origination/ extinction rates and duplication profile may need more data. But they can say something.

• Life is not only a passive process, especially the ecosystem. (about the two conflicts in the figure) (consist with the evolution of oxygenic photosynthesis)– ~0.3Gya, Gymnosperms begin to diversify widely. – ~0.13Gya, Angiosperm plants evolve flowers, structures that attract insects a

nd other animals to spread pollen. The evolution of the angiosperms cause a major burst of animal evolution.

Nature, vol 400, 58~ 61 ( For flower plant)

Page 30: Presented by Guohui Ding R&D, SIBS, CAS

Periodogram analysis (FFT)

dtimectimebtimeafit 23

a=56.72948; b=-51.95965; c=12.79374; d=0.07294

Page 31: Presented by Guohui Ding R&D, SIBS, CAS

FFT …

)2sin( PhasetimePeriod

Ampfit

Amp=0.15357741Period =0.06230346 GyaPhrase=1.09601472 (radians)

Account for 8.5% of the variance.(>5%)

Page 32: Presented by Guohui Ding R&D, SIBS, CAS

FFT …Model R/W Monte Carlo simulation

P=0.1294

P=0.0138

R model

W model

α=0.05

α=0.05

control

observation

Page 33: Presented by Guohui Ding R&D, SIBS, CAS

Result …• 0.062Gya cycles is evident in the Phanerozoic in the fourier sp

ectrum, but can’t reject the Random walk null hypothesis. (R: p = 0.1294; W: p = 0.0138; V: 8.52%)– Several others: 0.0912Gya (0.2956/0.0039, 10.63%); 0.0275Gya (0.00

71/0.1037, 4.22%); 0.0162(1e-4/0.1047).– Ten thousand Monte Carlo simulations were done.

• Overall Periodogram after ~ 2.75Gya– Not a good question. It is difficult to choose an appropriate trends functi

on to detrend the data. • The phase is different between fossil diversity and duplication’

s 62 Mya cycles’ wave.– 5.21 (radians) - 1.1 (radians) = 4.1 (radians) = 1.305π

Nature. 2005 Mar 10;434(7030):208-10

Page 34: Presented by Guohui Ding R&D, SIBS, CAS

Discussion …• The 62-million-year wave is surprisingly strong and— so

far – there is no good explanation for it (the wave from the GOD^_^). We have detected it in an independent data applying the same trend functions. Is it an egg-chicken question?– It implicates some essence question about the life and the enviro

nment. What cause it? – We give a second way to discuss this question.

• About the phase shifting– 1.305π ≠ π. In my story, it must be 1.5 π, but that is not the true. – The phase shifting indicates the asynchronism between duplicati

on profile and genus diversity.

Nature. 2005 Mar 10;434(7030):208-10

Page 35: Presented by Guohui Ding R&D, SIBS, CAS

Some references• [1]Jianzhi Zhang. Evolution by gene duplication: an update. TRENDS in Ecology and Evolution 18, 292-298(2003).• [2]Michael Lynch & Vaishali Katju. The altered evolutionary trajectories of gene duplicates. TRENDS in Genetics

20, 544-549(2004).• [3]David Jablonski. The interplay of physical and biotic factors in macroevolution. Evolution Planet Earth(book).• [4]U Lehnert, Y Xia et al. Computational analysis of membrane proteins: genomic occurrence, structure prediction

and helix interactions. Quaterly Review in Biophysics (in press). • [5]Lynch M & Conery JS. The evolutionary fate and consequences of duplicate genes. Science 290(5494), 1151-

5(2000). • [6]Sonnhammer EL, Koonin EV. Orthology, paralogy and proposed classification for paralog subtypes. Trends Ge

net 18(12), 619-20(2002). • [7]Canfield DE, Habicht KS, Thamdrup B. The Archean sulfur cycle and the early history of atmospheric oxygen.

Science. 2000 Apr 28;288(5466):658-61. • [8]Hayes JM. Biogeochemistry: a lowdown on oxygen. Nature. 2002 May 9;417(6885):127-8. • [9]Hoffman PF, Kaufman AJ, Halverson GP, Schrag DP. A neoproterozoic snowball earth. Science. 1998 Aug 28;

281(5381):1342-6. • [10]Hedges SB, Blair JE, Venturi ML, Shoe JL. A molecular timescale of eukaryote evolution and the rise of compl

ex multicellular life. BMC Evol Biol. 2004 Jan 28;4(1):2. • [11]Cornette JL, Lieberman BS. Random walks in the history of life.Proc Natl Acad Sci U S A. 2004 Jan 6;101(1):

187-91. • [12]Sidow A. Gen(om)e duplications in the evolution of early vertebrates. Curr Opin Genet Dev. 1996 Dec;6(6):71

5-22. • [13]Gu Z, Steinmetz LM, Gu X, Scharfe C, Davis RW, Li WH. Role of duplicate genes in genetic robustness again

st null mutations. Nature. 2003 Jan 2;421(6918):63-6.

Page 36: Presented by Guohui Ding R&D, SIBS, CAS

。。。• Function clustering• The methodology discussion• ……

Page 37: Presented by Guohui Ding R&D, SIBS, CAS

Acknowledge• Dr Qi Wang Prof Yixue Li• Dr Qi Liu Prof Gang Pei• Ziliang Qian Prof Tieliu Shi• Yongzhang Zhu• Guang Li• PeiLin Jia• Changzheng Dong• Fudong Yu• ……