phylopat phylogenetic pattern analysis of eukaryotic genes tim hulsen 2006-11-22

22
PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Upload: melanie-garrett

Post on 13-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

PhyloPatphylogenetic pattern analysis

of eukaryotic genes

Tim Hulsen

2006-11-22

Page 2: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Goal

• Create database of sequence (in this case: gene, Protein World: protein) relationships over several species

• Can be used for transferring information from model organisms to humans (-> drug testing)

• Database can be used for many other things too… like: analysis using phylogenetic patterns

Page 3: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Introduction (1)

• Phylogenetic patterns show presence/absence of genes over a certain set of species:e.g. for 10 species: 0011101011

• Very useful for all kinds of evolutionary analyses:– Origin of certain genes– Deletion of certain genes– Clustering of genes with similar patterns: likely

to have similar function / be in same pathway

Page 4: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Introduction (2)

• Earlier phylogenetic pattern initiatives:– Phylogenetic Pattern Search (PPS), incorporated into

COG (Natale et al., 2000)– Extended Phylogenetic Patterns Search (EPPS)

(Reichard & Kaufmann, 2003)– Incorporated into OrthoMCL-DB (Chen et al., 2006)

• All applied on proteins, not on genes! PhyloPat: phylogenetic pattern analysis of

eukaryotic genes

Page 5: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Method

• Genes: easier to check for lineage-specific expansions (no alternative transcripts or splice forms); less redundant

• Basis: Ensembl (EnsMart) database: 21 fully available genomes (i.e. no Pre! versions or low coverage genomes): S. cer. to H. sap.

• Make use of accurate Ensembl orthology pipeline (combination of BLAST,SW,MUSCLE and PHYML)

• Single linkage cluster algorithm: create orthologous groups containing ALL genes in Ensembl

Page 6: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Results

• 446,825 genes were clustered into 147,922 groups, using 3,164,088 orthologies from 21 species

• Species ordered from ‘low’ ( ) to ‘high’ ( ), i.e. approximate distance to human :

• Can be queried in several ways• Output in HTML, Excel or plain text format

Page 7: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Web interface

http://www.cmbi.ru.nl/phylopat

Page 8: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Pattern/ID Search

• Binary string:0=absent, 1=present, *=absent/presente.g. ‘00000********11111111’: must be absent in non-chordata , must be present in all mammals

• MySQL regular expression:e.g. ‘^0*1{10}0*$’ gives all genes that occur only in ten subsequent species

• Input list of Ensembl/EMBL IDs (PhyloPat contains EMBL to Ensembl mapping)

Page 9: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Output

Page 10: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Phylogenetic Tree

Page 11: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Oligo-/Polypresent Genes• Oligopresent: present in only one/two species (oligo=few),

e.g. ‘000000010000000000100’• These two species should be highly related

1. C. sav C. int 1737 div. 100 Mya(Boffelli et al.,

2004)

2. T. nig T. rub 1572 div. 85 Mya(Yakanoue et al., 2006)

3. A. gam A. Aeg 1058 div. 140 Mya(Service, 1993)

4. P. tro H. sap 887 div . 6 Mya(Glazko & Nei, 2003)

5. R. nor M. Mus 713 div. 20 Mya(Springer et al., 2003)

• Polypresent: present in all species, except for one/two (poly=many),

e.g. ‘111110111110111111111’• These two species should be related too; similar analysis possible

Page 12: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Omnipresent genes

• Omnipresent: present in all 21 species (omni=all): ‘111111111111111111111’

• Currently 1001 omnipresent groups

• Tend to have very general/important functions, mostly involved in transcription/translation

Page 13: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

FatiGO analysis

• FatiGO: connection with GO terms, KEGG pathways, InterPro domains, etc. (El-Shahrour et al., 2004)

• Analysis of all human genes in output by just one mouse click

• e.g. omnipresent genes:

Page 14: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Other possibilities

• Anti-correlating patterns:

e.g. ‘001111100011000000000’

and ‘110000011100111111111’

could be completely different, or very similar (analogous)!

• Easy homology-inferred functional annotation (using information from other genes in the same lineage)

Page 15: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Case study: Hox genes (1)• Hox genes determine where limbs and other body

segments will grow in a developing embryo• Should exist mostly in vertebrates• Expansion in teleost fish species ( , 8-11);

seven Hox clusters instead of the mammalian four• Search Ensembl database for human genes with term

‘hox’ in annotation• 44 genes found -> enter in PhyloPat -> 32 groups found

(PP######)

Page 16: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Case study: Hox genes (2)PPID # genes per species phylogenetic pattern gene name(s)PP022041 011111136562233233222 011111111111111111111 MSX1, MSX2PP024984 001000011111001111111 001000011111001111111 HOXC4PP027791 001110023343233333333 001110011111111111111 TLX1, TLX2, TLX3PP049478 000000221153112322223 000000111111111111111 HOXB8, HOXC8, HOXD8PP053824 000000011120010101011 000000011110010101011 HOXD11PP053827 000000022211111111111 000000011111111111111 HOXA10PP053828 000000021111212122222 000000011111111111111 HOXC13, HOXD13PP053829 000000063341122222222 000000011111111111111 HOXA1, HOXB1PP053830 000000011110010111111 000000011110010111111 HOXB4PP053832 000000021111011111111 000000011111011111111 HOXA5PP053833 000000021110111111011 000000011110111111011 HOXB2PP053834 000000031101011111111 000000011101011111111 HOXD3PP053835 000000021110111111101 000000011110111111101 HOXA9PP053836 000000021111111111111 000000011111111111111 HOXA3PP053838 000000021110101111111 000000011110101111111 HOXC12PP053839 000000011111111110111 000000011111111110111 HOXD4PP053840 000000021111201011101 000000011111101011101 HOXC11PP053842 000000043221111111111 000000011111111111111 HOXA13PP053844 000000032231011111111 000000011111011111111 HOXB5PP053845 000000021111111111011 000000011111111111011 HOXB3PP053846 000000021121111111111 000000011111111111111 HOXD10PP053847 000000022211111111111 000000011111111111111 HOXA2PP053849 000000034151132333323 000000011111111111111 HOXA6, HOXB6, HOXC6PP053853 000000011101111111011 000000011101111111011 HOXA4PP053854 000000032252223133213 000000011111111111111 HOXB9, HOXC9, HOXD9PP053858 000000011120011111111 000000011110011111111 HOXA11PP070659 000000000121212222222 000000000111111111111 HOXA7, HOXB7PP075622 000000000010001111111 000000000010001111111 HOXC5PP084287 000000000001101111111 000000000001101111111 HOXC10PP085049 000000000001011011111 000000000001011011111 HOXD1PP087941 000000000000111011111 000000000000111011111 HOXD12PP089685 000000000000111111111 000000000000111111111 HOXB13

Page 17: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Case study: Hox genes (3)PPID(s) name cl.A cl.B cl.C cl.D first sp. positionPP053829,085049 HOX1 HOXA1 HOXB1 HOXD1 T. nigrov. anteriorPP053847,053833 HOX2 HOXA2 HOXB2 T. nigrov. anteriorPP053836,053845,053834 HOX3 HOXA3 HOXB3 HOXD3 T. nigrov. PG3PP053832,053844,075622 HOX5 HOXA5 HOXB5 HOXC5 T. nigrov. centralPP053849 HOX6 HOXA6 HOXB6 HOXC6 T. nigrov. centralPP053835,053854 HOX9 HOXA9 HOXB9 HOXC9 HOXD9 T. nigrov. posteriorPP053827,084287,053846 HOX10 HOXA10 HOXC10 HOXD10 T. nigrov. posteriorPP053858,053840,053824 HOX11 HOXA11 HOXC11 HOXD11 T. nigrov. posteriorPP053838,087941 HOX12 HOXC12 HOXD12 T. nigrov. posteriorPP053842,089685,053828 HOX13 HOXA13 HOXB13 HOXC13 HOXD13 T. nigrov. posterior

PP053853,053830,024984,053839 HOX4 HOXA4 HOXB4 HOXC4 HOXD4 A. gamb. centralPP027791 TLX TLX1 TLX2 TLX3 A. gamb.

PP070659 HOX7 HOXA7 HOXB7 G. acul. central

PP049478 HOX8 HOXB8 HOXC8 HOXD8 C. intest. central

PP022041 MSX MSX1 MSX2 C. eleg.

‘First’vertebrate

Non-vertebrate

Non-vertebrate

Non-vertebrate

Vertebrate

Page 18: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Conclusions

• PhyloPat: quick and easy tool for phylogenetic pattern search on complete Ensembl database

• Also usable for study of lineage-specific expansions of genes

• Just updated to Ensembl v41 (released last Thursday); 5 new species:

D.nov E.tel L.afr O.cun O.lat+ extra option: gene neighborhood

Page 19: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Gene neighborhood

Equal color = belonging to same orthologous group

Conservation of gene order = functionally related

Page 20: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Future directions

• Map (drug discovery) pathways in model organisms and man to each other, to understand differences between species

• Now being applied in immunogenomics study within Organon: how does immune system evolve from model organisms to man?

Page 21: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Acknowledgements

Supervision:• Peter Groenen• Jacob de Vlieg

Fruitful discussions:• Wilco Fleuren• Erik Franck• Nanning de Jong• Arnold Kuzniar

supervisor

head of group

suggestions

suggestions

suggestions

suggestions

Page 22: PhyloPat phylogenetic pattern analysis of eukaryotic genes Tim Hulsen 2006-11-22

Where to find• Web interface:

http://www.cmbi.ru.nl/phylopat

(accessible through www.cmbi.ru.nl and www.nbic.nl)

• Publication:

Hulsen T., Groenen P.M.A., de Vlieg J.

BMC Bioinformatics 2006, 7: 398

http://www.biomedcentral.com/1471-2105/7/398

• Powered by Ensembl:

http://www.ensembl.org/info/about/ensembl_powered.html