high-throughut comparative genomics

48
High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London

Upload: lois-crawford

Post on 04-Jan-2016

46 views

Category:

Documents


1 download

DESCRIPTION

High-throughut comparative genomics. 24th October 2013. Joe Parker, Queen Mary University London. Topics. Introduction Background: why phylo genomics ? Examples Practice Case study On the horizon Over the horizon. Aims. Context of phylogenomics: Next-generation sequencing (NGS) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High-throughut comparative genomics

High-throughut comparative genomics

24th October 2013Joe Parker,

Queen Mary University London

Page 2: High-throughut comparative genomics

Topics

1. Introduction2. Background: why phylogenomics?3. Examples4. Practice5. Case study6. On the horizon7. Over the horizon

Page 3: High-throughut comparative genomics

Aims

• Context of phylogenomics: Next-generation sequencing (NGS)

• Why phylogenomics?• Practical analyses• Future developments

Page 4: High-throughut comparative genomics

1. Our Research

Page 5: High-throughut comparative genomics

Lab Interests

• Ecology and evolution of traits• Echolocation, sociality• NGS data for population genetics and phylogenomics

Page 6: High-throughut comparative genomics

Activities

• Phylogeny estimation/comparison• Molecular correlates of evolution;

– site substitutions, dN/dS, composition• Simulation • Dataset limitations

(R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey

Page 7: High-throughut comparative genomics

2. Background

Page 8: High-throughut comparative genomics

Next-generation sequencing

Page 9: High-throughut comparative genomics

Why phylogenomics, not -genetics?

• Causes of discordant signal– Incomplete lineage sorting– Lateral transfer– Recombination – Introgression

Page 10: High-throughut comparative genomics

Quantitative biology

• Multiple configurations

• Hyperparameters empirically investigated

• Determine sensitivity of results

Page 11: High-throughut comparative genomics

Distributions

• Genome-scale data provides context

• Identify outliersGenes / taxa / trees

• Compare values across biological systems

Page 12: High-throughut comparative genomics

Integration with ‘Omics

• Multiple databases

• Functional data

• Bibliographic information

Page 13: High-throughut comparative genomics

3. Example studies

Page 14: High-throughut comparative genomics

Tsakgogeorgia et al. (in press)

QuickTime™ and a decompressor

are needed to see this picture.

Page 15: High-throughut comparative genomics

Salichos & Rokas (2013)

QuickTime™ and a decompressor

are needed to see this picture.

Page 16: High-throughut comparative genomics

Backström et al. (2013)

QuickTime™ and a decompressor

are needed to see this picture.

Page 17: High-throughut comparative genomics

Lindblad-Toh et al. (2011)

QuickTime™ and a decompressor

are needed to see this picture.

Page 18: High-throughut comparative genomics

4. Practice

Page 19: High-throughut comparative genomics

Source material

• Samples• Storage• Purification• Library prep

Page 20: High-throughut comparative genomics

Sequencing

• Genome– Sanger– Illumina – Pyro /454– SOLiD– PacBio

• Transcriptome / RNA-seq– MyBAITS

• HiSeq / MiSeq• IonTorrent

Page 21: High-throughut comparative genomics

Infrastructure

• Desktop machines• Computing clusters• Grid systems• Cloud-based computation

Page 22: High-throughut comparative genomics

Assembly, Annotation

• Assembly– To reference (mapping)– De novo

• Annotation– By homology– De novo

•SOAPdenovo•MAKER•Velvet•Bowtie / Cufflinks / Tophat•Trinity

Page 23: High-throughut comparative genomics

Alignment

• PRANK• MUSCLE• MAFFT• Clustal

Page 24: High-throughut comparative genomics

Phylogeny inference

• MrBayes• RAxML• BEAST• MP-EST• STAR

Page 25: High-throughut comparative genomics

Phylogenetic analysis

• BEAST• HYPHY• PAML• Pipelines• LRT

Page 26: High-throughut comparative genomics

5. Case study

Page 27: High-throughut comparative genomics

QuickTime™ and a decompressor

are needed to see this picture.

Parker et al. (2013)

• De novo genomes:– four taxa– 2,321 protein-coding loci– 801,301 codons

• Published:– 18 genomes

• ~69,000 simulated datasets

• ~3,500 cluster cores

Page 28: High-throughut comparative genomics

Our pipeline for detecting genome-wide convergence

Page 29: High-throughut comparative genomics
Page 30: High-throughut comparative genomics
Page 31: High-throughut comparative genomics
Page 32: High-throughut comparative genomics
Page 33: High-throughut comparative genomics
Page 34: High-throughut comparative genomics
Page 35: High-throughut comparative genomics
Page 36: High-throughut comparative genomics

mean = 0.05

Page 37: High-throughut comparative genomics

mean = 0.05 mean = -0.01 mean = -0.08

Page 38: High-throughut comparative genomics

Development cycle

Design

Wireframe & specify

tests

Implement

AlignmentloadSequences()

getSubstitutions()

PhylogenytrimTaxa()getMRCA()

DataSeriescalculateECDF()randomise()

RegressiongetResiduals()

predictInterval()

Review, refine & refactor

Page 39: High-throughut comparative genomics

Parker et al. (2013)

Page 40: High-throughut comparative genomics

Parker et al. (2013)

Page 41: High-throughut comparative genomics

6. On the horizon

Page 42: High-throughut comparative genomics

Environmental metagenomics

Page 43: High-throughut comparative genomics

Models of computation

• Cloud resources: Unlimited flexibility, finite time

• Development trade-off– Off-the-shelf– Bespoke

• Exploratory work– Real time genomic

transects?

• Essential fundamental data missing from nearly every system;

– Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer

Page 44: High-throughut comparative genomics

Serialisation

• Process data remotely

• Freeze-dry objects, download to desktop

• Implement new methods directly on previously-analysed data

Page 45: High-throughut comparative genomics

7. Over the horizon

• Real-time phylogenetics• Field phylogenetics• Alignment-free analyses

Page 46: High-throughut comparative genomics

Conclusions

• Why phylogenomics?• Practice• Comparative approach• Statistical context

Page 47: High-throughut comparative genomics

ThanksSteve Rossiter1, James Cotton2, Elia Stupka3 & Georgia Tsagkogeorga1

1School of Biological and Chemical Sciences, Queen Mary, University of London2Wellcome Trust Sanger Institute

3Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan

Chris Walker & Dan TraynorQueen Mary GridPP High-throughput Cluster

Chaz Mein & Anna TerryBarts and The London Genome Centre

Mahesh PancholiSchool of Biological and Chemical Sciences

BBSRC (UK); Queen Mary, University of London

Page 48: High-throughut comparative genomics

Resources• My email: Joe Parker (Queen Mary University of London): [email protected]

• Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511.

• Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press.

• Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327-331. doi:10.1038/nature12130

• Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033

• Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J., et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530

• Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009

• The Tree Of Life: http://phylogenomics.blogspot.co.uk/

• RNA-seq For Everyone: http://rnaseq.uoregon.edu/index.html

• Evo-Phylo: http://www.davelunt.net/evophylo/tag/phylogenomics/

• OpenHelix: http://blog.openhelix.eu/

• Our blogs: http://evolve.sbcs.qmul.ac.uk/rossiter/ (lab) and http://www.lonelyjoeparker.com/?cat=11 (Joe)