“biomedical computing is entering an age where creative exploration

36

Upload: bridie

Post on 10-Jan-2016

29 views

Category:

Documents


1 download

DESCRIPTION

“Biomedical computing is entering an age where creative exploration of huge amounts of data will lay the foundation of hypotheses. Much work must still be done to collect data and create the tools to analyse it. Bioinformatics, which provides the tools to extract and - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: “Biomedical computing is entering an age where creative exploration
Page 2: “Biomedical computing is entering an age where creative exploration
Page 3: “Biomedical computing is entering an age where creative exploration

“Biomedical computing is entering an age where creative explorationof huge amounts of data will lay the foundation of hypotheses.Much work must still be done to collect data and create the tools toanalyse it. Bioinformatics, which provides the tools to extract andcombine knowledge from isolated data, gives us ways to think aboutthe vast amounts of information now available. It is changing theway biologists do science.”

A report to Harold Varmus, June 3 1999.

Page 4: “Biomedical computing is entering an age where creative exploration

3 Kilobytes

6 Megabytes

9 Terabytes

12 Petabytes

15 Exabytes

18 Zettabytes

21 Yottabytes

Page 5: “Biomedical computing is entering an age where creative exploration

GAATTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGAGCCCAAATGAGCGATAGATAGATAGATCGTGCGGCGATCTCGTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGGTTCTGGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGCTTACGATCGGGTTTTGGGCTTTGGTTGTGGCCTCCAGTTCTCTGGCTCGTTGCCTGTGCCAATTCAAGTGCGCATCCGGCCGTGTGTGTGGGCGCAATTATGTTTATTTACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGACTGGTAACTGGTAATTTGATCGATTCAAACGATTCTGGGTCTCCCCGGTTTTCTGTCCCGGTTCAATCTCGTAGAACTTGCCCTTGGTGGACAGTGGGACGTACAACACCTGCCGGTTTTCATTAAGCAGCTGGGCATACTTCTTTTCCTTCTCCCTTCCCATGTACCCACTGCCATGGGACCTGGTCGCATTGCCGTTGCCATGTTGCGACATATTGACCTGATCCTGTTTGCCATCCTCGAAGACGGCCAACAGACGGAATACCTGCCCGCCCCTTGCCGTCGTTTTCACGTACTGTGGTCGTCCCTTGTTTATGGGCAGGCATCCCTCGTGCGTTGGACTGCTCGTACTGTTGGGCGAGGATTCCGTAAACGCCGGCATGTTGTCCACTGAGACAAACTTGTAAACCCGTTCCCGAACCAGCTGTATCAGAGATCCGTATTGTGTGGCCGTGGGGAGACCCTTCTCGCTTAGCATCGAAAAGTAACCTGCGGGAATTCCACGGAAATGTCAGGAGATAGGAGAAGAAAACAGAACAACAGCAAATACTGTGCGGCGATCTCGTACTGGACGGAAATGTCAGGAGATAGGAGAAGAAAA

Page 6: “Biomedical computing is entering an age where creative exploration

Nucleotide sequence database.

0

200

400

600

82 83 84 85 86 87 88 89 90 91 92 93 94 95 96Year

Meg

ab

ase

s

Page 7: “Biomedical computing is entering an age where creative exploration

The Human Proteome

• ~ 30,000 protein coding genes

• Expansion of the number of different protein molecules due to:– (a) alternative splicing (30 to 50% increase);– (b) post-translational modifications (5 to 10 fold

increase)

• There could well be about 1 million different protein molecules in the human body

Page 8: “Biomedical computing is entering an age where creative exploration
Page 9: “Biomedical computing is entering an age where creative exploration
Page 10: “Biomedical computing is entering an age where creative exploration

Annotated genome

Annotation

Depth

of

know

ledge

Breadth of knowledge

Detailed analysis (typically biological)

of single genes

Large-scale analysis (typically

computational) of entire genome

Page 11: “Biomedical computing is entering an age where creative exploration

The two major methods of gene prediction

• sequence comparison

• ab initio

Page 12: “Biomedical computing is entering an age where creative exploration

Approaches to gene finding: Generalized hidden Markov models

Page 13: “Biomedical computing is entering an age where creative exploration

Limitations of Gene Prediction Programs

• Good at predicting ORF-containing sequence

• Prediction of exact exon-intron boundaries difficult

• Fuse & split genes• Cannot predict UTRs• Cannot predict nested genes

Page 14: “Biomedical computing is entering an age where creative exploration

Computational Analysis

Fly Alignments

•Known genes/cDNAs

•ESTs

•Transposons

Cross-species Sequence Similarities

Proteins & ESTs•Fly•Primate•Rodent•Worm•Yeast•Plant•Other Insects•Other Vertebrates•Other Invertebrates

Gene Predictions

•Genie

•Genscan

•tRNAscan-SE

Page 15: “Biomedical computing is entering an age where creative exploration
Page 16: “Biomedical computing is entering an age where creative exploration

Drosophila Gene Collection 1 Pavel Tomancak

Page 17: “Biomedical computing is entering an age where creative exploration

• Embryonic expression of wild-type eve (rust) and a transgene containing the stripe 3 + 7 tertiary element (blue)

• Alignment of eve 5’ regulatory region

• D. melanogaster vs (A) D.erecta (B) D.pseudoobscura

(C) D. willistoni and (D) D.littoralis

stripe 3 + 7

eve

Page 18: “Biomedical computing is entering an age where creative exploration
Page 19: “Biomedical computing is entering an age where creative exploration
Page 20: “Biomedical computing is entering an age where creative exploration
Page 21: “Biomedical computing is entering an age where creative exploration

Gene_Ontology

FlyBase - Drosophila - Cambridge & EBI, HarvardBerkeley & Bloomington.

Saccharomyces Genome Data Base - Stanford.Mouse Genome Informatics - Jackson Labs.

The Arabidopsis Information Resource - StanfordWormBase - Caltech & CSHL

DictyBase - Chicago

SwissProt - Hinxton & Geneva The Institute for Genome Research - MD

With support from NIH (NHGRI) &AstraZeneca.

Page 22: “Biomedical computing is entering an age where creative exploration

The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.

Page 23: “Biomedical computing is entering an age where creative exploration

What is an Ontology?

An ontology is a specification of a conceptualization that is designed for reuse across multiple applications and

implementations. …a specification of a conceptualization is a written, formal description of a set of concepts and

relationships in a domain of interest.

Peter Karp (2000) Bioinformatics 16:269

Page 24: “Biomedical computing is entering an age where creative exploration

• The Gene Ontology Consortium subscribes to the

Manifesto of Liberation Bioinformatics:

• Open source

• Open standards

• Open annotation

• Open data• merci tim hubbard - liberationise extraordinaire de ‘inxton

Page 25: “Biomedical computing is entering an age where creative exploration

Introduction to GO Introduction to GO

GO: A Gene Ontology

GO Objectives:

Provide a controlled vocabulary for the description of the molecular function and cellular location of gene products, as well as the role of the gene products in basic biological processes

Use these terms as attributes of gene products in the collaborating databases

Allow queries across databases using GO terms, providing the linking of biological information across species

Page 26: “Biomedical computing is entering an age where creative exploration

GO = Three OntologiesGO = Three Ontologies

• Biological Process = goal or objective within cell

• Molecular Function = elemental activity or

task

• Cellular Component = location or complex

Page 27: “Biomedical computing is entering an age where creative exploration

Parent-Child RelationshipsParent-Child Relationships

HierarchyOne-to-many parental relationship

Directed acyclic graph - dagMany-to-many parental relationship

Each child has only one parent

Each child may have one or more parents

Page 28: “Biomedical computing is entering an age where creative exploration

Classes of parent-child relationship:

• ISA (hyponomy) - as in: an elephant is a mammal.

• PARTOF (meronomy) - as in: a trunk is part of an elephant.

Page 29: “Biomedical computing is entering an age where creative exploration

cellular_component

%membrane %vacuolar membrane %nuclear membrane%intracellular %cell <cytoplasm <vacuole <vacuolar membrane <vacuolar lumen <nucleus <nuclear membrane

cellular_component

vacuolarmembrane

membrane intracellular

vacuole

vacuolarlumen

cytoplasmnucleus

nuclearmembrane

cell

instance of (%), part of (<).

Structure of the Ontologies

Page 30: “Biomedical computing is entering an age where creative exploration

• molecular function 5232 terms• biological process 6416 terms• cellular component 1111 terms

•all 12,759 terms

• definitions 7735 (61%) September 13 2002

Content of GO

Page 31: “Biomedical computing is entering an age where creative exploration
Page 32: “Biomedical computing is entering an age where creative exploration
Page 33: “Biomedical computing is entering an age where creative exploration
Page 34: “Biomedical computing is entering an age where creative exploration
Page 35: “Biomedical computing is entering an age where creative exploration

Thank yous

• Genome annotation: Colleagues in the European and Berkeley Drosophila Genome Projects.

• FlyBase: Colleagues in Harvard, Berkeley, Bloomington & Cambridge.

• Gene Ontology: Colleagues in Berkeley, Jackson Labs, Stanford and EBI.

Page 36: “Biomedical computing is entering an age where creative exploration