string - predicting novel metabolic pathways through the integration of diverse genome-scale data

10
STRING Predicting novel metabolic pathways through the integration of diverse genome-scale data Lars Juhl Jensen EMBL Heidelberg

Upload: lars-juhl-jensen

Post on 11-May-2015

806 views

Category:

Technology


2 download

DESCRIPTION

12th International Conference on Intelligent Systems for Molecular Biology, BioPathways SIG, Scottish Exhibition & Conference Center, Glasgow, Scotland, July 29-August 4, 2004

TRANSCRIPT

Page 1: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

STRINGPredicting novel metabolic pathways through the

integration of diverse genome-scale data

Lars Juhl JensenEMBL Heidelberg

Page 2: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Too much information – too little knowledge

• Biology is now in the age of large-scale data collection– Explosive increase in data from genome sequencing, microarray

expression studies, screening for protein interactions etc.– The data types are highly heterogeneous– Much data is not being deposited in standardized repositories– Most data sets are error-prone and suffer from systematic biases

• STRING is a web resource that integrates many different types of information across 100+ species– Objective definition of metabolic pathways / functional modules– Prediction of additional pathway members / novel pathways

• We do not intend STRING to be– a primary repository for experimental data– a curated database of complexes or pathways– a substitute for expert annotation

Page 3: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

STRING provides a network of functional interactions between proteins

Genomic neighborhood

Species co-occurrence

Gene fusions

Database imports

Exp. interaction data

Microarray expression data

Literature co-mentioning

Page 4: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Inferring functional modules fromgene presence/absence patterns

Restingprotuberances

Protractedprotuberance

Cellulose

© Trends Microbiol, 1999

CellCell wall

Anchoring proteins

Cellulosomes

Cellulose

The “Cellulosome”

Page 5: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Score calibration against a common reference

• Many diverse types of evidence– The quality of each is judged by

very different raw scores

– These are all calibrated against the same reference set

• Requirements for a reference– Must represent a compromise

of the all types of evidence

– Broad species coverage

• Both a strength and a weakness– Scores for all evidence types

are directly comparable

– The type of interaction is currently not predicted

Page 6: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Multiple evidence types from several species

Page 7: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Image: Molecular Biology of the Cell, 3.rd edition

Metabolism overview

Defined manually:

cutting metabolic

maps into pathways

Purinebiosynthesis

Histidinebiosynthesis

Objective definition of metabolic pathways

Defined objectively:

standard clustering

of genome-scale data

Page 8: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Getting more specific – generally speaking

Page 9: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Acknowledgments

• The STRING team– Christian von Mering

– Berend Snel

– Martijn Huynen

– Daniel Jaeggi

– Steffen Schmidt

– Mathilde Foglierini

– Peer Bork

• ArrayProspector web service– Julien Lagarde

– Chris Workman

• NetView visualization tool– Sean Hooper

• Analysis of yeast cell cycle– Ulrik de Lichtenberg

– Thomas Skøt

– Anders Fausbøll

– Søren Brunak

• Web resources– string.embl.de

– www.bork.embl.de/ArrayProspector

– www.bork.embl.de/synonyms

Page 10: STRING - Predicting novel metabolic pathways through the integration of diverse genome-scale data

Thank you!