large-scale metabolic network alignment: metacyc and kegg

13
1 SRI International Bioinformatics Large-Scale Metabolic Network Alignment: MetaCyc and KEGG Tomer Altman Bioinformatics Research Group SRI International [email protected]

Upload: golda

Post on 09-Jan-2016

33 views

Category:

Documents


6 download

DESCRIPTION

Large-Scale Metabolic Network Alignment: MetaCyc and KEGG. Tomer Altman Bioinformatics Research Group SRI International [email protected]. Problem Motivation. There are an increasing number of ‘encyclopedic’ metabolic networks, or reaction databases - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

1 SRI International Bioinformatics

Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

Tomer AltmanBioinformatics Research Group

SRI International

[email protected]

Page 2: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

2 SRI International Bioinformatics

Problem Motivation

There are an increasing number of ‘encyclopedic’ metabolic networks, or reaction databases

KEGG and MetaCyc, plus Rhea, BRENDA, and GO A natural question to ask is, “what is similar /

different between them?”There has been some linking of MetaCyc

compounds to KEGG, but none for reactions

Page 3: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

3 SRI International Bioinformatics

Challenges with Mapping Objects

Multiple aspects to compare (name, chemical structure, reaction substrates, external identifiers)

Inexact namingInexact structures (different specificity of

stereocenters)Inexact description of reactions (classes vs.

instances, proton-balancing)How to combine the evidence in a logical fashion

Page 4: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

4 SRI International Bioinformatics

Evidence / Prediction Data Structure

Prediction typePredictor nameMetaCyc Object Frame IDKEGG Object identifierIteration numberParameters used

Page 5: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

5 SRI International Bioinformatics

Compound Evidence

Curated MetaCyc links to KEGGName matchingPubChem identifier mapping (used for ChEBI as

well)Molecular Fingerprint Tanimoto Similarity

CoefficientInChI string comparisonExact Sub-Structure Match (no stereochemistry)‘All-but-one’ inference

Page 6: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

6 SRI International Bioinformatics

Compound Prediction Detail: ‘All-but-one’

Most of the compounds between these two reactions are the sameClass vs. instance, and naming issues lead to unknown match between “acceptor” and “oxidized electron acceptor”

Page 7: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

7 SRI International Bioinformatics

Reaction Evidence

EC NumbersUniProt Accession NumbersName matches (gleaned from associated objects)Exact equation matchInexact equation match (cosine similarity)

Page 8: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

8 SRI International Bioinformatics

Reaction Prediction Detail: UniProt Mapping

Use UniProt Accession numbers to map the enzymes in MetaCyc and KEGG to one anotherUse UniRef 90 or 100 to map “the same protein” when not exact same Accession Number

Page 9: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

9 SRI International Bioinformatics

From Evidence to Prediction

Rule #1: Evidence is partitioned into exact and inexact quality types

Rule #2: Evidence is partitioned (orthogonally) into qualitative and quantitative types

Rule #3: A high-quality prediction will consist of qualitative and quantitative evidence in favor

Rule #4: We try exact quality types first, then inexact Rule #5: No contradictory predictions Rule #6: Iterate until no new predictions This is the general idea, but implementation has some

domain-based heuristics thrown in, such as with EC numbers

Page 10: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

10 SRI International Bioinformatics

Statistics

3269 MetaCyc reactions with links to KEGG (~75%)

4004 MetaCyc compounds with links to KEGG (>50%)

Page 11: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

11 SRI International Bioinformatics

Future Work

Greatest Common Subgraph Compound Matching Sensitivity / Specificity characterization of all evidence types Analysis of unmatched content of KEGG and MetaCyc for

algorithm improvement and focused curation (inexact reaction matching and novel reaction import)

Hierarchical clustering of compounds and reactions A general tool that can be used for any two given metabolic

networks Recasting algorithm in a machine learning framework

Page 12: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

12 SRI International Bioinformatics

Acknowledgements

•Peter Karp•Douglas Brutlag•Dan Davison•Anamika Kothari•Joseph Dale

MetaCyc.org

Page 13: Large-Scale Metabolic Network Alignment: MetaCyc and KEGG

13 SRI International Bioinformatics

… One more thing!

Pathway Tools API (beta) http://www.ai.sri.com/~taltman/ptools-api/ptools_api.html