biopax the birth of a data exchange language for biological pathways joanne luciano biopax core...
TRANSCRIPT
BioPAXThe Birth of A Data Exchange
Language for Biological PathwaysJoanne Luciano
BioPAX Core Groupwww.biopax.org
7th International Annual Bio-Ontologies Meeting
30 July 2004Glasgow, ScotlandUnited Kingdom
30 July 2004 7th BioOntologies Workshop 2
IntroductionBioPAX = Biopathway Exchange Language
Emerged at ISMB
•conceived at ISMB ’01•born at ISMB ’02 •crawling at ISMB ’03 (Level 0.5)•walking at ISMB ’04 (Level 1.0)•now approaching the “terrible twos”
30 July 2004 7th BioOntologies Workshop 3
What is a pathway?
Depends on who you ask
30 July 2004 7th BioOntologies Workshop 4
WITBioCycReactomeaMAZEKEGGBINDDIPHPRDMINTIntActPSI formatCSNDBTRANSPATHTRANSFACPubGeneGeneWays
IntegratedPathwayDatabase
Research Community Need
PathwayDatabases
MetabolicProtein InteractionSignal Transduction
Gene Regulatory
30 July 2004 7th BioOntologies Workshop 5
Design Goals• Encapsulation: An entire pathway in
one record• Compatible: Use existing standards
wherever possible• Computable: From file reading to
logical inference• Successful: Buy-in from the research
community
30 July 2004 7th BioOntologies Workshop 6
Technical Logistics & Goals
Interoperability – Integration and exchange of
pathway data– Interchange through a common
(standard) representation– accommodate existing database
representations– provide a basis for future databases– enables development of tools for
searching and reasoning over the data base
30 July 2004 7th BioOntologies Workshop 7
Technical Logistics (cont’d)Why OWL? Why OWL DL?Expressivity (biology = “complex relationships”) • W3C Standard (use existing standards)
“Semantic Web enabled”• XML based (the exchange language in computing)• Machine Computable
– Facilitate integration of knowledge, data, tool development– Uncover inconsistencies and new knowledge
– OWL DL• Enable full reasoning capability for users
from file reading to logical inference• Complete: all conclusions are guaranteed to be
computed• Decidable: all computations will finish in finite time
(with OWL Lite, short amount of time)
30 July 2004 7th BioOntologies Workshop 8
Social Logistics
Get organizedMake the decision & commitment2 or 3 dedicated individuals
Small core group– Bi-weekly conference calls, bi-monthly F2F– Commitment & resources
• Participants willing and able cover their costs• Outside funding (DOE)
Special interests and needs form subgroup task forces• Core group member(s)• Outside experts
International representation & participation (Outreach & Community Building)
• conferences and mailing lists• follow-up and individual
Collaborate with complementary/competing representations
30 July 2004 7th BioOntologies Workshop 9
Social LogisticsHow we engendered buy in from the field which
made life much easier
Take things in steps:•Pathway Database vision -> Data Exchange Format as 1st step•Data Exchange Format -> Release in Levels of increasing complexity Level 1 supports Metabolic pathways, Level 2
Early success leads to early adoption, leads to increased probability of overall project success.
Get “buy in” and get involvement -leads to acceptance later•Support the existing databases (BioCYC, WIT, BIND, etc.)
–Got database sources to agree to participate in the development to assure that their DBs will be properly represented
•Got database sources to agree to export in the new format once it is defined
30 July 2004 7th BioOntologies Workshop 10
Social Logistics (cont’d)Get “buy in” (continued) • Community Involvement and Support
Core group (represents voice of community, small, committed)Mailing ListUser communitySubgroups
• International Meetings and Presentations Tool developers
ModelersUsers (researchers)Ontology developersDatabase providersComplementary representations (SBML, CellML)Like mindsGeneral Community
30 July 2004 7th BioOntologies Workshop 11
Implementation of BioPAX
Designed using GKB Editor and Protégé
BioPAX uses OWL to define the Schema
BioPAX Instances to store the data
30 July 2004 7th BioOntologies Workshop 12
BioPAX – Ontology
30 July 2004 7th BioOntologies Workshop 13
OWL(schema)
Instances (Individuals)
data
30 July 2004 7th BioOntologies Workshop 14
Complex Relationships Captured
30 July 2004 7th BioOntologies Workshop 15
Ontology Slot Definitions
30 July 2004 7th BioOntologies Workshop 16
Integration -> KnowledgeKnowledge is Power
Data in the same format: Metabolic Protein Protein
InteractionSignal Transduction Gene Regulation
Facilitates– Centralized public pathway DB– Share data between existing DBs– Distribute public and proprietary data– Knowledge Assembly– Reasoning
30 July 2004 7th BioOntologies Workshop 17
A Common Exchange Language
Without BioPAX>100 DBs and tools
BioPAX
Promotes collaboration (big science), accessibility
Database
Application
User
30 July 2004 7th BioOntologies Workshop 18
Biomass
Consistency Checking: Nutrient-related analysis of a BioPAX knowledge base
Fired Reaction
Missing essentialcompound
Known Nutrient set
Essentialcompounds
Unfired Reaction
30 July 2004 7th BioOntologies Workshop 19
What Next?
• BioPAX future Development– Level 2, 3, future levels– BOF (check schedule)– Talk later today by Gary Bader at BioPathways
SIG– Poster in Main Conference (check program)
• Development of tools and API– libBioPAX
• Semantic Web Life Science Initiatives– BOF Sunday
30 July 2004 7th BioOntologies Workshop 20
BioPAX Supporting GroupsGroups • Memorial Sloan-Kettering Cancer Center:
G. Bader, M. Cary, J. Luciano, C. Sander• SRI Bioinformatics Research Group:
P. Karp, S. Paley, J. Pick• University of Colorado Health Sciences
Center: I. Shah• BioPathways Consortium: J. Luciano,
E. Neumann, A. Regev, V. Schachter• Argonne National Laboratory: N. Maltsev,
E. Marland• Samuel Lunenfeld Research Institute:
C. Hogue• Harvard Medical School: E. Brauner,
D. Marks, J. Luciano, A. Regev• NIST: R. Goldberg• Stanford: T. Klein• Columbia: A. Rzhetsky• Dana Farber Cancer Institute: J. Zucker
Collaborating Organizations:
• Proteomics Standards Initiative (PSI)• Systems Biology Markup Language
(SBML)• CellML• Chemical Markup Language (CML)
Databases• BioCyc (www.biocyc.org)• BIND (www.bind.ca)• WIT (wit.mcs.anl.gov/WIT2)• PharmGKB (www.pharmgkb.org)
Grants• Department of Energy (Workshop)
The BioPAX Community
30 July 2004 7th BioOntologies Workshop 21
PSI
Biochemical Reactions
SBML,CellML
Regulatory PathwaysLow Detail High Detail
ProteinInteractionNetworks
Metabolic PathwaysLow Detail High Detail
Database ExchangeFormats
Simulation ModelExchange Formats
RateFormulas
Exchange Formats in the Pathway Data Space
30 July 2004 7th BioOntologies Workshop 22
Level 1 BioPAXReleased July 2004
BioPAX Level 1
PSISBML,CellML
GeneticInteractions
Molecular InteractionsPro:Pro All:All
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Regulatory PathwaysLow Detail High Detail
Database ExchangeFormats
Simulation ModelExchange Formats
RateFormulas
Metabolic PathwaysLow Detail High Detail
Biochemical Reactions
Small MoleculesLow Detail High Detail
30 July 2004 7th BioOntologies Workshop 23
Exchange Formats in the Pathway Data Space
BioPAX
PSISBML,CellML
GeneticInteractions
Molecular InteractionsPro:Pro All:All
Interaction NetworksMolecular Non-molecularPro:Pro TF:Gene Genetic
Regulatory PathwaysLow Detail High Detail
Database ExchangeFormats
Simulation ModelExchange Formats
RateFormulas
Metabolic PathwaysLow Detail High Detail
Biochemical Reactions
Small MoleculesLow Detail High Detail