photo taken by flickr/people/mfsarwar
DESCRIPTION
Interoperability With BioMoby 1.0. It’s Better Than Sharing Your Toothbrush!. Photo taken by http://flickr.com/people/mfsarwar/. A brief history of BioMoby. Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) May 21, 2002 – Genome Canada Platform Award - PowerPoint PPT PresentationTRANSCRIPT
Photo taken by http://flickr.com/people/mfsarwar/
Interoperability With BioMoby 1.0
It’s Better ThanSharing Your Toothbrush!
A brief history of BioMoby• Model Organism Bring Your own Database Interface Conference,
Sept, 2001 (MOBY-DIC)
• May 21, 2002 – Genome Canada Platform Award
• May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML
• July 18, 2002 – First Moby Client (Gbrowse Moby)
• June 9, 2003 – API Version 0.5 deployed
• 2006 – Genome Canada Platform Award
• 2007 - Version 1.0 API submitted for publication
MOBY-DIC Chapter VII
7th Model Organism Bring Your-own Database Interface Conference
Vancouver, BC, June 2007.
The Core Ahab’s
WendyRichard
MylahMartin
Eddie
Andreas
Paul
Ivan
Mark’s Screen…
• Create an ontology of bioinformatics data-types• Define a serialization of this ontology (data syntax)• Create an open API over this ontology• Define Web Service inputs and outputs v.v. Ontology• Register Services in an ontology-aware Registry
• Machines can find an appropriate service• Machines can execute that service unattended• Ontology is community-extensible
The BioMoby PlanThe BioMoby Plan
Gene names
MOBYCentral
MOBY hosts & services
SequenceAlignment SequenceExpress. Protein Alleles…
AlignPhylogenyPrimers
Overview of BioMoby Transactions
Overview of BioMoby Transactions
MOBYCentral
SequenceAlignPhylogenyPrimers
Overview of BioMoby Transactions
Overview of BioMoby Transactions
Objectontology
What is a sequence?A sequence is a ___That has these features __
Discovery of servicesThat consume things LIKE sequences!
This is SCUFL – Simple ConceptualUnified Flow Language
It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…
Pipeline discovery “on the fly”
• No explicit coordination between providers
• Dynamic discovery of ~appropriate Services
• Automated execution of services
Some BioMoby statistics
Moby: Breadth
• Namespaces (data types): 418• Objects (data syntaxes): >561• Service Types (analytical categories): 112• Providers: ~50 active
• Service Instances: ~1200 currently “alive”– In main Moby Central server in Canada – Others in “boutique” Moby registries serving
specialized communities worldwide
Moby: Clients• Gbrowse_moby (M Wilkinson)
• PlaNet Locus_View (H Schoof, R Ernst)
• Blue-Jay (P Gordon)
• Taverna (T Oinn, M Senger, E Kawas)
• MOWserv (INB, Spain)
• Remora (S Carrere, J Gouzy, INRA)
• MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.)
• SeaHawk (P Gordon)
BioMoby in detail
• MOBY Data typing system: Semantic Type
• MOBY Data typing system: Syntactic Type
• Moby Registry Queries
BioMoby in detail
• MOBY Data typing system: Semantic Type
• MOBY Data typing system: Syntactic Type
• Moby Registry Queries
Moby Namespaces
• A “Namespace” is a category of identifiers– NCBI has gi numbers (gi Namespace)– GO Terms have accession numbers (GO Namespace)
• Namespaces indicate data’s semantic type.– GO:0003476 a Gene Ontology Term– gi|163483 a GenBank record
• Though we are using the word “Namespace” correctly, it causes confusion!– “Namespace” in XML is tightly associated with an XML
document and/or its syntax– In Moby, we are ONLY talking about data entities NOT
THEIR SYNTAX
BioMoby in detail
• MOBY Data typing system: Semantic Type
• MOBY Data typing system: Syntactic Type
• Moby Registry Queries
BioMoby in detail
• MOBY Data typing system: Semantic Type
• MOBY Data typing system: Syntactic Type
• Moby Registry Queries
The MOBY Object Ontology
• Syntactic types are defined by a GO-like ontology– Class name at each node– Edges define the relationships between Classes– GO used as a model because of its familiarity in the
community
• Edges define one of three relationships– ISA
• Inheritance relationship• All properties of the parent are present in the child
– HASA• Container relationship of ‘exactly 1’
– HAS• Container relationship with ‘1 or more’
The Simplest Moby Data-Type
<Object namespace=‘NCBI_gi’ id=‘111076’/>
Object
The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation
Moby Primitives
Object
Integer
String
Float
DateTimeISA
ISA
ISA
ISA
<Integer namespace=‘’ id=‘’>38</Integer>
A Derived Data-Type
Object
Integer
VirtualSequence
String
ISA
ISA
ISA
HASA
<Integer namespace=‘’ id=‘’>38</Integer><VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer></ VirtualSequence >
Describes the semanticrelationship between the Integer andthe Virtual Sequence
<VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer></ VirtualSequence >
<GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>
ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ GenericSequence >
Object
Integer
VirtualSequence
String
ISA
ISA
ISA
HASA
GenericSequence
ISA
HASA
A Derived Data-Type
<GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>
ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ GenericSequence >
<DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>
ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ DNASequence >
Object
Integer
VirtualSequence
String
ISA
ISA
ISA
HASA
GenericSequence
ISA
HASA
DNASequence
ISA
A Derived Data-Type
Legacy file formats
<NCBI_Blast_Report namespace=‘NCBI_gi’ id=‘115325’><String namespace=‘’ id=‘’ articleName=‘content’>
TBLASTN 2.0.4 [Feb-24-1998]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman(1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.
Query= gi|1401126 (504 letters)
Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters
Searchingdone
Score ESequences producing significant alignments: (bits) Value
gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05
</String></NCBI_Blast_Report>
• Containing “String” allows ontological classes to represent legacy data types
Binaries – pictures, movies
<base64_encoded_jpeg namespace=‘TAIR_image’ id=‘3343532’><String namespace=‘’ id=‘’ articleName=‘content’>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVMIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt
</String>
</base64_encoded_jpeg>
• Text-base64 is a Class that contains String
• Binaries are base64 encoded and passed in classes that inherit from text-base64
• base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String
• With legacy data-types defined, we can extend them as we see fit• annotated_jpeg ISA base64_encoded_jpeg• annotated_jpeg HASA 2D_Coordinate_set • annotated_jpeg HASA Description
<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>
<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> <Integer namespace=‘’ id=‘’
articleName=“x_coordinate”>3554</Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer>
</2D_Coordinate_set>
<String namespace=‘’ id=‘’ articleName=“Description”>This is the phenotype of a ufo-1 mutant under long daylength,
16’C</String><String namespace=‘’ id=‘’ articleName=“content”>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC
Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV
</String></annotated_jpeg>
Extending legacy datatypes
The same object…
<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>
<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> </2D_Coordinate_set>
<String namespace=‘’ id=‘’ articleName=“Description”>This is the phenotype of a ufo-1 mutant under long daylength, 16’C
</String> <String namespace=‘’ id=‘’ articleName=“content”>
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV
</String></annotated_jpeg>
annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description
The same object…
<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>
<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”>
<Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> </2D_Coordinate_set> <String namespace=‘’ id=‘’ articleName=“Description”>
This is the phenotype of a ufo-1 mutant under long daylength, 16’C </String> <String namespace=‘’ id=‘’ articleName=“content”>
MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U
</String></annotated_jpeg>
<CrossReference><Object namespace=“TAIR_Allele” id=“ufo-1”/>
</CrossReference>
<CrossReference> <Object namespace=‘TAIR_Tissue’ id=‘122’/> </CrossReference>
annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description
Cross reference types
• Simple– A MOBY Object
• Rich– Takes the form:
– …Incidentally, this avoids the problem of reification that is experienced in RDF
<Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''><Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''> ... Textual Description ...... Textual Description ... </Xref></Xref>
<Object namespace=‘foo' id=‘12345‘/><Object namespace=‘foo' id=‘12345‘/>
XML Schema?
The Object Ontology allows new data-types WITHOUT new flatfile formats, and
without having to understand e.g. XML Schema
Minimize future heterogeneity
Improve interoperability without requiring schema-to-schema mapping
• Object Ontology terms have semantically rich names, but this is primarily for human intuition– DNA Sequence– Annotated_GIF
• Object Ontology does not define the meaning of an object to the machine– No machine-readable semantics
• It does define the representation – SYNTAX
XML Schema?
A portion of the MOBY-SObject Ontology
…community-built!
BioMoby in detail
• MOBY Data typing system: Semantic
Type
• MOBY Data typing system: Syntactic
Type
• Moby Registry Queries
A Moby Central Query
• Give me:
– Services that consume THIS data-type in THIS syntax…
– …do SOMETHING LIKE THIS to it…
– …and provide me THAT data-type in response
Example
• Find me services that – consume FASTA sequence data, – do a BLAST with it, – and provide me lists of GenBank GI numbers in
return.
• Query can be any or all of the above criterion– Also limit by service provider and service
description keyword
Remember!!
Moby Registry Query
INPUT TYPE||
TRANSFORMATION TYPE||
OUTPUT TYPE
A weakness of MOBY
Service discovery is horribly flawed due to insufficiently rich semantics…
Chickens go in;Pies come out!
The problem with Moby
The problem with Moby
What sort o’ pies?
Apple!
The problem with Moby
The MOBY-S Service Ontology
• A simple ISA hierarchy… – too simple!
• Primitive types include:– Analysis– Parsing– Registration– Retrieval– Resolution– Conversion– Rendering
Parse_WU_Blast
A slice of the Service Ontology
Service
Blast
NCBI_Blast
WU_Blast
Parse_NCBI_Blast
Parsing
AlignmentAnalysis
“The Exploding Bicycle”- A. Rector, U Manchester
Summary so far
• BioMoby uses ontologies to describe both data types and data syntaxes– This is where the interoperability comes from– These are used to match consumers with
providers during service discovery
• BioMoby uses a simple ontology to describe bioinformatics operations– This ontology is only marginally useful
Seahawk
• Highlight data in your browser and drag/drop it into Moby
• What could be easier than that?!
Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208
BMC Bioinformatics, in press
Seahawk: A New Moby Client for Biologists
Drag ‘n’ drop, highlight existing data for use with MOBY ServicesPaul Gordon & Christoph Sensen
Seahawk looks like a browser
How do I load data?
How do I load data?
How do I load data?
• Use the “open” button:– Text file (e.g. FASTA sequences)– HTML page (e.g. NCBI Entrez Web page)– RTF document (e.g. conference abstract)– MOBY XML document
• Drag ‘n’ Drop– Web links and desktop files– Highlighted text from open documents
or Web pages
Under the Hood(Beneath the Bonnet?)
• Data has to be converted into Moby XML format to be used by Moby
• Moby data has to be converted back to human-readable text for presentation to the biologist
Again: How do I load data?
How do I Find Services?• Right-click MOB rules are invoked• Resulting Moby XML is used for service search
How do I run a service?
• Click it!
• If necessary, a service’s extra parameters can be set
• Control+click submits using default params
How do I run a service?
• If required inputs are missing, the missing ones must be dragged into place.
• Unrecognized data will be rejected
How do I collate data?
• Seahawk clipboard lets you build collections of objects
• Seahawk “knows” the type of collection and will suggest appropriate Moby services
Seahawk Summary
• Seahawk integrates Moby Web Service discovery and execution into the biologists day-to-day “Web Surfing” activity
• It uses Regular Expressions and XSLT to move normal web or hard-drive-file data into and out of BioMoby
Why doesn’t MobyUse RDF/OWL?
Timeline of Moby/W3C Activities
2000 2001 2002 2003 2004 20062005
RDF CandidateSpec
RDF SchemaCandidateSpec
W3C Launches SemanticWeb (SW) Activity Group
BioMobyProject Established
BioMoby XMLFinalized
BioMobyStable 0.85 APIPublished(>400 services)
RDF/OWLFormal W3CRecommendations
BioMobyStable 1.0 APIPublished
>>>>>>
Extensive SW toolbuilding…
Moby 2.0Getting it right, the second time!
What BioMoby Already Does
SequenceData
BLAST SERVER
Blast Hit
What BioMoby Already Does
SequenceData
Blast Hit
givesBlastResult
Not “Bologically” Meaningful
What BioMoby Already Does
SequenceData
Blast Hit
hasHomologyTo
URIhasHomologyTo
URI
…looks a lot like…
Which is effectively just an RDF triple,
Now think in reverse…
(in case you forgot…)
Moby Registry Query
INPUT TYPE||
TRANSFORMATION TYPE||
OUTPUT TYPE
Moby 2.0Sequence
DataWhat does Have homology to?
hasHomologyTo
Maps to
BLAST SERVICE
Send data
Blast Hit
Query
FIND SERVICES THAT
Consume Sequence Data||
Provide hasHomologyTo Property||
Attached to other Sequence Data
SPARQL
• A Semantic Web query language
• Queries “look like” graphs
Find “X” with predicate “Y”
attached to “Z”
Moby 2.0 extends the SPARQL query language
• SPARQL queries contain concepts and the relationships between them (subject, predicate, object)
• We simply map RDF predicates onto Moby services capable of generating that relationship
• Registry query: “What Moby service consumes [subject] and generates the [predicate] relationship type?”
But wait, there’s more!
Exploit knowledge in OWL ontologies to enhance query
Subject Predicate Look up and execute Moby serviceConsumes proteins and generatesFunctional annotation info
Subject PredicateLook up and execute Moby serviceConsumes STK or proteins and Looks-up inhibitor molecules
Evaluate Query Expression
Exploit knowledge in OWL ontologies to enhance query
This SPARQL query could be posed on a database of RAW, UNANNOTATEDProtein sequences, and be answered
by Moby 2.0 (a.k.a. CardioSHARE)
Credits
• Genome Canada/Genome Alberta• myGrid – Carole Goble in particular• Spanish National Institute for
Bioinformatics (INB) through Fundación Genoma España
• Generation Challenge Programme (GCP) of the Consultative Group for International Agricultural Research (CGIAR)
• Heart and Stroke Foundation of BC and Yukon (CardioSHARE)
• Microsoft Research (CardioSHARE)