Download - Linking Linked Data CSHALS2013
Expert Bioinformatics from Bioinformatics Experts
Linking Linked Data
Linked Data to Integrated Data
Expert Bioinformatics from Bioinformatics Experts
Put your data on the web make a pretty web site later.
Expert Bioinformatics from Bioinformatics Experts
Expert Bioinformatics from Bioinformatics Experts
Now we can ask questions like this...
What members of a target pathway are already targeted in other diseases?
Target
PathwayDisease
Protein
Compound
Target Pathway Disease
Chembl Uniprot Reactome OMIM
Expert Bioinformatics from Bioinformatics Experts
Because we have lots of data exposed as RDF
Mim:Phenotype
Uniprot:ProteinBioPAX:Protein
Expert Bioinformatics from Bioinformatics Experts
What do you do when you have to add data...
Expert Bioinformatics from Bioinformatics Experts
Or connect SPARQL endpoints?
RDF != Linked Data
Expert Bioinformatics from Bioinformatics Experts
Is your data 5* ?
Linked data is essential to actually connect the semantic web. It is quite easy to do with a little thought, and becomes second nature. Various common sense considerations determine when to make a link and when not to.
Expert Bioinformatics from Bioinformatics Experts
Example openflydata to BioCycWhat genes are differentially expressed in the hindgut and are there any pathways associated with those genes?● Use FlyAtlas at openflydata.org for tissue specific expression profiles. ● Use FlyCyc from BioCyc.● Then SPARQL
Expert Bioinformatics from Bioinformatics Experts
Problem: Node URIs<http://openflydata.org/id/flyatlas/affyid/1616608_a_at> <http://purl.org/NET/flyatlas/schema#gene> <http://openflydata.org/id/flybase/feature/FBgn0001128> .
<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#xref><http://biocyc.org/biopax/biopax-level3#Protein202210>
.<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#db>
FlyCyc.
<http://biocyc.org/biopax/biopax-level3#UnificationXref202209><http://www.biopax.org/release/biopax-level3.owl#id>
FBGN0001128.
Expert Bioinformatics from Bioinformatics Experts
CONSTRUCT { ?x
RDFS:seeAlso `bif:sprintf_iri ("http://identifiers.org/flybase/%s", ?id)`
}WHERE { ?x BP:unificationxref ?xref . ?xref BP:id ?id . ?blank BP:db "FlyCyc"^^xsd:string}
Integration Level 1Use Identifiers.org
Expert Bioinformatics from Bioinformatics Experts
Integration Level 2 adding property characteristics
BP = <http://www.biopax.org/release/biopax-level3.owl#>
BP:Protein BP:controls BP:Catalysis
BP:Catalysis BP:controls BP:BioChemicalReaction
BP:Protein BP:controls BP:BioChemicalReaction
CONSTRUCT {?x GB:controlledBy ?y }WHERE
{ ?x BP:controls ?catalysis . ?catalysis BP:controls ?y }
Expert Bioinformatics from Bioinformatics Experts
Integration Level 3 class subsumption
FlyA = <http://purl.org/NET/flyatlas/schema#>
flywebflyatlas:1616608_a_at a flyatlas:ProbeData
BP = <http://www.biopax.org/release/biopax-level3.owl#>
flyatlas:ProbeData rdfs:subClassOf BP:DNARegion
CONSTRUCT {?x a BP:DNARegion }WHERE
{ ?x a flyatlas:ProbeData }
Expert Bioinformatics from Bioinformatics Experts
Connect BiochemicalReactions to Expression Values
SELECT ?name ?id ?meanWHERE{ ?reaction a BP:BiochemicalReaction . ?reaction BP:standardName ?name . ?reaction GB:controlledBy ?protein . ?protein a BP:Protein . ?protein BP:xref ?id . ?probe a BP:DNARegion . ?probe BP:xref ?id . ?probe flyatlas:l_fatbody ?blank . ?blank flyatlas:mean ?mean}LIMIT 5
No Reasoner – just a few SPARQL CONSTRUCTs
Expert Bioinformatics from Bioinformatics Experts
Expert Bioinformatics from Bioinformatics Experts
Client Architecture
Expert Bioinformatics from Bioinformatics Experts
SELECT distinct ?classWHERE{ ?s a ?class . ?s ?p ?o }
>100
chembl:Activitychembl:Assaychembl:AssayCategorychembl:AssayTargetLinkchembl:ChemicalCompoundchembl:DrugTargetchembl:LiteratureCitationdailymed:drugsdrugbank:Drugdrugbank:DrugInteractiondrugbank:EnzymeLinkdrugbank:ExternalIdentifierdrugbank:ExternalLinkdrugbank:LiteratureCitationdrugbank:Moleculedrugbank:OrganismSpeciesdrugbank:Patentdrugbank:ProteinSequencedrugbank:TargetLinkentrez:EnsemblReferenceentrez:Genepdb:Moleculepdb:Structurepubmed:Chemicalpubmed:Citationpubmed:DatabankReference
Vocabularies in Linked DataWhat does the linked data cloud know about Drugs....
Expert Bioinformatics from Bioinformatics Experts
Create a tighter more unified “view” under one schema
Expert Bioinformatics from Bioinformatics Experts
Unified Vocabulary What does the linked data cloud know about Drugs....
Expert Bioinformatics from Bioinformatics Experts
Map Classes and Properties into a single instantiated view
Expert Bioinformatics from Bioinformatics Experts
Before Query
SELECT *WHERE{?s drugb:calculatedInChIKey ?inchiD . ?s a drugb:Drug . ?c a Chembl:ChemicalCompund . ?c chembl:standardInChIKey ?inchiC .FILTER regex(?inchiD, ?inchiC)}
Expert Bioinformatics from Bioinformatics Experts
After Query
SELECT *where{?s a GB:Drug .?s GB:inchiKey ?inchi . }
Expert Bioinformatics from Bioinformatics Experts
Linked Data Architecture
Expert Bioinformatics from Bioinformatics Experts
Creating fixed “views” of Linked Data
When the use of integrated data is fixed e.g. an API or application, Linked Data can be expensive:
– Changes to data requires significant recoding
– Multiple Schemas make queries long and inefficient
• A view or middle layer of data used by the API, changes to data are managed by the view and the API is minimally disturbed
– Views are easier to query
– Views are faster to query
• Client gets the best of both worlds a tight view of data for API queries while still having all the advantages of a linked data strategy.
Expert Bioinformatics from Bioinformatics Experts
Summary
● Exposing data as RDF does not equal Linked Data● Making data linked is not hard
– Node IRI's– Unifying Classes– Transitive closure of Properties
● A little semantics goes a long way (no reasoner required)● Creating “Views” from one schema to another is not hard.
– But should be easier
Expert Bioinformatics from Bioinformatics Experts
www.generalbioinformatics.com/science.html