ontology databases: detecting inconsistencies in the gene ontology using not-gadgets paea lependu...

Download Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical

If you can't read please download the document

Post on 19-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • Ontology Databases: Detecting Inconsistencies in the Gene Ontology using Not-gadgets Paea LePendu University of Oregon Talk: National Center for Biomedical Ontology Stanford University September, 2009
  • Slide 2
  • General Interests Programming Languages Automated Reasoning Databases Logic
  • Slide 3
  • Outline Ontology-based Data Management Background, Motivation Theory Benchmarking Application Domain, Query Answering Inconsistency Detection Theory The serotonin example GO plus ZFIN, MGI annotations
  • Slide 4
  • Ontology-based Database Integration: reducing database integration to ontology translation
  • Slide 5
  • Slide 6
  • Ontology-based Data Management
  • Slide 7
  • Ontology User Data Annotation Data Management Data Access Layer
  • Slide 8
  • Example: sisters-siblings All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? { } Obviously, the answer should be : Hilary and Lynn are siblings. { | siblingOf(x,y) }
  • Slide 9
  • Example: sisters-siblings
  • Slide 10
  • Slide 11
  • Example: The Gene Ontology GO_0003674 z01, z02, z03 GO_0003674 z01, z02, z03 GO_0005488 e01,e02, e03 GO_0005488 e01,e02, e03 GO_0030528 y01, y02, y03 GO_0030528 y01, y02, y03 GO_0003677 x01, x02, x03 GO_0003677 x01, x02, x03 GO_0003700 w01, w02, w03 GO_0003700 w01, w02, w03 GO_0003676 c01, c02, c03 GO_0003676 c01, c02, c03 GO_0003723 d01, d02, d03 GO_0003723 d01, d02, d03 GO_0008135 a01, a02, a03 GO_0008135 a01, a02, a03 GO_0045182 b01, b02, b03 GO_0045182 b01, b02, b03
  • Slide 12
  • Example: The Gene Ontology GO_0003674 z01, z02, z03 GO_0003674 z01, z02, z03 GO_0005488 e01,e02, e03 GO_0005488 e01,e02, e03 GO_0030528 y01, y02, y03 GO_0030528 y01, y02, y03 GO_0003677 x01, x02, x03 GO_0003677 x01, x02, x03 GO_0003700 w01, w02, w03 GO_0003700 w01, w02, w03 GO_0003676 c01, c02, c03 GO_0003676 c01, c02, c03 GO_0003723 d01, d02, d03 GO_0003723 d01, d02, d03 GO_0008135 a01, a02, a03 GO_0008135 a01, a02, a03 GO_0045182 b01, b02, b03 GO_0045182 b01, b02, b03
  • Slide 13
  • Ontology Databases: General Models for Database Designs Generality is important Avoid rewriting Scalability of KB is important Persistence, caching and indexing Major generic models Horizontal Models Vertical Models Decomposition Storage Models
  • Slide 14
  • Ontology Databases: View-based Approach CREATE VIEW v_Person(id) AS SELECT id FROM Person UNION SELECT id FROM v_Male UNION SELECT id FROM v_Female Person Female Male [Pan & Heflin. DLDB: Extending Relational Databases to Support Semantic Web Queries. ISWC, 2003.]
  • Slide 15
  • Ontology Databases: Active Database Approach Person Female Male [LePendu, et al. Ontology Database: a New Method for Semantic Modeling and an Application to Brainwave Data. SSDBM, 2008.] ON INSERT into Male INSERT into Person On INSERT into Female INSERT into Person
  • Slide 16
  • Ontology Databases: Active Database Approach Person Female Male ON INSERT into Male INSERT into Person On INSERT into Female INSERT into Person
  • Slide 17
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 18
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 19
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 20
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 21
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 22
  • Ontology Databases: Active Database Approach Person Female Male
  • Slide 23
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)
  • Slide 24
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)
  • Slide 25
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited)
  • Slide 26
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) }
  • Slide 27
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) } Just look it up!
  • Slide 28
  • All sisters are siblings. Hilary and Lynn are sisters. This is what we know : This is what we want to know : Who are siblings? Obviously, the answer should be : Hilary and Lynn are siblings. Example: sisters-siblings (revisited) { | siblingOf(x,y) } Just look it up! { }
  • Slide 29
  • Lehigh University Benchmark (LUBM) Load Time and Query Time (1.5 million facts) (10 Universities, 20 Departments) [Guo, et al. LUBM: A Benchmark for OWL Knowledge Base Systems. J Web Semantics, 2005.]
  • Slide 30
  • Ontology-based Data Management [Frishkoff, et al. Development of Neural Electromagnetic Ontologies (NEMO): Ontology-based Tools for Representation and Integration of Event-related Brain Potentials. ICBO, 2009]
  • Slide 31
  • Ontology-based Query Answering Return all data instances that belong to ERP pattern classes which have a surface positivity over frontal regions of interest and are earlier than the N400. Which patterns have a region of interest that is left-occipital and manifests between 220 and 300ms? What is the range of intensity mean for the region of interest for N100? Show the region of interest for all ERP patterns that occur between 0 and 300ms. Which PCA factor do P100 patterns most often appear in? What is the range of intensity mean for the region of interest for N100 patterns? Show the patterns whose region of interest is left occipital and occurs between 220 and 300ms.
  • Slide 32
  • Inconsistency Detection Background and Motivation Expressiveness From disjunctions to negations Theory Not-gadgets Motivation Serotonin example ATP-gated cation channel activity Results from ZFIN and MGI Annotations
  • Slide 33
  • Not-gadgets
  • Slide 34
  • Example: inconsistency detection "Annotations in this way sometimes point to errors in the type- type relationships described in the ontology. An example is the recent removal of the type serotonin secretion as an is_a child of neurotransmitter secretion from the GO Biological Process ontology. This modification was made as a result of an annotation from a paper showing that serotonin can be secreted by cells of the immune system where it does not act as a neurotransmitter. [Hill, et al. Gene Ontology annotations: what they mean and where they come from. BMC Bioinformatics, 2008]
  • Slide 35
  • gene-x not-gadget fail! Example: serotonin secretion
  • Slide 36
  • Example: GO:0004931 ATP-gated cation channel activity (as of 3/09): [Term] id: GO:0004931 name: ATP-gated cation channel activity namespace: molecular_function def: "Catalysis of the transmembrane transfer of an ion by a channel that opens when extracellular ATP has been bound by the channel complex or one of its constituent parts." [GOC:mah, PMID:9755289] comment: Note that this term refers to an activity and not a gene product. Consider also annotating to the molecular function term 'purinergic nucleotide receptor activity ; GO:0001614'. synonym: "P2X activity" RELATED [] synonym: "purinoceptor" BROAD [] synonym: "purinoreceptor" BROAD [] is_a: GO:0005231 ! excitatory extracellular ligand-gated ion channel activity is_a: GO:0005261 ! cation channel activity
  • Slide 37
  • Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):
  • Slide 38
  • Example: GO:0004931 What is so interesting about GO:0004391? ZFINZDB-GENE-030319-2p2rx2NOTGO:0004931ZFIN:ZDB- PUB-031031-8|PMID:14580944IDAFpurinergic receptor P2X, ligand-gated ion channel, 2gene taxon:795520071005ZFIN ZFINZDB-GENE-030319-2p2rx2GO:0004931ZFIN:ZDB- PUB-031031-8|PMID:14580944IGIZFIN:ZDB-GENE-000427-3 Fpurinergic receptor P2X, ligand-gated ion channel, 2 genetaxon:795520071005ZFIN Source: [1/13/2009] http://www.geneontology.org/gene-associations/http://www.geneontology.org/gene-associations/
  • Slide 39
  • Example: GO:0004931 The not-gadget will raise a logical inconsistency. p2rx2NOTGO:0004931 p2rx2GO:0004931 GO_0004931 * Tables starting with an '_' are negations. not-gadget fail! _GO_0004931 p2rx2 _GO_0004931 p2rx2
  • Slide 40
  • Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):
  • Slide 41
  • Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):
  • Slide 42
  • Example: GO:0004931 GO:0004391 sub-graph (using Jambalaya):
  • Slide 43
  • ZFIN
  • Slide 44
  • Slide 45
  • MGI
  • Slide 46
  • Slide 47
  • Slide 48
  • ZFIN - MGI
  • Slide 49
  • ZFIN
  • Slide 50
  • Outcome: suspect IEA annotations
  • Slide 51
  • GO Online SQL Environment (GOOSE) Source: [1/13/2009] http://www.geneontology.org/GO.database.shtml#diagram pos,IEA (graph_path x association) x neg (grapth_path x association)
  • Slide 52
  • What do logical inconsistencies mean? Several possibilities: Incorrect annotation (e.g., suspect IEA annotations) Incorrect relationship (e.g., serotonin secretion) Incomplete model: Recall: ZFINZDB-GENE-030319-2p2rx2GO:0004931 ZFIN:ZDB-PUB-031031-8|PMID:14580944IGI ZFIN:ZDB-GENE-000427-3Fpurinergic receptor P2X, ligand-gated ion channel, 2gene taxon:795520071005ZFIN Perfectly admissible!
  • Slide 53
  • Next Directions Explanation and proof-reconstruction Deep (data) annotation tools Distributed network of Ontology Databases
  • Slide 54
  • Data Annotation: Neural ElectroMagnetic Ontologies LFRON RFRON frontocentral [Frishkoff, et al. ERP measures of partial semantic knowledge: Left temporal indices of skill differences and lexical quality. Biological Psychology, 2009.]
  • Slide 55
  • Network of Ontology Databases [Thorisson, Muilu and Brookes. Genotypephenotype databases: challenges and solutions for the post-genomic era. Nature Reviews, 2009.]
  • Slide 56
  • Thank you Questions?
  • Slide 57
  • Slide 58
  • Andreas Example Is John supervised by a TopManager who is a friend of an AreaManager? [Franconi. Ontologies and databases: myths and challenges. VLDB, 2008.]
  • Slide 59
  • Raymond Reiter [Reiter. Deductive Question-Answering on Relational Data Bases. Logic and Data Bases, 1977]
  • Slide 60
  • Raymond Reiter
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Benchmarking Suite
  • Slide 65
  • Origins
  • Slide 66
  • CIS @ UO
  • Slide 67
  • Research Areas in Computer Science: software engineering programming languages human-computer interaction parallel and distributed computing networking and graph theory scientific computation/visualization information integration and mining Affiliates: Neurosciences Institute Computational Science Institute Zebrafish Information Network
  • Slide 68
  • Ontology-based Data Access [Rodriguez-Muro, et al. Realizing Ontology Based Data Access: A plug-in for protg. ICDEW, 2008.]