semantic web & semantic web processes

Download Semantic Web  &  Semantic Web Processes

Post on 25-Feb-2016

38 views

Category:

Documents

2 download

Embed Size (px)

DESCRIPTION

Semantic Web & Semantic Web Processes. A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix , Inc. Special Thanks: Cartic Ramakrishnan , Karthik Gomadam. - PowerPoint PPT Presentation

TRANSCRIPT

  • Semantic Web & Semantic Web ProcessesA course at Universidade da Madeira, Funchal, PortugalJune 16-18, 2005

    Dr. Amit P. ShethProfessor, Computer Sc., Univ. of GeorgiaDirector, LSDIS labCTO/Co-founder, Semagix, IncSpecial Thanks: Cartic Ramakrishnan, Karthik Gomadam

  • Agenda 1Part IWhat is Semantic Web? What makes the Semantic web Ontologies importance of relationships and knowledgeRepresentation and Languages Why XML is not enough Describe semantic web resources- RDF and RDFS OWLQuery processing and storage Part IIMetadata, Enabling techniques and technologiesOntology and knowledge engineering: ontology design, ontology population maintaining, ontology freshness Automated metadata extraction and annotationComputation and reasoning with focus on relationshipsExample commercial Semantic Web platform

  • Agenda 2Part IIISemantic web applications: search, integration, analysisPan-Web and consumer-centricEnterprise Part IVSemantic Web Services and ProcessesWhat are Web Services ?What are Web processes ?Creating Web processes: Annotation, discovery, composition, etc. Semantic Web Service/Process tools

  • Part IWhat is Semantic Web? What makes the Semantic web Ontologies importance of relationships and knowledgeTypes and examples of ontologiesMetadata and Semantic Annotation -- metadata classifications Representation and Languages Why XML is not enough RDF - Describe semantic web resources and RDFS - RDF as a triple, RDF as a graph (show example RDF/S) OWLRDF Query processing and storage

  • Three generation of Information Systems:Where we have come from, where we are going

  • Broad Scope of Semantic (Web) TechnologyOther dimensions: how agreements are reached,Lots of UsefulSemanticTechnology(interoperability,Integration)Cf: Guarino, Gruber

  • What is the Semantic Web?"The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001 OntologiesRDF/RDFS or OWL Syntax machine processableSemantic Metadata annotation of web resources

  • An ontology is a specification of a conceptualization (T. Gruber)A conceptualization is the way we think about a domainA specification provides a formal way of writing it down

    Building Ontologies from the Ground Up When users set out to model their professional activity Mark Mussen

  • Conceptualization and Ontologyhttp://www.w3c.it/events/minerva20040706/guarino.pdf

  • Central Role of OntologyOntology represents agreement, represents common terminology/nomenclatureOntology is populated with extensive domain knowledge or known facts/assertionsKey enabler of semantic metadata extraction from all forms of content:unstructured text (and 150 file formats)semi-structured (HTML, XML) and structured dataOntology is in turn the center piece that enablesresolution of semantic heterogeneity semantic integrationsemantically correlating/associating objects and documents

  • Types of Ontologies (or things close to ontology)Upper ontologies: modeling of time, space, process, etcBroad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ; Domain-specific or Industry specific ontologiesNews: politics, sports, business, entertainmentFinancial MarketTerrorismPharmaGlycO, ProPreO(GO (a nomenclature), UMLS inspired ontology, ), MGEDApplication Specific and Task specific ontologiesAnti-money launderingEquity ResearchRepertoire ManagementFinancial irregularity

    Fundamentally different approaches in developing ontologies at the two end of the above spectrum

  • Building ontologyThree broad approaches:social process/manual: many years, committeesCan be based on metadata standardautomatic taxonomy generation (statistical clustering/NLP): limitation/problems on quality, dependence on corpus, namingDescriptional component (schema) designed by domain experts; Description base (assertional component, extension) using automated processes from trusted knowledge sourcesOption 2 is being investigated in several research projects; Option 3 is currently supported by Semagix Freedom

  • SUMO -- http://ontology.teknowledge.com/

  • Part of the CYC Upper Ontology

    http://www.cyc.com/cyc/technology/whatiscyc_dir/whatdoescycknow

  • SWETO (Semantic Web Testbed Ontology) Current StatusDeveloped using Semagix technology for free non-commercial usage by the SW community; some initial usersV1.4 population includes over 800,000 entities and over 1,500,000 explicit relationships among them Continue to populate the ontology with diverse sources thereby extending it in multiple domains, new smaller and larger release due soon; RDF and OWL versionsSignificant information for provenance/trust support [UMBC partnership]97% of disambiguation performed automatically, 2% manually; not quite a high-quality as an evaluation testset (e.g., low connectivity)Working on test harness, quality measures, and benchmarks

  • Expressiveness Range: Knowledge Representation and Ontologies Catalog/IDGeneralLogicalconstraintsTerms/glossaryThesaurinarrowertermrelationFormalis-aFrames(properties)Informalis-aFormalinstanceValue RestrictionDisjointness, Inverse, part ofSimple TaxonomiesExpressive OntologiesWordnetCYCRDFDAMLOODB SchemaRDFSIEEE SUOOWLUMLSGOKEGGGlycOSWETOPharmaOntology Dimensions After McGuinness and Finin

  • Gene Ontology (GO)Comprises three independent ontologiesmolecular function of gene productscellular component of gene productsbiological process representing the gene products higher order role.Uses these terms as attributes of gene products in the collaborating databases (gene product associations)Allows queries across databases using GO terms, providing linkage of biological information across specieshttp://www.geneontology.org/

  • GO = Three OntologiesMolecular Function elemental activity or taskexample: DNA bindingCellular Component location or complexexample: cell nucleusBiological Process goal or objective within cellexample: secretion

    http://www.geneontology.org/

  • GlycOGlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycansContains 770 classes describe structural features of glycansURL: http://lsdis.cs.uga.edu/projects/glycomics/glyco is a focused ontology for the description of glycomicsmodels the biosynthesis, metabolism, and biological relevance of complex glycansmodels complex carbohydrates as sets of simpler structures that are connected with rich relationships

  • GlycO statistics: Ontology schema can be large and complex770 classes142 slotsInstances Extracted with Semagix Freedom:69,516 genes (From PharmGKB and KEGG)92,800 proteins (from SwissProt)18,343 publications (from CarbBank and MedLine)12,308 chemical compounds (from KEGG)3,193 enzymes (from KEGG)5,872 chemical reactions (from KEGG)2210 N-glycans (from KEGG)

  • GlycO taxonomyThe first levels of the GlycO taxonomyMost relationships and attributes in GlycOGlycO exploits the expressiveness of OWL-DL.Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.

  • Query and visualization

  • A biosynthetic pathwayGNT-I attaches GlcNAc at position 2

  • The impact of GlycOGlycO models classes of glycans with unprecedented accuracyImplicit knowledge about glycans can be deductively derivedExperimental results can be validated according to the model

  • N-Glycosylation Process (NGP)Cell CultureGlycoprotein FractionGlycopeptides FractionextractSeparation technique IGlycopeptides Fractionn*mnSignal integrationData correlationPeptide FractionPeptide Fractionms datams/ms datams peaklistms/ms peaklistPeptide listN-dimensional arrayGlycopeptide identificationand quantificationproteolysisSeparation technique IIPNGaseMass spectrometryData reductionData reductionPeptide identificationbinningn1By N-glycosylation Process, we mean the identification and quantification of glycopeptides

  • ProPreO - Experimental Proteomics Process OntologyProPreO models the phases of proteomics experiment using five fundamental concepts:Data: (Example: a peaklist file from ms/ms raw data)

    Data_processing_applications: (Example: MASCOT* search engine)

    Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF)

    Parameter_list: describes the different types of parameter lists associated with experimental phases

    Task: (Example: component separation, used in chromatography) *http://www.matrixscience.com/

  • Semantic Annotation of Scientific Data830.9570 194.9604 2580.2985 0.3592688.3214 0.2526779.4759 38.4939784.3607 21.77361543.7476 1.38221544.7595 2.99771562.8113 37.47901660.7776 476.5043

    ms/ms peaklist data

    830.9570194.96042

    Annotated ms/ms peaklist data

  • Semantic annotation of Scientific DataAnnotated ms/ms peaklist data

    830.9570194.96042

  • Syntax for Onologies and MetadataWhy not use XML?Why use OWL?Or for that matter why RDF?So many questions

  • From XML to OWLXML surface syntax for structured documentsimposes no semantic constraints on the meaning of these documents. XML Schema is a language for restricting the structure of XML documents. RDF is a datamodel for objects ("resources") and relations between them, provides a simple semantics for this datamodelthese datamodels can be represented in an XML syntax. RDF Schema is a vocabulary for describing properties and classes of RDF resources with a semanti