semantic web & semantic web processes

Download Semantic Web  &  Semantic Web Processes

Post on 11-Feb-2016

39 views

Category:

Documents

3 download

Embed Size (px)

DESCRIPTION

Semantic Web & Semantic Web Processes. A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix , Inc. Special Thanks: Cartic Ramakrishnan , Karthik Gomadam. - PowerPoint PPT Presentation

TRANSCRIPT

  • Semantic Web & Semantic Web ProcessesA course at Universidade da Madeira, Funchal, PortugalJune 16-18, 2005

    Dr. Amit P. ShethProfessor, Computer Sc., Univ. of GeorgiaDirector, LSDIS labCTO/Co-founder, Semagix, IncSpecial Thanks: Cartic Ramakrishnan, Karthik Gomadam

  • Semantic (Web) Technology Applications

    Part III (b) Representative Enterprise Applications

  • Visualizer Content: BSBQ Application

  • -- Breaking News --Gore Demands That Recount RestartGore Says Fla. Can't Name Electors Bush Meets Colin Powell at RanchMarket Tumbles on Earnings WarningBarak Outlines His Peace Plan TALLAHASSEE, Florida (CNN) Though the two presidential candidates have until noon Wednesday to file briefs in Al Gore's appeal to the Florida Supreme Court, the outcome of two trials set on the same day in Leon County, Florida, may offer Gore his best hope for the presidency. Democrats in Seminole County are seeking to have 15,000 absentee ballots thrown out in that heavily Republican jurisdiction -- a move that would give Gore a lead of up to 5,000 votes statewide. Lawyers for the plaintiff, Harry Jacobs, claim the ballots should be rejected because they say County Elections Supervisor Sandra Goard allowed Republican workers to fill out voter identification numbers on 2,126 incomplete absentee ballot applications sent in by GOP voters, while refusing to allow Democratic workers to do the same thing for Democratic voters.

    The GOP says that suit, and one similar to it from Martin County, demonstrates Democratic Party politics at its most desperate. Gore is not a party to either of those lawsuits. On Tuesday, the judge in the(1.33) - 12/06/00 - ABC(2.33) - 12/06/00 - CBS(3.12) - 12/06/00 - NNS(0.32) - 12/06/00 - CBS(1.33) - 12/06/00 - CBSDescriptionProduced by : CNN Posted Date : 12/07/2000 Reporter : David Lewis Event : Election 2000 Location : Tallahassee, Florida, USAPeople : Al Gore

  • Equity Research Dashboard with Blended Semantic Querying and BrowsingSheth et al, 2002 Managing Semantic Content for the Web

  • Global BankAim Legislation (PATRIOT ACT) requires banks to identify who they are doing business with

    Problem Volume of internal and external data needed to be accessed Complex name matching and disambiguation criteria Requirement to risk score certain attributes of this data

    Approach Creation of a risk ontology populated from trusted sources (OFAC etc); Sophisticated entity disambiguation Semantic querying, Rules specification & processing

    Solution Rapid and accurate KYC checks Risk scoring of relationships allowing for prioritisation of results Full visibility of sources and trustworthinessAmit Sheth, From Semantic Search & Integration to Analytics, Proceedings of the KMWorld, October 26, 2004, Santa Clara, CA.

  • Ahmed Yaseer appears on Watchlistmember of organizationworks for CompanyAhmed Yaseer: Appears on Watchlist FBI Works for Company WorldCom Member of organization HamasThe Process

  • Global Investment BankExample of Fraud Prevention application used in financial servicesWorld Wide Web contentPublic RecordsBLOGS, RSSUn-structure text, Semi-structured Data Watch ListsLaw EnforcementRegulatorsSemi-structured Government Data

  • Law Enforcement AgencyAim Provision of an overarching intelligence system that provides a unified view of people and related information

    Problem Need to create unique entities from across multiple disparate, non-standardised databases; Requirement to disambiguate dirty data Need to extract insight from unstructured text

    Approach Multiple database extractors to disambiguate data and form relevant relationships Modelling of behaviours/patterns within very large ontology (6Mn+ entities)

    Solution Merged and linked case data from multiple sources using effective identification, disambiguation, and link analysis Dynamic annotation of documents Single query across multiple datasets 360 view of an individual and relevant associations

  • Application of bespoke and pre-configured profiles for detailed investigationProfile CreationComplex QueryingSummary of ResultsInvestigation

    Profile CreationComplex QueryingSummary of ResultsInvestigation

  • User configurable scoring profiles Profile based on direct matching with case characteristics Profiling based on link analysis through indirect relationships with other cases and informationProfile CreationComplex QueryingSummary of ResultsInvestigation

  • Free text searching across aggregated information sourcesGisondi, white ford expedition, main street, assault, traffic offencesProfile CreationComplex QueryingSummary of ResultsInvestigation

  • Unified view of direct and indirect results that best match the complex query and the profileProfile CreationComplex QueryingSummary of ResultsInvestigation

  • Knowledge Annotation of known entities from within free textDirect and indirect relationship scoring driven by risk weightingsAggregated knowledge from disparate sourcesProfile CreationComplex QueryingSummary of ResultsInvestigation

  • Scoring of key characteristics to drive relevance to original profile and query Identification of investigation pathVisualisation of resultsProfile CreationComplex QueryingSummary of ResultsInvestigation

  • Schematic of Ontological Approach to the Legitimate Access Problem

  • Legitimate Document Access Control Applicationdemo

  • Ontology QualityMany real-world ontologies may be described as semi-formal ontologies populated with partial or incomplete knowledgemay contain occasional inconsistencies, or occasionally violate constraints (e.g. all schema level constraints may not be observed in the knowledgebase that instantiates the ontology schema) often ontology is populated by many persons or by extracting and integrating knowledge from multiple sources analogy is dirty data which is usually a fact of life in most enterprise databases.From Semantic Search & Integration to Analytics

  • Ontology Representation ExpressivenessApplications vary in terms of expressiveness of representation needed.Trade-off between expressive power and computational complexity applies both to knowledge creation/maintenance and to inference mechanisms for such languages. It is often very difficult to capture the knowledge that instantiates the more expressive constructs/constraints. Many business applications end up using models/languages that lie closer to less expressive languages. On the other hand, we have seen a few applications, especially in scientific domains such as biology, where more expressive languages are needed.

  • Ontology Size / Population / FreshnessOntology population is critical. Among the ontologies developed by Semagix or using its technology, a median size of ontology is over 1 million instances/facts and relationship instances each (at least two have exceeded 10 million instances). This level of knowledge makes the system very powerful (as it is applied . Furthermore, in many cases, it is necessary to keep these ontologies current or updated with facts and knowledge on a daily or more frequent basis. Both the scale and freshness requirements dictate that populating ontologies with instance data needs to be automated.

  • Metadata Extraction Large scale metadata extraction and semantic annotation is possible. IBM WebFountain [Dill et al 2003] demonstrates the ability to annotate on a Web scale (i.e., over 4 billion pages), while Semagix Freedom related technology [Hammond et al 2002] demonstrates capabilities that work for a few million documents per day per server. However, the general trade-off of depth versus scale applies. Storage and manipulation of metadata for millions to hundreds of millions of content items requires database techniques with the challenge of improving performance and scale in presence of more complex structures

  • Semantic Technology Building BlocksA vast majority of the Semantic (Web) Technology Applications that have been developed or envisioned rely on three crucial capabilities: ontology creation, semantic annotation (metadata extraction) and querying/inferencing. Enterprise-scale applications share many requirements in these three respects with pan Web applications. All these capabilities must scale to many millions of documents and concepts (rather than hundreds to thousands) for current applications, and applications requiring billions of documents and concepts have also been discussed (esp. in intelligence and government space) but not yet deployed.

  • Primary Technical Capabilities/Key Research ChallengesTwo of the most basic semantic techniques are named entity identification, and semantic ambiguity resolution. [It would be nice to have relationship extraction too.] A tool for annotation is of little value if it does not support ambiguity resolution. Both require highly multidisciplinary approaches, borrowing for NLP/lexical analysis, statistical and IR techniques and possibly machine learning techniques. A high degree of automation is possible in meeting many real-world semantic disambiguation requirements, although pathological cases will always exist and complete automation is unlikely.

  • Content HeterogeneitySupport for heterogeneous content is key it is too hard to deploy separate products within a single enterprise to deal with structured, semi-structured and unstructured data/content management. New applications involve extensive types of heterogeneity in format, media and access/delivery mechanisms (e.g., news feed in RSS, NewsML news, Web posted article in HTML or served up dynamically through database query and XSLT transformation, analyst report in PDF or WORD, subscription se