Steffen Staab (1)ISWeb – Informationssysteme & Semantic Web
Ontologies and the Semantic Web
SMBM-2006Jena, April 9, 2006
Steffen Staabhttp://isweb.uni-koblenz.de
Steffen Staab (2)ISWeb – Informationssysteme & Semantic Web
Agenda
1. Ontologies2. Semantic Web3. Semantic Web Languages4. Some Applications (Ontoprise)5. Ontologies & Text
Steffen Staab (3)ISWeb – Informationssysteme & Semantic Web
Part I
Introduction to Ontologies
Steffen Staab (4)ISWeb – Informationssysteme & Semantic Web
Origin and History
• Ontology in Philosophy• a philosophical discipline, branch of philosophy
that deals with the nature and the organization of reality
• Science of Being (Aristotle, Metaphysics, IV, 1)
• Tries to answer the questions:• What characterizes being?• Eventually, what is being?
Steffen Staab (5)ISWeb – Informationssysteme & Semantic Web
Aristotle - Ontology• Before: study of the nature of being
• Since Aristotle: study of knowledge representation and reasoning
• Terminology:– Genus: (Classes)– Species: (Subclasses)– Differentiae: (Characteristics which allow to group or
distinguish objects from each other)• Syllogisms (Inference Rules)
Steffen Staab (6)ISWeb – Informationssysteme & Semantic Web
Example for differentiae (adapted from Uta Priss, in preparation)
XXXOsmond
XXXCopito
XXNemo
XXXBugs Bunny
XXXSnoopy
XXXGarfield
mammalkoalagorillafishrabbitdogcatcartoonreal
Steffen Staab (7)ISWeb – Informationssysteme & Semantic Web
Organizing the Objects as a Lattice
Steffen Staab (8)ISWeb – Informationssysteme & Semantic Web
What is an Ontology?Gruber 93:
An Ontology is aformal specificationof a sharedconceptualizationof a domain of interest
⇒ Executable, Discussable⇒ Group of persons⇒ About concepts⇒ Between application
and „unique truth“
Steffen Staab (9)ISWeb – Informationssysteme & Semantic Web
Why Develop an Ontology?• To make domain assumptions explicit
– Easier to change domain assumptions– Easier to understand and update legacy data
• To separate domain knowledge from operational knowledge– Re-use domain and operational knowledge
separately
• A community reference for applications
• To share a consistent understanding of what information means
Steffen Staab (10)ISWeb – Informationssysteme & Semantic Web
Taxonomy
Object
Person Topic Document
ResearcherStudent Semantics
OntologyDoctoral Student
Taxonomy := Segmentation, classification and ordering of elements into a classification system according to theirrelationships between each other
PhD Student F-Logic
Menu
Steffen Staab (11)ISWeb – Informationssysteme & Semantic Web
Thesaurus
Object
Person Topic Document
ResearcherStudent Semantics
PhD StudentDoktoral Student
• Terminology for specific domain• Taxonomy plus fixed relationships (similar, synonym, related to) • originate from bibliography
similarsynonym
OntologyF-Logic
Menu
Steffen Staab (12)ISWeb – Informationssysteme & Semantic Web
Topic Map
Object
Person Topic Document
ResearcherStudent Semantics
PhD StudentDoktoral Student
knows described_in
writes
AffiliationTel
• Topics (nodes), relationships and occurences (to documents)• ISO-Standard• typically for navigation- and visualisation
OntologyF-Logic
similarsynonym
Menu
Steffen Staab (13)ISWeb – Informationssysteme & Semantic Web
OntologyF-Logic
similar
OntologyF-Logic
similarPhD StudentDoktoral Student
Ontology (in our sense)
Object
Person Topic Document
Tel
PhD StudentPhD Student
Semantics
knows described_in
writes
Affiliationdescribed_in is_about
knowsP writes D is_about T P T
DT T D
Rules
subTopicOf
• Representation Language: Predicate Logic (F-Logic)• Standards: RDF(S); OWL
ResearcherStudent
instance_of-1
is_a-1
is_a-1
is_a-1
Affiliation
Affiliation
York Sure
AIFB+49 721 608 6592
Steffen Staab (14)ISWeb – Informationssysteme & Semantic Web
Ontologies - Some Examples• General purpose ontologies:
– DOLCE, http://www.loa-cnr.it/DOLCE.html– The Upper Cyc Ontology, http://www.cyc.com/cyc-2-1/index.html– IEEE Standard Upper Ontology, http://suo.ieee.org/
• Domain and application-specific ontologies:– GALEN, http://www.openclinical.org/prj_galen.html– Foundational Model of Anatomy, http://sig.biostr.washington.edu/projects/fm/AboutFM.html– RETSINA Calendering Agent, http://ilrt.org/discovery/2001/06/schemas/ical-full/hybrid.rdf– Dublin Core, http://dublincore.org/
• Semantic Desktop Ontologies– Semantics-Aware instant Messaging: SAM Ontology,
http://www.uni-koblenz.de/FB4/Institutes/IFI/AGStaab/Research/sam– Haystack, http://haystack.lcs.mit.edu/– Gnowsis, http://www.gnowsis.org/– Piggybank, http://simile.mit.edu/piggy-bank/
• Web Services Ontologies– Core ontology of services http://cos.ontoware.org– Web Service Modeling ontology http://www.wsmo.org– OWL-S, http://www.daml.org/services/owl-s/1.0/
• Ontologies in a wider sense– GO - Gene Ontology, http://www.geneontology.org/– UMLS, http://www.nlm.nih.gov/research/umls/– Agrovoc, http://www.fao.org/agrovoc/– Art and Architecture, http://www.getty.edu/research/tools/vocabulary/aat/– DTD standardizations, e.g. HR-XML, http://www.hr-xml.org/– WordNet / EuroWordNet, http://www.cogsci.princeton.edu/~wn
Steffen Staab (15)ISWeb – Informationssysteme & Semantic Web
Ontologies and Their Relatives
Catalog / ID
Terms/Glossary
Thesauri
InformalIs-a
FormalIs-a
FormalInstance
Frames
ValueRestric-tions
Generallogical
constraints
AxiomsDisjointInverseRelations,...
Steffen Staab (16)ISWeb – Informationssysteme & Semantic Web
Ontologies and Their Relatives (cont´d)Front-End
Back-End
Topic Maps
Extended ER-Models
Thesauri
Predicate Logic
Semantic Networks
Taxonomies
Ontologies
Navigation
Queries
Sharing of Knowledge
Information Retrieval
Query Expansion
Mediation Reasoning
Consistency CheckingEAI
Steffen Staab (17)ISWeb – Informationssysteme & Semantic Web
Applications of Ontologies• Natural Language Processing and Machine Translation, e.g. Nirenburg et al.
2004, Maedche et al. 2001, Agirre et al. 1996, Beale et al. 1995• Semantic Web, see http://www.w3.org/2001/sw/ and
http://www.w3.org/2001/sw/WebOnt/• Knowledge Engineering & Management, e.g. Fensel 2001, Mullholland et al.
2000; Staab & Schnurr, 2000; Sure et al., 2000, Abecker et al. 1997• Electronic Commerce, e.g. RosettaNet3 and Ontology.org4• Information Retrieval and Information Integration, e.g. Kashyap, 1999; Mena
et al., 1998; Wiederhold, 1992• Intelligent Search Engines, e.g. WebKB (Martin et al. 2000), SHOE (Heflin &
Hendler, 2000), OntoSeek (Guarino et al., 1999), Ontobroker (Decker et al., 1999)
• Digital Libraries, e.g. Amann & Fundulaki, 1999• Enhanced User Interfaces, e.g. (Kesseler, 1996), Inxight5• Software Agents, e.g. OnTo-agents, FIPA, (Gluschko et al., 1999; Smith &
Poulter, 1999)• Business Process Modeling, e.g. Decker et al., 1997; TOVE, 1995; Uschold et
al., 1998
Steffen Staab (18)ISWeb – Informationssysteme & Semantic Web
Overview Literature
S. Staab, R. Studer. Handbook on Ontologies. Springer, 2004.
Steffen Staab (1)ISWeb – Informationssysteme & Semantic Web
Part II
Introduction to Semantic Web
Steffen Staab (2)ISWeb – Informationssysteme & Semantic Web
Syntax is not sufficient
Andreas
• Tel
Steffen Staab (3)ISWeb – Informationssysteme & Semantic Web
Information Convergence• Convergence not just in devices, also in “information”
– Your personal information (phone, PDA,…)Calendar, photo, home page, files…
– Your “professional” life (laptop, desktop, … Grid)Web site, publications, files, databases, …
– Your “community” contexts (Web)Hobbies, blogs, fanfic, social networks…
• The Web teaches us that people will work to share– How do we CREATE, SEARCH, and BROWSE in
the non-text based parts of our lives?
Steffen Staab (4)ISWeb – Informationssysteme & Semantic Web
WWW vs. Semantic Web WWW :=
Hypertext &
Internet &
Social Phenomenon
Semantic Web :=
Semantic Web Language/Data &
Ontologies &
Internet &
Social Phenomenon
Steffen Staab (5)ISWeb – Informationssysteme & Semantic Web
XML is unspecific:No predetermined vocabularyNo semantics for relationships
& must be specified upfront
Only possible in close cooperations– Small, reasonably stable group– Common interests or authorities
Not possible in the Web or on a broad scale in general !
Let’s try XML
Steffen Staab (6)ISWeb – Informationssysteme & Semantic Web
CV
name
education
work
private
Meaning of Informationen:(or: what it means to be a computer)
Steffen Staab (7)ISWeb – Informationssysteme & Semantic Web
CV
name
education
work
private
< >
< >
< >
< >
< >
< Χς >
< ναµε >
<εδυχατιον>
<ωορκ>
<πριϖατε>
XML ≠ Meaning, XML = Structure
Steffen Staab (8)ISWeb – Informationssysteme & Semantic Web
Some Principal Ideas
• URI – uniform resource identifiers• XML – common syntax• Interlinked• Layers of semantics –
from database to knowledge base to proofs
Design principles of WWW applied to Semantics!!
Tim Berners-Lee, Weaving
the Web
Steffen Staab (9)ISWeb – Informationssysteme & Semantic Web
The Semantic Web on one Slide
EmployeeEmployee
PostDocPostDoc ProfessorProfessor
PersonPerson
rdfs:subClass rdfs:subClass
rdfs:subClass
cooperatesWithcooperatesWith
rdfs:Rangerdfs:DomainOntology
<swrc:Professorrdf:ID="person_sst">
<swrc:name>Steffen Staab</swrc:name>
...</swrc:Professor>
http://www.uni-koblenz.de/~staab
rdf:typerdf:type
Meta-data
<swrc:PostDoc rdf:ID="person_sha"><swrc:name>Siegfried Handschuh</swrc:name>
...</swrc:PostDoc>
Webpage
http://www.deri.ie/~shaURL
<swrc:cooperatesWith rdf:resource = "http://www.uni-koblenz.de/~staab/
#person_sst"/>
swrc:cooperatesWith
Steffen Staab (10)ISWeb – Informationssysteme & Semantic Web
The Semantic Web - Inference
swrc:name
swrc:member
swrc:homepage
swrc:cooperatesWith
swrc:member
swrc:project
swrc:project
DERI
Handschuh
swrc:affiliation
Visualization of a Logic Representation:
OWL, F-Logic, etc.
Nepomuk
Steffen Staab (11)ISWeb – Informationssysteme & Semantic Web
Proof
The new Semantic Web Stack
URI Unicode
RDF Core
Spa
rQL
RDF Schema
DLP bit of OWL/Rule Enc
rypt
ion
Sig
natu
re
OWL Rules
Trust
Logic framework
XML Namespaces
Tim Berners-Lee, ISWC November 2005, http://www.w3.org/2005/Talks/1110-iswc-tbl/#(12)
Steffen Staab (12)ISWeb – Informationssysteme & Semantic Web
Knowledge Provisioning
Steffen Staab (13)ISWeb – Informationssysteme & Semantic Web
Tools for markup...
PhotoStuff Demo
Steffen Staab (14)ISWeb – Informationssysteme & Semantic Web
Semi-automatic
Steffen Staab (15)ISWeb – Informationssysteme & Semantic Web
Not tied to specific domains
M-OntoMat is publicly available http://acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html
Shape erasure
Shape Color selection
Visual Descriptor selection
Draw panel
Descriptor extraction
Shape selection
VDE plug-in launch
Domain Ontology Browser Selected
region
Save Prototype Instances
Steffen Staab (16)ISWeb – Informationssysteme & Semantic Web
Shared Workspace(Xarop + Screenshot)
Steffen Staab (17)ISWeb – Informationssysteme & Semantic Web
Social networks:e.g. Friend of a Friend (FOAF)
• Say stuff about yourself (or others) in OWL files, link to who you “know”
Estimates of the number of Foaf users range from 2M-5M
Steffen Staab (18)ISWeb – Informationssysteme & Semantic Web
Using FOAF in other contexts
http://trust.mindswap.orgJennifer Golbeck
Steffen Staab (19)ISWeb – Informationssysteme & Semantic Web
Get a B&N price (In Euros)
Steffen Staab (20)ISWeb – Informationssysteme & Semantic Web
Of a particular book
Steffen Staab (21)ISWeb – Informationssysteme & Semantic Web
In its German edition?
Steffen Staab (22)ISWeb – Informationssysteme & Semantic Web
Steffen Staab (23)ISWeb – Informationssysteme & Semantic Web
Now.• RDF, RDFS and OWL are ready for prime time
– Designs are stable, implementations maturing
• Major Research investment translating into application development and commercial spinoffs
– Adobe 6.0 embraces RDF
– IBM releases tools, data and partnering
– HP extending Jena to OWL
– OWL Engines by Ontoprise GmbH, Network Inference, Racer GmbH
– Ontoprise is a strategic partner for Oracle and Software AG
– Proprietary OWL ontologies for vertical markets• c.f. pharmacology, HMO/health care, ... Soft drinks
Steffen Staab (24)ISWeb – Informationssysteme & Semantic Web
Now: Plenty of annotations –unfortunately, not in the open
• Taggings are daily practice:– Flickr, http://www.flickr.com/– Delicious, http://del.icio.us/– Cite-u-like, http://www.citeulike.org/– Bibsonomy,…
• Plenty of annotations– Dooyoo, E-pinions– Quipe, http://www.quipe.com/ – Froogle, http://froogle.google.com/– Google Base, http://base.google.com/– RSS– E-Science data curation,
http://www.jisc.ac.uk/index.cfm?name=pub_escience– Semantic Wikis
• Web 2.0– would be easier with Semantic Web!
Steffen Staab (25)ISWeb – Informationssysteme & Semantic Web
The Semantic Wave
(Berners-Lee, 03)
YOUARE
HERE2006
YOUARE
HERE2003
Steffen Staab (26)ISWeb – Informationssysteme & Semantic Web
Semantic Technologies vs. Semantic Web
Semantic Technologies• Used by „Early Adopters“
• Mature
– Deductive Databases(Research since early 80ies)
– Description logics(Research since late 70ies)
- Ontobroker (Research prototypesince 1990; commercial since 1999)
• A lot of knowledge about integration withexisting technology (databases, modelling, …)
Semantic Web• Still „research-oriented“
• Currently: Used in Intranets
• Currently: Used for internetapplications with simple ontologies (Dublin Core, RSS, PICS, FOAF,…)
• Quite some way to go for fullfledged success, initial take-upnow by some focus groups
Steffen Staab (27)ISWeb – Informationssysteme & Semantic Web
Application areas for SemanticTechnologies
• Software engineering: conceptual approachesneed semantic interchange language
• Data description:– Databases in bioinformatics– Multimedia data (complementary to MPEG 7/21)
• Data integration: data exchange benefits fromsemantic interchange language
• „Plug n‘play“ for dynamic(not necessarily „automatic“!!!) business process configuration: needs rich semantic descriptions
Steffen Staab (28)ISWeb – Informationssysteme & Semantic Web
Prospectives of Semantic Web or
WWW vs. Semantic Web revisited
WWW :=
Hypertext &
Internet &
Social Phenomenon
Semantic Web :=
Semantic Web Language/Data &
Ontologies &
Internet &
Social Phenomenon
WithoutSocial Phenomenon
= Intranet
WithoutSocial Phenomenon
= Semantic DataIntegration
New and important
paradigms at their time, but
„less“ outreach
Steffen Staab (29)ISWeb – Informationssysteme & Semantic Web
„Less“ vs „More“ Outreach
„Less“ equals a multi-billion dollar market
„More“ equals a change as radical as triggered by the WWW
Steffen Staab (30)ISWeb – Informationssysteme & Semantic Web
Overview Literature
Frank van Harmelen, Grigoris Antinou. Semantic Web Primer, MIT Press 2005.
Steffen Staab (1)ISWeb – Informationssysteme & Semantic Web
Part III
Semantic Web Languages
Steffen Staab (2)ISWeb – Informationssysteme & Semantic Web
RDF
Steffen Staab (3)ISWeb – Informationssysteme & Semantic Web
RDF Data Model• Resources
– A resource is a thing you talk about (can reference)– Resources have URI’s– RDF definitions are itself Resources (linkage)
• Properties – slots, defines relationship to other resources or
atomic values• Statements
– “Resource has Property with Value”– (Values can be resources or atomic XML data)
• Similar to Frame Systems
Steffen Staab (4)ISWeb – Informationssysteme & Semantic Web
A simple Example
• Statement– “Ora Lassila is the creator of the resource
http://www.w3.org/Home/Lassila”
• Structure– Resource (subject)
http://www.w3.org/Home/Lassila– Property (predicate) http://www.schema.org/#Creator– Value (object) "Ora Lassila”
• Directed graphhttp://www.w3.org/Home/Lassila s:Creator Ora Lassila
Steffen Staab (5)ISWeb – Informationssysteme & Semantic Web
Another Example
• To add properties to Creator, point through a intermediate Resource.
http://www.w3.org/Home/Lassila
s:Creator
Person://fi/654645635
Name
Ora Lassila [email protected]
Steffen Staab (6)ISWeb – Informationssysteme & Semantic Web
Collection Containers
• Multiple occurrences of the same PropertyTypedoesn’t establish a relation between the values– The Millers own a boat, a bike, and a TV set– The Millers need (a car or a truck)– (Sarah and Bob) bought a new car
• RDF defines three special Resources:– Bag unordered values rdf:Bag– Sequence ordered values rdf:Seq– Alternative single value rdf:Alt
• Core RDF does not enforce ‘set’ semantics amongst values
Steffen Staab (7)ISWeb – Informationssysteme & Semantic Web
Example: BagThe students in
course 6.001are Amy, Tim,John, Mary,and Sue
Rdf:Bag
/Students/Amy
/Students/Tim
/Students/John
/Students/Mary
/Students/Sue
bagid1
/courses/6.001
students
rdf:type
rdf:_1
rdf:_2
rdf:_3
rdf:_4
rdf:_5
Steffen Staab (8)ISWeb – Informationssysteme & Semantic Web
Example: Alternative• The source code for X11 may be found at
ftp.x.org, ftp.cs.purdue.edu, or ftp.eu.net
http://x.org/package/X11rdf:Alt
ftp.x.org
ftp.cs.purdue.edu
ftp.eu.net
altid
rdf:type
rdf:_1
rdf:_2
rdf:_3
Steffen Staab (9)ISWeb – Informationssysteme & Semantic Web
Statements about Statements (Requirement 2: Dispute Statements)
• Making statements about statementsrequires a process for transforming them into Resources– subject the original referent– predicate the original property type– object the original value– type rdf:Statement
Steffen Staab (10)ISWeb – Informationssysteme & Semantic Web
Example: Reification
• Ralph Swick believes that – the creator of the resource
http://www.w3.org/Home/Lassila is OraLassila
rdf:Statement
rdf:type
genid1
Ralph Swick
b:believedBy
http://www.w3.org/Home/Lassila
rdf:subject
Ora Lassila
rdf:object
s:Creatorrdf:predicate
s:Creator
Steffen Staab (11)ISWeb – Informationssysteme & Semantic Web
RDF Syntax I• Datamodel does not enforce particular syntax• Specification suggests many different syntaxes
based on XML• General form:
<rdf:RDF><rdf:Description about="http://www.w3.org/Home/Lassila">
<s:Creator>Ora Lassila</s:Creator><s:createdWith rdf:resource=“http://www.w3c.org/amaya”/>
</rdf:Description></rdf:RDF>
Starts an RDF-Description
Properties
Subject (OID)
Literal
Resource (possibly another RDF-description)
Steffen Staab (12)ISWeb – Informationssysteme & Semantic Web
Resulting Graph
<rdf:RDF><rdf:Description about="http://www.w3.org/Home/Lassila">
<s:Creator>Ora Lassila</s:Creator><s:createdWith rdf:resource=“http://www.w3c.org/amaya”/>
</rdf:Description></rdf:RDF>
http://www.w3c.org/amaya
http://www.w3.org/Home/Lassila
Ora Lassila
s:createdWiths:Creator
Steffen Staab (13)ISWeb – Informationssysteme & Semantic Web
RDF Syntax II: Syntactic Varieties
<s:Homepage rdf:about="http://www.w3.org/Home/Lassila”s:Creator=“Ora Lassila”/>
<s:Title>Ora's Home Page</s:Title><s:createdWith><s:HTMLEditor rdf:about=“http://www.w3c.org/amaya”/>
</s:createdWith> </s:Homepage>
Typing Information In-Element Property
Property
Subject (OID)
http://www.w3c.org/amaya
http://www.w3.org/Home/Lassila
Ora Lassila
s:createdWiths:Creator
HTMLEditor
s:Homepagerdf:type
rdf:type
Steffen Staab (14)ISWeb – Informationssysteme & Semantic Web
RDF Schema (RDFS)
• RDF just defines the datamodel• Need for definition of vocabularies for the
datamodel - an Ontology Language!• RDF schemas are Web resources (and
have URIs) and can be described using RDF
Steffen Staab (15)ISWeb – Informationssysteme & Semantic Web
RDF-Schema: Example
rdfs:Resource
xyz:MotorVehicle rdfs:Class
s s t
t
xyz:Truck
s
t
xyz:PassengerVehicle
s = rdfs:subClassOf t = rdf:type
xyz:Van s
s
xyz:MiniVan s
s
t t
t
Steffen Staab (16)ISWeb – Informationssysteme & Semantic Web
Rdfs:subclassOf<rdfs:description about=„Xyz:Minivan“>
<rdfs:subclassOf about=„xyz:Van“/></rdfs:description><rdfs:description about=„myvan“>
<rdf:type about=„xyz:MiniVan“/></rdfs:description>
Predicate Logic Consequences:
Forall X: type(X,MiniVan) -> type(X,Van). Forall X: subclassOf(X,MiniVan) -> subclassOf(X,Van).
Steffen Staab (17)ISWeb – Informationssysteme & Semantic Web
Rdf:property<rdf:description about=„possesses“>
<rdf:type about=„....property“/><rdfs:domain about=„person“/><rdfs:range about=„vehicle“/>
</rdf:description><rdf:description about=„peter“>
<possesses>petersminivan</possesses></rdf:description>
Predicate Logic Consequences:Forall X,Y: possesses (X,Y) -> (type(X,person) &
type(Y,vehicle)).
Steffen Staab (18)ISWeb – Informationssysteme & Semantic Web
OWL
Steffen Staab (19)ISWeb – Informationssysteme & Semantic Web
OWL
• OWL1.0 (acronym for Web OntologyLanguage) is a W3C recommendation
• OWL stems from a family of logics, called„description logics“
Steffen Staab (20)ISWeb – Informationssysteme & Semantic Web
Description Logics(Terminological Logics, DLs)
• Fragments of FOL• Most often decidable• Moderately expressive• Stem from semantic networks• W3C Standard OWL DL corresponds to
SHOIN(D)
Steffen Staab (21)ISWeb – Informationssysteme & Semantic Web
DLs – general structure• DLs are a Family of logic-based formalism for
knowledge representation • Special language characterized by:
– Constructors to define complex concepts and roles based on simpler ones.
– Set of axiom to express facts using concepts, roles and individuals.
• ALC is the smallest DL, which is propositionally closed: – ∧, ∨, ¬ are constructors, noted by u, t, ¬.– Quantors define how roles are to be interpreted:
Man u ∃hasChild.Female u ∃hasChild.Maleu ∀hasChild.(Rich t Happy)
Steffen Staab (22)ISWeb – Informationssysteme & Semantic Web
Further DL concepts and role constructors
• Number restrictions (cardinality constraints) for roles:≥3 hasChild, ·1hasMother
• Qualified number restrictions:≥2 hasChild.Female, ·1 hasParent.Male
• Nominals (definition by extension): {Italy, France, Spain}
• Concrete domains (datatypes): hasAge.(≥21)
• Inverse roles: hasChild– ≡ hasParent• Transitive roles: hasAncestor* (descendant)• Role composition: hasParent.hasBrother (uncle)
Steffen Staab (23)ISWeb – Informationssysteme & Semantic Web
DL Knowledge Bases• DL Knowledge Bases consist of two parts (in general):
– TBox: Axioms, describing the structure of a modelled domain (conceptual schema):
• HappyFather ≡ Man u ∃hasChild.Female u …• Elephant v Animal u Large u Grey• transitive(hasAncestor)
– Abox: Axiome describing concrete situations (data, facts):• HappyFather(John)• hasChild(John, Mary)
• The distinction between TBox/ABox does not have a deep logical distinction… but it is common useful modelling practice.
Steffen Staab (24)ISWeb – Informationssysteme & Semantic Web
General DL Architecture
Knowledge Base
Tbox (schema)
Abox (data)
Man ≡ Human u Male
Happy-Father ≡ Man u ∃ has-child.Female u …
Happy-Father(John)
has-child(John, Mary) Infe
ren
ce S
yst
em
Inte
rface
Steffen Staab (25)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling in OWLExample ontology and conclusion from
http://owl.man.ac.uk/2003/why/latest/#2• Also an example for OWL Abstract Syntax.
Namespace(a = <http://cohse.semanticweb.org/ontologies/people#>)Ontology(
ObjectProperty(a:drives) ObjectProperty(a:eaten_by) ObjectProperty(a:eats inverseOf(a:eaten_by) domain(a:animal)) …Class(a:adult partial annotation(rdfs:comment "Things that are adult.") Class(a:animal partial restriction(a:eats someValuesFrom (owl:Thing))) Class(a:animal_lover complete intersectionOf(restriction(a:has_pet
minCardinality(3)) a:person)) …)
Steffen Staab (26)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling: examplesClass(a:bus_driver complete intersectionOf(a:person
restriction(a:drives someValuesFrom (a:bus))))
Class(a:driver complete intersectionOf(a:personrestriction(a:drives someValuesFrom (a:vehicle))))
Class(a:bus partial a:vehicle)
• A bus driver is a person that drives a bus. • A bus is a vehicle. • A bus driver drives a vehicle, so must be a driver. The subclass is inferred due to subclasses being used
in existential quantification.
bus_driver ≡ person u ∃drives.bus
bus v vehicle
driver ≡ person u ∃drives.vehicle
Steffen Staab (27)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling: examplesClass(a:driver complete intersectionOf(a:person restriction(a:drives
someValuesFrom (a:vehicle))))
Class(a:driver partial a:adult)
Class(a:grownup complete intersectionOf(a:adult a:person))
• Drivers are defined as persons that drive cars (complete definition) • We also know that drivers are adults (partial definition) • So all drivers must be adult persons (e.g. grownups)
An example of axioms being used to assert additional necessaryinformation about a class. We do not need to know that a driver isan adult in order to recognize one, but once we have recognized a driver, we know that they must be adult.
driver ≡ person u ∃drives.vehicle
driver v adult
grownup ≡ adult u person
Steffen Staab (28)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling: ExamplesClass(a:cow partial a:vegetarian)DisjointClasses(unionOf(restriction(a:part_of someValuesFrom
(a:animal)) a:animal) unionOf(a:plant restriction(a:part_ofsomeValuesFrom (a:plant))))
Class(a:vegetarian complete intersectionOf( restriction(a:eatsallValuesFrom (complementOf(restriction(a:part_ofsomeValuesFrom (a:animal))))) restriction(a:eats allValuesFrom(complementOf(a:animal))) a:animal))
Class(a:mad_cow complete intersectionOf(a:cow restriction(a:eatssomeValuesFrom (intersectionOf(restriction(a:part_ofsomeValuesFrom (a:sheep)) a:brain)))))
Class(a:sheep partial a:animal restriction(a:eats allValuesFrom(a:grass)))
• Cows are naturally vegetarians• A mad cow is one that has been eating sheeps brains• Sheep are animalsThus a mad cow has been eating part of an animal, which is
inconsistent with the definition of a vegetarian
∃partof.animal t animal ≡ plant t ∃partof.plant/
Steffen Staab (29)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling: ExampleIndividual(a:Walt type(a:person) value(a:has_pet a:Huey)
value(a:has_pet a:Louie) value(a:has_pet a:Dewey)) Individual(a:Huey type(a:duck)) Individual(a:Dewey type(a:duck)) Individual(a:Louie type(a:duck)) DifferentIndividuals(a:Huey a:Dewey a:Louie) Class(a:animal_lover complete intersectionOf(a:person
restriction(a:has_pet minCardinality(3))))ObjectProperty(a:has_pet domain(a:person) range(a:animal))
• Walt has pets Huey, Dewey and Louie. • Huey, Dewey and Louie are all distinct individuals. • Walt has at least three pets and is thus an animal lover.
Note that in this case, we don’t actually need to include person in the definition of animal lover (as the domain restriction will allowus to draw this inference).
Steffen Staab (30)ISWeb – Informationssysteme & Semantic Web
Knowledge modelling: SomeResearch Challenges
• Concluding with– uncertainty (fuzzy, probabilistic)– Inkonsistencies (paraconsistent)– Rules– Further AI-Paradigms (nonmonotonic reasoning,
preferences …)• Maintenance (updates, infrastructure, etc)• Scalability of reasoning• …
www.ontoprise.de
© 2006 ontoprise GmbH - 1 -Home | Menu | Technology | References | End
Application Scenario: Semantic Inference
www.ontoprise.de
© 2006 ontoprise GmbH - 2 -Home | Menu | Technology | References | End
Background
Complex dependencies decrease the speed of development
Knowledge is distributed over different departments
Goal
Design of a Semantic Guide for capturing the dependenciesConfiguration of components
Integration into existing order systemEngineers can concentrate on creative
efforts
Audi: Semantic Testcar Configuration
www.ontoprise.de
© 2006 ontoprise GmbH - 3 -Home | Menu | Technology | References | End
Inference: to conclude implicit facts
Testengine has been tested and released
Chassis 17 is suited for 110 KW
Testengine has 104 KW
Rule 1: A Chassis has to be suited for the power of the engine
Rule 2: All parts have to be tested and released
Testengine is ready to test in a car
Fit of Testengine and Chassis 17
www.ontoprise.de
© 2006 ontoprise GmbH - 4 -Home | Menu | Technology | References | End
Application Scenario:Semantic Data Integration and Search
www.ontoprise.de
© 2006 ontoprise GmbH - 5 -Home | Menu | Technology | References | End
Knowledge Management for your Projects
Users keep their establishedsoftware tools
A knowledge model (ontology) both integrates and structuresthe information
The ontology is enriched withspecific expertise
The ontology empowers a context-aware and easy-to-usesearch and navigationsystem
All information to stay in theiroriginal place
? ??
Object
Person Topic Document
TechnicianDecisionMaker Content ApplicationMetho-
dology
www.ontoprise.de
© 2006 ontoprise GmbH - 6 -Home | Menu | Technology | References | End
GoalsImprove the effectivity and quality of work
Integrated serach over multiple sources
Usage of an ontology to improve results
Simple interface
Proof of Concept for SemanticWebtechnology for whole group
FactsUsers: 1000 peopleproject duration: 2 months
Deutsche Post IT Solutions (DHL group)
A Companywide Search- & Integration-Project
www.ontoprise.de
© 2006 ontoprise GmbH - 7 -Home | Menu | Technology | References | End
Editorial Process for Ontology Evolution
www.ontoprise.de
© 2006 ontoprise GmbH - 8 -Home | Menu | Technology | References | End
Application Scenario:Semantic Data Integration - II
www.ontoprise.de
© 2006 ontoprise GmbH - 9 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 10 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 11 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 12 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 13 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 14 -Home | Menu | Technology | References | End
Integration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 15 -Home | Menu | Technology | References | End
Vielfältige IntegrationsproblemeIntegration Problems
Value Conflict
Structure Conflict
Name Conflict
Languages
Duplicates
Missing Information
Multiple Interfaces
www.ontoprise.de
© 2006 ontoprise GmbH - 16 -Home | Menu | Technology | References | End
Import and Mapping of DB-Structures
www.ontoprise.de
© 2006 ontoprise GmbH - 17 -Home | Menu | Technology | References | End
Application Scenario:Intelligent Question Answering
www.ontoprise.de
© 2006 ontoprise GmbH - 18 -Home | Menu | Technology | References | End
Background
• Development of a Digital Aristotle• Phase 1 successfully closed in 2003• Phase 2 since January 2004
Functions
• Capturing of extensive set of chemical knowledge
• System passed the „Advanced Placement Test“
• Query is answered and answer is explained
Vulcan Inc: OntoBroker passes Advanced Placement Test
www.ontoprise.de
© 2006 ontoprise GmbH - 19 -Home | Menu | Technology | References | End
Ontobroker™ passed the Advanced Placement Test!Correct AnswersCorrect Explanations
PerformanceCYCORP 1650 MinutesStudent 240 MinutesStanford Research 38 MinutesOntoprise 9 Minutes
Steffen Staab (1)ISWeb – Informationssysteme & Semantic Web
Semantic Web Applications –on the Internet
Steffen Staab (2)ISWeb – Informationssysteme & Semantic Web
Conceptual architecture for semantic portal
Sources
Commondata model
CommonSemantics
Presentation& Use
Selection
P2PRelationalDatabase
...
K-EdutellaWrapper
Rel-DBWrapper
FileS
X(HT)ML Wrapper
Integration
OntologIE
Presen-tationView
Presen-tationView
View
HTML Page
RDF Output
HTML Form
...
...
QueryAPI
RDF API
RDF
InputView
Datenbank
Navi-Gation(HTML)
Navi-gationView
[IEEE Data Engineering, 2002]
Steffen Staab (3)ISWeb – Informationssysteme & Semantic Web
OntoWeb Community
AnnotatedWeb PagesGenerated
Content Objects
Participating Site2
{ }
Participating Siten
{ }
Participating Site1
{ }
...
OntologyBrowse & Query
Front End
ContentSyndication
Service
http://www.ontoweb.org
OntoWeb-Portal[CRIS 2002]
EU IST Projekt OntoWeb
Steffen Staab (4)ISWeb – Informationssysteme & Semantic Web
P2P Application: Bibster
Steffen Staab (5)ISWeb – Informationssysteme & Semantic Web
Ontologies & Text
Part V
Steffen Staab (6)ISWeb – Informationssysteme & Semantic Web
OL from Text as Reverse Engineering
Reverse Engineering
Write
Shared World Model
Steffen Staab (7)ISWeb – Informationssysteme & Semantic Web
Some pre-History of Ontology Learning
• AI: Knowledge Acquisition– Since 60s/70s: Semantic Network Extraction and similar for Story Understanding
• Systems: e.g. MARGIE (Schank et al., 1973), LUNAR (Woods, 1973)
• NLP: Lexical Knowledge Extraction– 70s/80s: Extraction of Lexical Semantic Representations from Machine
Readable Dictionaries• Systems: e.g. ACQUILEX LKB (Copestake et al.)
– 80s/90s: Extraction of Semantic Lexicons from Corpora for Information Extraction Systems
• Systems: e.g. AutoSlog (Riloff, 1993), CRYSTAL (Soderland et al., 1995)
• IR: Thesaurus Extraction– Since 60s: Extraction of Keywords, Thesauri and Controlled Vocabularies
• Based on construction and use of thesauri in IR (Sparck-Jones, 1966/1986, 1971)• Systems: e.g. Sextant (Grefenstette, 1992), DR-Link (Liddy, 1994)
Steffen Staab (8)ISWeb – Informationssysteme & Semantic Web
Some Current Work on Ontology Learning from TextTerm Extraction
• Statistical Analysis• Patterns• (Shallow) Linguistic Parsing• Term Disambiguation & Compositional Interpretation• Combinations
Taxonomy Extraction• Statistical Analysis & Clustering (e.g. FCA)• Patterns• (Shallow) Linguistic Parsing• WordNet• Combinations
Relation Extraction• Anonymous Relations (e.g. with Association Rules)• Named Relations (Linguistic Parsing)• (Linguistic) Compound Analysis• Web Mining, Social Network Analysis• Combinations
Relation Label Extraction• Extension of Association Rules Algorithm
Definition Extraction• (Linguistic) Compound Analysis (incl. WordNet)
Steffen Staab (9)ISWeb – Informationssysteme & Semantic Web
Some Current Work on Ontology Learning from Text
AIFB – TextToOnto (Maedche and Staab, 2000; Cimiano et al., 2005)– Term Extraction and Taxonomy Extraction
• Statistical Analysis• Conceptual Clustering (FCA), Patterns, WordNet (+ Combination)
– Relation Extraction• Anonymous Relations (Association Rules)• Named Relations (Subcategorization Frames)
CNTS Univ. Antwerpen, VUB (Reinberger et al., 2004)– Concept Formation + Relation Extraction
• Shallow Linguistic Parsing• Clustering
DFKI – OntoLT (Buitelaar et al., 2004), RelExt (Schutz and Buitelaar, 2005)– Term Extraction
• Shallow Linguistic Parsing & Statistical Analysis– Taxonomy and Relation Extraction
• Shallow Linguistic Parsing & manually defined mapping rules• Named Relations (Subcategorization Frames)
Steffen Staab (10)ISWeb – Informationssysteme & Semantic Web
Some Current Work on Ontology Learning from Text
Economic Univ., Prague (Kavalec and Svatek, 2005)– Relation Label Extraction
• Extension of Association Rules Algorithm
Free Univ. Amsterdam (Sabou, 2005)– Term and Taxonomy Extraction (for Web Service Ontologies)
• Shallow Linguistic Analysis & Patterns
Jozef Stefan Inst., Ljubljana -- OntoGen (Fortuna et al., 2005)– Term and Taxonomy Extraction
• Statistical Analysis & Clustering– Relations
• Web Mining, Social Network Analysis
Univ. Paris -- ASIUM (Faure and Nedellec, 1998)– Taxonomy Extraction (& Subcategorization Frames)
• Shallow Linguistic Parsing• Clustering
Steffen Staab (11)ISWeb – Informationssysteme & Semantic Web
Univ. Rome – OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005)– Term Extraction and Interpretation
• Shallow Linguistic Parsing &Term Disambiguation & Compositional Interpretation
– Relations• Classification of the relation between terms in a compound into predefined
set of (thematic) relations– Definitions
• Rules for Gloss Generation
Univ. of Zürich (Rinaldi et al., 2005)– Term and Taxonomy Extraction
• Shallow Linguistic Analysis & Patterns
Some Current Work on Ontology Learning from Text
Steffen Staab (12)ISWeb – Informationssysteme & Semantic Web
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming / also available as Springer book, end of 2006
Ontology Learning Layer Cake
Steffen Staab (13)ISWeb – Informationssysteme & Semantic Web
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Ontology Learning Layer Cake
Steffen Staab (14)ISWeb – Informationssysteme & Semantic Web
TermsTerms are at the basis of the ontology learning process
– Terms express more or less complex semantic units– But what is a term?
Huge Selection of Top Brand Computer Terminals Available for Immediate DeliveryBecause Vecmar carries such a large inventory of high-quality computer terminals, including: ADDS terminals, Boundless terminals, DEC terminals,HP terminals, IBM terminals, LINK terminals, NCR terminals and Wyse terminals, your order can often ship same day. Every computer terminal shipped to you is protected with careful packing, including thick boxes. All of our shipping options - including international - are available through major carriers.
– Extracted term candidates (phrases)
- computer- terminal- computer terminal- ? high-quality computer terminal- ? top brand computer terminal- ? HP terminal, DEC terminal, …
Steffen Staab (15)ISWeb – Informationssysteme & Semantic Web
Term ExtractionDetermine most relevant phrases as terms
– Linguistic Methods• Rules over linguistically analyzed text
– Linguistic analysis – Part-of-Speech Tagging, Morphological Analysis, …– Extract patterns – Adjective-Noun, Noun-Noun, Adj-Noun-Noun, …– Ignore Names (DEC, HP, …), Certain Adjectives (quality, top, …), etc.
– Statistical Methods• Co-occurrence (collocation) analysis for term extraction within the
corpus• Comparison of frequencies between domain and general corpora
– Computer Terminal will be specific to the Computer domain– Dining Table will be less specific to the Computer domain
– Hybrid Methods• Linguistic rules to extract term candidates• Statistical (pre- or post-) filtering
Steffen Staab (16)ISWeb – Informationssysteme & Semantic Web
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Ontology Learning Layer Cake
Steffen Staab (17)ISWeb – Informationssysteme & Semantic Web
Extraction of Synonyms
Term Classification and Clustering
– Classification• Classifying terms to existing class systems, e.g., by
extending WordNet (with SynSets corresponding to classes)
– Clustering• Clusters according to similar distributions, e.g., by
measuring co-occurrence between terms
Steffen Staab (18)ISWeb – Informationssysteme & Semantic Web
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Ontology Learning Layer Cake
Steffen Staab (19)ISWeb – Informationssysteme & Semantic Web
The Semiotic TriangleOgden & Richards, 1923
• based on Structural Linguistics studies (de Saussure, 1916)
• adopted in Knowledge Representation (e.g. Sowa, 1984)
Steffen Staab (20)ISWeb – Informationssysteme & Semantic Web
Concepts: Intension, Extension, LexiconA term may indicate a concept, if we can define its
– Intension• (in)formal definition of the set of objects that this concept describes
– a disease is an impairment of health or a condition of abnormal functioning
– Extension• a set of objects (instances) that the definition of this concept describes
– influenza, cancer, heart disease, …
Discussion: what is an instance? - ‘heart disease’ or ‘my uncle’s heart disease’
– Lexical Realizations• the term itself and its multilingual synonyms
– disease, illness, Krankheit, maladie, …
Discussion: synonyms vs. instances – ‘disease’, ‘heart disease’, ‘cancer’, …
Steffen Staab (21)ISWeb – Informationssysteme & Semantic Web
Concepts: IntensionExtraction of a Definition for a Concept from Text
– Informal Definition• e.g., a gloss for the concept as used in WordNet• OntoLearn (Navigli and Velardi, 2004; Velardi et al., 2005) uses natural
language generation to compositionally build up a WordNet gloss for automatically extracted concepts
– ‘Integration Strategy’ : “strategy for the integration of …”
– Formal Definition• e.g., a logical form that defines all formal constraints on class
membership• Inductive Logic Programming, Formal Concept Analysis, …
Steffen Staab (22)ISWeb – Informationssysteme & Semantic Web
Concepts: ExtensionExtraction of Instances for a Concept from Text
– Commonly referred to as Ontology Population– Relates to Knowledge Markup (Semantic Metadata)– Uses Named-Entity Recognition and Information
Extraction
– Instances can be:
• Names for objects, e.g.– Person, Organization, Country, City, …
• Event instances (with participant and property instances), e.g.– Football Match (with Teams, Players, Officials, ...)– Disease (with Patient-Name, Symptoms, Date, …)
Steffen Staab (23)ISWeb – Informationssysteme & Semantic Web
Concepts: LexiconExtraction of Synonyms and Translations for a Concept from Text
– (Multilingual) Term Extraction – see previous slides– Representation of Lexical Information in Ontologies
rdfs:subClassOf
rdfs:subClassOfmeta-
classes
classes
instances
rdfs:Class
feat:ClassWithFeats
o:StorageProductfeat:ClassWithFeats
o:Refrigeratorfeat:ClassWithFeats
feat:imgFeatfeat:lingFeat
if:ImgFeat
lf:LingFeat
rdfs:Class
rdfs:Class
lf:lang “de”lf:term “Kühlschrank”lf:morphlf:context
lf:LingFeat
lf:head “Schrank”lf:pos “noun”
lf:Morph
...
if:color “#111111”if:shape “cuboid”lf:texture “&keypatchSet_223”
if:ImgFeat
URIrdf:type
property ...Lege
nd
o:Cupboardfeat:ClassWithFeats
feat:lingFeat
lf:lang “de”lf:term “Schrank”lf:morphlf:context
lf:LingFeat
...
...
...
Steffen Staab (24)ISWeb – Informationssysteme & Semantic Web
The Mathematical Definition of an Ontology [Stumme et al.; abbrev. from Cimiano-06]
• Structure:
– C: set of concept identifiers– R: set of relation identifiers– <C partial order on C (concept hierarchy) – <R: partial order on R (relation hierarchy)– Signature:
– Mathematical definition of extension of concepts [c] and relations [r]
– L-Axiom System: Arbitrary Axioms (may include patterns)
+→CR:σ
),,,,(: σRC RCC <<=
Steffen Staab (25)ISWeb – Informationssysteme & Semantic Web
LexiconDef: A Lexicon for an ontology is a structure
Lex:={SC,SR,RefC,RefR}
SC,SR are called signs for concepts and relations, respectively.
RefC,RefR, are binary relations denoting lexical referencesfor concepts and relations, respectively.
Example: RefC(„car“)={car-concept1,car-concept2}RefC(„automobile“)={car-concept1}RefC-1(car-concept1)={„car“, „automobile“}
Steffen Staab (26)ISWeb – Informationssysteme & Semantic Web
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, illness, hospital
{disease, illness, Krankheit}
DISEASE:=<Int,Ext,Lex>
is_a(DOCTOR,PERSON)
cure(dom:DOCTOR,range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx →∀
Ontology Learning Layer Cake
Steffen Staab (27)ISWeb – Informationssysteme & Semantic Web
Distributional Hypothesis & Vector Space Model
• Harris, 1986– „Words are (semantically) similar to the extent to which they share
similar words“• Firth, 1957
– „You shall know a word by the company it keeps“
• Idea: collect context information and represent it as a vector:
• compute similarity among vectors wrt. a measure
XXexcursionXXtrip
XXXXmotor-bikeXXXcar
XXapartment
join_objride_objdrive_objrent_objbook_obj
Steffen Staab (28)ISWeb – Informationssysteme & Semantic Web
Context Features• Four-grams [Schuetze 93]
• Word-windows [Grefenstette 92]
• Predicate-Argument relations (every man loves a woman)Modifier Relations (fast car, the hood of the car)– [Grefenstette 92, Cimiano 04b, Gasperin et al. 03]
• Appositions (Ferrari, the fastest car in the world)– [Hahn & Schnattinger 98, Caraballo 99]
• Coordination (ladies and gentlemen)– [Caraballo 99, Dorow and Widdows 03]
Steffen Staab (29)ISWeb – Informationssysteme & Semantic Web
Overall Process
Or other clustering mechanism
Steffen Staab (30)ISWeb – Informationssysteme & Semantic Web
Using Syntactic Surface Dependencies
Mopti is the biggest city along the Niger with one of the most vibrant ports and a large bustling market. Mopti has a traditional ambience that other towns seem to have lost. It is also the center ofthe local tourist industry and suffers from hard-sell overload. The nearby junction towns of Gao and San offer nice views over the Niger’s delta.
city: biggest(1)ambience: traditional(1)center: of_tourist_industry(1)junction town: nearby(1)market: bustling(1)port: vibrant(1)overload:suffer_from(1)tourist industry: center_of(1), local(1)town: seem_subj(1)view: nice(1), offer_obj(1)
Steffen Staab (31)ISWeb – Informationssysteme & Semantic Web
Context Extraction Process• extract syntactic dependencies from text
⇒ verb/object, verb/subject, verb/PP relations⇒ car: drive_obj, crash_subj, sit_in, …
s
LoPar
vpdp
v dp
tgrep
crashed_subj(cars)sat_in(car)drove_obj(car)
sit_in(car)crash_subj(car)drive_obj(car)
lemmatization
Steffen Staab (32)ISWeb – Informationssysteme & Semantic Web
Weighting
• Observation:– output of the parser can be erroneous– not all attribute/object pairs are significant
• Conditional Probability:
• Consider attribute/object pairs with weightover threshold t
)|( argvnP
Steffen Staab (33)ISWeb – Informationssysteme & Semantic Web
Set Theoretical & Probabilistic Clustering
XXexcursion
XXtrip
XXXXmotor-bike
XXXcar
XXapartment
joinableridabledrivablerentablebookable
• Set theoretical– Formal Concept Analysis
[Ganter and Wille 1999]
Steffen Staab (34)ISWeb – Informationssysteme & Semantic Web
Tourism Formal Context
XXexcursion
XXtrip
XXXXmotor-bike
XXXcar
XXappartment
joinablerideabledriveablerentablebookable
Steffen Staab (35)ISWeb – Informationssysteme & Semantic Web
Tourism Lattice
Steffen Staab (36)ISWeb – Informationssysteme & Semantic Web
Concept Hierarchybookable
rentable joinable
driveable appartment
car
motor-bike
tripexcursion
rideable
Steffen Staab (37)ISWeb – Informationssysteme & Semantic Web
Compacting the hierarchybookable
rentable joinable
driveable appartment
carmotor-bike
tripexcursion
Steffen Staab (38)ISWeb – Informationssysteme & Semantic Web
Evaluation - Data Sets
• Tourism (118 Mio. tokens):– http://www.all-in-all.de/english– http://www.lonelyplanet.com– British National Corpus (BNC)– handcrafted tourism ontology (289 concepts)
• Finance (185 Mio. tokens):– Reuters news from 1987– GETESS finance ontology (1178 concepts)
Steffen Staab (39)ISWeb – Informationssysteme & Semantic Web
Precision/Recall/F-MeasureFCA (Tourism)
0
0,2
0,4
0,6
0,8
1
1,2
0 0,2 0,4 0,6 0,8 1
threshold t
PrecRecallF
Steffen Staab (40)ISWeb – Informationssysteme & Semantic Web
Lexical Recall, F‘
FCA (Tourism)
00,050,1
0,150,2
0,250,3
0,350,4
0,450,5
0 0,2 0,4 0,6 0,8 1
threshold t
FLRF'
Steffen Staab (41)ISWeb – Informationssysteme & Semantic Web
Comparison (Tourism, F‘)
Comparison (Tourism)
00,05
0,10,15
0,20,25
0,30,35
0,40,45
0,5
0 0,2 0,4 0,6 0,8 1
threshold t
FCAComplete LinkageAverage LinkageSingle LinkageBi-Section-Kmeans
Steffen Staab (42)ISWeb – Informationssysteme & Semantic Web
Comparison (Finance, F‘)
Comparison (Finance)
00,05
0,10,15
0,20,25
0,30,35
0,40,45
0 0,2 0,4 0,6 0,8 1
threshold t
FCAComplete-LinkageAverage LinkageSingle LinkageBi-Section-Kmeans
Steffen Staab (43)ISWeb – Informationssysteme & Semantic Web
Clustering – Comparison
Weak-FairO(n2)36.42/32.77%DivisiveClustering
FairO(n2 log(n))O(n2)O(n2)
36.78/33.35%36.55/32.92%38.57/32.15%
AgglomerativeClustering
GoodO(2n)(pract. better!)
43.81/41.02%FCA
UnderstandabilityWorst Case Time Complexity
F-Measure
Steffen Staab (44)ISWeb – Informationssysteme & Semantic Web
Problem 1: Labeling of Clusters• Caraballo’s Method [1999]:
– Agglomerative Clustering– Labeling Clusters with hypernyms derived from
Hearst patterns– Removing unlabeled concepts thus compacting the
hierarchy
• Evaluation: select 20 nouns with at least 20 hypernyms and present them to human judges with the 3 best hypernyms for each
• Results: – Best Hypernym (33% (Majority) / 39% (Any)– Any Hypernym (47.5% (Majority) / 60.5% (Any))
Steffen Staab (45)ISWeb – Informationssysteme & Semantic Web
Problem 2: Spurious Similarities• Guided Clustering [Cimiano 2005c]:
– Integrate a externally derived hypernym oracle into the agglomerative clustering algorithm
– Two terms are only clustered if they have a common hypernym according to the oracle
– Label the cluster with the common hypernym⇒Demonstrably better hierarchies⇒Labels for the cluster
⇒Reuse techniques from Clustering with constraints!
Steffen Staab (46)ISWeb – Informationssysteme & Semantic Web
Conclusion about Comparison
• FCA is an interesting alternative to similarity-based clustering approaches– high traceability due to intensional description
of clusters– Problem: worst case exponential in the size of
the formal context– But: Zipfian distribution of attributes
Steffen Staab (47)ISWeb – Informationssysteme & Semantic Web
Using Ontologies withText Retrieval
Steffen Staab (48)ISWeb – Informationssysteme & Semantic Web
Using Ontologies
Ontologies as:
• background knowledge for text clustering and classification
• basis for recommender systems• background knowledge in ILP• knowledge for models in Statistical
Relational Learning
Steffen Staab (49)ISWeb – Informationssysteme & Semantic Web
Text Clustering & Classification Approaches
clustering/classification
algorithm
DocumentsBag of Words
backgroundknowledge
oman has granded …Obj1 2 2Obj2 1 1Obj3 2 …Obj4 2 …
1 …0 …
0 00 0
Steffen Staab (50)ISWeb – Informationssysteme & Semantic Web
Omanhas grantedtermcrudeoilcustomersretroactivediscounts...
211112111...
Bag of WordsDok 17892 crude ============= Oman has granted term crude oil customers retroactive discounts from official prices of 30 to 38 cents per barrel on liftings made during February, March and April, the weekly newsletter Middle East Economic Survey (MEES) said. MEES said the price adjustments, arrived at through negotiations between the Omani oil ministry and companies concerned, are designed to compensate for the difference between market-related prices and the official price of 17.63 dlrs per barrel adopted by non-OPEC Oman since February. REUTER
Documents
Further preprocessing steps-Stopwords-Stemming
Text Clustering & Classification Approaches
Steffen Staab (51)ISWeb – Informationssysteme & Semantic Web
109377 Concepts(synsets)
WordNet as an example and ontology
144684 lexicalentries
Rootentity
something
physical object
artifact
substance
chemicalcompound
organiccompound
lipid
oil
EN:oil
covering
coating
paint
oil paint
cover
cover with oil
bless
oil, anoint
EN:anoint EN:inunct
oil colorcrude oil
144684 lexicalentries
Use of superconcepts(Hypernyms in Wordnet)
• Exploit more generalized concepts• e.g.: chemical compound is the 3rd superconcept of oil
Strategies:all, first, context
Steffen Staab (52)ISWeb – Informationssysteme & Semantic Web
Ontology-based representation
strategy: add
Omanhas grantedtermcrudeoilcustomersretroactivediscounts...
111111111...
1
1111111111...
Omangrantedterm(C) termcrude(C) crudeoil(C) oilcustomer(C) customer...
2
1111111111...
3
Omangrantedterm(C) termcrude(C) crudeoil(C) oil(C) lipid(C) compound...
Steffen Staab (53)ISWeb – Informationssysteme & Semantic Web
Evaluation of Text Clustering0,6160,618
0,570
0,300
0,350
0,400
0,450
0,500
0,550
0,600
0,650
add repl add only repl add only repl add only repl add only repl add only repl add only
context context first all context first all
0 0 5
false true
tfidf - 30without - 30
CLUSTERCOUNT 60 EXAMPLE 100 MINCOUNT 15
Mittelwert - PURITY
ONTO HYPDEPTH HYPDIS HYPINT
WEIGHT
PRUNE
backgro..depthdisambig.integrat.
Evaluation parameter• min 15, max 100, 2619 documents
of the reuters corpus• cluster k = 60, with BiSec-KMeans
avg - purity
Steffen Staab (54)ISWeb – Informationssysteme & Semantic Web
Evaluation: OHSUMED Classification Results
Top 50 classes with WordNet and AdaBoost
Steffen Staab (55)ISWeb – Informationssysteme & Semantic Web
Combine FCA & Text-clustering
1. preprocess Reuters documents and enrich them with background knowledge (Wordnet)
2. calculate a reasonable number k (100) of clusterswith BiSec-k-Means using cosine similarity
3. extract a description for all clusters4. relate clusters (objects) with FCA5. use the visualization of the concept lattice for
better understanding
Steffen Staab (56)ISWeb – Informationssysteme & Semantic Web
Explaining Clustering Results with FCA
compound, chemical compound
oil
refiner
chain of concepts with increasing specificity
Steffen Staab (57)ISWeb – Informationssysteme & Semantic Web
Explaining Clustering Results with FCA
Crude oilbarrel
Steffen Staab (58)ISWeb – Informationssysteme & Semantic Web
Explaining Clustering Results with FCA
resin palm
• Resulting concept latticecan also be interpretedas a concept hierarchydirectly on thedocuments
• all documents in onecluster obtain exactlythe same description
Steffen Staab (59)ISWeb – Informationssysteme & Semantic Web
Conclusion: Ontologies + Text
• Ontologies may be discovered as regularities unterlying some text
• Ontologies improve access to text– By annotation (cf part 2)– By retrieval (this part)