web3.0 and language resources marta sabou knowledge media institute (kmi) the open university...
Post on 20-Jan-2016
215 views
TRANSCRIPT
Web3.0 and Language Resources
Marta SabouKnowledge Media Institute (KMi)
The Open University
Exploiting Semantic Web Ontologies:
An Experimental Report
QuickTime™ and a decompressor
are needed to see this picture.
Outline
• The Semantic Web– Online ontologies– Gateways to the Semantic Web
• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment
• Outlook for Language Technology
Scientific American, May 2001:
The Semantic Web
Tim Berners-Lee:– “an extension of the current web (1) in which information is given well-defined meaning (2), better enabling computers and people to work in cooperation (3).”
1. The SW will gradually evolve out of the existing Web, it is not a competition to the current WWW
2. Represent Web content in a form that is more easily machine-processable
3. An open platform allowing information to be shared and processed
Ontology
Metadata
UoD
<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…
<rdf:RDF><channel rdf:about=“http://watson.kmi.open.ac.uk/blog”><title>Elementaries - The Watson Blog</title><link>http://watson.kmi.open.ac.uk:8080/blog/</link><description>"Oh dear! Where the Semantic Web is going to go now?" -- imaginary user 23</description><language>en</language><copyright>Watson team</copyright><lastBuildDate>Thu, 01 Mar 2007 13:49:52 GMT</lastBuildDate><generator>Pebble (http://pebble.sourceforge.net)</generator><docs>http://backend.userland.com/rss</docs>…
<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …
<rdf:RDF> <foaf:Image rdf:about='http://static.flickr.com/132/400582453_e1e1f8602c.jpg'> <dc:title>Zen wisteria</dc:title> <dc:description></dc:description> <foaf:page rdf:resource='http://www.flickr.com/photos/xcv/400582453/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/vittelgarden/'/> <foaf:topic rdf:resource='http://www.flickr.com/photos/tags/wisteria/'/> <dc:creator> <foaf:Person><foaf:name>Mathieu d'Aquin</foaf:name> …
<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …
<rdf:RDF> <owl:Ontology rdf:about=""> <owl:imports rdf:resource="http://usefulinc.com/ns/doap#"/> </owl:Ontology> <j.1:Organization rdf:ID="KMi"> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string" >The Knoledge Media Institute of the Open University, Milton Keynes UK</rdfs:comment> </j.1:Organization> <j.1:Document rdf:ID="KMiWebSite"> …
DOAP
FOAFDC
RSS TAPWORDNET
NCI GalenMusic
…
…… …
…
…
SW = A Conceptual Layer over the web
SW is Heterogeneous!
Interlinked, Semantic Data on the Web
2007 2008 2009
Semantic Web Gateways
Search engines for the semantic data: collect, index and provide access to online semantic data.
10K ontologies
QuickTime™ and a decompressor
are needed to see this picture. 50 million semantic documents
QuickTime™ and a decompressor
are needed to see this picture.250K ontologies and metadata
Semantic Web Status
Online semantic data constitutes now the largest and most heterogeneous knowledge resource known in AI/KR.
Semantic Web Gateways offer a way to access this data easily.
So, the question is…How to use it? How to make the best out of it?
Next Generation Semantic Web Applications
Dynamically retrieving, exploiting and combining relevant semantic resources from the SW, at large
Gateway to the Semantic Web
IEEE Intelligent Systems23(3), pp. 20-28, May/June 2008
• Key aspects of the paradigm• Tech. Infrastructure• Concrete Applications
Outline
• The Semantic Web– Online ontologies– Gateways to the Semantic Web
• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment
• Outlook for Language Technology
Concept_A
(e.g., Supermarket)
Concept_B
(e.g., Building)
ScarletScarlet≡≡
Semantic Web
Semantic Relation
( )
Deduce
Access
⊆
- SCARLET - relation discovery on the SW
- http://scarlet.open.ac.uk/
- Automatically selects and combines multiple online ontologies to derive a relation
Relation Discovery
M. Sabou, M. d’Aquin, E. Motta, “Using the Semantic Web as Background Knowledge in Ontology Mapping", Ontology Mapping Workshop, ISWC’06.
Two strategies
Supermarket Building
Supermarket
Shop
⊆
⊆
PublicBuilding⊆
⊆Building
ScarletScarlet
Cholesterol OrganicChemical
Cholesterol
Steroid
⊆
⊆
Lipid⊆
⊆OrganicChemical
ScarletScarlet
Steroid
≡
≡≡ ≡ ≡
Deriving relations from (A) one ontology and (B) across ontologies.
Semantic Web
(A) Strategy 1 (B) Strategy 2
Matching two large scale agricultural thesauri:• AGROVOC
• UN’s Food and Agriculture Organisation (FAO) thesaurus • 28.174 descriptor terms• 10.028 non-descriptor terms
• NALT• US National Agricultural Library Thesaurus• 41.577 descriptor terms• 24.525 non-descriptor terms
Experiment
M. Sabou, M. d’Aquin, E. Motta, “Exploring the Semantic Web as Background Knowledge in Ontology Matching", Journal of Data Semantics, 2008.
Results - S1
226 Used Ontologies - S1
http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf
http://reliant.teknowledge.com/DAML/SUMO.daml
http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml
http://reliant.teknowledge.com/DAML/Economy.damlhttp://gate.ac.uk/projects/htechsight/Technologies.daml
Results - S2
306 Used Ontologies - S2
http://139.91.183.30:9090/RDF/VRP/Examples/tap.rdf
http://reliant.teknowledge.com/DAML/SUMO.daml
http://a.com/ontology
http://reliant.teknowledge.com/DAML/Mid-level-ontology.daml
http://www.dannyayers.com/2003/08/udef.rdfs
http://gate.ac.uk/projects/htechsight/Technologies.daml
http://reliant.teknowledge.com/DAML/Economy.daml
Evaluation
• Manual assessment of 1000 mappings (15%)
• Performed for both strategies• Evaluators:
– Researchers in the area of the Semantic Web– 10 people split in two groups
Evaluation - Precision
• S1
• S2
Indicative Comparison with Other Techniques
• Traditional Matching (only eq.): 54% - 83%• Using a single, pre-selected domain ontology: 76%
• Using the entire Web (via Google): 38% - 50%• Using pre-selected, domain texts: 53% - 75%• Using dynamically selected ontologies: 70%
The Semantic Web offers high quality data that can be used to improve ontology matching.
Evaluation - Error Analysis S1
Error Analysis S2 old
Subsumption as generic relation.
Subsumption as part-whole.
Subsumption as role.
Findings(1)
• Online ontologies are good enough to provide performance values comparable with other methods
• All relations have a formal “explanation”
BUT:• Sparseness in domain coverage• Several modeling errors, most often the miss-use of subsumption
Outline
• The Semantic Web– Online ontologies– Gateways to the Semantic Web
• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment
• Outlook for Language Technology
PowerAqua
Natural language question
Answers from online semantic data
Open domain QA by exploring online available semantic data.
Findings (2)
• Online ontologies allowed answering 69% of our question set
BUT:• Weakly populated
– Most ontologies do not have enough instances• Sparseness in domain coverage
– Only 20% of the IR TREC topics covered• Limited amount of non-taxonomic relations• Low quality:
– Several modeling errors, most often the miss-use of subsumption
– Unclear labels– Missing domain and range information
Outline
• The Semantic Web– Online ontologies– Gateways to the Semantic Web
• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment
• Outlook for Language Technology
Search in Tag Spaces
5/24 ≈ 21% relevant
Dog Dog
DogDog
Bird
Bird
Bird
Bird
Bird
Bird
Bird
Tiger
Tiger
Tiger
Tiger
CatLandscape
Landscape
Landscape
Let’s find photos of “animals which live in the water”
Query: Animal Water
Bring in the SW…
Dolphin Seal
Marine Mammal
Mammal
Sea
livesIn
Whale
Body of Water
Ocean
Sea Elephant
FishlivesIn
Animal
FreshwaterFish SaltwaterFish
livesIn
Animal Water
<Animal livesIn Water>
<Dolphin>or<Seal>or<“Sea Elephant”>or<Whale>
Results
dolphin
seal
whale
sea elephan
t
18/24 ≈ 75% relevant
FLOR - Folksonomy enrichment
kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea
grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter
mammal animal zoo nature dolphin farm
DolphinSeal
Marine Mammal SeahasHabitat
Whale
Body of Water
Ocean
Mammal
Terrestrial Mammal
Tiger Lion
Sea Elephant
Animal
kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea
grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter
mammal animal zoo nature dolphin farm
FLOR - Experiment
kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea
grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter
mammal animal zoo nature dolphin farm
Structure_WN Structure_SW
Interface_WN Interface_SW
Richness ofstructure
Increase inSearch results
WordNet
Findings (3)
• SW covers (some) multilingual tags• SW covers novel tags
BUT:• on average, SW leads to less senses than WordNet per
tag• on average, SW leads to a weaker structure than
obtained from WordNet
YET:• Better results obtained when Structure_SW is used
for querying – Better alignment between tags and online concepts– Less fine-grained structure
Findings
• Good results obtained for relation discovery, open domain QA, improvement of search in folksonomies
• Large scale– More than 10K ontologies and growing!!!– Larger than any knowledge source in KR/AI
• Heterogeneous– Wrt. Size, quality of conceptualization, e.t.c
• Constantly evolving– Covers new terms that don’t (yet) appear in WordNet
• Multi-domain • Multilingual • Tools and API’s exist to allow its exploration
However…
• Domain coverage is still rather limited• Ontology quality affects some applications:
– Modeling errors– Few non-taxonomic relations– Unclear labels for ontology entities– Weakly populated– Less senses than in WordNet– Lack of domain and range information
Outline
• The Semantic Web– Online ontologies– Gateways to the Semantic Web
• Exploiting the Semantic Web– Relation discovery– Open Domain Question Answering– Folksonomy Enrichment
• Outlook for Language Technology
The Web as a LRWeb 1.0 •Web-based relatedness
•Calibrasi & Vitanyi, 2007•Verifying semantic relations
•Cimiano et Al, 2004
The Web as a LR
kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea
grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter
mammal animal zoo nature dolphin farm
Web 2.0
+•Wikipedia based relatedness
•Strube et. Al, 2006•Folksonomy based relatedness
• Stumme et. Al, 2008
•Web-based relatedness •Calibrasi & Vitanyi, 2007
•Verifying semantic relations
•Cimiano et Al, 2004
The Web as a LR
kitten furry pets cow whiskers whale eyecat cute feline water deer primate bearlion rodent elephant fur ocean rabbit sea
grass cute tree goat seal gorilla brownmarine wild white cats eyes park animals otter
mammal animal zoo nature dolphin farm
DolphinSeal
Marine Mammal SeahasHabitat
Whale
Body of Water
Ocean
Mammal
Terrestrial Mammal
Tiger Lion
Sea Elephant
Animal
•Web-based relatedness •Calibrasi & Vitanyi, 2007
•Verifying semantic relations
•Cimiano et Al, 2004
•Wikipedia based relatedness•Strube et. Al, 2006
•Folksonomy based relatedness• Stumme et. Al, 2008
Besides deepening research on the frontier of Web2.0 and LRs,
… the next important wave is in exploring Web3.0. resources.
Web 3.0 +
+
LT <---> SW
• LT <--- SW:– Complementary to existing LRs
• Additional senses, novel terms and relations– Combine with other LRs– How to explore redundancy of knowledge?– How to explore heterogeneity?
• LT ---> SW :Can LT methods help to:– Increase domain coverage?– Detect modeling errors?
• E.g., by checking evidence from Web, Wikipedia– Improve anchoring?
• E.g., WSD methods
Thank you!
QuickTime™ and a decompressor
are needed to see this picture.QuickTime™ and a
decompressorare needed to see this picture.
QuickTime™ and a decompressor
are needed to see this picture.
Strategy 2 - Definition
BABCCAr
BABCCAr
BABCCAr
BABCCAr
BABCCAr
⊇⇒≡∧⊇⊇⇒⊇∧⊇⊥⇒⊥∧⊆≡⇒≡∧⊆⊆⇒⊆∧⊆
')5(')4(')3(')2(')1(
Principle: If no ontologies are found that contain the two terms then combine information from multiple ontologies to find a mapping.
A Brel
Sem
anti
c W
eb
A’BC
C’B’rel
rel
Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:
(a) if find relation between C and B.(b) if find relation between C and B.
CA ⊆'CA ⊇'
Details: (1) Select all ontologies containing A’ equiv. with A (2) For each ontology containing A’:
(a) if find relation between C and B.(b) if find relation between C and B.
Strategy 2 - Examples
PoultryChicken⊆FoodPoultry ⊆
Chicken Vs. Food(midlevel-onto)
(Tap)
Ex1:
FoodChicken⊆
Ham Vs. FoodEx2:
(r1)
MeatHam⊆FoodMeat ⊆
(pizza-to-go)
(SUMO) FoodHam⊆
(Same results for Duck, Goose, Turkey)
(r1)
Ham Vs. SeafoodEx3:
MeatHam⊆SeafoodMeat ⊥
(pizza-to-go)
(wine.owl) SeafoodHam ⊥(r3)
1
0.9
0.9 0.91
0.5
0.5
–Label similarity methods •e.g., Full_Professor = FullProfessor
–Structure similarity methods•Using taxonomic/property related information
Context: Ontology Matching
New paradigm: use of background knowledge
A B
Background Knowledge(external source)
A’ B’R
R
External Source = One Ontology
Aleksovski et al. EKAW’06• Map (anchor) terms into concepts from a richly axiomatized domain ontology • Derive a mapping based on the relation of the anchor terms
Assumes that a suitable (rich, large) domain ontology (DO) is available.
Strategy 1 - Definition
Find ontologies that contain equivalent classes for A and B and use their relationship in the ontologies to derive the mapping.
A Brel
Sem
anti
c W
eb
A1’B1’
A2’B2’
An’Bn’
O1
O2 On
BABA
BABA
BABA
BABA
⊥⇒⊥⊇=>⊇⊆=>⊆≡⇒≡
''
''
''
''For each ontology use these rules:
…
These rules can be extended to take into account indirect relations between A’ and B’, e.g., between parents of A’ and B’:
'''' BABCCA ⊥⇒⊥∧⊆
External Source = Web
van Hage et al. ISWC’05• rely on Google and an online dictionary in the food domain to extract semantic relations between candidate terms using IR techniques
A Brel
+ OnlineDictionary
IR Methods
Precision increases significantly if domain specific sources are used:50% - Web; 75% - domain texts.
Does not rely on a rich DO