evaluating semantic metadata without the presence of a gold standard yuangui lei, andriy nikolov,...
TRANSCRIPT
![Page 1: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/1.jpg)
Evaluating Semantic Metadata without the
Presence of a Gold StandardYuangui Lei, Andriy Nikolov, Victoria Uren, Enrico
Motta
Knowledge Media Institute,The Open University
{y.lei,a.nikolov,v.s.uren,e.motta}@open.ac.uk
![Page 2: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/2.jpg)
Focuses
• A quality model which characterizes quality problems in semantic metadata
• An automatic detection algorithm
• Experiments
![Page 3: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/3.jpg)
Ontology
Metadata
Data
<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>
<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>
<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>
<RDF triple><RDF triple><RDF triple><RDF triple><RDF triple><RDF triple>
![Page 4: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/4.jpg)
Semantic Metadata Generation
Semantic Metadata Acquisition
Semantic Metadata Repositories
![Page 5: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/5.jpg)
Semantic Metadata Generation
Semantic Metadata Acquisition
Semantic Metadata Repositories
A number of problems can happen that decrease the quality of metadata
![Page 6: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/6.jpg)
Quality Evaluation
• Metadata providers: ensuring high quality
• Users: facilitate assessing the trustworthiness
• Applications: filtering out poor quality data
![Page 7: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/7.jpg)
Our Quality Evaluation Framework
• A quality model
• Assessment metrics
• An automatic evaluation algorithm
![Page 8: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/8.jpg)
The Quality Model
Real World
Semantic Metadata
OntologiesData Sources
Modelling
InstantiatingAnnotating
Representing
Describing
![Page 9: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/9.jpg)
Quality Problems
(a) Incomplete Annotation
Data Objects Semantic Entities
![Page 10: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/10.jpg)
Quality Problems
(a) Incomplete Annotation (b) Duplicate Annotation
![Page 11: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/11.jpg)
Quality Problems
(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation
![Page 12: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/12.jpg)
Quality Problems
(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation
(d) Spurious Annotation
![Page 13: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/13.jpg)
Quality Problems
(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation
(d) Spurious Annotation (e) Inaccurate Annotation
![Page 14: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/14.jpg)
Quality Problems
(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation
(d) Spurious Annotation (e) Inaccurate Annotation
Semantic metadata
I1
I2
I3
R1 R2
Class
C1
C2
C3
I4
R2
(f) Inconsistent Annotation
![Page 15: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/15.jpg)
Current Support for Evaluation
• Gold standard based:– Examples: Gate[1], LA[2], BDM[3]
• Feature: assessing the performance of information extraction techniques used.
• Not suitable for evaluating semantic metadata– Gold standard annotations are often not
available
![Page 16: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/16.jpg)
The Semantic Metadata Acquisition Scenario
KMi News Stories Information
Extraction Engine
(ESpotter)
Semantic Data Transformation
Engine
Departmental Databases
Raw Metadat
a
High Quality
Metadata
Evaluation
• Evaluation needs to take place dynamically whenever a new entry is generated.
• In such context, gold standard is NOT available.
![Page 17: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/17.jpg)
Our Approach
• Using available knowledge instead of asking for gold standard annotations– Knowledge sources specific for the domain:
• Domain ontologies, data repositories, domain specific lexicons
– Knowledge available at background• Semantic Web, Web, and general lexicon resources
• Advantages:– Making possible for automatic operation – Making possible for large scale data evaluation
![Page 18: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/18.jpg)
Using Domain Knowledge
1. Domain OntologiesConstraints and restrictions Inconsistent Problems
Example: one person classified as both KMi-Member and None-KMi-Member when they are disjoint classes.
![Page 19: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/19.jpg)
Using Domain Knowledge
1. Domain OntologiesConstraints and restrictions Inconsistent Annotations
2. Domain LexiconsLexicon – instance mappings
Duplicate Annotations
Example: when OU and Open-University both appear as values of the same property of the same instance
![Page 20: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/20.jpg)
Using Domain Knowledge
1. Domain OntologiesConstraints and restrictions Inconsistent Annotations
2. Domain LexiconsLexicon – instance mappings
Duplicate Annotations
3. Domain Data Repositories
Ambiguous Annotations
Inaccurate Annotations
![Page 21: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/21.jpg)
• When nothing can be found in the domain knowledge, the data can be:– Correct but outside the domain (e.g., IBM in
the KMi domain)– Inaccurate annotation: mis-classification
(e.g., Sun Micro-systems as a person)– Spurious (e.g., workshop chair as an
organization)
• Background knowledge is then used to further investigate the problems
![Page 22: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/22.jpg)
Semantic Web
Investigating the Semantic Web
ClassesSimilar?
Found matches
No
Yes
Examining the Web
No
Inaccurate Annotations
Watson
WordNet
Yes
Adding data to the repositories
![Page 23: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/23.jpg)
Pankow
Web
Examining the Web
Similar?
Has classification?
No
Yes
No
Inaccurate Annotations
Spurious Annotations
WordNet
![Page 24: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/24.jpg)
The Overall Picture
WebSemantic
Web
Background Knowledge
Domain Knowledge
Metadata Evaluation Results
Ontologies
Lexical Resources
WordNet
Web
PANKOWWATSON
Semantic Web
SemSearch
Step1: Using domain knowledge
Step2: Using background knowledge
Evaluation Engine
Pellet + Reiter
![Page 25: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/25.jpg)
(a) Incomplete Annotation (b) Duplicate Annotation (c) Ambiguous Annotation
(d) Spurious Annotation (e) Inaccurate Annotation
Semantic metadata
I1
I2
I3
R1 R2
Class
C1
C2
C3
I4
R2
(f) Inconsistent Annotation
Addressed Quality Problems
![Page 26: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/26.jpg)
Experiments
• Data settings: gathered in our previous work [4] in KMi semantic web portal– Randomly chose 36 news stories from the KMi news
archive– Collected a metadata set by using ASDI– Constructed a gold standard annotation
• Method:– A gold standard based evaluation as a comparison
base line– Evaluating the data set using domain knowledge only– Evaluating the data set using both domain knowledge
and background knowledge
![Page 27: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/27.jpg)
![Page 28: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/28.jpg)
![Page 29: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/29.jpg)
A number of entities are not contained in the problem domain
![Page 30: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/30.jpg)
Background knowledge is useful in data evaluation
![Page 31: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/31.jpg)
Discussion
• The performance of such an approach largely depends on:– A good domain specific knowledge
source– A good publicity of the entities that
are contained in the data set, otherwise there would be lots of false alarms.
![Page 32: Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,](https://reader036.vdocuments.mx/reader036/viewer/2022062517/56649f165503460f94c2c37b/html5/thumbnails/32.jpg)
References
1. H. Cunningham, D. Maynard, K. Bontcheva, and V. Tablan. GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL02), 2002.
2. P. Cimiano, S. Staab, and J. Tane. Acquisition of Taxonomies from Text: FCA meets NLP. In Proceedings of the ECML/PKDD Workshop on Adaptive Text Extraction and Mining, pages 10 – 17, 2003.
3. D. Maynard, W. Peters, and Y. Li. Metrics for Evaluation of Ontology-based Information Extraction. In Proceedings of the 4th International Workshop on Evaluation of Ontologies on the Web, Edinburgh, UK, May 2006.
4. Y. Lei, M. Sabou, V. Lopez, J. Zhu, V. S. Uren, and E. Motta. An Infrastructure for Acquiring High Quality Semantic Metadata. In Proceedings of the 3rd European Semantic Web Conference, 2006.