ontology based information extraction jin mao postdoc, school of information, university of arizona...
TRANSCRIPT
![Page 1: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/1.jpg)
Ontology based Information Extraction
Jin MaoPostdoc, School of Information, University of Arizona
Oct. 9th, 2015
![Page 2: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/2.jpg)
Outline
![Page 3: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/3.jpg)
Information Extraction
The process of obtaining pertinent information (facts) from documents. Examples: The forest area in India extended to about 75 million
hectares, which in terms of geographical area is approximately 22 percent of the total land.
What’s the relationship between forest area and geographical area?
![Page 4: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/4.jpg)
Ontology Based Information Extraction (OBIE)
Ontology Based Information Extraction(Wimalasuriya and Dou, 2010)
Ontology-driven Information Extraction(Yildiz and Miksch, 2007) The same as Ontology Based Information Extraction Whether the ontology part is within the system (Yildiz and
Miksch, 2007)
TerminologyTerminology
![Page 5: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/5.jpg)
Ontology Based Information Extraction (OBIE)
Process unstructured or semi-structured natural language text
Present the output using ontologies Ontology as input(Li and Bontcheva, 2007), released
Use an IE process guided by an ontology no new IE method an existing one is oriented to identify the components of an
ontology (classes, properties and instances) Extractors belong to an ontology? linguistic rules
Key CharacteristicsKey Characteristics
![Page 6: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/6.jpg)
Ontology Based Information Extraction (OBIE)
An ontology helps to clarify a domain’s semantics. E.g., concepts and their relationships
To alleviate a wide variety of natural language ambiguities
WhyWhy
![Page 7: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/7.jpg)
Ontology Based Information Extraction (OBIE)
Business Intelligence (BI) in e-business
Social Media—twitter
Metadata Generation for digital resources.
……
ApplicationsApplications
![Page 8: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/8.jpg)
Common Architectures
Information Extraction: Identify instances from the ontology in the text. Classes, Instances, Mentions, Properties, Property Values Free texts in natural language.
Example 1: Classical fried egg Mycoplasma-type colonies were not observed on 1% agar medium.
Example 2: The cells are not motile, are not lysed in 1% SDS (wt/vol), and stain Gram positively.
Major ChallengesMajor Challenges
![Page 9: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/9.jpg)
Common Architectures
Ontology Enhancement / UpdatingUpgrade the ontology with new instances to cover the knowledge
better in a domainNot in the common architecture.
Major ChallengesMajor Challenges
![Page 10: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/10.jpg)
Common Architectures
General ArchitectureGeneral Architecture
![Page 11: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/11.jpg)
Common Architectures
First StepFirst Step
Define the semantic elements to be extracted An example (Muller et al., 2004) Concept (C): named entities about every parts of human body
such as heart,lung, kidney… Name of Disease (N): words or phrases of disease names. Description (D): any words or phrases that describe Concepts.
“Description”refers to any kind of words or phrases that relates semantically to Concepts.
Pair of Concept and Description (P): all possible combinations of Concepts and Descriptions. Combinations contain full meaning of relationships between C and D.
![Page 12: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/12.jpg)
Information Extraction Methods
Using regular expressions/patterns (watched|seen) <NP> Part-of-Speech Tag
Implemented using finite-state transducers which consist of a series of finite-state automata Automatically generate regular rules: “[Ii]nteract(s|ed|
ing)?”“interact,” “interacts,” “interacted,” “interacting,” ”Interact,” “Interacts,” “Interacted,” and “Interacting.”
Simple, surprisingly good results
Linguistic rulesLinguistic rules
![Page 13: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/13.jpg)
Information Extraction Methods
automatically mine extraction rules from text
A dictionary inductive learning algorithm(Vargas-Vera et al., 2001)
Finding the longest common subsequence problem (Romano et
al., 2006)
Relational Learning(Califf and Mooney, 1999), a bottom-up
learning
Linguistic rulesLinguistic rules
![Page 14: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/14.jpg)
Information Extraction Methods
To recognize individual words or phrases
widely used in the named-entity recognition
E.g., to recognize states of the US or countries of the world
Conditions:
Specify exactly what is being identified by the gazetteer.
Specify where the information for the gazetteer lists was obtained
from.
Gazetteer ListsGazetteer Lists
![Page 15: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/15.jpg)
Information Extraction Methods
Linguistic features such as POS tags, capitalization
information and individual
Part of IE as classification problems:
whether a word token is the start/end of an entity (Li et al., 2004)
identify different components of an ontology such as instances (Li
and Bontcheva, 2007) and property values (Wu and Weld, 2007)
Classification TechniquesClassification Techniques
![Page 16: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/16.jpg)
Information Extraction Methods
A semantically annotated parse tree for the text as a part
of the IE process
Linguistic extraction rules with partial parse trees
(Todirascu et al., 2002).
Syntax/Shallow NLPSyntax/Shallow NLP
![Page 17: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/17.jpg)
Ontology Construction
to consider the ontology as an input to the system
to construct an ontology as a part of the OBIE process
![Page 18: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/18.jpg)
Ontology Enhancement
update the ontology by adding new classes and properties through the IE process. NOT instances and their property values Such systems include the implementations by Maedche et al.
(2003) and Dung and Kameyama (2007). Fuzzy Relationship Rule: Define rules according to the
relationships among semantic elements.
o Generate a suggestion list for the domain experts to extract real semantic elements.
![Page 19: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/19.jpg)
Performance Evaluation
Measure the accuracy of identifying instances and property values.
Most IE systems face a trade-off between improving precision and recall.
when β2<1, p should be more important
![Page 20: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/20.jpg)
Performance Evaluation
Evaluation in different scales (Maynard et al., 2004) each answer is categorized as correct or incorrect, however,
different degrees of correctness should be allowed. Learning Accuracy (LA) : This measures the closeness of the
assigned class label to the correct class label based on the hierarchy of the ontology (Cimiano et al., 2005).
Multi-dimensional evaluation beyond Precision and Recall
![Page 21: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/21.jpg)
Performance Evaluation
Cost-based metrics(Maynard et al., 2004)
cost would typically be associated with a miss and a false alarm
(spurious answer)
augmented precision (AP)
augmented recall (AR)
![Page 22: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/22.jpg)
Potentials
Automatically processing the information contained in
natural language text
Creating semantic contents for the Semantic Web
automatic metadata generation
semantic annotation
Improving the quality of ontologies
![Page 23: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/23.jpg)
ACKNOWLEDGEMENT
Most of the materials are adapted from:
Wimalasuriya, D. C., & Dou, D. (2010). Ontology-based information extraction: An introduction
and a survey of current approaches. Journal of Information Science.
Other References (part):•Muhammad, A., & Dey, L. (2005). Biological Ontology enhancement with Fuzzy Relation: A Text Mining Framework. In International Conference on Web Intelligence WI (Vol. 5). •R. Romano, L. Rokach and O. Maimon, Automatic discovery of regular expression patterns representing negated findings in medical narrative reports. In: Proceedings of the 6th International Workshop on Next Generation Information Technologies and Systems (Springer, Berlin, 2006).•Muller, H. M., Kenny, E. E., & Sternberg, P. W. (2004). Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol, 2(11), e309.•Dung, T. Q., & Kameyama, W. (2007). Ontology-based information extraction and information retrieval in health care domain. In Data Warehousing and Knowledge Discovery (pp. 323-333). Springer Berlin Heidelberg.
![Page 24: Ontology based Information Extraction Jin Mao Postdoc, School of Information, University of Arizona Oct. 9th, 2015](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bf911a28abf838c8e2b7/html5/thumbnails/24.jpg)
Thank you!