knowledge-driven implicit information extraction
TRANSCRIPT
![Page 1: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/1.jpg)
1
Knowledge-driven Implicit Information Extraction
Sujan PereraDissertation Committee : Drs. Amit P. Sheth (advisor), Krishnaprasad
Thirunarayan, Michael Raymer, Pablo N. Mendes (IBM Research)
Ph.D. Dissertation Defense
![Page 2: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/2.jpg)
2
Information Extraction
• More than 70% of data in organizations exist in unstructured form1
• Extraction of structured information from unstructured data is a fundamental task
“All home medications although his insulin dose (nph 20 qPM) was halved (--> NPH 10 qPM) on the floor, and his sugars were running in the 150s-250s range.”
Insulin
Cisapride
contradicti
ng drug
Diabetes Mellitus
Hyperglycemia
may_treat
may treat
Proinsulin
Porcine Insulin Insulin Glulisine
is a is a
is a
1https://en.wikipedia.org/wiki/Unstructured_data
![Page 3: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/3.jpg)
3
Information Extraction
• Almost exclusively focused on explicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
![Page 4: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/4.jpg)
4
Information Extraction
• Almost exclusively focused on explicit information
Named Entity Recognition Relationship ExtractionEntity Linking
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
![Page 5: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/5.jpg)
5
Information Extraction
• Misses the implicit information
“Bob Smith is a 61-year-old man referred by Dr. Davis for outpatient cardiac catheterization because of a positive exercise tolerance test. Recently, he started to have left shoulder twinges and tingling in his hands. A stress test done on 2013-06-02 revealed that the patient exercised for 6 1/2 minutes, stopped due to fatigue. However, Mr. Smith is comfortably breathing in room air. He also showed accumulation of fluid in his extremities. He does not have any chest pain.”
Person Person C0018795
C0015672
C0008031
No shortness of breath
edema
Named Entity Recognition Relationship ExtractionEntity Linking Implicit information extraction
![Page 6: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/6.jpg)
6
Thesis Statement
Implicit factual information in unstructured text can be efficiently extracted by bridging syntactic and semantic gaps in natural language
usage and augmenting information extraction techniques with relevant domain knowledge.
![Page 7: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/7.jpg)
7
• Express sarcasm/sentiment• “I'm striving to be positive in what I say on Twitter. So I'll refrain
from making a comment about the latest Michael Bay movie.”• Provide descriptive information• “small fluid adjacent to the gallbladder with gallstones which may
represent inflammation”• Emphasize features of the entity• “Mason Evans 12 year long shoot won big in golden globe”
• Communicate the common understanding• “He is suffering from nausea and severe headaches. Dolasteron was
prescribed.”• Stylistic Preferences• “Democratic candidate Bernie Sanders … The Vermont senator …”
Credit:http://bit.ly/2b9Bnjk
![Page 8: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/8.jpg)
8
Significance
• Volume• 20% movie references and 40% book references in tweets• 35% edema and 40% shortness of breath references in clinical
narratives• Value
Explicit InformationComputer Assisted Coding
30-day Readmission Prediction
Sentiment Analysis
Structured Information
![Page 9: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/9.jpg)
9
Significance
• Volume• 20% movie references and 40% book references in tweets• 35% edema and 40% shortness of breath references in clinical
narratives• Value
Ignoring implicit information in text would adversely affect downstream applications
Explicit Information
Implicit Information
Computer Assisted Coding
30-day Readmission Prediction
Sentiment Analysis
Structured Information
![Page 10: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/10.jpg)
10
Role of Knowledge
New Sandra Bullock astronaut lost in space movie looks absolutely terrifying
The patient showed accumulation of fluid in his extremities, but respirations were unlabored and there
were no use of accessory muscles.
Edema Accumulation of an excessive amount of watery fluid in cells or intercellular tissues
Shortness of breath
Labored or difficult breathing associated with a variety of disorders
UMLS
Sandra Bullock Gravity
Knowledge Bases
WordNet
Image credits: http://bit.ly/2b5HPDQ and Icon made by Freepik from www.flaticon.com
Credit: http://bit.ly/2bi34FGCredit: http://bit.ly/1x3sack Credit: http://bit.ly/2b9CejW Credit: http://bit.ly/2aXM97v
![Page 11: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/11.jpg)
11
Knowledge Acquisition
Knowledge Modeling
Detecting Implicit
Information
Information
Extraction
Implicit Information Extraction
![Page 12: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/12.jpg)
12
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
![Page 13: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/13.jpg)
13
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
![Page 14: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/14.jpg)
14
Sentence Entity
“small fluid adjacent to the gallbladder with gallstones which may represent inflammation.”
Cholecystitis
“His tip of the appendix is inflamed.” Appendicitis
“The respirations were unlabored and there were no use of accessory muscles.” Shortness of breath (NEG)
Implicit Entities in Clinical Documents
• One should know the physiological observations that characterize particular entity
• Negations are embedded in the phrases indicating entities• “Patient denies shortness of breath”• “The respirations were unlabored”
![Page 15: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/15.jpg)
15
Knowledge Acquisition
• Unified Medical Language System – integrate many health and biomedical vocabularies
• Linguistic Knowledge – WordNet• Synonyms/antonyms• Syntactic variations of the same term
CUI AUI STR
CUI TUI
CUI STR DEF SABDefinitions for shortness of breath
A disorder characterized by an uncomfortable sensation of difficulty breathing
Difficult or labored breathing
Labored or difficulty breathing associated with a variety of disorders, indicating inadequate ventilation or low blood oxygen or a subjective experience of breathing discomfort
![Page 16: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/16.jpg)
16
Knowledge Modeling
• Each entity has multiple definitions• Each definition is processed to create entity indicator
• Representative power of the term (r1) calculated with measure inspired by TF-IDF
• A collection of entity indicators constitute entity model
definition1
definition2
definition3
Entity Indicator1
Entity Indicator2
Entity Indicator3
Entity Model
Definition Entity Indicator
A disorder characterized by an uncomfortable sensation of difficulty breathing
(uncomfortable, r1), (sensation, r2), (difficulty, r3), (breathing, r4)
Difficult or labored breathing (difficult, r5), (labored, r6), (breathing, r4)
![Page 17: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/17.jpg)
17
Detecting Sentences with Implicit Entities
• The sentences with entity representative term but without the entity name may have implicit mention of the entity.
“However, Mr. Smith is comfortably breathing in room air.”
Candidate sentence for shortness of breath
![Page 18: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/18.jpg)
18
• The similarity between entity model and the pruned sentence is measured to annotate them with positive or negative labels
• We developed a semantic similarity measure that takes care of the synonyms and antonyms
Information Extraction – Entity Linking
Candidate Sentence
Indicator1
Indicator2
Indicator3
Entity Model
sim1
sim2
sim3
![Page 19: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/19.jpg)
19
Information Extraction – Entity Linking
ct1
ct2
ct3
ct4
et5
et6
et7
Candidate Sentence Entity Indicator
WordNet
If antonym then -1
else max similarity
∑ 𝑠𝑖𝑚∗𝑟𝑝𝑒𝑡
∑ 𝑟𝑝𝑒𝑡
>t1
<t2
Positive Annotation
Negative Annotation
![Page 20: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/20.jpg)
20
Evaluation
• Re-annotated the SemEval-2014 task 7 dataset for implicit entities
• Entities are selected considering the frequency of appearance and with expert feedback
• 857 sentences selected for 8 entities
• Annotated by three domain experts
• Annotation agreement 0.58
Entity Positive Annotations
Negative Annotations
Shortness of Breath 93 94
Edema 115 35
Syncope 96 92
Cholecystitis 78 36
Gastrointestinal Gas 18 14
Colitis 12 11
Cellulitis 8 2
Fasciitis 7 3
![Page 21: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/21.jpg)
21
Algorithm Positive Precision
Positive Recall
Positive F1
Negative Precision
Negative Recall
Negative F1
Our 0.66 0.87 0.75 0.73 0.73 0.73
MCS 0.50 0.93 0.65 0.31 0.76 0.44
SVM 0.73 0.82 0.77 0.66 0.67 0.67
Adding similarity value as a feature for the supervised algorithmSVM+MCS 0.73 0.82 0.77 0.66 0.66 0.66
SVM+Our 0.77 0.85 0.81 0.72 0.75 0.73
• Baselines• MCS algorithm (Mihalcea 2006)• SVM (trained on n-grams)
• Our algorithm outperforms selected baselines in negative category.• SVM is able to leverage the supervision to beat our algorithm in
positive category.
Annotation Performance
![Page 22: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/22.jpg)
22
Similarity as a Feature to Supervised Algorithm
• Added similarity value of unsupervised algorithms as a feature to the SVM.
Positive Annotations Negative Annotations
![Page 23: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/23.jpg)
23
Annotation Performance – A Study with the Confidence
• Each annotation has confidence ranges from 1 to 5
• Low confidence reflects incomplete or ambiguous information
• Annotation performance increases as the confidence increases
• The negative class shows significant increment
![Page 24: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/24.jpg)
24
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
![Page 25: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/25.jpg)
25
• Use diverse characteristics of the entity– “New Sandra Bullock astronaut lost in space movie looks absolutely
terrifying”– “ISRO sends probe to Mars for less money than it takes Hollywood to send a
woman to space.”– “oh yeah there is that new space movie coming out that looks terrifying i am
going to go see it”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to earn $1 billion
Paul walkers’ last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bkePJ6
![Page 26: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/26.jpg)
26
• Use diverse characteristics of the entity– “… Richard Linklater movie …”– “… Ellar Coltrane on his 12-year movie …”– “… 12-year long movie shoot …”– “… Mason Evan's childhood movie …”
• Use time-sensitive phrases
Furious 7Gravity The Martian
Fall 2013 April 2014 Fall 2015
space movie
fastest movie to earn $1 billion
Paul walkers’ last movie
Tweets with Implicit Entities
Credit: http://bit.ly/2bk8xdp
![Page 27: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/27.jpg)
27
Knowledge Acquisition
• Acquiring factual knowledge• Source – DBpedia• Not all factual knowledge is important – movie has ‘starring’ and
‘director’ as well as ‘billed‘ and ‘license’• Rank the relationships based on joint probability with the entity type• Values of top-k relationships and the value of rdfs:comment are obtained
• Acquiring contextual knowledge• Source – contemporary tweets• We collect 1000 tweets with explicit mentions of the entity
• Number of views for the entity’s Wikipedia page within last t days
![Page 28: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/28.jpg)
28
Knowledge Acquisition
Wikipedia page titles and anchor texts
Contemporary tweets
Generate semantic cues
Factual knowledge
Clean tweets
Generate n-grams
• Need to extract meaningful phrases from acquired knowledge
• Meaningful phrases = Wikipedia titles + anchor texts• Matching n-grams are added to semantic cues• Non-matching n-grams are added to semantic cues
after removing stop words
![Page 29: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/29.jpg)
29
Knowledge Modeling – Entity Model Network
Sandra BullockAlfonso Curan
Mars orbiter mission
Woman in space
astronaut
• A property graph - reflecting the topical relationships between entities
𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦=¿𝑁∨ ¿¿𝑁𝑐 𝑗
∨¿¿¿
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦=𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 h𝑝 𝑟𝑎𝑠𝑒 𝑖𝑛𝑡𝑤𝑒𝑒𝑡𝑠
number of Wikipedia views
𝑁−𝑡𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠 ,𝑁 𝑐 𝑗𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑎𝑑𝑗𝑎𝑐𝑒𝑛𝑡 𝑒𝑛𝑡𝑖𝑡𝑖𝑒𝑠
Factual Knowledge
Contextual Knowledge
Entity
Gravity
Christopher Nolan
Matt Damon
Interstellar
The Martian
![Page 30: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/30.jpg)
30
Detecting Tweets with Implicit Entities
• Tweets are filtered with keywords – movie, film, book, novel• Applied simple annotation technique – dictionary matching• The tweets that are not annotated with entity of types we are
looking for are considered to have implicit entity mentions
KeywordsEntity
Dictionary
Annotating Tweets
![Page 31: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/31.jpg)
31
Information Extraction – Entity Linking
• Two Step Process
• Step 1: Candidate selection and filtering• Objective - prune the search space to reduce number of entities to be
considered in disambiguation step from EMN
• Step 2: Disambiguation• Objective - sort the selected candidate entities to place the implicitly
mentioned entity in top position
![Page 32: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/32.jpg)
32
Entity Linking - Candidate selection and filtering
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c4
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
m8
EntityFactual Knowledge Contextual Knowledge
![Page 33: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/33.jpg)
33
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space” c5
c2 c7
c8
m8
Factual Knowledge Contextual Knowledge Entity
Entity Linking - Candidate selection and filtering
c4
![Page 34: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/34.jpg)
34
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
c5c2
m1
m2
m4
m5
m3
c7
c8
m6
m7
m8
Factual Knowledge Contextual Knowledge Entity
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
Entity Linking - Candidate selection and filtering
c4
![Page 35: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/35.jpg)
35
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c6
c3
c2
c9
c7
c5c2
m1
m2
m4
m5
m3
𝑠𝑐𝑜𝑟𝑒𝑚𝑖= ∑
𝑐 𝑗𝜖ℂ𝑠𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 𝑜𝑓 𝑐 𝑗∗ 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑐 𝑗 ,𝑚𝑖)
c7
c8
m6
m7
m2
m4
m6
m7
m3
is the set of matching cues
m8
Factual Knowledge Contextual Knowledge Entity
“ISRO sends probe to Mars for less money than it takes Hollywood movie to send a woman to space”
Entity Linking - Candidate selection and filtering
c4
![Page 36: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/36.jpg)
36
• Formulated as a ranking problem
• SVMrank to rank candidates• Similarity between the candidate entity and the tweet
• Temporal salience of the candidate entity
x1 x2 x3 … xn
xj
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑒𝑖∑𝑒∈𝐸 𝑐
𝑡𝑒𝑚𝑝𝑜𝑟𝑎𝑙 𝑠𝑎𝑙𝑖𝑒𝑛𝑐𝑒𝑒
is the selected candidate setm2m6
m4m3m7
Winner
Entity Linking - Disambiguation
![Page 37: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/37.jpg)
37
Evaluation Dataset
Entity Type Annotation Tweets Entity
Movie Explicit 391 107
Implicit 207 54
NULL 117 0
Book Explicit 200 24
Implicit 190 53
NULL 70 0
• Tweets are collected in August 2014 using keywords • Manually annotated the tweets with DBpedia URL of entities
• The tweets annotated with NULL do not have either explicit or implicit mention of an entity
![Page 38: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/38.jpg)
38
Entity Model Network Creation
• 15,000 tweets for movies and books in July 2014
• 617 movies and 102 books
• Recent 1000 tweets per entity to build its contextual knowledge
• May 2014 version of DBpedia used to extract factual knowledge
• Temporal salience is obtained for July 2014
m1
m2 m4
m5
m3
m7
m6c1
c5
c8
c4
c6
c3
c2
c9
c7
Factual Knowledge Contextual Knowledge Entity
![Page 39: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/39.jpg)
39
• How many tweets had correct entity within selected candidate set (top-25)?• How many entities were correctly linked by our disambiguation approach?
• Importance of contextual knowledge
Evaluation - Implicit Entity Linking
Entity Type Candidate Selection Recall Disambiguation accuracy
Movie 90.33% 60.97%
Book 94.73% 61.05%
Step Entity Type Without Contextual Knowledge
With Contextual Knowledge
Candidate Selection Recall
Movie 77.29% 90.33%
Book 76.84% 94.73%
Disambiguation Accuracy
Movie 51.7% 60.97%
Book 50.0% 61.05%
![Page 40: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/40.jpg)
40
Qualitative Error Analysis
Error Tweet Entity
Lack of contextual knowledge
‘That Movie Where Shailene Woodley Has Her First Nude Scene? The Trailer Is RIGHT HERE!: No one can say Shailene Woodley isn't brave!’
White Bird in a Blizzard
Novel entities ‘”hey, what's wrawng widdis goose?" RT @TIME: Mark Wahlberg could be starring in a movie about the BP oil spill http://ti.me/1oZh55V'
Deepwater Horizon
Cold start of entities ‘Video: George R.R. Martin's Children's Book Gets Re-releasehttp://bit.ly/1qNNH5r’
The Ice Dragon
Multiple implicit entity mentions
‘That moment when you realize that hazel grace and Augustus are brother and sister in one movie and in love battling cancer in another’
Divergent, The Fault in Our Stars
![Page 41: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/41.jpg)
41
Dissertation Focus
Implicit Information Extraction
Entities Relationships
Organized Text Unorganized Text
Clinical Narratives Tweets
Disorders Symptoms Movies Books
Clinical Narratives
Disorders and Symptoms
![Page 42: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/42.jpg)
42
Implicit Relationships in Clinical Narratives
atrial fibrillation hypertension
diabetes
chest pain
weight gain
headache
lisinopril
warfarin
insulin
atenolol
medication
disease
symptomis_treated_with
has_symptom
![Page 43: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/43.jpg)
43
• Implicit relationships:• Exist between symptoms, disorders, medications, and procedures• Can be established by leveraging domain knowledge
• The existing knowledge bases fall short in eliciting relationships• Data + Knowledge can help to elicit such implicit relationships
efficiently
Implicit Relationships in Clinical Narratives
![Page 44: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/44.jpg)
44
A Scenario
Atrial fibrillation
Hypertension
Diabetes
Fatigue
Syncope
Weight loss
Chest painDiscomfort in chest
DizzyShortness of Breath
NauseaVomitingHeadacheCoughWeight gain
![Page 45: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/45.jpg)
45
A Scenario
Atrial fibrillation
Hypertension
Diabetes
Fatigue
Syncope
Weight loss
Chest painDiscomfort in chest
DizzyShortness of Breath
NauseaVomitingHeadacheCoughWeight gain
Atrial fibrillation
Hypertension
Diabetes
Chest pain
Weight gain
Discomfort in chest
CoughHeadache
Edema
Shortness of Breath
Knowledge base does not know about edema. Now edema can be a symptom of any disorder in the document.
Observed Disorders
Observed Symptoms
![Page 46: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/46.jpg)
46
Knowledge Acquisition
• Hierarchical knowledge and non-hierarchical knowledge
Hierarchical Knowledge
Retrieved from UMLS
Non-hierarchical Knowledge
Extracted from Web Resources
+Feedback from domain expert
www.nlm.nih.gov www.en.wikipedia.org
www.webmd.com www.mayoclinic.com
www.clevelandclinic.org ww.healthline.org
CUI AUI PAUI PTR
C0013404 A0052186 A0111363 A0434168.A2367943. …
C0013604 A0052723 A0135504 A0434168.A2367943
CUI AUI SAB STR
C0013404 A0052186 MSH Shortness of breath
C0013604 A0052723 MSH Edema
MRHIER
MRCONSO
![Page 47: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/47.jpg)
47
Hypertension
Diastolic Hypertension
Pulmonary Hypertension
Renal Hypertension
Episodic Pulmonary Hypertension
Solitary Pulmonary Hypertension
Breathing Problems
Shortness of Breath
Asthma
is_symptom_of
Instances of symptomsInstances of disorders
Shortness of Breath
Hypertension
Classes of disorders Classes of symptoms
rdfs:subclassOf rdf:type
Knowledge Modeling
![Page 48: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/48.jpg)
48
Detecting Unexplained Symptoms
• Clinical documents were semantically annotated for entities using cTAKES
• Known relationships are populated• Unexplained symptoms were detected Modeled
Knowledge
Credit:http://bit.ly/2aMWVAd
![Page 49: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/49.jpg)
49
Information Extraction – Unknown Relationships
• Naïve method would assume relationship between unexplained symptom and all disorders in clinical narrative
• Can we leverage the knowledge we have about symptom to find most plausible disorders?
• Intuition: a symptom is most likely to be shared by similar disorders
![Page 50: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/50.jpg)
50
1. All co-occurring disorders are candidates
Information Extraction – Unknown Relationships
D1
S
D2
D3
D4
D5
![Page 51: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/51.jpg)
51
2. Find known disorders of the symptom
D1
S
D6
D7
D2
D3
D4
D5
Information Extraction – Unknown Relationships
![Page 52: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/52.jpg)
52
3. Collect more knowledge about
known relationships
D1
S
D6
D7
D2
D3
D4
D5
D7
D8 D2
D10 D11
D12
D4
D14
Information Extraction – Unknown Relationships
![Page 53: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/53.jpg)
53
4. Compare co-occurring disorders with collected
knowledge
D1
S
D6
D7
D2
D3
D4
D5
D7
D8 D2
D10 D11
D12
D4
D14
Information Extraction – Unknown Relationships
![Page 54: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/54.jpg)
54
5. Eliminate non-matching candidate
disorders
S
D2
D4
We left with most plausible disorders for unexplained symptom. If this scenario occurs frequently, it increases the confidence on this
relationship.
Information Extraction – Unknown Relationships
![Page 55: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/55.jpg)
55
Evaluation
• A corpus of 1,500 electronic medical records were used• Annotated with cTAKES and selected the most frequent entities
were selected• UMLS semantic types were used to categorize disorders and
symptoms• Initial knowledge base - 86 disorders, 42 symptoms, 255 disorder-
symptom relationships
![Page 56: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/56.jpg)
56
• There were 29 distinct unexplained symptoms
• Precision of the questions generated • 1st iteration - 105 correct from 142 (73.94%)• 2nd iteration - 20 correct from 29 (68.96%)• 3rd iteration - 4 correct from 9 (44.44%)
Evaluation – Relationship Prediction
Symptom Number of unexplained instances
Edema 910
Syncope 336
Systolic Murmur 168
Tachycardia 143
Angina 136
Disorder Number of co-occurrences
Hypertension 647
Hyperlipidemia 641
Claudication 454
Coronary atherosclerosis 395
Coronary artery disease 242
Top 5 unexplained symptom Top 5 co-occurring disorders with edema
![Page 57: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/57.jpg)
57
Evaluation – Increment in Explainability
Knowledge base Number of unexplained relationships
Increment in explainability
Initial knowledge base 2251 0%
After 1st iteration 878 60.99%
After 2nd iteration 806 64.19%
![Page 58: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/58.jpg)
58
Summary
• Implicit information is frequent occurrence in text and ignoring them would adversely affect downstream applications.
• Linguistic and world Knowledge plays an important role in decoding implicit information.
• This dissertation demonstrated characteristics of implicit information and developed solution to capture factual implicit constructs.
Knowledge Acquisition
Knowledge Modeling
Detecting Implicit
Information
Information Extraction
UMLS
TaxonomicalDefinitional
Non-taxonomicalAssociational
Representative terms
Domain Semantics Semi-supervised
Supervised
Unsupervised
![Page 59: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/59.jpg)
59
Contributions
• Identify and demonstrate the value of implicit information.• Study the characteristics of the implicit information manifestation.• Demonstrate the value of knowledge in extracting factual implicit
information.- Linguistic - Domain -
Contextual• Developed a framework for factual implicit information extraction.• Demonstrated the usage of the framework to solve three implicit
information extraction problems.
![Page 60: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/60.jpg)
60
Graduate [email protected] Publications:• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas
Nair, Semantics Driven Approach for Knowledge Acquisition from EMRs, IEEE Journal of Biomedical and Health Informatics.
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu Chen and Amit Sheth. I just wanted to tell you that loperamide WILL WORK': A Web-Based Study of Extra-Medical Use of Loperamide.
Conference Publications:• Sujan Perera, Pablo Mendes, Adarsh Alex, Amit Sheth, Krishnaprasad
Thirunarayan, Implicit Entity Linking in Tweets, ESWC 2016• Sujan Perera, Pablo Mendes, Amit Sheth, Krishnaprasad Thirunarayan, Adarsh Alex,
Christopher Heid, Greg Mott, Implicit Entity Recognition in Clinical Documents, *SEM 2015
• Sujan Perera, Cory Henson, Krishnaprasad Thirunarayan, Amit Sheth, Suhas Nair, Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in the Healthcare, BIBM 2012
• Menasha Thilakaratne, Ruvan Weerasinghe, Sujan Perera, Knowledge-driven Approach to Predict Personality Traits by Leveraging Social Media Data, WI 2016
Workshop and Posters:• Sujan Perera, Amit Sheth, Krishnaprasad Thirunarayan, Challenges in Understanding
Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help, DARE 2013
• Raminta Daniulaityte, Robert Carlson, Russel Falck, Delroy Cameron, Sujan Perera, Lu Chen, Amit Sheth. A Web-Based Study of Self-Treatment of Opioid Withdrawal Symptoms with Loperamide, CPDD 2012
Internships:• ezDI Summer 2012• IBM Watson Summer 2014 and 2015
Awards and grants:• George Thomas Graduate Fellowship • NSF travel grants: BIBM and ICHI
PC Committee:• DARE (2013), EKAW (2014, 2016), ISWC
2015, IJCAI 2016External Reviewer:• ISWC, ESWC, IJSWIS, IEEE Intelligent
Systems, Applied Ontology, ODBASE
Proposal Contributions:• eDrugTrends (NIH R01)• Healthcare Outcome Prediction (NSF-SCH)
Mentoring:• Adarsh Alex (MSc)• Menasha Tilakaratne (BSc)
![Page 61: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/61.jpg)
61
Thank You
Mentors Collaborators
![Page 62: Knowledge-driven Implicit Information Extraction](https://reader035.vdocuments.mx/reader035/viewer/2022062900/58e5db281a28ab1d608b5ee3/html5/thumbnails/62.jpg)
62
Coffee Mates and Colleagues
Thank You
Funding• ezDI• George Thomas Fellowship• NSF: CNS 1513721 Context-
Aware Harassment Detection on Social Media