processing semantic relations across textual genres

Processing Semantic Relations Across Textual Genres

Bryan Rink

University of Texas at DallasDecember 13, 2013

Outline

• Introduction• Supervised relation identification• Unsupervised relation discovery• Proposed work• Conclusions

Motivation

• We think about our world in terms of:– Concepts (e.g., bank, afternoon, decision, nose)– Relations (e.g. IS-A, PART-WHOLE, CAUSE-EFFECT)

• Powerful mental constructions for:– Representing knowledge about the world– Reasoning over that knowledge:• From PART-WHOLE(brain, Human) and IS-A(Socrates,

Human)• We can reason that PART-WHOLE(brain, Socrates)

Representation and Reasoning• Large general knowledge bases exist:

– WordNet, Wikipedia/DBpedia/Yago, ConceptNet, OpenCyc• Some domain specific knowledge bases exist:

– Biomedical (UMLS, – Music (Musicbrainz)– Books (RAMEAU)

• All of these are available in the standard RDF/OWL data model• Powerful reasoners exist for making inferences over data

stored in RDF/OWL• Knowledge acquisition is still the most time consuming and

difficult among these

Relation Extraction from Text

• Relations between concepts are encoded explicitly or implicitly in many textual resources:– Encyclopedias, news articles, emails, medical

records, academic articles, web pages• For example:– “The report found Firestone made mistakes in the

production of the tires.” PRODUCT-PRODUCER(tires, Firestone)

Outline


Supervised Relation Identification

• SemEval-2010 Task 8 – “Multi-Way Classification of Semantic Relations Between Pairs of Nominals”– Given a sentence and two marked nominals– Determine the semantic relation and directionality of that

relation between the nominals.• Example: A small piece of rock landed into the trunk• This contains an ENTITY-DESTINATION(piece, trunk)

relation:– The situation described in the sentence entails the fact that

trunk is the destination of piece in the sense of piece moving (in a physical or abstract sense) toward trunk.

Semantic Relations

Relation DefinitionCAUSE-EFFECT X causes YINSTRUMENT-AGENCY Y uses X; X is the instrument of YPRODUCT-PRODUCER Y produces X; X is the product of YCONTENT-CONTAINER X is or was stored or carried inside YENTITY-ORIGIN Y is origin of an entity X, X coming/derived from YENTITY-DESTINATION X moves toward YCOMPONENT-WHOLE X is component of Y and has a functional relationMEMBER-COLLECTION X is a member of YMESSAGE-TOPIC X is a message containing information about YOTHER if none of the nine relations appears to be suitable

Observations• Three types of evidence useful for classifying relations:1. Lexical/Contextual cues– “The seniors poured flour into wax paper and threw the items

as projectiles on freshmen during a morning pep rally”2. Knowledge of the typical role of one nominal– “The rootball was in a crate the size of a refrigerator, and some

of the arms were over 12 feet tall.3. Knowledge of a pre-existing relation between the

nominals– “The Ca content in the corn flour has also a strong dependence

on the pericarp thickness.”

Approach• Use an SVM classifier to first determine the relation type

– Each relation type then has its own SVM classifier to determine direction of the relation

• All SVMs share same set of 45 feature types which fall into the following 8 categories:– Lexical/Contextual– Hypernyms from WordNet– Dependency parse– PropBank parse– FrameNet parse– Nominalization– Nominal similarity derived from Google N-Grams– TextRunner predicates

System

Lexical/Contextual Features• Words between the nominals are very important:

• Number of tokens between the nominals is also helpful:– Product-Producer, Entity-Origin often have zero: “organ builder”, “Coconut oil”

• Additional features for:– E1/E2 words, E1/E2 part of speech, Words before/after the nominals, Prefixes of

words between– Sequence of word classes between the nominals:

• Verb_Determiner, Preposition_Determiner, Preposition_Adjective_Adjective, etc.

cause Cause-Effect used Instrument-Agency makes Product-Producer

contained Content-Container

from Entity-Origin intoEntity-Destination

onComponent-Whole ofMember-Collection aboutMessage-Topic

Example Feature Values• Sentence: Forward [motion]E1 of the vehicle through the air

caused a [suction]E2 on the road draft tube.• Feature values:

– e1Word=motion, e2Word=suction– e1OrE2Word={motion, suction}– between={of, the, vehicle, through, the, air, caused, a}– posE1=NN, posE2=NN– posE1orE2=NN– posBetween=I_D_N_I_D_N_V_D– distance=8– wordsOutside={Forward, on}– prefix5Between={air, cause, a, of, the, vehic, throu, the}

Parsing Features• Dependency Parse (Stanford parser)

– Paths of length 1 from each nominal– Paths of length 2 between E1 and E2

• PropBank SRL Parse (ASSERT)– Predicate associated with both nominals

• Number of tokens in the predicate• Hypernyms of predicate

– Argument types of nominals• FrameNet SRL Parse (LTH)

– Lemmas of frame trigger words, with and without part of speech• Also make use of VerbNet to generalize verbs from dependency

and PropBank parses

Example Feature Values

• Sentence: Forward [motion]E1 of the vehicle through the air caused a [suction]E2 on the road draft tube.

• Dependency– <E1>nsubjcauseddobj<E2>

– <E1>nsubjvn:27dobj<E2>• VerbNet/Levin class 27 is the class of engender verbs such as: cause,

spawn, generate, etc.• This feature value indicates that E1 is the subject of an engender verb,

and the direct object is E2

• PropBank– Hypernyms of the predicate: cause#v#1, create#v#1

Nominal Role Affiliation Features• Sometimes context is not enough and we must use

background knowledge about the nominals• Consider the nominal: writer

– Knowing that a writer is a person increases the likelihood that the nominal will act as a Producer or an Agency• Use WordNet hypernyms for the nominal’s sense determined by

SenseLearner– Additionally, writer nominalizes the verb write, which is

classified by Levin as a “Creation and Transformation” verb. • Most likely to act as a Producer• Use NomLex-Plus to determine the verb being nominalized and

retrieve the Levin class from VerbNet

Google N-Grams for Nominal Role Affiliation

• Semantically-similar nominals should participate in the same roles– They should also occur in similar contexts in a large corpus

• Using Google 5-grams, the 1,000 most frequent words appearing in the context of a nominal are collected

• Using Jaccard similarity on those context words, the 4 nearest neighbor nominals are determined, and used as a feature– Also, determine the role most frequently associated with

those neighbors

Example Values for Google N-Grams Feature

• Sentence 4739: As part of his wicked plan, Pete promotes Mickey and his pals into the [legion]E1 of [musketeers]E2 and assigns them to guard Minnie. – MEMBER-COLLECTION(E2 , E1)

• E1 nearest neighbors: legion, army, heroes, soldiers, world– Most frequent role: COLLECTION

• E2 nearest neighbors: musketeers , admirals, sentries, swordsmen, larks– Most frequent role: MEMBER

Pre-existing Relation Features

• Sometimes the context gives few clues about the relation– Can use knowledge about a context-independent

relation between the nominals• TextRunner– A queryable database of NOUN-VERB-NOUN

triples from a large corpus of web text– Plug in E1 and E2 as the nouns and query for

predicates that occur between them

Example Feature Values for TextRunner Features

• Sentence: Forward [motion]E1 of the vehicle through the air caused a [suction]E2 on the road draft tube.

• E1 ____ E2 : may result from, to contact, created, moves, applies, causes, fall below, corresponds to which

• E2 ____ E1 : including, are moved under, will cause, according to, are effected by, repeats, can match

ResultsRelation Precision Recall F1

Cause-Effect 89.63 89.63 89.63

Component-Whole 74.34 81.73 77.86

Content-Container 84.62 85.94 85.27

Entity-Destination 88.22 89.73 88.96

Entity-Origin 83.87 80.62 82.21

Instrument-Agency 71.83 65.38 68.46

Member-Collection 84.30 87.55 85.89

Message-Topic 81.02 85.06 82.99

Product-Producer 82.38 74.89 78.46

Other 52.97 51.10 52.02

Overall 82.25 82.28 82.19

Learning Curve

1000 2000 4000 800050

55

60

65

70

75

80

85

F1

Training Size

73.08

77.0279.93

82.19

Ablation Tests• All 255 (= 28 – 1) combinations of the 8 feature sets were

evaluated by 10-fold cross validation# of feature sets Optimal feature sets F1

1 Lexical 73.8

2 +Hypernym 77.8

3 +FrameNet 78.9

4 +Ngrams 79.7

5 -FrameNet +PropBank +TextRunner 80.5

6 +FrameNet 81.1

7 +Dependency 81.3

8 +NomLex-Plus 81.3

Lexical is the single best feature set, Lexical+Hypernym is the best 2-feature set combination, etc.

Other Supervised TasksCausal relations between events – FLAIRS 2010

Causal Relations Between Events

• Discovered graph patterns that were then used as features in a supervised classifier

• Example pattern: – “Under the agreement”, “In the affidavits”, etc.

Detecting Indications of Appendicitis in Radiology Reports

• Submitted to AMIA TBI 2013

Resolving Coreference in Medical Records

• i2b2 2011 and JAMIA 2012• Approach– Based on Stanford Multi-Pass Sieve method– Added supervised learning by introducing features

to each pass– Showed that creating a first pass which identifies

all the mentions of the patient provides a competitive baseline

Extracting Relations Between Concepts in Medical Records

• i2b2 2010 Shared Task and JAMIA 2011

Supervised Relations Conclusion

• Identifying semantic relations requires going beyond contextual and lexical features

• Use the fact that arguments sometimes have a high affinity for one of the semantic roles

• Knowledge of pre-existing relations can aid classification when context is not enough

Outline


Relations in Electronic Medical Records

• Medical records contain natural language narrative with very valuable information– Often in the form of a relation between medical

treatments, tests, and problems• Example:– … with the [transfusion] and [IV Lasix] she did not go

into [flash pulmonary edema]– TREATMENT-IMPROVES-PROBLEM relations:

• (transfusion, flash pulmonary edema)• (IV Lasix, flash pulmonary edema)


• Additional examples:– [Anemia] secondary to [blood loss].• A causal relationship between problems

– On [exam] , the patient looks well and lying down flat in her bed with no [acute distress] .• Relationship between a medical test (“exam”) and what

it revealed (“acute distress”). • We consider both positive and negative findings.


• Utility– Detected relations can aid information retrieval– Automated systems which review patient records

for unusual circumstances• Drugs prescribed despite previous allergy• Tests and treatments never performed despite

recommendation


• Unsupervised detection of relations– No need for large annotation efforts– Easily adaptable to new hospitals, doctors,

medical domains– Does not require a pre-defined set of relation

types• Discover relations actually present in the data, not

what the annotator thinks is present– Relations can be informed by very large corpora

Unsupervised Relation Discovery

• Assumptions:– Relations exist between entities in text– Those relations are often triggered by contextual words:

trigger words• Secondary to, improved, revealed, caused

– Entities in relations belong to a small set of semantic classes• Anemia, heart failure, edema: problems• Exam, CT scan, blood pressure: tests

– Entities near each other in text are more likely to have a relation

Unsupervised Relation Discovery• Latent Dirichlet Allocation baseline– Assume entities have already been identified– Form pseudo-documents for every consecutive pair of entities:

• Words from first entity• Words between the entities• Words from second entity

• Example: – If she has evidence of [neuropathy] then we would consider a

[nerve biopsy]– Pseudo-document: {neuropathy, then, we, would, consider, a,

nerve, biopsy}


• These pseudo-documents lead LDA to form clusters such as:

“causal” “stopwords” “reveal problem” “prescription”

to and was (

due , on mg

secondary is and )

was she , needed

be had which as

, has showed PO

likely this he PRN

have are done :

found that showing for

thought after demonstrated every


• Clusters formed by LDA – Some good trigger words– Many stop words as well– No differentiation between:• Words in first argument• Words between the arguments• Words in second argument

• Can do a better job– By better modeling the linguistic phenomenon

Relation Discovery Model (RDM)

• Three observable variables:– w1 : Token from the first argument– wc : Context word (between the arguments)– w2 : Tokens from the second argument

• Example: – Recent [chest x-ray] shows [resolving right lower lobe

pneumonia] .– w1: {chest, x-ray}– wc: {shows}– w2: {resolving, right, lower, lobe, pneumonia}

Relation Discovery Model (RDM)• In RDM:

– A relation type (tr) is generated– Context words (wc) are generated from:

• Relation type-specific word distribution (showed, secondary, etc.); or• General word distribution (she, patient, hospital)

– Relation type-specific semantic classes for the arguments are generated• e.g. a problem-causes-problem relation would be unlikely to

generate a test or a treatment class– Argument words (w1, w2) are generated from argument class-

specific word distributions• “pneumonia”, “anemia”, “neuropathy” from a problem class

Relation Discovery Model (RDM)

• Graphical model:

Experimental Setup

• Dataset– 349 medical records from 4 hospitals– Annotated with:

• Entities: problems, treatments, tests• Relations: Used to evaluate our unsupervised approach

– Treatment-Addresses-Problem– Treatment-Causes-Problem– Treatment-Improves-Problem– Treatment-Worsens-Problem– Treatment-Not-Administered-Due-To-Problem– Test-Reveals-Problem– Test-Conducted-For-Problem– Problem-Indicates-Problem

Results

• Trigger word clusters formed by the RDM:“connected problems”

“test showed” “prescription” “prescription 2”

due showed mg (

consistent no p.r.n. )

not revealed p.o. Working

likely evidence hours ICD9

secondary done pm Problem

patient 2007 q Diagnosis

( performed needed 30

started demonstrated day cont

most without q. ):

s/p normal 4 closed

Results

• Instances of “connected problems”First Argument Context Second Argument

ESRD secondary to her DM

slightly lightheaded and with increased HR

Echogenic kidneys consistent with renal parenchymal disease

A 40% RCA , which was Hazy

Librium for Alcohol withdrawal

Last example is actually a Treatment-Administered-For-Problem

Results

• Instances of “Test showed”First Argument Context Second Argument

V-P lung scan Was performed on May 24 2007, showed

low probability of PE

A bedside transthoracic echocardiogram

done in the Cardiac Catheterization laboratory without evidence of

an effusion

Exploration of the abdomen

revealed significant nodularity of the liver

echocardiogram showed moderate dilated left atrium

An MRI of the right leg

was done which was equivocal for osteomyelitis

Results

• Instances of “prescription”First Argument

Context Second Argument

Haldol 0.5-1 milligrams p.o. q.6-8h. P.r.n. agitation

Plavix every day to prevent failure of these stents

KBL mouthwash

, 15 ccp .0. q.d. prn mouth discomfort

Miconazole nitrate powder

tid prn for groin rash

AmBisome 300 mg IV q.d. for treatment of her hepatic candidiasis

Results

• Instances of “prescription 2”First Argument Context Second

ArgumentMAGNESIUM HYDROXIDE SUSP

30 ML ) , 30 mL , Susp , By Mouth , At Bedtime , PRN, For

constipation

Depression, major ( ICD9 296.00 , Working, Problem ) cont NOS home meds

Diabetes mellitus type II

( ICD9 250.00 , Working , Problem ) cont home meds

ASCITES ( ICD9 789.6 , Working , Diagnosis ) on spironalactone

Dilutional hyponatremia

( SNMCT **ID-NUM , Working , Diagnosis ) improved with

fluid restriction

Results

• Discovered Argument Classes“problems” “treatments/tests” “tests”

pain Percocet CT

disease Hgb scan

right Hct chest

left Anion x-ray

renal Gap examination

patient Vicodin Chest

artery RDW EKG

- Bili MRI

symptoms RBC culture

mild Ca head

Evaluation

• Two versions of the data:– DS1: Consecutive pairs of entities which have a

manually identified relation between them– DS2: All consecutive pairs of entities

• Train/Test sets:– Train: 349 records, with 5,264 manually annotated

relations– Test: 477 records, with 9,069 manually annotated

relations

Evaluation

• Evaluation metrics– NMI: Normalized Mutual Information• An information-theoretic measure of how well two

clusterings match– F measure: • Computed based on the cluster precision and cluster

recall• Each cluster is paired with the cluster which maximizes

the score

Evaluation

Method DS1 DS2

NMI F NMI F

Train Set

Complete-link 4.2 37.8 N/A N/A

K-means 8.25 38.0 5.4 38.1

LDA baseline 12.8 23.1 15.6 26.2

RDM 18.2 39.1 18.1 37.4

Test Set

LDA baseline 10.0 26.1 11.5 26.3

RDM 11.8 37.7 14.0 36.4

Results with 9 relation types, 15 general word classes, and 15 argument classes for RDM.

Unsupervised Relations Conclusion

• Trigger words and argument classes are jointly modeled

• RDM uses only entities and tokens• Relations are local to the context, rather than

global• RDM outperforms several baselines• Discovered relations match well with manually

chosen relations• Presented at EMNLP 2011

Additional Relation Tasks

• Relational Similarity – SemEval 2012 Task 2– Define a relation through prototypes:• water:drop time:moment pie:slice

– Decide which is most similar:• feet:inches country:city

• Used a probabilistic approach to detect high precision patterns for the relations

• Pattern precision was then used to rank word pairs occurring with that pattern

Relational Selectional Preferences

• Submitted to IWCS 2013• Use LDA to induce latent semantic classes

Outline


Proposed work

• Supervised vector representations– Initially: word representations

• Most existing approaches create unsupervised word representations– Latent Semantic Analysis (Deerwester et al., 1990)– Latent Dirichlet Allocation (Blei et al., 1998)– Integrated Components Analysis (Scholkopf, 1998)

• More recent approaches allow for supervision

Existing supervised approaches

• HDLR– “Structured Metric Learning for High Dimensional Problems”– Davis and Dhillon (KDD 2008)

• S2Net– “Learning Discriminative Projections for Text Similarity

Measures”– Yih, Toutanova, Platt, and Meet (CoNLL 2011)– Learns lower-dimensional representations of documents– Optimizes a cosine similarity metric in the lower-dimensional

space for similar document retrieval

Supervised word representations

• Relational selectional preferences:– Classify words according to their admissibility for

filling the role of a relation:– report, article, thesis, poem are admissible for the

MESSAGE role of a MESSAGE-TOPIC relation– Assume a (possibly very small) training set

Supervised word representations

– Each word is represented by a high-dimensional context vector v over a large corpus• e.g., documents the word occurs in, other words it co-

occurs with, or grammatical links – Learn a transformation matrix T which transforms v

into a much lower dimensional vector w• subject to a loss function which is maximized when

words from the target set have high cosine similarity• Learning can be performed using LBFGS optimization on

the loss function because the cosine similarity function is twice differentiable

Proposed application• Supervised word representations can be used for many

supervised tasks which use words as features– Relation arguments– Contextual words

• Not limited to words– arbitrary n-grams– syntactic features

• We believe this approach could be useful for any high-dimensionality linguistic features (sparse features)– Benefit comes from both a larger corpus and the supervised

learning of the representation

Additional evaluations

• ACE 2004/2005 relation data– Relations between entities in newswire• e.g., MEMBER-OF-GROUP – “an activist for Peace Now”

• BioInfer 2007– Relations between biomedical concepts• e.g., locations, causality, part-whole, regulation

• SemEval 2013 Task 4 and SemEval 2010 Task 9– Paraphrases for noun compounds– e.g., “flu virus” “cause”, “spread”, “give”

Outline


Conclusions

• State of the art supervised relation extraction methods in both general domain and medical texts

• Identifying relations in text relies on more than just context– Semantic and background knowledge of arguments– Background knowledge about relations themselves

• An unsupervised relation discovery model

Thank you!Questions??

processing semantic relations across textual genres

Documents

trunk relation

preexisting relation

entity x

processing semantic

sense of piece

svm classifier

destination of piece

small piece of rock