1 ceratops center for extraction and summarization of events and opinions in text janyce wiebe, u....

CERATOPS Center for Extraction and Summarization

of Events and Opinions in Text

Janyce Wiebe, U. PittsburghClaire Cardie, Cornell U.

Ellen Riloff, U. Utah

Outline

• Sample of research results since March 2008– Coreference resolution– Topics and discourse

• Topic identification for fine-grained opinion analysis• Discourse-level opinion graphs for polarity classification

• Resources and systems– Upcoming releases– Jodange system for extracting and aggregating opinions from

on-line content

• Interactions • Research plans for the remainder of the grant

NP Coreference Resolution

Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty.

NP Coreference Resolution

Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty. Italian coach Lippi has also been blasted for his comments after the game.In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. He hailed 10-man Italy's determination to beat Australia and said the penalty was rightly given.

Coreference and Opinion Holders

Stoyanov & Cardie [2006]

Australian press has launched a bitter attack on Italy after seeing their beloved Socceroos eliminated on a controversial late penalty. Italian coach Lippi has also been blasted for his comments after the game.In the opposite camp Lippi is preparing his side for the upcoming game with Ukraine. He hailed 10-man Italy's determination to beat Australia and said the penalty was rightly given.

Reconcile

• NP coreference resolution toolbox/testbed– Public domain

• Joint effort of Cornell, LLNL, Univ of Utah• Algorithms for

– Training NP coref classifiers– Generating standard feature vectors– Testing/training on standard data sets (MUC, ACE)– Evaluating w.r.t. multiple metrics (MUC, B3, CEAF, etc.)

• Trained versions of state-of-the-art systems

Outline

on-line content

Topics in Fine-grained Opinion Analysis

– Opinion expression– Polarity

• positive• negative• neutral

– Strength/intensity• low..extreme

– Source (opinion holder)– Target (topic)

“The Australian Press launched a bitter attack on Italy”

Opinion FramePolarity: negative Intensity: highSource: “The Australian Press”Target: “Italy”

Opinion Summaries

Australian PressAustralian Press

ItalyMarcello Lippi

penalty

Socceroos

Definitions

• Topic - the real-world object, event or abstract entity that is the subject of the opinion as intended by the opinion holder

• Topic span - the closest, minimal span of text that mentions the topic

• Target span - the span of text that covers the syntactic surface form comprising the contents of the opinion

Stoyanov & Cardie [LREC 2008, Coling 2008]

Definitions

(1) [OH John] likes Marseille for its weather and cultural diversity.

(1) [OH John] likes [TARGET+TOPIC SPAN Marseille] for its weather and cultural diversity.

Definitions

(2) [OH Al] thinks that the government should tax more in order to curb CO2 emissions.

(2) [OH Al] thinks that [TARGET SPAN the government should tax gas more in order to curb CO2 emissions].

(2) [OH Al] thinks that [TARGET SPAN [TOPIC SPAN? the government] should [TOPIC SPAN? tax gas] more in order to [TOPIC SPAN? curb [TOPIC SPAN? CO2 emissions]]].

(3) Although he doesn’t like government imposed taxes, he thinks that a fuel tax is the only effective solution.

Issues in Opinion Topic Identification

• Multiple potential topics • Opinion topics are not always explicitly

mentioned

(3) [OH John] believes the violation of Palestinian human rights is one of the main factors.

Topic: ISRAELI-PALESTINIAN CONFLICT

(4) [OH I] disagree entirely!

A Coreference Approach

• Hypothesize that the notion of topic coreference will facilitate identification of opinion topics

• Easier than specifying the topic of each opinion in isolation

Two opinions are topic-coreferent if they share the same opinion topic.

A Coreference Approach to Opinion Topic Identification

• Find the clusters of topic-coreferent opinions– the critical step for topic identification

• Label the clusters with the name of the topic– we currently ignore this step – address in future work through frequency analysis of

the terms in each of the clusters

The Topic Coreference Algorithm

• Adapt a standard machine learning-based approach to NP coreference resolution (Soon et al., 2001; Ng & Cardie, 2002)1. identify topic spans2. perform pairwise classification of the associated

opinions and topic spans to determine whether or not they are topic-cofererent

3. cluster the opinions according to the results of (2)

Results• Annotated subset of the MPQA corpus for topic-

coreferent opinions– Acceptable inter-annotator agreement

• 10-fold cross-validation• Baselines

B3 CEAFall-in-one .37 -.10 .301 opinion per cluster .29 .22 .27same paragraph .55 .31 .50Choi 2000 .54 .37 .54

Results

sentence spans .57 .40 .54

automatic (heuristics) .57 .41 .54

modified manual topic spans .64 .51 .61

manual topic spans .71 .66 .62

Results

sentence spans .57 .40 .54automatic (heuristics) .57 .41 .54modified manual topic spans .64 .51 .61manual topic spans .71 .66 .62

Outline

on-line content

Discourse level opinion graphs for polarity classification

• Somasundaran et al. Discourse level Opinion Graphs for Polarity Classification. Submitted.

• Somasundaran et al. (2008). Discourse Level Opinion Interpretation. COLING-2008.

• Somasundaran et al. (2008). Discourse Level Opinion Relations: An Annotation Study. SIGdial Workshop on Discourse and Dialogue-2008.

Motivation

• The interpretations of opinions often depend on the other opinions in the discourse

Motivation• I like the LCD feature

• We must implement the LCD

• I think the LCD is hot

Motivation• I like the LCD feature

Interdependent Interpretation of opinions in the discourse

• Shapes should be curved, so round shapes. Nothing square-like.

• ... So we shouldn’t have too square corners and that kind of

thing.

Motivation

• Shapes should be curved, so round shapes Nothing square-like.

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

Motivation

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

What were the opinions regarding the curved shape?Will the curved shape be accepted?

Motivation

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

Direct opinion

Motivation

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

Direct opinionOpinions towards mutually

exclusive option (alternative)

Motivation

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

Opinions towards the square shapes reveal more about the

speaker’s opinion regarding the curved shape

Motivation

thing.

Arguing Negative

Arguing Positive

Arguing Negative

Sentiment Negative

More opinion information

Motivation

Opinions towards the square shapes reveal more about the

speaker’s opinion regarding the curved shape

Opinion Frames • This blue remote is cool.

• What’s more, the rubbery material is ergonomic.

• We must really go for the blue remote.

• I feel the red remote is a better choice.

• The blue remote will be too expensive.

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Neg

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Neg

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Neg

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Negalternative

Annotation tool

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Negalternative

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment Neg

SPSPsame

alternative

Sentiment Pos

Arguing Pos

Sentiment Pos

Sentiment NegSPSNaltalternative

• AMI meeting corpus (Carletta et al., 2005)– Rich in opinions– Rich interplay between opinion types, targets,

and polarities– Same and alternative target relations are

prominent– Targets are relatively concrete

The Project roadmapAccomplished

– Developed a multi-stage annotation scheme – Tested human reliability in recognizing opinion frames– Showed that the frame presence between two opinion bearing

sentences can be automatically detected– Showed that interdependent interpretation improves automatic

detection of opinion polarityTo Do

– Fully automate effective discourse level opinion interpretation – Text data

• Do the opinion frames behave the same way? What knowledge sources can we use for our task in text?

Discourse Level Opinion Graphs (DLOGs)

• Pair-wise relationship from Opinion frames are used to make Discourse Level Opinion Graph (DLOGs)

LegendAlternative Target relation ------------Same Target relation ------------Reinforcing frame relation ------------Non-reinforcing frame relation ------------

Improving Polarity Classification using DLOG

• Computationally model discourse level relations between opinions

• Leverage these relations to improve polarity classification

Improving Polarity Classification using DLOG

• Initial step:– Individual classifiers for recognizing opinions,

polarities, and the 4 types of links• Iterative steps

– Relational classifiers that use information of the neighboring nodes

• The classes assigned to neighbors thus far

Illustration• I like the LCD feature

DLOG• I like the LCD feature

• I think the LCD is hot ?

Reinforcing

Findings

• Link prediction is hard due to data skewness fully automatic system does not significantly improve polarity classification

Findings

• Can discourse-level opinion graphs be exploited to improve the performance of opinion polarity classification?

• Yes!

• 8-16 percentage point improvement in F-measure when the graph structure is known as well as the neighbor’s polarity is known

• 3-5 percentage point improvement in F-measure when only the graph structure is known

Upcoming releases

• Reconcile NP coreference system• OpinionFinder version 2

– Improved performance on opinion and polarity recognition

• MPQA opinion annotated corpus version 3– Attitude and target span annotations added – Opinion topic coreference annotations added

• New lexicon representation and toolkit– Uniform representation for different types of subjectivity clues– Attributes, such as polarity, intensity– Toolkit for adding entries to the dictionary and finding instances

of them in a corpus

Jodange

• Demo www.jodange.com• Commercialization of methods to identify and

aggregate opinions and their attributes

Opinion FrameExpression: “blasted”Polarity: negative Source: “The Australian Press”Topic: “Italy”

NP Coref ResolutionSource: “The Australian Press” = “their” = “the reporters”

Interactions• UACs

– Riloff and Hovy have continued their collaboration on Information Extraction

– Riloff was on sabbatical at ISI through the summer• Other

– Riloff and Cardie are engaged in coreference resolution work with researchers at LLNL

– Wiebe is in the planning stages of work with researchers at LLNL on a news search project

– Riloff is collaborating with the Veterinary Information Network on research in pet health surveillance

– Wiebe is collaborating with researchers in the U. Pittsburgh Medical School on an NIH syndromic surveillance project

– Cardie is collaborating with researchers in the Cornell U. Information Science program on a project on the detection of deception in on-line communication

Research Plans

Information extraction for pet health surveillance– collaborating with the Veterinary Information Network

to study message board discussions between veterinary professionals.

– goal is information extraction of entities and relations associated with pet health events, to discover:

• associations between symptoms and diseases• spikes or trends in diseases (e.g., kidney failure

cases during pet food recall)• unusual or unexplained clusters of symptoms or

diseases

Research Plans

Coreference resolution– Developing methods to incorporate knowledge from the web as

corroborating evidence for coreference resolution.

• Ex: Orrin Hatch and Ralph Becker are possible antecedents for “the mayor” … issue web queries to decide!

– Experimenting with methods to automatically generate training data for domain-specific coreference resolution.

– Developing state-of-the-art coreference resolver called Reconcile that will be made publicly available.

• testbed for research and analysis

• easily configurable to use different subcomponents

Research PlansSelective information extraction with relevant regions– developing weakly supervised learners to create relevant

region classifiers

– learning several different types of clues with varying levels of reliability

• primary indicators are reliable enough to apply everywhere

• secondary indicators are only applied within relevant regions of text

Domains: terrorist events & disease outbreak reports

Architecture for Selective IE

UnstructuredText

RelevantSentenceClassifier

IE System

RelevantSentences

IE Patternsand Clues

Extractions

Pattern &Clue Learner

Relevant Region Training

Seed IEPatterns

Relevant andIrrelevant

Documents

Research Plans

• Complete the releases of the MPQA corpus, OpinionFinder, & lexicon and toolkit

Discourse level opinion interpretation

• Move from task-oriented meetings to blogs and other texts

• Challenges – Other types of relations between targets are more dominant than

in task-oriented meetings– Targets are often more abstract

• Opportunities– Richer lexically– Can exploit parse trees– Explicit discourse cues are more common– Sentences are longer and more self contained

Lexicon

• Explore different uses of words, to zero in on the subjective ones

• Example: benefit

Lexicon

• Example: benefit• Very often objective, as a Verb: Children with ADHD benefited from a 15-course

of fish oil

Lexicon

• Noun uses look more promising: The innovative economic program has shown benefits to humanity

Lexicon

• However, there are objective noun uses too: …tax benefits.…employee benefits.…tax benefits to provide a stable economy.…health benefits to cut costs.

Lexicon

• Pattern: benefits as the head of a noun phrase containing a prepositional phrase

• Matches this:The innovative economic program has shown

proven benefits to humanity• But none of these:…tax benefits.…employee benefits.…tax benefits to provide a stable economy.…health benefits to cut costs.

Entry representationbe soft on crime

<word>crime</word><majorClass>N</majorClass></itemMorphoSyntax>

Attributive information

<entryAttributes origin="j"><name>be soft on crime</name><subjective>true</subjective><reliability>h</reliability><confidence>h</confidence><subType>sen</subType><example>The Obama campaign rejected the notion that the senator

might be vulnerable to accusations that he is soft on crime.</example><morphosyn>vp</morphosyn><target>s</sp_target><polarity>n</polarity><intensity>m</intensity><confidence>h</confidence><regex>1:[morph:[lemma="be"] order:[distance="2" landmark="2"]] 2:

[morph:[word="soft" majorClass="J"] order:[distance="1" landmark="3"]] 3:[morph:[word="on"] order:[distance="1" landmark="4"]] 4:[morph:[word="crime" majorClass="N"]]</regex>

<patterntype>ngramPattern</patterntype>

Research Plans

• Learn subjective uses from corpora (bodies of texts)

• Capture longer subjective constructions• Incorporate word senses

– OntoNotes (Hovy)• Add relevant knowledge about expressions• Exploit the richer knowledge resource to improve

opinion extraction

1 ceratops center for extraction and summarization of events and opinions in text janyce wiebe, u....

Documents

multi-perspective question answering using the opqa corpus...

network name alexandra hammond infant feeding coordinator /...

proceedings of the joint conference of the 47th annual...

improving information extraction from clinical...

acl01 workshop on collocation1 identifying collocations for...

· translate this pageÐÏ à¡± á> þÿ ìñu òu îu u...

grundkarta teckenförklaring...u u u u u u u u u u u u u u u...

1 identifying subjective language janyce wiebe university of...

widening the field of view of information extraction...

patio heaters janyce archutick, nick chappell-moss...

creating subjective and objective sentence classifiers from...

vnrru% lfvx 0...5 5 5 5 5 5 5 5 u u u u u u u u u u u u u u...

veselin stoyanov claire cardie diane litman janyce...

תיברעמה הדגה · 2019. 3. 28. · u u u u u u u u...

aspergillus fumigatus and related...

open file 2000-2 · u% %u %u %u %u %u %u %u %u %u%u% u% %u%...

tracking point of view in...

annotating expressions of opinions and emotions...

information extraction: methodologies and …...rank...

desiderata for annotating data to train and evaluate...