identifying constant and unique relations by using …ynaga/papers/takaku-emnlp2012...to plot...
TRANSCRIPT
Identifying Constant and Unique Relations by using Time-Series Text
Yohei Takaku Nobuhiro Kaji Naoki Yoshinaga Masashi ToyodaToyo Keizai Inc. Institute of Industrial Science, University of Tokyo
2009 2010 2011 2012
Apple’s CEO isSteve Jobs
Relation Extraction from the Evolving Web
Apple’s CEO isTim Cook
• Web (text) as a growing goldmine for extracting
relations between real-world entities
[Pantel+ 06; Banko+ 07; Suchanek+ 07; Wu+ 08,10; Zhu+ 09; Mintz+ 09]
– Processing more text leads to more relations, but– Relations in text could be obsolete / will become outdated
obsolete
• <arg1, flows through, arg2>• <arg1, ‘s CEO is, arg2>• <arg1, sells, arg2>
How to Consistently Compile Extracted Relations?
• <arg1, flows through, arg2>• <arg1, ‘s CEO is, arg2>• <arg1, sells, arg2>
2009 2010 2011 2012
Moselle river flows through Germany
Moselle river flows through France
How to Consistently Compile Extracted Relations?
Accumulate all the relations, because the relation does not evolve over time
• <arg1, flows through, arg2>• <arg1, ‘s CEO is, arg2>• <arg1, sells, arg2>
2009 2010 2011 2012
Apple’s CEO isSteve Jobs
Apple sellsMacBook Air
Apple’s CEO isTim Cook
Apple sellsiPhone 4S
How to Consistently Compile Extracted Relations?
overwritt
en
Overwrite with new one, because the relation can take one value of arg2
Constant and Unique Relations
<arg1, sells, arg2><arg1, flows through, arg2>
<arg1, ‘s father of, arg2> <arg1, belongs to, arg2>
<arg1, was born in, arg2> <arg1, ’s CEO is, arg2>
constant
non-constant
non-constant
• Given value of arg1,– Constant rel.: value of arg2 is independent of time– Unique rel.: value of arg2 is one at any point in time
, non-unique , non-unique
, uniqueconstant, unique
Overview
• Constancy and Uniqueness of Relations• Our Approach• Features for Constancy Classification• Features for Uniqueness Classification• Experiments• Conclusion
Two Binary Classification Tasks
<arg1, ’s father is, arg2>
<arg1, ’s CEO is, arg2>
<arg1, sells, arg2><arg1, borders on, arg2>
<arg1, was born in, arg2>
<arg1, belongs to, arg2>
Two Binary Classification TasksTask 1: constancy classification
constant non-constant
<arg1, ’s father is, arg2>
<arg1, ’s CEO is, arg2>
<arg1, sells, arg2><arg1, borders on, arg2>
<arg1, was born in, arg2>
<arg1, belongs to, arg2>
Two Binary Classification TasksTask 2: uniquness classification
unique
non-unique
<arg1, ’s father is, arg2>
<arg1, ’s CEO is, arg2>
<arg1, sells, arg2><arg1, borders on, arg2>
<arg1, was born in, arg2>
<arg1, belongs to, arg2>
Two Kinds of Features forTraining Supervised Classifiers
• Frequency obtained from time-series text– Detailed later
– Based on blog posts crawled from 2006 to 2011
• Linguistic cuese.g., <arg1, ’s president is, arg2>non-const.
e.g., <arg1, borders on, arg2>non-uniq.
Prefix George Bush is ex-president of USA
Coordination France borders on Italy as well as Spain
Overview
• Constancy and Uniqueness of Relations• Our Approach• Features for Constancy Classification• Features for Uniqueness Classification• Experiments• Conclusion
<Keisuke Honda(=arg1), belongs to, arg2>non-const.
Using Time-series Textfor Constancy Classification
2010 – now, CSK Moskow2008 – 2010, VVV-Venlo
<Keisuke Honda(=arg1), belongs to, arg2>non-const.
Using Time-series Textfor Constancy Classification
0
5
10
15
20
25
30
CSKA M
oscow
VVV Venlo VVVIta
ly
Dutch le
ague
Luzh
niki St
adium
Chairm
an Haib
eruden
Chelse
aPA
O0
5
10
15
20
25
30
CSKA M
oscow
VVV Venlo VVVIta
ly
Dutch le
ague
Luzh
niki St
adium
Chairm
an…
Chelse
aPA
O
2008 – 2009 2010 – 2011
arg2 fillers arg2 fillers
freq
uenc
y
freq
uenc
yVVV Venlo CSKA Moscow
Cosine Similarity as a Feature Value
0
5
10
15
20
25
30
CSK
A M
osco
w
VVV V
enl
oVVV
Ital
y
Dut
ch le
ague
Luzh
niki Sta
dium
Cha
irm
an H
aibe
rude
n
Che
lsea
PAO
0
5
10
15
20
25
30
CSK
A…
VVV V
enl
oVVV
Ital
y
Dut
ch…
Luzh
niki…
Cha
irm
an…
Che
lsea
PAO
<Keisuke Honda, belongs to, arg2>non-const.
(19, 0, 0, 0, 0, 0, 0, 0, 1)( 0, 27, 7, 0, 0, 0, 1, 0, 0)
cosine = 0.0
0
5
10
15
20
25
Tam
a rive
r
Sum
ida rive
r
Meg
uro
river
Asa
riv
er
Kann
da riv
er
Edo
rive
r
Saka
i river
Min
amiasa
river
Onda
river
<Tokyo, has river, arg2>const.
0
5
10
15
20
25
Tam
a rive
r
Sum
ida rive
r
Meg
uro
river
Asa
riv
er
Kann
da riv
er
Edo
rive
r
Saka
i river
Min
amiasa
river
Onda
river
( 4, 2, 1, 1, 1, 1, 0, 0, 0)( 4, 0, 1, 3, 1, 21, 3, 1, 1)
cosine = 0.39
Importance of Choosing Time Windows
0
5
10
15
20
25
30
CSKA M
osco
w
VVV Venl
oVVV
Italy
Dut
ch le
ague
Luzh
niki Sta
dium
Chairm
an H
aibe
rude
n
Chelsea
PAO
0
5
10
15
20
25
30
CSKA M
osco
w
VVV Venl
oVVV
Italy
Dut
ch le
ague
Luzh
niki Sta
dium
Chairm
an H
aibe
rude
n
Chelsea
PAO
<Keisuke Honda, belongs to, arg2>non-const.
2009 2010 2011 2012
VVV Venlo CSKA Moscow
Team
changed
X
Importance of Choosing Time Windows
05
1015202530
CSKA M
oscow
VVV Venlo VVVIta
ly
Dutch le
ague
Luzh
niki St
adium
Chairm
an Haib
eruden
Chelse
aPA
O
2009 2010 2011 2012
CSKA Moscow
Team changed
X
05
1015202530
CSKA M
oscow
VVV Venlo VVVIta
ly
Dutch le
ague
Luzh
niki St
adium
Chairm
an Haib
eruden
Chelse
aPA
O
CSKA Moscow
<Keisuke Honda, belongs to, arg2>non-const.
-10
10
30
CSKA…
VVVDut
ch…
Chair…
PAO-10
10
30
CSKA…
VVVDut
ch…
Chair…
PAO
0
20
40
CSKA…
VVVDut
c…
Chai…
PAO
Using Multiple Time Windows
T months
2009 2010 2011 2012time0.0
0.4 0.3
............
Window size T = { 1, 3, 6, 12 (months) }
Integration method = { ave., min., max. }12 (= 4 x 3) features
0.4 + 0.3 + 0.03
= 0.233
Overview
• Constancy and Uniqueness of Relations• Our Approach• Features for Constancy Classification• Features for Uniqueness Classification• Experiments• Conclusion
0
5
10
15
20
25
30
CSKA M
oscow
VVV V
enlo
VVV
Italy
Dutch league
Luzhnik
i Stadiu
m
Chairm
an…
Chelsea
PAO
Using Time-series Text
for Uniqueness Classification
<Keisuke Honda, belongs to, arg2>uniq.
<Google, acquires, arg2>non-uniq.
single entity12 entities
arg2 fillers arg2 fillers
……
0
5
10
15
20
25
30
YouTube
Gro
upon
Slide
Dem
and
SayNow
Tw
itte
r
DSP
FeedBurner
W
idevin
e
Fflick
Jaik
u
PushLife
number of entity types
0
5
10
15
20
25
30
CSKA M
osco
w
VVV Venl
oVVV
Italy
Dut
ch le
ague
Luzh
niki Sta
dium
Chairm
an…
Chelsea
PAO
Using Time-series Text
for Uniqueness Classification (Cont.)
<Keisuke Honda, belongs to, arg2>uniq. <Google, acquires, arg2>non-uniq.
values of arg2values of arg2data can be noisy
0
5
10
15
20
25
30
YouT
ube
Gro
upon
Slid
e
Dem
and
SayN
ow
Twitt
erDSP
Feed
Burne
r
Wid
evine
Fflic
k
Jaik
u
Push
Life
19
1
97
1/19 ≒ 0.05 7/9 = 0.77
Frequency ratio between the 1st and 2nd most frequent entities
Setting Appropriate Time Windowsis also Important
<Keisuke Honda, belongs to, arg2>uniq.
2009 2010 2011 2012
Team changed
X
CSKA MoscowVVV Venlo
05
1015202530
CSKA M
osco
w
VVV Venlo VVVIta
ly
Dutch
leag
ue
Luzh
niki S
tadiu
m
Chairm
an H
aiberu
den
Chelse
aPA
O
Overview
• Constancy and Uniqueness of Relations• Our Approach• Features for Constancy Classification• Features for Uniqueness Classification• Experiments• Conclusion
Experiments
• Evaluate our method with relations extracted from time-series text– Classifier: Passive-aggressive algorithm w/ proposed features
– Time-series text: 6-year’s worth of Japanese blog posts (2.3-billion sentences)
• Conduct experiments to:– Evaluate the constancy classification
– Evaluate the uniqueness classification
– Investigate the impact of multiple window sizes
Data and Settings• Parse time-series text to extract dependency paths
connecting two named entities as relation instances
• Annotate 1000 relations (majority vote, 3 humans)
– Major reason for disagreement: type ambiguityEx. <arg1human, is seen in, arg2>non-const.
<arg1mountain, is seen in, arg2>const.
• Use the labeled relations for training & testing– Evaluation metric: Precision & Recall (5-fold cross validation)
Constancy UniquenessKappa [Fleiss 1971] 0.346 (fair) 0.428 (moderate)
• Varying the threshold to classifier’s output (margin) to plot recall-precision curve – Baseline: cosine similarity between distributions over arg2
in the first and last month
Classification Result: Constancy
Recovered most of the constant relations while keeping precision
Suffered from data sparsenessBaseline
Proposed
Classification Result: Uniqueness
• Varying the threshold to classifier’s output (margin) to plot recall-precision curve – Baseline: re-implementation of [Lin+ 10]
(based on gross distributions over arg2)
Baseline
Proposed Time-series distributions over arg2 greatly improved the precision
Confused non-constant, unique rels. with non-unique rels.
Impact of using Multiple Time Windows, T
• Compare our method (multiple time windows) with methods using a single time window
– Combining features computed from multiple time windows greatly improved the precision of uniqueness classification
Constancy Uniqueness
ex. <Japan, ‘s capital city is, {Tokyo, Kyoto}>uniq. (outdated fact)
<Sun, is bought out by, {Oracle, IBM}>const. (speculation)
Error Type Const. Uniq. Total
Paraphrases 16 52 68
Topical bias 0 56 56
Short/long-term evolution 20 27 47
Obsolete/Speculative facts 10 19 29ex. <Real Madrid, beats, Valencia>uniq. Apr. 7th, 2012
<Real Madrid, beats, Barcelona>uniq. Apr. 21st, 2012
ex. <Will Smith, ’s son is, {Jaden Smith/Trey Smith}>non-uniq.
co-starred in a film
ex. <Obama, is president of, {USA, the United States}>uniq.
• Investigate 200 misclassified relations
Error Analysis
Related Work
• TempEval temporal relation identification[Verhagen+ 2007, 2010]
– Associate event (relation instance) with timeex. <Charles Chaplin, was born in, London> OVERLAP 1889
– Do not address constancy of relation
• Functional relation identification [Ritter+ 10]
– Use distributions over arg2 to identify uniqueness [Lin+ 10]
– Time dimension should be considered to identify uniquenessex. <Naoki Yoshinaga, lives in, {Kyoto (-’96),
Tokyo (‘96-’05,’08-)}>uniq.
Overview
• Constancy and Uniqueness of Relations• Our Approach• Features for Constancy Classification• Features for Uniqueness Classification• Experiments• Conclusion
2019/11/7 31EMNLP-CoNLL2012
Conclusion
• A novel notion of constancy of relations• A method of identifying constant and unique relations
– Use massive time-series text to induce features – Timer-series distributions over arg2 computed with multiple
time windows were quite effective
• Future work
– Apply our method to relations with typed arguments [Lin+ 10]
Ex. <arg1mountain, is seen in, arg2>const.
– Use constancy/uniqueness to compile extracted relations