identifying relevant temporal expressions for real-world events

18
Identifying Relevant Temporal Expressions Identifying Relevant Temporal Expressions for Real-World Events N tti K hb 1 S R 2 A é St t 1 Nattiya Kanhabua 1 , Sara Romano 2 , Avaré Stewart 1 1 L3S Research Center G Leibniz Universität Hannover, Germany 2 Dipartimento di Informatica e Sistemistica Ui it Fd i II N l It l University Federico II Naples, Italy

Upload: nattiya-kanhabua

Post on 16-Aug-2015

104 views

Category:

Presentations & Public Speaking


0 download

TRANSCRIPT

Page 1: Identifying Relevant Temporal Expressions for Real-world Events

Identifying Relevant Temporal ExpressionsIdentifying Relevant Temporal Expressionsfor Real-World Events

N tti K h b 1 S R 2 A é St t1Nattiya Kanhabua1, Sara Romano2, Avaré Stewart1

1L3S Research CenterGLeibniz Universität Hannover, Germany

2Dipartimento di Informatica e SistemisticaU i it F d i II N l It lUniversity Federico II Naples, Italy

Page 2: Identifying Relevant Temporal Expressions for Real-world Events

MotivationMotivation

• Numerous works have shown the potential of using Twitter to infer the existence and magnitude of real-world events in real-time– Earthquake [Sakaki et al., 2010]– Influenza epidemics [Culotta, 2010; Lampos et al.,

2011; Paul et al., 2011]

• In the medical domain, there has been a surge in gdetecting health related tweets for early warning– Allow a rapid response from authoritiesp p

Page 3: Identifying Relevant Temporal Expressions for Real-world Events

Health related tweetsHealth related tweets

• User status updates or news related to public health are common in Twitterp– I have the mumps...am I alone?

b b i l h G t t iti t!! Pl– my baby girl has a Gastroenteritis so great!! Please do not give it to meee

– #Cholera breaks out in #Dadaab refugee camp in #Kenya http://t.co/....

– As many as 16 people have been found infected with Anthrax in Shahjadpur upazila of the Sirajganj district in Bangladesh.

Page 4: Identifying Relevant Temporal Expressions for Real-world Events
Page 5: Identifying Relevant Temporal Expressions for Real-world Events

Extracting outbreak eventsExtracting outbreak events

• Support a comparative, temporal analysis between Twitter and official sources– World Health Organization1

– ProMED-mail2

[Kanhabua et al., 2012]

1http://www.who.int2http://www.promedmail.org/

Page 6: Identifying Relevant Temporal Expressions for Real-world Events

Problem statementProblem statement

• How to extract real-world events from unstructured text documents?– Previous work finds interesting time for an event, but

not determining the relevance of time

• How to determine the relevance of temporal pexpressions for extracted events?– Not all temporal expressions associated to an eventNot all temporal expressions associated to an event

are equally relevant

Page 7: Identifying Relevant Temporal Expressions for Real-world Events

Related workRelated work

• Extract temporal expressions from unstructured text using time and event recognition algorithms– [Verhagen et al., 2005; Strötgen et al., 2012]

• Harvest temporal knowledge from semi-structured contents like Wikipedia infoboxescontents like Wikipedia infoboxes– [Hoffart et al., 2012]

Page 8: Identifying Relevant Temporal Expressions for Real-world Events

ContributionsContributions

• An approach to extracting real-world events automatically from unstructured texts

• A machine learning approach to identifying relevant temporal expressionsrelevant temporal expressions– Three classes of features for learning relevance

• Experiments on real-world data and 3,500 manually judged relevance pairs

Page 9: Identifying Relevant Temporal Expressions for Real-world Events

System architectureSystem architecture

• Extract events in a pipeline fashion Unstructured

text collection

Sentence Extraction

Tokenization gg g

Part-of-speechTagging

• Annotated documents– named entities (diseases, Text Annotation

Temporal Expression Extractiong

Named Entity

RecognitionAnnotated Document

s

victims and locations)– temporal expressions

a set of sentences

IdentifyingRelevant

Time

Event Aggregation

Event Profiles

browsing/ – a set of sentences

• Event e: (v, m, l, te)who (victim v) was infected

Event Extraction

User

gretrieving

– who (victim v) was infected – what (disease m) causes– where (location l)( )– when (time te)

Page 10: Identifying Relevant Temporal Expressions for Real-world Events

Two time aspectsTwo time aspects

1. Publication time2. Content or event time

Page 11: Identifying Relevant Temporal Expressions for Real-world Events

Two time aspectsTwo time aspects

content time

publication time

Page 12: Identifying Relevant Temporal Expressions for Real-world Events

Event extractionEvent extraction

• An event is a sentence containing two entities– (1) medical condition and (2) geographic expression– A minimum requirement by domain experts

• A victim and the time of an event can be identified• A victim and the time of an event can be identified from the sentence itself, or its surrounding context

• Output: a set of event candidates

Reported by World Health Organization (WHO) on 29 July 2012 about an ongoing Ebola outbreak

fin Uganda since the beginning of July 2012

Page 13: Identifying Relevant Temporal Expressions for Real-world Events

Identifying relevant timeIdentifying relevant time

• The task of identifying relevant time is regarded as a classification problem– Two classes: (1) relevant and (2) irrelevant

• Definition: relevant referring to the starting, g g,ending or ongoing time of the event

• Learn relevance using three classes of featuresLearn relevance using three classes of features– Sentence-based features– Document-based features– Corpus-specific features

Page 14: Identifying Relevant Temporal Expressions for Real-world Events

FeaturesFeatures

• Sentence-based– senLen, senPos, isContext, cntEntityInS, cntTExpInS,

cntTPointInS, cntTPeriodInS, entityPos, entityPosDist, TExpPos, TExpPosDist, timeDist, entityTExpPosDist

• Document-based– cntEntityInD, cntEntitySen, cntTExpInD, cntTPointInD,

cntTPeriodInD• Domain-specific

– isNeg, isHistory

Page 15: Identifying Relevant Temporal Expressions for Real-world Events

ExperimentsExperiments

• Settings– Official outbreak reports posted during the year 2011

– The number of documents and sentences• ProMED-mail: 2,977 documents and 95,465 sentencesProMED mail: 2,977 documents and 95,465 sentences• WHO: 59 documents and 761 sentences

– Series of NLP tools including– Series of NLP tools including• OpenNLP (tokenization, sentence splitting, POS tagging)• OpenCalais (named entity recognition) p ( y g )• HeidelTime (temporal expression extraction)

– Our dataset: manually selected 25 infectious diseasesOur dataset: manually selected 25 infectious diseases (medical conditions) by medical professionals

Page 16: Identifying Relevant Temporal Expressions for Real-world Events

ResultsResults• Baseline: majority class with

accuracy of 0.58

D i i t (J48) i th b t• Decision tree (J48) is the best among other classification algorithms

• Sentence-based features improved the accuracy of

fbaseline significantly

• senLen and entityPosDistf b t 0 65perform best accuracy=0.65

• The combination of different features gained high accuracyfeatures gained high accuracy

Page 17: Identifying Relevant Temporal Expressions for Real-world Events

SummarySummary

• An approach to extracting real-world events automatically from unstructured texts

• A machine learning approach to identifying the relevant temporal expressionsrelevant temporal expressions– Three classes of features for learning relevance

• Experiments on real-world data and 3,500 manually judged relevance pairs

Page 18: Identifying Relevant Temporal Expressions for Real-world Events

ReferencesReferences• [Culotta, 2010] A. Culotta. Towards detecting influenza epidemics by analyzing twitter messages.

In Proceedings of the First Workshop on Social Media Analytics (SOMA’2010), 2010.• [Hoffart et al., 2012] J. Hoffart, F. Suchanek, K. Berberich, and G. Weikum. Yago2: A spatially

and temporally enhanced knowledge base from wikipedia. Artificial Intelligence Journal, Special Issue on Wikipedia and Semi-Structured Resources, 2012.

• [Kanhabua et al., 2012] N. Kanhabua, Sara Romano, A. Stewart and W. Nejdl. Supporting Temporal Analytics for Health Related Events in Microblogs. In Proceedings of CIKM'2012, 2012.

• [Lampos et al 2011] V Lampos and N Cristianini Nowcasting events from the social web with• [Lampos et al., 2011] V. Lampos and N. Cristianini. Nowcasting events from the social web with statistical learning. ACM TIST, 3, 2011.

• [Paul et al., 2011] M. J. Paul and M. Dredze. You are what you tweet: Analyzing twitter for public health. In Proceedings of ICWSM’2011, 2011.[S k ki t l 2010] T S k ki M Ok ki d Y M t E th k h k t itt• [Sakaki et al., 2010] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of WWW’2010, 2010.

• [Strötgen et al., 2012] J. Strötgen, O. Alonso, and M. Gertz. Identification of top relevant temporal expressions in documents. In Proceeding of the 2nd Temporal Web Analytics Workshop (TempWeb02), 2012.

• [Strötgen et al., 2010] J. Strötgen and M. Gertz. Heideltime: High quality rule-based extraction and normalization of temporal expressions. In Proceedings of the 5th International Workshop on Semantic Evaluation, 2010.,

• [Verhagen et al., 2005] M. Verhagen, I. Mani, R. Sauri, J. Littman, R. Knippen, S. B. Jang, A. Rumshisky, J. Phillips, and J. Pustejovsky. Automating temporal annotation with TARSQI. In Proceedings of ACL’2005, 2005.