yuheng hu (@hyheng) arizona state univ. ajita john avaya labs fei wang ibm t.j watson research...

1

Yuheng Hu (@hyheng) Arizona State Univ.Ajita John Avaya LabsFei Wang IBM T.J Watson ResearchSubbarao Kambhampati Arizona State Univ.

ET-LDA: Joint Topic Modeling for Aligning Events and their Twitter Feedback

3

MotivationRepublican Primary Debate, 09/07/2011 Tweets tagged with #ReaganDebate

?

?

Which part of the event did a tweet refer to?What were the topics of the event and tweets?

Applications: Event playback/Analysis, Sentiment Analysis, Advertisement, etc

Yuheng

GOP Primary debate on Sept 07, 2011, talked about various issues such as job creation, healthcare, foreign policy, etcwe crawled tweet using the official hashtag, posted by NBC newsWith vast amount of tweets, we want to know two things: 1) which part of the event did a tweet refer to and 2)what were the topicsAnswering two questions has great applications, e.g., event analysis, playback, situation awareness etc

4

Event-Tweet Alignment: The Problem

• Given an event’s transcript S and its associated tweets T– Find the segment s (s ∈ S) which is topically

referred by tweet t (t ∈ T) [Could be a general tweet]

• Alignment requires:1. Extracting topics in the tweets and event2. Segmenting the event into topically coherent chunks3. Classify the tweets

--General vs. Specific

4

Yuheng

Here's a conceptual model of tweet and event.event has segment, tweets can be either general or specific, if general, then it can refer to whole event, otherwise, it can refer to centain specific segment, in any case, such reference is based on topics.

5

Event-Tweet Alignment: A Model

6

Event-Tweet Alignment: Challenges

• Both topics and Segments are latent

• Tweets are topically influenced by the content of the event. A tweet’s words’ topics can be – general (high-level and constant

across the entire event), or– specific (concrete and relate to

specific segments of the event)• General tweet = weakly influenced

by the event• Specific tweet = strongly influenced

by the event

• An event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topics

7

Event-Tweet Alignment: Approaches

• Prior work– Event Segmentation

• HMM-based, etc

– Topics Modeling • LDA, PLSI

• Possible Solution– Apply LDA to event and

Tweets separately– Measure the closeness

by JS-divergence of their topic distributions

– Problem: Event and and its twitter feeds are modeled largely independently

• Our Solution: Joint Modeling– ET-LDA (event-tweets LDA)

considers an event and its Twitter feeds jointly and characterizes the topic influences between them in a fully Bayeisan model

• Potential advantages– Tweets provide a richer

context about the topic evolution in the event

– Can measure the influence of the event on the twitterati

Yuheng

why not independent? In reality, event and tweets are obviously inter-dependent. For example, in practice, when tweets are generated by the crowdsto express their interests in the event, their content is essentially inﬂuenced by the topics covered in the event in some way.by joint considering, we have two benefit 1) to be able to measure the influence on tweets 2), from tweets, we can obtain a richer context about the evolution of topics and the topical boundaries in the event which is critical to the event segmentation (i.e., based on tweets response to segment the event). So joint model can do two things together, and these two things are depended on each other.

subbarao

you need to say, orally, why indepdendent modeling is not going to be as good and why joint modeling is needed.

8

ET-LDA

9

ET-LDA ModelEvent Tweets

Determine event segmentation

Determine tweet type

Determine which segment a tweet (word) refers to

Determine word’s topic in event Tweets

word’s topic

Yuheng

in the event part, we assume that an event is formed by discrete sequentially-ordered segments, each of which discusses a particular set of topicsto do segmentatoin/model the topic evolutions in the event, we apply the Markov assumption on \theta(s),with some probability, it is as same asthe distribution of topics of previous paragraph s-1, otherwise, a new distribution of topics \theta(s) is sampled from a dirichlet. This pattern of dependency is produced by associating a binary variable c(s). with each paragraph, indicating whether its topic is the same as that of the previous paragraph or different. If the topic remains the same, these paragraphs are merged to form one segment.

Yuheng

we assume that a tweet consists ofwords which can belong to two distinct types of topics: general topics, which are high-level and constant across the entire event, and specific topics, which are detailed and relate to the segments of the event. As a result, the distribution of general topics is fixed for a tweet. However, the distribution of specific topics keeps varying with respect to the development of the event. each word in a tweet is associated with a distribution of topics. It can be either sampled from a mixture of specific topics \theta(s), or a mixture of general topics \psi(t) over K topicsdepending on a binary variable c(t) sampled from a binomial distribution. In the first case, \theta(s)is from a referring segment s of the event, where s is chosen according to a categorical distribution s(t). An important property of the categorical distribution s(t) is to allow choosing any segment in the event. This reflects the fact that a person may compose a tweet on topics discussed in a segment that (1) was in the past (2) is currently occurring, or (3) will occur after the tweet is posted (usually when she expects certain topics to be discussed in the event)

10

ET-LDA Model

For more details of the inference, please refer to our paper: http://bit.ly/MBHjyZ

http://bit.ly/MBHjyZ



11

Learning ET-LDA: Gibbs sampling

For more details of the inference, please refer to our paper: http://bit.ly/MBHjyZ

Coupling between a and b makes the posterior computation of latent variables is intractable




12

Experimental Evaluation

Evaluation Plan for ET-LDA• Performance of topic

extraction • Performance of topic

influence prediction• Performance of event

segmentation

Experimental Setup• Tweets for President

Obama’s speech on the Middle East (#MESpeech) & Republican Primary debate in the US (#ReaganDebate), expanded by search snippets

• Event transcripts from New York Times

• Tweets expanded with search snippets for context

Yuheng

by twitter streaming api, hashtags are known before due to the white house and nbc newsexpande the tweet by search engine snippets to augment a tweet's context. in specific, treat each tweet as a query to a search engine.

13

Topics Extraction (#MESpeech)

MESpeech: specific topics are sensitive to the event’s context and keep evolving as the event progresses

Yuheng

ET-LDA segments whole speech into 7 chunksSo, The most left column is segment id, the right most column are top words of the segment, and we manully labeled for the topics identification (e.g. “Arab Spring”) to reflect our interpretation of their meaning from their top words.As we can see, the specific toics correlate with the event quite well from a reading of the transcript. Also, the specifc tweets are sensitive to the event’s content and keep envolving as the event progresses.

14

Examples of segments of (#MESpeech)

• 1st segment

• 2nd segment

Thank you. Thank you. (Applause.) Thank you very much. Thank you. Please, have a seat. Thank you very much. I want to begin by thanking Hillary Clinton, who has traveled so much these last six months that she is approaching a new landmark – one million frequent flyer miles. I count on Hillary every single day, and I believe that she will go down as one of the finest Secretaries of State in our nation's history.

The State Department is a fitting venue to mark a new chapter in American diplomacy. For six months, we have witnessed an extraordinary change taking place in the Middle East and North Africa. Square by square, town by town, country by country, the people have risen up to demand their basic human rights. Two leaders have stepped aside. More may follow. And though these countries may be a great distance from our shores, we know that our own future is bound to this region by the forces of economics and security, by history and by faith. Today, I want to talk about this change -- the forces that are driving it and how we can respond in a way that advances our values and strengthens our security. Now, already, we've done much to shift our foreign policy following a decade defined by two costly conflicts. After years of war in Iraq, we've removed 100,000 American troops and ended our combat mission there. In Afghanistan, we've broken the Taliban's momentum, …

Introduction

Overview of US foreign policy

7 segments

15

Event Segmentation (#MESpeech)

• Participants asked to assess quality of segmentation by ET-LDA and LCSeg (an HMM-based event segmentation tool, trained on 15 states HMM)– Participants: 5 graduate students– Method: questionnaire

• ET-LDA performed consistently better than baselines (lower Pk values)

Pk Prob. that a random pair of words incorrectly separated by segment boundary

Yuheng

P_k value: this measure is the probability that a randomly chosen pair of wordsfrom the event will be incorrectly separated by a hypothesized segment boundary. Therefore, the lower Pk indicates better agreement with the human-annotated segmentation results, i.e., better performance.In practice, we ﬁrst ask four graduate students in our department to annotate the segments of the events based on their transcripts (two for each event) and later ask another graduate student to judge, for one event, which human annotation is better. We pick the better one of each event and treat it as the hypothesizedsegmentation. Then, we compute the Pk value

16

Examples of Specific/General tweets

• ReaganDebate– Specific

– General

Yes, we need to talk about jobs and teachers needing jobs! #Reagandebate

Something the #GOP candidates won't mention about Reagan - Reagan grew the size of the federal government tremendously. #reagandebate

Huntsman said Ronnie!! Take a shot! #GOPDebate #tcot #ReaganDebate

Wow, Ron Paul. Really, you think airlines would give a rip about security? Free market nonsense. #reagandebate

17

Topic Influence Prediction (#MESpeech)

• Prediction of topical influences (whether tweets are strongly/weakly influenced by the event) from the event on the un-seen tweets in our test set (20% of total tweets).

• Baseline: LDA on event and tweets, then measure by JS-divergence, deeming top ones as strongly influenced tweets

• Human study to evaluate the “goodness” of prediction results – (e.g., do you think this tweet

is strongly correlated to this segment of the event?)

The improvements are statistically significant

18

Conclusion

• Motivated joint modeling for event-tweet alignment

• Developed ET-LDA model• Provided evaluations on

two tweet datasets– Demonstrated that ET-LDA

significantly outperformed the traditional models

Thank you! 18

For details: [email protected] Web: http://bit.ly/Mkie7l

Twitter: @hyheng

mailto:[email protected]

http://bit.ly/Mkie7l

http://bit.ly/Mkie7l

yuheng hu (@hyheng) arizona state univ. ajita john avaya labs fei wang ibm t.j watson research...

Documents

eventtweet alignment

entire event

event playbackanalysis

event general tweet

event specific tweet

event tweets words topic

etlda model event tweets

model slide