emerging topic detection on twitter based on temporal and social terms evaluation
TRANSCRIPT
Emerging Topic Detection on Twitter based on Temporal and Social Terms EvaluationKDD 2010 Workshop on Multimedia Data Mining
Chin Hui Chen (陳晉暉 )
Author
• Mario Cataldi Università di Torino, Torino, Italy• Luigi Di Caro Università di Torino, Torino, Italy• Claudio Schifanella Università di Torino, Torino, Italy
Agenda
• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
• Experiments and Evaluation
Introduction
• Twitter.com• 75 million users on December 2009.• 6.2 million new accounts/per month (2-3 per second)
• People post tweets for …• Daily chatter • Conversations• Sharing information/URLs• Reporting news
Introduction (con’t)
• One of the founders of Twitter.com …
• A low level information news flashes portal.
Introduction (con’t)
• Target : Extract the emerging topics.• Process : • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
Agenda
• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
• Experiments and Evaluation
Step 1: Content Extraction
• Target : Tweets => Vector• t-th considered interval :
• Each tweet => tweet vector
Content Extraction (con’t)
where , = vocabulary size.
where , is the term freq value of the x-th vocab terms in j-th tweet, and returns the highest term freq value of the j-th tweet.
Step 2: User Authority
• Target : Which User is Important ?
• Define an author-based graph G(U,F) , where U is the set of users and F is the set of directed edges.
follower
User Authority (con’t)
User Authority (con’t)
• Compute Authority • => PageRank
User Authority (con’t)
Step 3: Content Aging Theory
• Target : Find Emerging Term.
• An Emerging keyword can be viewed as a semantic unit which links to a very recent news event.
• Chien Chin Chen, Yao-Tsung Chen, Yeali S. Sun, Meng Chang Chen: Life Cycle Modeling of News Events Using Aging Theory. ECML 2003
• See each term as a living organism:• With nourishment => life cycle is prolonged. => high energy• Without nourishment => die => low energy
Content Aging Theory (con’t)
• Term with high energy => important currently• Term with low energy => out of favor
• So, we need to know how to compute Nutrition and Energy.• Content Nutrition• Content Energy
Content Aging Theory (con’t) – Content Nutrition• Each food brings a different calory contribution depending on
its ingredients.• Different tweets containing the same keyword generate
different amount of nutrition.• Define the amount of nutrition :
Content Aging Theory (con’t) – Content Energy• Now we obtained the nutrition of a semantic unit => map into
energy => effective contribution (how much it is emergent).
• Hot Terms :
• Emergent Terms :
Content Aging Theory (con’t) – Content Energy
Content Aging Theory (con’t) – Content Energy• Define s = number of previous time slots.
Step 4: Selection of Emerging Terms• Target : How to select emerging keywords.• 1. Supervised
• ( )• 2. Unsupervised• Dynamically sets the critical drop• CoSeNa: a Context-based Search and Navigation System
Step 5: From Emerging Terms to Emerging Topics
• Target : Find Emerging Topics!
• Define topic as a minimal set of a terms semantically related to an emerging keyword.
• “victory”• Nov 2008 : “elections”, “Obama”, “USA” • Feb 2010 : “football”, “superbowl”, “New Orleans Saints”
• Method : co-occurrences
From Emerging Terms to Emerging Topics• 1. Generate Correlation Vector
• a. the keyword k as query.• b. the set of tweets containing k as relevance feedback.• c. relying on probabilistic feedback mechanism.
From Emerging Terms to Emerging Topics• 2. Construct Topic Graph
Keyword-based topic graph :
Thinning.
From Emerging Terms to Emerging Topics• 3. Topic Detection and Ranking
From Emerging Terms to Emerging Topics• Find SCC (Strongly Connect Component) :
• Emerging Topic as a subgraph representing a set of keywords semantically related to term z within the time interval.
Use DFS.
From Emerging Terms to Emerging Topics• Ranking
•
From Emerging Terms to Emerging Topics
Experiments and Evaluation
• Dataset : • 15 days (between 13th and 28th of April 2010)• More than 3 millions of tweets ( 10k/hr )• More then 300k different keywords
Real Case Study
• Set r = 15 mins , time slot s = 200. (2 solar days)• Result :
History Worthiness• Analyze two diff number of considered slots, s=100 and s=200.
History Worthiness (con’t)• “morning” => periodic events
History Worthiness (con’t)
• Life status of a keyword depends => number of time intervals.• Temporal relevance of the retrieved
topics. (Relevance是跟時間有關 )
Conclusion
• 1. Formalized the Keyword Life Cycle.• (now => frequently , past => rare)• 2. Study the Social Relationships.• 3. Formalized the Keyword-based Topic
Graph.
Appendix
• Twitter Search• Google Real Time Search