emerging topic detection on twitter based on temporal and social terms evaluation
Post on 15-Apr-2017
206 Views
Preview:
TRANSCRIPT
Emerging Topic Detection on Twitter based on Temporal and Social Terms EvaluationKDD 2010 Workshop on Multimedia Data Mining
Chin Hui Chen (陳晉暉 )
Author
• Mario Cataldi Università di Torino, Torino, Italy• Luigi Di Caro Università di Torino, Torino, Italy• Claudio Schifanella Università di Torino, Torino, Italy
Agenda
• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
• Experiments and Evaluation
Introduction
• Twitter.com• 75 million users on December 2009.• 6.2 million new accounts/per month (2-3 per second)
• People post tweets for …• Daily chatter • Conversations• Sharing information/URLs• Reporting news
Introduction (con’t)
• One of the founders of Twitter.com …
• A low level information news flashes portal.
Introduction (con’t)
• Target : Extract the emerging topics.• Process : • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
Agenda
• Introduction• The Main Steps • Content Extraction• User Authority• Content Aging Theory• Selection of Emerging Terms• From Emerging Terms to Emerging Topics
• Experiments and Evaluation
Step 1: Content Extraction
• Target : Tweets => Vector• t-th considered interval :
• Each tweet => tweet vector
Content Extraction (con’t)
where , = vocabulary size.
where , is the term freq value of the x-th vocab terms in j-th tweet, and returns the highest term freq value of the j-th tweet.
Step 2: User Authority
• Target : Which User is Important ?
• Define an author-based graph G(U,F) , where U is the set of users and F is the set of directed edges.
follower
User Authority (con’t)
User Authority (con’t)
• Compute Authority • => PageRank
User Authority (con’t)
Step 3: Content Aging Theory
• Target : Find Emerging Term.
• An Emerging keyword can be viewed as a semantic unit which links to a very recent news event.
• Chien Chin Chen, Yao-Tsung Chen, Yeali S. Sun, Meng Chang Chen: Life Cycle Modeling of News Events Using Aging Theory. ECML 2003
• See each term as a living organism:• With nourishment => life cycle is prolonged. => high energy• Without nourishment => die => low energy
Content Aging Theory (con’t)
• Term with high energy => important currently• Term with low energy => out of favor
• So, we need to know how to compute Nutrition and Energy.• Content Nutrition• Content Energy
Content Aging Theory (con’t) – Content Nutrition• Each food brings a different calory contribution depending on
its ingredients.• Different tweets containing the same keyword generate
different amount of nutrition.• Define the amount of nutrition :
Content Aging Theory (con’t) – Content Energy• Now we obtained the nutrition of a semantic unit => map into
energy => effective contribution (how much it is emergent).
• Hot Terms :
• Emergent Terms :
Content Aging Theory (con’t) – Content Energy
Content Aging Theory (con’t) – Content Energy• Define s = number of previous time slots.
Step 4: Selection of Emerging Terms• Target : How to select emerging keywords.• 1. Supervised
• ( )• 2. Unsupervised• Dynamically sets the critical drop• CoSeNa: a Context-based Search and Navigation System
Step 5: From Emerging Terms to Emerging Topics
• Target : Find Emerging Topics!
• Define topic as a minimal set of a terms semantically related to an emerging keyword.
• “victory”• Nov 2008 : “elections”, “Obama”, “USA” • Feb 2010 : “football”, “superbowl”, “New Orleans Saints”
• Method : co-occurrences
From Emerging Terms to Emerging Topics• 1. Generate Correlation Vector
• a. the keyword k as query.• b. the set of tweets containing k as relevance feedback.• c. relying on probabilistic feedback mechanism.
From Emerging Terms to Emerging Topics• 2. Construct Topic Graph
Keyword-based topic graph :
Thinning.
From Emerging Terms to Emerging Topics• 3. Topic Detection and Ranking
From Emerging Terms to Emerging Topics• Find SCC (Strongly Connect Component) :
• Emerging Topic as a subgraph representing a set of keywords semantically related to term z within the time interval.
Use DFS.
From Emerging Terms to Emerging Topics• Ranking
•
From Emerging Terms to Emerging Topics
Experiments and Evaluation
• Dataset : • 15 days (between 13th and 28th of April 2010)• More than 3 millions of tweets ( 10k/hr )• More then 300k different keywords
Real Case Study
• Set r = 15 mins , time slot s = 200. (2 solar days)• Result :
History Worthiness• Analyze two diff number of considered slots, s=100 and s=200.
History Worthiness (con’t)• “morning” => periodic events
History Worthiness (con’t)
• Life status of a keyword depends => number of time intervals.• Temporal relevance of the retrieved
topics. (Relevance是跟時間有關 )
Conclusion
• 1. Formalized the Keyword Life Cycle.• (now => frequently , past => rare)• 2. Study the Social Relationships.• 3. Formalized the Keyword-based Topic
Graph.
Appendix
• Twitter Search• Google Real Time Search
top related