microblogs: information and social network huang yuxin
TRANSCRIPT
Microblogs: Information and Social Network
Huang Yuxin
Millions of users in Microblogs
• By July 2009, Twitter has attracted 41 million users.
• By March 2011, size of Twitter has grown to 175 million.
• The registered id in Sina Microblog has reached 100 million by March 2011
People can publish posts and share information on Microblogs
Social network in Microblogs
What information can we extract from Microblogs
• Plain Text• User reference (1/2 posts)• Hashtag (1/9 posts)• Retweet• Emoticons• Shortened URL (resource) (1/2 posts)• Time• Users’ Geology info
Basic features from text of twitterTiny URL
Users Post Time
Emoticons
Hashtag
Mention (User reference)
What is Twitter(WWW 2010)
• People who are moreactive tend to have morefollowers• The case is different forpeople with very highpopularity.(Because theyare celebrities)
Small World
• Average Path length of Twitter: 4.12
Reciprocity?
(Whole dataset)• 77.9% of user pairs with
any link between them are connected one-way.
• And 67.6% of users are not followed by any of their followings.
• The rate of reciprocity is higher in Asian countries than America.
• (www 2010)
(Part of active users• 72.4% of the users in Twitter
follow more than 80% of their followers
• 80.5% of users have 80% of users they are following follow them back
• (wsdm 2010)The difference of conclusion
between these two papers is caused by different data extraction method
Celebrities And Popular
Topics
Users’ participation in topics
• A topic can only attract certain group of users
Content types on twitter
• Daily Chatter• Conversations• Sharing Information• Reporting and Spreading News
Understanding following Behavior----a statistics made in a paper
• Why we follow: professional interest, technology, tone of presentation, keeping up with friends
• Why we unfollow: Too many posts in general, too much status/personal info, spam, duplicative posts.
Interesting Research Topics on Twitter
• Vertical Search on Twitter (partial indexing + time sensitive information retrieval)
• Static Topic Detection (topic model)• Burst Event Detection (topic specific)• Topic Biased Expert Recommendation (graph
feature+ activeness+ textual feature)• Cascading Feature Analysis (Network structure
+ topic spreading behavior on different topics)
Related Works
People I need to follow vs. Content I need to know
TWEET Listen
People I need to follow vs. Content I need to know
• An active publisher may has interest in many topics
• My page is always filled with non-valuable latest chatting
• I may only need to subscribe certain topics of an author
• Can we automatically classify one’s content and filter out irrelevant ones?
Topics spreads through network
EARTHQUAKE
EARTHQUAKE
EARTHQUAKE
EARTHQUAKE
Detecting hot Topics with community
• keywords temporal feature• Hot topics are biased to a group of users, or a
certain time period
• Retweet Trees, Social Networks accompanied with users’ expertise can all participate in the model training
Topic Model with network regularization(WWW 08)
21
e.g. coauthor network
Document d
k
12
O(C,G)=L(C)+ R(G,C)
keyword list
?????
Rumors have attracted much attention
Intuitions
• Rumors spread furiously and cause hot discussion
• Rumors tends to be controversial (people spreading it and people against it)
• The source of Rumor (celebrities? Nobody?)• Maybe a study of the spreading of particular
rumor is interesting.• Celebrities will clarify the truth?
Challenges
• How to differentiate rumors with personal view
• Most of the comments are subjective (expression of feelings)
• Most of the comments are subjective
Rumors vs. meaningless Topics
Suggestions and ideas are really Welcome