02 the web goes social

33
The Web Goes Social: Blogosphere and Twittersphere March 24, 2011 A Look into the Science of Web Retrieval Multimedia University, Invited Talk Presenter: Younus, Arjumand

Upload: arjumand-younus

Post on 26-Mar-2015

17 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 02 The Web Goes Social

The Web Goes Social:

Blogosphere and Twittersphere

March 24, 2011

A Look into the Science of Web Retrieval

Multimedia University, Invited Talk

Presenter: Younus, Arjumand

Page 2: 02 The Web Goes Social

Contents

Introduction

The Changing Role of Today’s Web Web as Media: Social Media

Role of Search Engines in the Social Web

A Blogosphere Case-Study

A Twittersphere Case-Study

March 24, 2011

Page 3: 02 The Web Goes Social

Quick Survey

Do you have a Facebook, MySpace, Twitter, or LinkedIn account?

Do you own a blog?

Do you read blogs?

Have you ever searched for something on Wikipedia?

Have you ever submitted content to a social network?

March 24, 2011

Page 4: 02 The Web Goes Social

Web 1.0 vs. Web 2.0

March 24, 2011

Borrowed from SIGKDD 2008 tutorial slides of Professor Huan Liu and Professor Nitin Agarwal with permission

Page 5: 02 The Web Goes Social

What is so Different about Web 2.0?

User Generated Content

Collaborative Environment: Participatory Web, Citizen Journalism

User is the Driving Factor

March 24, 2011

A Paradigm Shift rather than a Technology Shift

Page 6: 02 The Web Goes Social

Top 20 Most Visited Web Sites

Internet traffic report by Alexa on July 29th 2008

March 24, 2011

1 Yahoo! 11 Orkut

2 Google 12 RapidShare

3 YouTube 13 Baidu.com

4 Windows Live 14 Microsoft Corporation

5 Microsoft Network 15 Google India

6 Myspace 16 Google Germany

7 Wikipedia 17 QQ.Com

8 Facebook 18 EBay

9 Blogger 19 Hi5

10 Yahoo! Japan 20 Google France

Borrowed from SIGKDD 2008 tutorial slides of Professor Huan Liu and Professor Nitin Agarwal with permission

Page 7: 02 The Web Goes Social

Role of Today’s Web

March 24, 2011

Marketing Tool

Information Finding Tool

Media Tool

Page 8: 02 The Web Goes Social

New Dimensions in Search with The Social Web

Information Overload Search engines don’t always hold answers that users are looking for

Smart Search (CNN Money) “The Web, they say, is leaving the era of search and entering one of

discovery. What’s the difference? Search is what you do when you’re looking for something. Discovery is when something wonderful that you didn’t know existed, or didn’t know how to ask for, finds you.”

March 24, 2011

What does that mean for search engines? Will they be left behind?

Page 9: 02 The Web Goes Social

Role of Today’s Web

March 24, 2011

Marketing Tool

Information Finding Tool

Media Tool

Page 10: 02 The Web Goes Social

Research Issues in the Blogosphere

Understanding of the structure and properties of blogosphere [GLM+09] [CZS+07]

Community extraction from the blogosphere through an understanding of relationships between bloggers, readers, blog posts, comments, and different sites in the blogosphere [CZS+07] [YSK+09]

Blog clustering (particularly relevant for blog search engines) [QYS+10] [AGL+10]

Trend analysis through event detection in the blogosphere [LJS+10]

Blog mining for influence analysis and opinion mining [MGL09]

March 24, 2011

Page 11: 02 The Web Goes Social

Research Issues in the Twittersphere

Study of information diffusion [RMK11]

Influence analysis [KLP+10]

Sentiment analysis and opinion mining [OBR+10]

Event detection through identification of breaking news [SOM10]

Study of unfollow phenomenon [KGN11]

March 24, 2011

Page 12: 02 The Web Goes Social

Characteristics of Blog Search and Microblog Search [MR06] [TRM11]

Blog Search [MR06]

Tracking references to named entities

Locating blogs by theme

Engaged in technology, entertainment and politics with a particular interest in current events

Microblog Search [TRM11]

Temporal nature

Locating people using specialized syntax

Repetitive queries which change very little

March 24, 2011

Page 13: 02 The Web Goes Social

Social Search [HK10]

From “library” paradigm of search to “village” paradigm of search Trust in Web search based on “authority”

Trust in Social search based on “intimacy”

Key Characteristics Communities of users actively participating in the search process

Users interact with the system

Users interact with other users either implicitly or explicitly

March 24, 2011

Page 14: 02 The Web Goes Social

Enhancing Search using Social Network Features

Recency Crawling and Ranking Identification of Hot Topics on Social Web [YQG+11]

News in the Making

Real-Time Search

March 24, 2011

Wael Ghonim’s tweets shown on Google during Egypt uprising.

Page 15: 02 The Web Goes Social

March 24, 2011

Blogosphere Case-Study

Page 16: 02 The Web Goes Social

Blogosphere Clustering: Problem Definition

Given the blogosphere with blogs containing diverse information on a broad range of topics: Find the cluster of blogs to read that have interest in some particular topic.

Which blog holds the greatest influence for the particular topic?

March 24, 2011

Page 17: 02 The Web Goes Social

Blog Clustering Approach

Blog considered along three dimensions: Part of speech

Occurrence

Blog post no

March 24, 2011

Page 18: 02 The Web Goes Social

Topic Discussion Isolation Rank

Metric used to discover the topic clusters Based on set of given topic words and some linguistic rules

We define the TDIR score of a blog as follows:

nnoun, nadjective and nadverb is respectively the number of times a noun, adjective or adverb for a specific topic are found in all the blog posts

wn, wadj and wadv are respective weights assigned to the noun, adjective and adverb for a specific topic

March 24, 2011

Page 19: 02 The Web Goes Social

Topic Discussion Rank

Metric used to rank the blogs within a topic cluster Based on hyperlinked social network of blogs and blog post contents

We define the TDR score of a blog as follows:

Matching_Outlinks represent blogs that are part of topic cluster

o : (o,b) – outlinks from blog b

damp is the damping factor

March 24, 2011

Page 20: 02 The Web Goes Social

Role of Damping Factor

Assume TDIR of blog A is 2 and TDIR of blog B is 1

TDR without damping factor A: 2 + (1/1 x 1) = 3

B: 1 + (1/1 x 2) = 3

TDR with damping factor A: 2 + (1/1 x 1 x 0.9) = 2.9

B: 1 + (1/1 x 2 x 0.9) = 2.8

March 24, 2011

Page 21: 02 The Web Goes Social

Performance Evalution

Experimental data Real blog data collected during crawling of blogspot domain

102 blog sites comprising of 50,471 blog posts

Experimental topics “compute”, “democracy”, “secularism”, “bioinformatics”, “Haiti”, “Obama”

Experimental Measures Precision

Recall

March 24, 2011

Page 22: 02 The Web Goes Social

Experimental Results - Precision

March 24, 2011

Average precision found to be 0.87

Page 23: 02 The Web Goes Social

Experimental Results - Recall

March 24, 2011

Average recall found to be 0.971

Page 24: 02 The Web Goes Social

March 24, 2011

Twittersphere Case-Study

Page 25: 02 The Web Goes Social

Studying Ins and Outs of News

Using Twitter to study hot news items people are heavily tweeting about

March 24, 2011

Page 26: 02 The Web Goes Social

Algorithm for Identification of Popular News

March 24, 2011

1. Crawl daily news data and send the unranked news articles list to the UI module of the system.

2. Extract news title, news summary and news text for each news article per day. 3. From the news summary extract named entities per news article through a named

entity recognition approach.4. Match each named entity across entities in the common entity corpus and tag each

named entity per news article as common or uncommon.5. Use Boolean query model to compose the query per news article:

a. Use AND predicate with each common entity and OR predicate with each uncommon entity. If all entities for a particular news article are common use news title for the query construction.

6. For all articles per day:a. Send a request to Twitter Search API and extract result tweets per news article

per dayb. For each t in result tweet:

i. Use t’s metadata to find following and follower statistics for each unique Twitterer who has tweeted about the news article

ii. Calculate rank of each article using the ranking function 7. Send the ranked news articles list to the UI module of the system

Page 27: 02 The Web Goes Social

Application Prototype

March 24, 2011

Page 28: 02 The Web Goes Social

Observations (1/3)

March 24, 2011

Percentage of news in tweets per day greater than 50% for all days except one

day

Page 29: 02 The Web Goes Social

Observations (2/3)

March 24, 2011

Highest Number of Recorded Tweets per Day

Page 30: 02 The Web Goes Social

Observations (3/3)

March 24, 2011

DATE EVENT

17th Oct. Karachi violence (local)

18th Oct. Hopes fade for trapped Chinese miners (international)

19th Oct. Lakki Marwat suicide attack attempt foiled (local)

20th Oct. Rebels raid parliament in Grozny; 7 dead (international)

21st Oct. Obama to visit next year (local)

22nd Oct. Nuclear plant completes decade of good performance (local)

23rd Oct. Ghazi shrine suicide bomber back home (local)

24th Oct. WikiLeaks makes fresh claim about Iraq deaths (international)

25th Oct. Court orders Iraqi parliament back to work (international)

26th Oct. SCBA presidential election (local)

Page 31: 02 The Web Goes Social

References

[AGL+10] Agarwal, N., Galan, M., Liu, H., and Subramanya, S., "WisColl: Collective Wisdom based Blog Clustering." In Journal Information Sciences, Special Issue on Collective Intelligence, Vol. 180, Issue 1, Jan. 2010.

[GLM+09] Gotz, M., Leskovec, J., McGlohon, M., and Faloutsos, M., A., “Modeling Blog Dynamics.” In Proc. 3rd Internatonal Conference on Weblogs and Social Media (ICWSM 2009), San Jose, California, United States, May 2009.

[CZS+07] Chi, Y., Zhu, S., Song, X., Tatemura, J., and Tseng, B.L., "Structural and Temporal Analysis of the Blogosphere through Community Factorization." In Proc. 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '07), San Jose, California, United States, Aug. 2007.

[MR06] Mishne, D., and de Rijke, M., A., “A Study of Blog Search.” In Proc. 28th European Conf. on Information Retrieval (ECIR 2006), London, United Kingdom, Apr. 2006.

[TRM11] Teevan, J., Ramage, D., and Morris, M. R., “#TwitterSearch: A Comparison of Microblog Search and Web Search.” In Proc. 4th Int’l Conf. on Web Search and Data Mining (WSDM 2011), Hong Kong, China, Feb. 2011.

[HK10] Horowitz D. and Kamvar, S. D., “The Anatomy of a Large-Scale Social Search Engine.” In Proc. 19th Int’l Conf. on World Wide Web (WWW 2010), Raleigh, USA, Apr.. 2010.[KGN11] Kivran-Swaine, F., Govondan, P., and Naaman, M., “The Impact of Network Structure on Breaking Ties in Online Social Networks: Unfollowing on Twitter.” In Proc. ACM SIGCHI Conf. on Human Factors in Computing Systems (SIGCHI’11), Vancouver, Canada, May 2011.[KLP+10] Kwak, H., Lee, C., Park, H., and Moon, S. What is Twitter, a social network or a news media? In Proc. WWW 2010, ACM Press (2010), 591-600.[LJS+10] Lee, Y., Jung, H., Song, W., and Lee, J.H., “Mining the Blogosphere for Top News Stories Identification.” In Proc. 33rd ACM SIGIR Conf. on Research and Development in Information Retrieval (SIGIR’10), Geneva, Switzerland, Apr.. 2010.[MGL09] Melville, P., Gryc, W., and Lawrence, R.D., “Sentiment Analysis of Blogs by Combining Lexical Knowledge with Text Classification.” In Proc. 15th ACM SIGKDD Int’l. Conf. on Knowledge Discovery and Data Mining (SIGKDD’09), Paris, France, June 2009.[OBR+10] O’Connor, B., Balasubramanyan, R., Routledge, B.R., and Smith, N.A. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. In Proceedings of the International AAAI Conference on Weblogs and Social Media (Washington DC, USA, May, 2010).March 24, 2011

Page 32: 02 The Web Goes Social

[email protected]

March 24, 2011

THANK YOU VERY MUCH!

Page 33: 02 The Web Goes Social

References

[QYS+10] Qureshi M.A., Younus, A., Saeed, M., and Touheed, N. T.., “Identifying and Ranking Topic Clusters in the Blogosphere.” In Proc. 20th Int’l Conf. on World Wide Web (WWW 2011), Hyderabad, India, Mar.. 2011..[RMK11] Romero,D ., Meeder, B., and Kleinberg, J., “Differences in the Mechanics of Information Diffusion Across Topics: Idioms, Political Hashtags and Complex Contagion on Twitter.” In Proc. COLING Workshop on People’s Web Meets NLP 2010, Beijing, China, Aug. 2010.[SOM10] Sakaki, T., Okazaki, M., and Matsuo, Y., “Earthquake shakes Twitter users: real-time event detection by social sensors.” In Proc. WWW 2010, ACM Press (2010), 851-860.[YQG+10] Younus, A., Qureshi M.A., Ghazi, A.N.., Mumtaz, S., Saeed, M., Touheed, N. T.., and Qureshi, M.S. ,“Ins and Outs of News: Twitter as a Real-Time News Analysis Service.” In Proc. IUI Workshop on Visual Interfcacs to the Social and Semantic Web (VISSW ’11), Stanford University, California, USA, Feb. 2011.[YSK+09] Yoon, S.H., Shin, J.H., Kim, S.W., and Park, S., "Extraction of a Latent Blog Community based on Subject. ” In Proc. 18th ACM Conference on Information and Knowledge Management (CIKM '09), Hong Kng, China, Nov. 2009.

March 24, 2011