characterizing the life cycle of online news stories using social media reactions

28
Characterizing the Life Cycle of Online News Stories Using Social Media Reactions Carlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer Twitter: @ChaToX

Upload: carlos-castillo

Post on 16-Jul-2015

70 views

Category:

Technology


0 download

TRANSCRIPT

Characterizing the Life Cycle of Online News StoriesUsing Social Media ReactionsCarlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer

Twitter: @ChaToX

2

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Outline

• Determining classes of news articles• Predicting traffic using social media

3

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Usage analysis in online news

• Aikat (1998)– Short dwell times, weekday+, weekend-,

bursty traffic.

• Crane and Sornette (2008), Yang and Leskovec (2011), Lehmann et al. (2012)– Behavioral classes of attention online

4

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Analysis of social media responses

• SocialFlow whitepaper (Lotan, Gaffney, and Meyer 2011)– Al Jazeera, BBC News, CNN, The Economist,

Fox News and The New York Times

• Hu et al. (2011)– Tweets during speech of US president

5

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Predictive Web Analytics (references)

6

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Data collection

• Three weeks in October 2012• “Beacon” embedded in Al Jazeera pages

– Real-time data processing– Apache S4 application for online processing– Cassandra (NoSQL database) for storage

≈ 3M visits

≈ 200K social media reactions

7

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Summary of dataset

8

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

News In-Depth

Examples:• US state of Maryland

abolishes death penalty (May 2nd, 2013)

• Hundreds arrested in China over 'fake' meat (May 3rd, 2013)

Examples:• Spirits of Japan shrine

haunt Asian relations (May 2nd, 2013)

• Interactive: Powering the Gulf (May 2nd, 2013)

9

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

News (322) In-Depth (139)

Tag clouds extracted from titles of articles

Average News profile

Average In-Depth profile

In-Depth items have a slower growth

In-Depth items have a longer shelf-life

In-Depth items are shared on FacebookNews items are shared on Twitter

15

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Typical visitation profiles (12 hours)

Decreasing (78%)

Steady (9%)

Increasing (3%)

Rebounding (10%)

Examples

Decreasing (78%):

● Almost all breaking news

● Sometimes delayed due to timezone differences, e.g. Hurricane Sandy

Steady or Increasing (12%):

● Ongoing news: Obama/Romney, Worker strikes in SA, Syrian unrest

● Articles updated with supporting content

Rebounding (10%):

● Articles picked up by external sources or social media (typically single source of traffic)

● Background articles to new developments

17

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Prediction of visits

• Short-term traffic is to a large extent correlated with long-term traffic

• Social media signals are correlated with traffic and shelf-life

More reactions → more trafficMore discussion → longer shelf-life

• Can we predict 7 days after 30 minutes?

18

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Predicting traffic and shelf-life online has a long history

• Predicting long-term behavior and half-life from short-term observations– Observations = comments, visits, votes, …– Behavior = total comments, total visits, …– 10+ papers specifically on web traffic

• Bit.ly (2011, 2012)– Studies half-life per topic and platform

Results (traffic predictions)

Results (traffic predictions)

Extrapolate visitsNews are more predictable than In-Depth

Results (traffic predictions)

Improved predictionsUsing social media variables

22

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Selected variables, traffic prediction

Results (shelf-life prediction)

Larger improvements for In-Depth articles

Still, this is a 12 hours error in predicting something with an average of 48-72 hours

24

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

http://fast.qcri.org/

25

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

What did we learn?

• Decrease, Stay or Increase. Rebound– Roughly 80:10:10 ratio

• News vs In-Depth: different behavior• Social media signals are useful to

understand and predict visits

26

Carlos Castillo – @chatoxhttp://www.chato.cl/research/

Invitation:ECML/PKDD Discovery Challenge 2014

• Open competition on predictive Web Analytics

• Data provided by Chartbeat Inc.

Thank you!Carlos Castillo · [email protected]

http://www.chato.cl/research/