social filtering. computational journalism week 5

36
Frontiers of Computational Journalism Columbia Journalism School Week 5: Social Filtering October 9, 2015

Upload: jonathan-stray

Post on 05-Dec-2015

11 views

Category:

Documents


2 download

DESCRIPTION

Jonathan Stray, Columbia University, Fall 2015Syllabus at http://www.compjournalism.com/?p=133

TRANSCRIPT

Page 1: Social Filtering. Computational Journalism week 5

Frontiers  of  Computational  Journalism

Columbia Journalism School

Week 5: Social Filtering

October 9, 2015

Page 2: Social Filtering. Computational Journalism week 5

User

Page 3: Social Filtering. Computational Journalism week 5

User

stories  not  covered

filtering

x

x

xx

x

x

x

Page 4: Social Filtering. Computational Journalism week 5
Page 5: Social Filtering. Computational Journalism week 5
Page 6: Social Filtering. Computational Journalism week 5

x

x

xx

x

who  user  chooses  to  follow  =   social  filtering

Page 7: Social Filtering. Computational Journalism week 5

Twi>er  follower  network “We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4, 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks”

- Kwak et. al, What is Twitter, a Social Network or a News Media?

Page 8: Social Filtering. Computational Journalism week 5

More  “followings”  than  followers

Page 9: Social Filtering. Computational Journalism week 5

Small  avg  distance  between  nodes

Page 10: Social Filtering. Computational Journalism week 5

It’s  a  news  network  -­‐‑  hubs

Page 11: Social Filtering. Computational Journalism week 5

It’s  a  news  network  

Small  number  of  high-­‐‑degree  hubs    

Different  network  structure  than  e.g.  Facebook.    

Different  uses.    

why?

Page 12: Social Filtering. Computational Journalism week 5
Page 13: Social Filtering. Computational Journalism week 5
Page 14: Social Filtering. Computational Journalism week 5
Page 15: Social Filtering. Computational Journalism week 5

-­‐‑  Zynep  Tufekci,  What  Happens  to  #Ferguson  Affects  Ferguson: Net  Neutrality,  Algorithmic  Filtering  and  Ferguson

Page 16: Social Filtering. Computational Journalism week 5

John  McDermo>,  Why  Facebook  is  for  ice  buckets,  TwiBer  is  for  Ferguson

data  from SocialReach, who  works  with  many  publishers

Page 17: Social Filtering. Computational Journalism week 5

-­‐‑  Sunita,  Why  #Ferguson  broke  out  on  TwiBer,  not  Facebook

Page 18: Social Filtering. Computational Journalism week 5

Information  flow  on  Facebook

Page 19: Social Filtering. Computational Journalism week 5

Finding  sources  on  social  media

Page 20: Social Filtering. Computational Journalism week 5
Page 21: Social Filtering. Computational Journalism week 5

Classify  Users Classic machine learning problem. Classify each user as one of: •  journalist/blogger •  organization •  ordinary individual First, need to encode as a vector / select features...

Page 22: Social Filtering. Computational Journalism week 5

Features  for  user  classifier •  # of followers / following •  # of posts, favorites •  percentage of posts that are RTs, @replies, links •  presence/absence of named entities •  topic distribution of tweets (IPTC top level topics)

Page 23: Social Filtering. Computational Journalism week 5

Digression:  IPTC  Media  Topic  Codes International standard hierarchical taxonomy, part of the NewsML markup system. Defined by Reuters, AP, NYTimes...

Page 24: Social Filtering. Computational Journalism week 5

K-­‐‑nearest  neighbor  classifier

Take K closest training points (in high dimensional feature space), choose majority label.

Page 25: Social Filtering. Computational Journalism week 5

Creating  the  training  data 1,850 random users 1,532 known organizations 1,490 known journalists and bloggers Hired Mechanical Turk workers to apply labels. Each user labeled by two workers, discarded if disagreement.

Page 26: Social Filtering. Computational Journalism week 5

Classifier  Accuracy

Page 27: Social Filtering. Computational Journalism week 5

“Eyewitness”  classifier Goal is to find individual tweets that are eyewitness reports. Started with LIWC (“linguistic inquiry and word count”) dictionary that classifies English words along 70 different dimensions, including emotion, cognition, time, health...

Page 28: Social Filtering. Computational Journalism week 5

Word  Aspects

Used “perception” category words plus “insight” and “certainty” words

Page 29: Social Filtering. Computational Journalism week 5

Eyewitness  tweet  classifier It’s an eyewitness tweet if it contains any of these special words! (or their stems) High precision! Low recall. •  89% of tweets classified as eyewitness actually were. •  But only 32% of eyewitness tweets detected.

Page 30: Social Filtering. Computational Journalism week 5

Other  dimensions Tweet contains URL to photo or video (used table of domain names, e.g. flickr.com = photo) Posted from mobile device (from tweet metadata naming posting app) Geocode user’s stated location (this is painful and unreliable) Distribution of friends’ locations. (Friend = mutual following)

Page 31: Social Filtering. Computational Journalism week 5
Page 32: Social Filtering. Computational Journalism week 5

Test  user  reactions “This gives you context… you have the context for whether or not you think they’re reputable or whether or not they’re worth reaching out to.” “It’s giving me a lot of context which is really useful when you’re trying to verify if someone is reputable or not.” “I would tend to focus on the eyewitnesses and journalists/bloggers. Eventually I’d look at everyone else but I’d want to start my search with those two groups because they would normally provide me with the most information.”

Page 33: Social Filtering. Computational Journalism week 5

Test  user  reactions Popular features:

Eyewitness filtering, user location, image/video filter

Unpopular features:

Entity extraction not helpful, no ability to filter by location and eyewitness status, focus on users instead of content

Page 34: Social Filtering. Computational Journalism week 5

Social  Software Basic assumption: structure of software influences how groups use it. or: architecture influences behavior

Page 35: Social Filtering. Computational Journalism week 5

Three  ways  to  influence  behavior Norms: culture, habits, etiquette, the user’s sense of what is “right” or “appropriate” Laws: rules enforced by the administrator Code: what it is actually possible to do

Page 36: Social Filtering. Computational Journalism week 5

Design  problem... What do we want the users to accomplish together? How do we encourage this? We can write the code, but the culture is a separate issue.