social filtering. computational journalism week 5

Frontiers of Computational Journalism

Columbia Journalism School

Week 5: Social Filtering

October 9, 2015

User

stories not covered

filtering

x

x

xx

x

x

x

x

x

xx

x

who user chooses to follow = social filtering

Twi>er follower network “We have crawled the entire Twitter site and obtained 41.7 million user profiles, 1.47 billion social relations, 4, 262 trending topics, and 106 million tweets. In its follower-following topology analysis we have found a non-power-law follower distribution, a short effective diameter, and low reciprocity, which all mark a deviation from known characteristics of human social networks”

- Kwak et. al, What is Twitter, a Social Network or a News Media?

More “followings” than followers

Small avg distance between nodes

It’s a news network -‐‑ hubs

It’s a news network

Small number of high-‐‑degree hubs

Different network structure than e.g. Facebook.

Different uses.

why?

-‐‑ Zynep Tufekci, What Happens to #Ferguson Affects Ferguson: Net Neutrality, Algorithmic Filtering and Ferguson

John McDermo>, Why Facebook is for ice buckets, TwiBer is for Ferguson

data from SocialReach, who works with many publishers

-‐‑ Sunita, Why #Ferguson broke out on TwiBer, not Facebook

Information flow on Facebook

Finding sources on social media

Classify Users Classic machine learning problem. Classify each user as one of: •  journalist/blogger •  organization •  ordinary individual First, need to encode as a vector / select features...

Features for user classifier •  # of followers / following •  # of posts, favorites •  percentage of posts that are RTs, @replies, links •  presence/absence of named entities •  topic distribution of tweets (IPTC top level topics)

Digression: IPTC Media Topic Codes International standard hierarchical taxonomy, part of the NewsML markup system. Defined by Reuters, AP, NYTimes...

K-‐‑nearest neighbor classifier

Take K closest training points (in high dimensional feature space), choose majority label.

Creating the training data 1,850 random users 1,532 known organizations 1,490 known journalists and bloggers Hired Mechanical Turk workers to apply labels. Each user labeled by two workers, discarded if disagreement.

Classifier Accuracy

“Eyewitness” classifier Goal is to find individual tweets that are eyewitness reports. Started with LIWC (“linguistic inquiry and word count”) dictionary that classifies English words along 70 different dimensions, including emotion, cognition, time, health...

Word Aspects

Used “perception” category words plus “insight” and “certainty” words

Eyewitness tweet classifier It’s an eyewitness tweet if it contains any of these special words! (or their stems) High precision! Low recall. •  89% of tweets classified as eyewitness actually were. •  But only 32% of eyewitness tweets detected.

Other dimensions Tweet contains URL to photo or video (used table of domain names, e.g. flickr.com = photo) Posted from mobile device (from tweet metadata naming posting app) Geocode user’s stated location (this is painful and unreliable) Distribution of friends’ locations. (Friend = mutual following)

Test user reactions “This gives you context… you have the context for whether or not you think they’re reputable or whether or not they’re worth reaching out to.” “It’s giving me a lot of context which is really useful when you’re trying to verify if someone is reputable or not.” “I would tend to focus on the eyewitnesses and journalists/bloggers. Eventually I’d look at everyone else but I’d want to start my search with those two groups because they would normally provide me with the most information.”

Test user reactions Popular features:

Eyewitness filtering, user location, image/video filter

Unpopular features:

Entity extraction not helpful, no ability to filter by location and eyewitness status, focus on users instead of content

Social Software Basic assumption: structure of software influences how groups use it. or: architecture influences behavior

Three ways to influence behavior Norms: culture, habits, etiquette, the user’s sense of what is “right” or “appropriate” Laws: rules enforced by the administrator Code: what it is actually possible to do

Design problem... What do we want the users to accomplish together? How do we encourage this? We can write the code, but the culture is a separate issue.

social filtering. computational journalism week 5

Documents