inferring user political preferences from streaming communications

26
Inferring User Political Preferences from Streaming Communications Svitlana Volkova 1 , Glen Coppersmith 2 and Benjamin Van Durme 1,2 1 Center for Language and Speech Processing 2 Human Language Technology Center of Excellence ACL 2014, Baltimore

Upload: truman

Post on 24-Feb-2016

34 views

Category:

Documents


0 download

DESCRIPTION

Inferring User Political Preferences from Streaming Communications. Svitlana Volkova 1 , Glen Coppersmith 2 and Benjamin Van Durme 1,2. 1 Center for Language and Speech Processing 2 Human Language Technology Center of Excellence. ACL 2014, Baltimore. Motivation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Inferring User Political Preferences from Streaming Communications

Inferring User Political Preferences from Streaming

Communications

Svitlana Volkova1, Glen Coppersmith2

and Benjamin Van Durme1,2

1Center for Language and Speech Processing2Human Language Technology Center of

ExcellenceACL 2014, Baltimore

Page 2: Inferring User Political Preferences from Streaming Communications

Motivation• Personalized, diverse and timely data • Can reveal user interests, preferences and

opinions

DemographicsPro – http://www.demographicspro.com/WolphralAlpha Analytics – http://www.wolframalpha.com/facebook/

Page 3: Inferring User Political Preferences from Streaming Communications

Applications• Large-scale passive

polling and real-time live polling

• Online advertising • Healthcare

analytics• Personalized

recommendation systems and search

Page 4: Inferring User Political Preferences from Streaming Communications

User Attribute Prediction

Political PreferenceRao et al., 2010; Conover et al.,

2011, Pennacchiotti and Popescu, 2011; Zamal et al.,

2012; Cohen and Ruths, 2013

.

.

.

Communications

GenderGarera and Yarowsky, 2009;

Rao et al., 2010; Burger et al., 2011; Van Durme, 2012; Zamal et al., 2012; Bergsma and Van

Durme, 2013

AgeRao et al., 2010; Zamal et al., 2012; Cohen and Ruth, 2013;

Nguyen et al., 2011, 2013

Page 5: Inferring User Political Preferences from Streaming Communications

Existing Approaches ~1K Tweets*

….…….…….…….…….…….…….…….…

Does an average Twitter user produce thousands of tweets?

*Rao et al., 2010; Conover et al., 2011; Pennacchiotti and Popescu, 2011a; Burger et al., 2011; Zamal et al., 2012; Nguyen et al., 2013

Tweets as a

document

Page 6: Inferring User Political Preferences from Streaming Communications

How Active are Twitter Users?

http://www.digitalbuzzblog.com/visualizing-twitter-statistics-x100/

Page 7: Inferring User Political Preferences from Streaming Communications

Real-World Predictions

Not active users: no or limited content

Average Twitter usersMedian = 10 tweets per

day

Active users 1,000+ tweets

Private users: no content

10%

50%

20%

20%

Page 8: Inferring User Political Preferences from Streaming Communications

Our Approach

1. Take advantage of user local neighborhoods

2. Incremental dynamic real-time predictions

Real world batch

predictions

Streaming predictions

Page 9: Inferring User Political Preferences from Streaming Communications

Our Approach

1. Take advantage of user local neighborhoods

2. Incremental dynamic real-time predictions

Real world batch

predictions

Page 10: Inferring User Political Preferences from Streaming Communications

Attributed Social Network

User Local Neighborhoods a.k.a. Social Circles

Page 11: Inferring User Political Preferences from Streaming Communications

Twitter Network Data

Code, data and trained models for gender, age, political preference prediction

http://www.cs.jhu.edu/~svitlana/

Page 12: Inferring User Political Preferences from Streaming Communications

Twitter Social GraphI. Candidate-Centric

1,031 users of interest

II. Geo-Centric 270 users

III. Politically Active* 371 users

10 - 20 neighbors of each type per user~50K nodes, ~60K edges

What types of neighbors lead to the best attribute prediction for a given

user?*Pennacchiotti and Popescu, 2011; Zamal et al., 2012; Cohen and Ruths, 2013

Code, data and trained models for gender, age, political preference prediction

http://www.cs.jhu.edu/~svitlana/

Page 13: Inferring User Political Preferences from Streaming Communications

Experiments• Log-linear binary unigram models:

(I) Users vs. (II) Neighbors and (III) Both

• Evaluate the relative utility of different neighborhood types:– varying neighborhood size n=[1, 2, 5, 10] and

content amount t=[5, 10, 15, 25, 50, 100, 200]– 10-fold cross validation with 100 random

restarts for every n and t parameter combination

Page 14: Inferring User Political Preferences from Streaming Communications

Neighborhood Comparison

Tweets per Neighbor Tweets per Neighbor

1 Neighbor 10 Neighbors

Accu

racy

Page 15: Inferring User Political Preferences from Streaming Communications

Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

Page 16: Inferring User Political Preferences from Streaming Communications

Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

Page 17: Inferring User Political Preferences from Streaming Communications

Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

Page 18: Inferring User Political Preferences from Streaming Communications

Optimizing Twitter API CallsCand-Centric Graph: Friend Circle

Page 19: Inferring User Political Preferences from Streaming Communications

Summary: Batch Real-World Predictions with Limited User

DataMore data is better How to get it?• More neighbors per user >

additional content from the existing neighbors

What kind of data?• Follower, friend, @mention,

retweet

• Users recently joined Twitter• No or limited access to user

tweets

no or very

limited content!

Real-world predictions

Page 20: Inferring User Political Preferences from Streaming Communications

Our Approach

1. Take advantage of user local neighborhoods

2. Incremental dynamic real-time predictions

Streaming predictions

Page 21: Inferring User Political Preferences from Streaming Communications

Iterative Bayesian Predictions

Time

?

Page 22: Inferring User Political Preferences from Streaming Communications

Cand-Centric Graph: Belief Updates

?

Time?

Time

Page 23: Inferring User Political Preferences from Streaming Communications

Cand-Centric Graph: Prediction Time

User-Neighbor

_x0004_Cand _x0004_ Geo _x0007_ Active0.001

0.01

0.1

1

10

100

0.02

12 20

0.01

198.9

0.002

1.23.2

0.001

3.51.1

Wee

ks (l

og sc

ale)

100 users75%

confidence

Cand

75%95%

User Stream

Page 24: Inferring User Political Preferences from Streaming Communications

Batch vs. Online Performance

Cand Geo Active0

0.2

0.4

0.6

0.8

1

0.720.57

0.75+0.03+0.1

+0.11+0.27

+0.27+0.14

+0.28+0.31

+0.25

User Batch Neighbor BatchUser Stream User-Neighbor Stream

Page 25: Inferring User Political Preferences from Streaming Communications

Summary

• Neighborhood content is useful*

• Neighborhoods constructed from friends,

usermentions and retweets are most

effective

• Signal is distributed in the neighborhood

• Streaming models > batch models*Pennacchiotti and Popescu, 2011a, 2001b; Conover et al., 2011a, 2001b; Golbeck et al., 2011; Zamal et al., 2012

Page 26: Inferring User Political Preferences from Streaming Communications

Thank you!Labeled Twitter network data for gender, age, political preference prediction: http://www.cs.jhu.edu/~svitlana/

Code and pre-trained models available upon request: [email protected]