collecting twitter data w/social feed manager

69
collecting twitter data w / social feed manager Daniel Chudnov - @dchud - dchud at gwu edu ELAG 2013 - 2013-05-30 - Ghent, Belgium tinyurl.com / dchud-elag-2013

Upload: dan-chudnov

Post on 16-Jan-2015

598 views

Category:

Technology


0 download

DESCRIPTION

a talk given at ELAG 2013 in Ghent, Belgium, on May 30, 2013.

TRANSCRIPT

Page 1: collecting twitter data w/social feed manager

collectingtwitter data

w /social feed manager

Daniel Chudnov - @dchud - dchud at gwu eduELAG 2013 - 2013-05-30 - Ghent, Belgium

tinyurl.com / dchud-elag-2013

Page 2: collecting twitter data w/social feed manager

social-feed-manager

•python / django

•user timelines, filter, sample, search

• simple display / export for user timelines

• free software, on github

Page 3: collecting twitter data w/social feed manager

social feed manager

github.com /gwu-libraries /

social-feed-manager

Page 4: collecting twitter data w/social feed manager

github.com / gwu-libraries / social-feed-manager

Page 5: collecting twitter data w/social feed manager

atraditional project

Page 6: collecting twitter data w/social feed manager

1expand scope

ofcollection development

Page 7: collecting twitter data w/social feed manager

2at-risk

e-resourcelicensing story

Page 8: collecting twitter data w/social feed manager

3save the time

of theresearcher

Page 9: collecting twitter data w/social feed manager

let’s startwith

the researcher

Page 10: collecting twitter data w/social feed manager

“How Mainstream News Outlets Use Twitter” (2011)• GWU’s Kimberly Gross (SMPA) +

students

• Pew Research Center’s Project for Excellence in Journalism

• “news agenda these organizations promoted on Twitter closely matches that of their legacy platforms”

http://www.journalism.org/analysis_report/how_mainstream_media_outlets_use_twitter

Page 11: collecting twitter data w/social feed manager

how do researchersstudy social media?

Page 12: collecting twitter data w/social feed manager

by hand.

Page 13: collecting twitter data w/social feed manager

•google reader

•copy and paste

•fold, spindle, mutilate

•excel

• ...eventually, SPSS and similar tools

Page 14: collecting twitter data w/social feed manager

whatever help

they can get

Page 15: collecting twitter data w/social feed manager

it’s a lot of workfor not a lot of data

Page 16: collecting twitter data w/social feed manager

(1000s of tweets)

Page 17: collecting twitter data w/social feed manager

copy and pasteto excel

doesn’t scale

just ask any student assigned to do this!

Page 18: collecting twitter data w/social feed manager

first tweet, in native JSON

Page 19: collecting twitter data w/social feed manager

astrategic

disadvantage

Page 20: collecting twitter data w/social feed manager

5,000+theses/dissertations

since 2010

(not all CS grad students)

Page 21: collecting twitter data w/social feed manager

see Leetaru et al.May 2013

First Monday

Page 22: collecting twitter data w/social feed manager

librarians can help here

Page 23: collecting twitter data w/social feed manager

what researchers ask for

• specific users, keywords

• historic time periods

• basic values: user, date, text, counts

• 10000s, not 10000000s

• delimited files to import

Page 24: collecting twitter data w/social feed manager

optionsfor

historical data?

Page 25: collecting twitter data w/social feed manager

Twitter-licensed data providers:

DataSiftGnipTopsy

Page 26: collecting twitter data w/social feed manager

data providers•friendly

•not cheap

•more than we need

•expensive

•still need tools to collect, process, etc.

Page 27: collecting twitter data w/social feed manager

what can we doourselves

?

Page 28: collecting twitter data w/social feed manager
Page 29: collecting twitter data w/social feed manager

social feed manager

github.com /gwu-libraries /

social-feed-manager

Page 30: collecting twitter data w/social feed manager

what researchers ask for

• specific users, keywords

• historic time periods

• basic values: user, date, text, counts

• 10000s, not 10000000s

• delimited files to import

Page 31: collecting twitter data w/social feed manager

can do thisfree

w/public API

Page 32: collecting twitter data w/social feed manager

twitter api

•user timelines

•filter streams

• spritzer

• search

Page 33: collecting twitter data w/social feed manager

up to 3,200most recent tweets

any public user200 at a time

and go back again for more later

Page 34: collecting twitter data w/social feed manager

dev.twitter.com/docs/working-with-timelines

Page 35: collecting twitter data w/social feed manager
Page 36: collecting twitter data w/social feed manager
Page 37: collecting twitter data w/social feed manager

1,969,760 tweetsfrom

1,228 users

Page 38: collecting twitter data w/social feed manager

group users in setsexport by user / set

all at onceor time slices

Page 39: collecting twitter data w/social feed manager

40+ media outlets400+ elected officials

300+ journalists300+ GWU groups

Page 40: collecting twitter data w/social feed manager

filter streams

Page 41: collecting twitter data w/social feed manager

millions of tweetsas they occur

around an event

Page 42: collecting twitter data w/social feed manager
Page 43: collecting twitter data w/social feed manager

filter streams

* a little more complicated than that

• filter by users, keywords, geo

• about 3,000 tweets / min *

• 10,000,000s of tweets

• political debates, news events

Page 44: collecting twitter data w/social feed manager
Page 45: collecting twitter data w/social feed manager

spritzer feed

• ~0.5% of all public tweets

• ~3,000,000 tweets / day (growing)

• a useful random sampling

Page 46: collecting twitter data w/social feed manager

search

•after an event

•find users, keywords

• limited - better than nothing

Page 47: collecting twitter data w/social feed manager

we can doall this

at no marginal costfor data*

* not really “big data” - GBs, not TBs

Page 48: collecting twitter data w/social feed manager

this muchalone

meets several needs

Page 49: collecting twitter data w/social feed manager

this muchalone

shows at-risk nature

Page 50: collecting twitter data w/social feed manager

when the Pope resigned

Page 51: collecting twitter data w/social feed manager

when Congress turned over

• 16+ accounts deleted / hidden

• combined 105,993 followers

• 14,479 tweets saved in SFM no longer public

Page 52: collecting twitter data w/social feed manager

if a researcher needs more•support selection,

acquisition, accession, storage, transformation

•collect what’s free around it to minimize cost

•plan purchase via grant

•collect prospectively

Page 53: collecting twitter data w/social feed manager

next steps

Page 54: collecting twitter data w/social feed manager

improving sfm

• support concurrent per-user filters / streams

• add Sina Weibo, YouTube, others as asked

Page 55: collecting twitter data w/social feed manager

driveselective, automated

web archiving

Page 56: collecting twitter data w/social feed manager

ensureyou can use

sfm

you can have it! it’s free to use, copify, modify, redistribute

Page 57: collecting twitter data w/social feed manager

discovery?

Page 58: collecting twitter data w/social feed manager

theobvious solution

Page 59: collecting twitter data w/social feed manager

653 - subject added entry, uncontrolled for hashtags

Page 60: collecting twitter data w/social feed manager

700 - name added entries for mentions

Page 61: collecting twitter data w/social feed manager

856 42 - URL of related resource for included links

Page 62: collecting twitter data w/social feed manager

500 - note for retweet count

Page 63: collecting twitter data w/social feed manager

336, 337, 338 - RDA ready!

Page 64: collecting twitter data w/social feed manager

w / catmanduslinging data around

is fun and easy!

already indexed piles of tweets in ElasticSearch** really!

Page 65: collecting twitter data w/social feed manager

we will add2 - 4 million

catalog recordsper month

Page 66: collecting twitter data w/social feed manager

WorldCatcan handle this

it’s web scale!

Page 67: collecting twitter data w/social feed manager

augmenting / creatingauthority records

w / twitter screen names

already cleared it with a PCC / NACO rep!

Page 68: collecting twitter data w/social feed manager

Summoncan handle this

Andrew is very familiar with growing consortial catalogs!

Page 69: collecting twitter data w/social feed manager

github.com /gwu-libraries /

social-feed-manager

@dchuddchud @ gwu edu