research using twitter social media...

11
1 Research Using Twitter Social Media Data Twitter is an extensive source of social media data, and unusual in that it has both an extensive data sharing infrastructure and an explicitly permissive Terms of Service. This tutorial document shows how to social scientists can use Twitter as a source of research data, without any technical background or programming ability. It shows how to use commonly available data capture and analysis tools, and the kinds of research methods and investigations that they can support. Note on Software Required for this Tutorial To follow this tutorial, you will need to use the Chrome browser to access Twitter’s web site. The data capture tool (Web Data RA) is a Chrome browser extension that interprets the Twitter Web pages and captures them as spreadsheet data that you can paste into Microsoft Excel (Windows or Mac). You will also use Gephi, an open source network analysis and visualisation application which you can download and install from gephi.org. This expects a recent version of the Java Runtime Environment (JRE) to be installed on your machine. Installing and Using Web Data RA The Web Data RA will capture Twitter, Facebook and Google data from a browser and allow you to paste a table of information directly into a spreadsheet. This document focuses on its use with Twitter. 1) Install the Web Data RA browser extension into Chrome by visiting bit.ly/WebDataRA in Chrome, and clicking on the blue “+ Add to Chrome” button. The small green icon will appear in the top right of the browser window, next to the URL bar. 2) Go to Twitter.com and create a Twitter search or display a timeline 3) Click on the WebDataRA icon to start collecting tweets. Every five seconds the browser will automatically scroll to the bottom of the page to make Twitter load the next batch of results and copy the data automatically to the clipboard. 4) When you have collected enough results, paste the data into an Excel spreadsheet. 5) Use Excel to analyse data, or export to other programs such as Gephi or Voyant for other kinds of analysis.

Upload: others

Post on 01-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

1

ResearchUsingTwitterSocialMediaDataTwitterisanextensivesourceofsocialmediadata,andunusualinthatithasbothanextensivedatasharinginfrastructureandanexplicitlypermissiveTermsofService.ThistutorialdocumentshowshowtosocialscientistscanuseTwitterasasourceofresearchdata,withoutanytechnicalbackgroundorprogrammingability.Itshowshowtousecommonlyavailabledatacaptureandanalysistools,andthekindsofresearchmethodsandinvestigationsthattheycansupport.NoteonSoftwareRequiredforthisTutorialTofollowthistutorial,youwillneedtousetheChromebrowsertoaccessTwitter’swebsite.Thedatacapturetool(WebDataRA)isaChromebrowserextensionthatinterpretstheTwitterWebpagesandcapturesthemasspreadsheetdatathatyoucanpasteintoMicrosoftExcel(WindowsorMac).YouwillalsouseGephi,anopensourcenetworkanalysisandvisualisationapplicationwhichyoucandownloadandinstallfromgephi.org.ThisexpectsarecentversionoftheJavaRuntimeEnvironment(JRE)tobeinstalledonyourmachine.

InstallingandUsingWebDataRATheWebDataRAwillcaptureTwitter,FacebookandGoogledatafromabrowserandallowyoutopasteatableofinformationdirectlyintoaspreadsheet.ThisdocumentfocusesonitsusewithTwitter.1) InstalltheWebDataRAbrowserextensionintoChromebyvisiting

bit.ly/WebDataRAinChrome,andclickingontheblue“+AddtoChrome”button.Thesmallgreenicon willappearinthetoprightofthebrowserwindow,nexttotheURLbar.

2) GotoTwitter.comandcreateaTwittersearchordisplayatimeline3) ClickontheWebDataRAicon tostartcollectingtweets.Everyfive

secondsthebrowserwillautomaticallyscrolltothebottomofthepagetomakeTwitterloadthenextbatchofresultsandcopythedataautomaticallytotheclipboard.

4) Whenyouhavecollectedenoughresults,pastethedataintoanExcelspreadsheet.5) UseExceltoanalysedata,orexporttootherprogramssuchasGephiorVoyantforother

kindsofanalysis.

Page 2: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

2

OverviewWhenyoupastedatafromWebDataRAyouwillcreatefourtablesinyourspreadsheet,appearingbeloweachother.

Thetweetdata,withauthor,mentions,hashtags,textandcountsofretweets,repliesandlikesbrokenoutinseparatecolumns.

Accountoccurrencesummary,acountofthenumberoftimesthateachTwitteraccountappearsinthedatasetasauthororamention(includingthenumberofretweets).

Countsoftheappearancesofeachhashtag.

Atableofedgesoftheconversationalnetwork,i.e.thenumberoftimeseachpairofaccountscommunicatewitheachother.

Youcanusethisdatainvariousways:(a) Thetweetdata(gray)containsthebasicdataabouteachtweet:whatwassaid,

when,bywhoandtowhom.Youcanusethisdatatoformageneraloverviewofthecommunicationovertimeandidentifythemostsignificanttweets.YoucanalsoexaminespecifictweetsandtheircontextbyreferringbacktotheTwittersiteusingeachtweet’sURL.

(b) Theaccounttable(green)showsyouthemostactivetweeters,themostfrequentrepliers,andthemostretweetedusers.Thiswillhelpyouseethekeyactorsinaconversation,andthemainrolesthattheytake.Youcanfollowupbyclickingontheaccountnamestolookattheaccountbiosandtherelevanttimelinesoftheseactorstounderstandwhethertheyarecorporateaccounts,privateindividuals,botsortrolls.

(c) Thehashtagtable(blue)showsyouthemostfrequentlyusedhashtags.Thiscanhelpyouextendyourdatagatheringtolookformoretweetsrelevanttoyourresearchquestion.

(d) Theedgetablewillhelpyoutoseetheinteractionsbetweenactors,andhelpyoutounderstandgroupingsofactors,andthepatternoftheirinteraction.Isakeyaccountdominatingaconversationandtalkingtomanyothers?Aretheyrespondingorjustbeingpassiverecipientsofmarketingmessages?Isthereagroupofequalshavingabalancedconversationwithequalparticipation?

Page 3: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

3

UsingExcelforSimpleQuantitativeAnalysesThefollowingsetoftweets(graytable)comesfromaTwittersearchforthephrase“DigitalDetox”.

TheeasiestwaytoseeanoverviewofaTwittertimelineistocreateaPivotTable.Clickonanygraycell,andchoose“PivotTable”fromtheInsertribbon.InthePivotTablebuilder,drag“Author”fromtheFieldNamepanelintothe“Rows”panel,drag“Timestamp”intothe“Columns”panel,anddrag“Author”(again)intothe“Values”panel(itwillautomaticallyturninto“CountofAuthor”).

Thescreenshotaboveshowsthedatesincludedinthetwittersampleasgreencolumnheadings,andtheaccountsthatauthoredtweetsasrowlabelsalongthelefthandside,orderedbymostprolifictweeters.Thevaluesineachcellarethenumberoftweetsauthoredbyaspecificaccountonaspecificday.Youcanadjusttheformattingforconvenience(Inarrowedthecolumnsandslantedthecolumnheadingsandchangedtheangleofthetextto60degreestofit),usethe“Row

Page 4: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

4

Labels”controltosortbytheauthorcount(i.e.thenumberoftweetsanauthorcreated)andshowonlytherowswherethetotalauthorcountisgreaterthanachosenthreshold.Youcanalsouseconditionalformattingtocolourthecellstohighlightthemostextremevalues.AllkindsofsummariesandanalysesarepossibleusingExcelonthisdata,including:

• Showingthedistributionofthetweetsamplethroughtime• Identifyingthemostprolificand/orpopularactors,andshowingtheiractivity

throughtime• Showingtheuseofindividualhashtags(thismightbeusefulinabigconversation,or

onethatevolvesoveralongerperiod)• Comparingtherelativeproportionofcontributionsfromdifferentactors/hashtags

AlloftheseanalyseswillleadontootherquestionsthatcanbeaskedbygoingbacktoTwitter.Theaccountnamesintheaccount“authorandmentions”(green)tableareclickable,andopenthepageoftheaccountprofileinyourdefaultwebbrower.Alternatively,tomakeasetofaccountnamesintoclickablehyperlinksgivingbrowseraccesstotheuser’sTwittertimelineandbio,usethefollowingfunction:=HYPERLINK(CONCATENATE("http://twitter.com/",A9),CONCATENATE("@",A9)) whereA9isanexampleofacelladdressthatcontainsaTwitteraccountnamee.g.lescarr.Followingtheaccounthyperlinksforthemostprolificauthorsinthegreentable,weseethattheyareallcommercialactorstooneextentoranother.Account # BioItsTimeToLogOff 30 TimeToLogOffisthehomeofdigitaldetox.We’respearheadingthe

movementtodisconnectregularlyfromdigitaldevicesandreconnectwiththeworldoffline.Wedothisthroughcollectingfactsontheneedfordigitaldetox,runningcampaignstogeteveryoneofftheirscreensandhostingretreats,eventsandworkshops.

DinnerTableMBA 9 Acommercialorganisationworkingtogethertohelpfamiliesbecomemoreconfident,successful,andself-empowered

SpareFoot 8 Astoragecompany.Wemakeiteasytomoveandstoreyourstuff.Reservestorageforfreeandgetyourmindoutoftheclutter.

CultureEffect 5 AuthorofDigitox:HowtoFindaHealthyBalanceforyourFamily’sDigitalDiet

UsingGephiforNetworkAnalysesToseehowthevariousaccountsinteractwitheachotherasanetwork,copyandpastetheyellowtableintoaseparatespreadsheetandsaveitasaCSVfile(callitedgetable.csvor

3.OneclicktoopenTwitteraccountinbrowser

2.Functionturnsaccountnameintolink

1.Functionusesaccountname

Page 5: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

5

similar).Loadupthenetworkvisualisationprogram“Gephi”,andstartanewproject.Inthe“DataLaboratory”,choose“ImportSpreadsheet”andloaduptheCSVdataasanedgetable.Youcanthenapplyavarietyofnetworklayoutalgorithmsinthe“Overview”pane.

Thenetworkvisualisationshowsmanyisolatednodes(accounts)inanouterringandacentralcoremadeupofdifferentgroupsofaccounts.Manyofthesearelooselyconnected“chains”of2-6accountswhereoneaccounthasmentionedanother,whichhasmentionedanotherandsoon.Therearemorecomplicatedsubnetworkcomponentsthatdemonstratemoreactivity,asseenbelow.

Thegreencomponentisdominatedbyasinglecorporateaccount(themostprolificaccountinthissample)whoseroleistopromotetheideaofadigitaldetoxandthat“tweetsat”manyotheraccounts,initiatingcommunicationwiththem.Bycontrast,therednetworkconsistsofalargergroupofteachersandeducationprofessionalswhoalreadyparticipateinalargerprofessionalnetworkwithinTwitter,andwhoarediscussingthetopicofdigitaldetoxwithinthatcontext.ManysummariesandanalysesarepossibleusingGephi’snetworkvisualisationtool:

• Showingtheinteractionofthenetworkactors• Identifyingthecommunitiesandactiveparticipantsubgroupswithinthelarger

sample• Identifyingtherolesofdifferentactorsinthecommunicationsnetwork

Page 6: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

6

Pleasenote,thatmanynetworkanalysesintheliteratureoftenfocusontheretweetnetwork,butthatisnotpossiblewithWebDataRA,becausetheidentityoftheretweetingaccountsisnotavailabletotheWebbrowser(theWebUserinterfaceorWebapp).Youhavedataonthenumberofretweets(i.e.thepopularityofthetweet),butnottheaccountsthatretweetedtheoriginalmessage.Formoreinformationonasupplementaryservicebeingdevelopedtofillthisgap,seetheendofthisdocument.

UsingVoyantforTextualAnalysesInthegraytable,copythe“SanitisedText”column.Thiscontainsthetextofallthetexts,butwithalltheTwitterfeatures(@names,#hashtags,URLs)removedtoleaveonlytheEnglishtext.GototheVoyant-Tools.orgwebsite,pastethetextintothetextboxandpressthe“Reveal”button.Youwillseeascreenwithseveralpanelsthathelpyouexplorethetextofthetweetsindifferentways.VoyantToolsisatextualcorpusanalyser.Itconsidersthedatathatyouhaveenteredasasingledocumentwheretheindividualtweetsarelikeindividualsentencesorparagraph.TheWordcloud,Reader,TrendandConcordanceallanalysethetextfromthecollectionoftweets.ClickonawordintheWordcloud,andallitsoccurrenceswillbehighlightedintheReaderpanel,it’sfrequencythroughoutthewholedocument(setoftweets)willbedisplayedintheTrendgraph,anditscontextwillbedisplayedintheConcordance.Thishelpsyoutoinvestigatetheuseoflanguageinthecollectionoftweets,andquicklyunderstandwhatisbeingtalkedaboutandhow.Italsohelpsyoutoseehowthelanguagechangesovertime.Thefirstthingthatthisdisplayshowsisthatthemostcommontermsaredigital,detox,anddigitaldetoxbecausetheywerethesearchterms!Toignorethem,addthemtothestopwordslistbyclickingonthe“Options”iconintheWordcloudpanel’sgreyiconbar.

Whenthemouseentersthebar,youwillseethefollowingiconsappear: .The“Options”iconistheslidingbutton,nexttothe“Help”questionmark.

ToedittheStopwordslist,clickonthe“EditList”buttonandaddthethreeextratermstobeignored,oneperline.

Page 7: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

7

Youwillseethatthepanelshavere-analysedthetext,ignoringthetermsthatyouhaveaddedtothestopwordlist.

Othermorecomplexvisualisationsandanalysesareavailabletobeused,includingadvancedMachineLearningalgorithmstoclusterkeywordsandsimplegraphvisualisationstoshowkeywordco-occurrence.

StreamGraph DimensionalReductionsofKeywordAppearance

NetworkofKeywordCo-occurrences

Page 8: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

8

AnalysisofTwitterLanguageThewordcloudshowsinvisualformthemostcommonlyusedtermsintheTwittersample.Thisisgreatforanimpressionofthetopics,butamoreusefulsummaryisthe“Terms”panel.Using15ofthetop20wordsinthissample,wemighthypothesisethatadigitaldetoxisaboutspendinglesstimeusingaphoneandtheneedtounplugyourselffromsmartphonesocialmediatechnologyapps–it’sahealthchallengeyoutryforaday,aweekorayear.

Aneasywaytoinvestigatetheuseofeachofthesewordsistoselecttheminthetermpanel,andscrollthroughthein-contextuseinthe“Context”panel.Themostcommonlyusedwordis“time”butit’smainuseisnotinthesenseof“spendingtoomuchtime”or“savingtime”butinthesenseofanopportunity(it’stimeto…orit’stimefor…)andfrequentlyasarhetoricalquestion(Isn’tittimeforadigitaldetox?)FindingtweetsthatspeakfrompersonalexperienceManyofthetweetsseemtobepromotionallifestyletweetsinheadlineform(e.g.“WhyYouShouldDoaDigitalDetox”or“DigitalDetoxBenefits:Areyouaddictedtotechnology?”)Toidentifypersonaltweetsfrompeoplewhohavetriedorarethinkingoftryingadigitaldetox,searchfortweetscontainingthepersonalpronoun“I”.Unfortunately,“I”isoneoftheVoyantToolsstoplistwordsandisignored(astoplistconsistsofthecommonbutlow-informationwordsinatextincludingpronouns,prepositionsandconjunctions).AlthoughitispossibletoeditthestoplistinVoyant,IwillsearchforthestringI<space>orI<apostrophe>inthetextofthetweet(tomatchI,Iam,Ihave,Iwill,I’m,I’ve,I’ll).Usethefollowingformula =OR(NOT(ISERROR(FIND("I ",A1))),NOT(ISERROR(FIND("I'",A1)))) todefineanextracolumninthegraytable,andsortthesetbytheresults(TRUEorFALSE).Only17%ofthetweetsinthiscollectionwereidentifiedaspersonalbythissimplerule.(Note,noteverytweetwiththewordIistalkingfromafirstpersonperspective,andmanytweeterswouldmiss“Ihave”or“Iam”fromthebeginningofatweetortext.)Ofthose17%,justoverhalfwereactivelyplanningadigitaldetox,weredoingadigitaldetoxorhaddoneone.(Theironyoftweetingaboutdoingadigitaldetoxisnotlostonsomecommentators!)

Page 9: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

9

Itisthenpossibletofollowupwitheachoftheaccountsthathastweetedaboutanactualperiodofdigitaldetoxthattheyhaveundertaken,toexaminetheirTwitteractivitybefore,duringandafterthedetoxandtoidentifyanyquantitativedifferenceintheiruseoftheplatform.Asanexample,Ichoseasingleaccountthathadannouncedthattheywerestartinga“DigitalDetox”todeprivethemselvesofsocialmediaforaday.Iaccessedallthetweetsofthatuserovertheperiod(usingWebDataRA),andcreateda1-authorpivottable(asabove)toaggregatethetweetstoacountperday,andthenplottedthatdataasascattergraphwith7-daymovingaverage.Thegraphshowsthatalthoughtheaccountdidnottweetonthechosendate(markedwitharedcross)thereisnonoticeablechangeintweetingbehaviouraftertheannounceddetoxday.

AdditionalMethods:SentimentAnalysisSentimentanalysiscanhelpyouidentifypositiveornegativecommentsinyoursample.Thisisapopularmethodinindustry,especiallywithbrandmanagementcompanies.Howeveritisacademicallycontested,anddoesnothaveahighdegreeoftransparencyinthelexicalprocessing. Nevertheless,totryitout,pastethe“SantitisedText”columnintoanonlineservicesuchassentigem.com.Considertowhatextenttheresultsseemaccuratetoyou–howwelldoesitidentifypositiveandnegative‘sentiment’inasentence?Whatkindsofinaccuraciescanyousee?And,mostimportantly,despiteanyshortcomings,doesithelpyoutoidentifyanypointsofinterestinyourdataformorethoroughinvestigation?

7% 11%

22%

23%

25%

12%

PersonalTweetsaboutDigitalDetox

WantToDo

PlanToDo

Doing

Did

Suggestions

Opinions

Page 10: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

10

AdditionalMethods:RetweetNetworkAsupportingserviceisbeingdevelopedforWebDataRAtoallowyoutorecovertheretweetnetworkaroundthetweetsthatyouhavecollected.Aspreviouslyexplained,theTwitterwebappdoesnotshowtheretweetsofeachtweet,onlythecountofretweets.However,itispossibletorequestthatdatafromtheTwitterAPI,althoughitislimitedto100retweetsofeachtweet.(Ifatweethasbeenretweetedmorethan100times,thoseextratweetswillnotbeaccessible,assuchthisshouldbeconsideredtobeanapproximationoftherealretweetnetwork.)Toobtaintheretweetdataforyourtweets,selectthetwitteridsforwhichyouwishtofindtheretweets,andpastetheseintotheformathttp://pretend.webdataRA/retweets/.PleasenotethattheTwitterAPIlimitsrequestssuchthateachtwitterIDthatyouprovidewilltake12secondstoprocess,andthefinishedresultmaytakeanhourormore.YoucanpastetheresultingdataintoanExcelspreadsheet,saveitasCSVandimportitintoGephiaspreviouslyexplained.IntheGephinetworkvisualisationbelow,nodesarelargerandcolouredmoreintenselyaccordingtohowmuchothernodesretweetthem(theirroleasauthoritiesinthenetwork),butthenodelabelsaresizedaccordingtowhethertheyretweetorareretweeted.Soyoucanseethatsomeaccountsareveryactivebutnotasoriginatorsofinformation,whereasotheraccountsmaybelessactivebutaremoreinfluential.

Page 11: Research Using Twitter Social Media Datasouthampton.ac.uk/~lac/WebDataResearchAssistant/HowTo.pdf · 2018-08-03 · use with Twitter. 1) Install the Web Data RA browser extension

11