predicting swedish elections with twitter: a case for stochastic link structure...

8
Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysis Nima Dokoohaki Swedish Institute of Computer Science (SICS) SE-16429 Kista, Sweden Email: [email protected] Filippia Zikou Royal Institute of Technology (KTH) SE-16440 Kista, Sweden Email: fi[email protected] Daniel Gillblad Swedish Institute of Computer Science (SICS) SE-16429 Kista, Sweden Email: [email protected] Mihhail Matskin Royal Institute of Technology (KTH) SE-16440 Kista, Sweden Email: [email protected] Abstract—The question that whether Twitter data can be leveraged to forecast outcome of the elections has always been of great anticipation in the research community. Existing research focuses on leveraging content analysis for positivity or negativity analysis of the sentiments of opinions expressed. This is while, analysis of link structure features of social networks underlying the conversation involving politicians has been less looked. The intuition behind such study comes from the fact that density of conversations about parties along with their respective members, whether explicit or implicit, should reflect on their popularity. On the other hand, dynamism of interactions, can capture the inherent shift in popularity of accounts of politicians. Within this manuscript we present evidence of how a well-known link prediction algorithm, can reveal an authoritative structural link formation within which the popularity of the political accounts along with their neighbourhoods, shows strong correlation with the standing of electoral outcomes. As an evidence, the public time-lines of two electoral events from 2014 elections of Sweden on Twitter have been studied. By distinguishing between member and official party accounts, we report that even using a focus- crawled public dataset, structural link popularities bear strong statistical similarities with vote outcomes. In addition we report strong ranked dependence between standings of selected politi- cians and general election outcome, as well as for official party accounts and European election outcome. I. I NTRODUCTION Twitter as a micro-blogging platform has become a widely anticipated platform within politics domain. Such anticipation is seen world-wide from both politicians and voters, as the real-time and democratic nature of information dissemination on Twitter allows masses to express and follow opinions freely. Such phenomena is visible in Sweden, as an increasing number of politicians leverage social media to communicate their daily activities to masses. Twitter became the centerfold medium for observing the course of public political discussions during and in the aftermath of the Iranian presidential elections back in 2011. Ever since, public and private bodies have been eager to search, browse and track the politicians, their respective parties and to gain an understanding of their respective reputation on Twitter. Fascination with aggregation and mining Tweets in order to predict possible outcome of real world events has led to a large number of case-studies [1]. Obviously, political domain was not an exception either. What is fascinating is mix of positive [2] and negative attitudes [3] in political data mining from Twitter to forecast vote outcomes. On the other hand, an overview of literature shows how much existing methodologies for mining democratic data on Twitter are mostly focused on analyzing the content of Tweets. For instance, a number of researchers have proposed common practices for classifying polarity of tweets [4] and sentiment analysis for the task of outcome prediction [2]. Comparatively speaking, studies mining topological features of underlying tweet networks are less popular. This has been visible in several works that try to combine social network and content analysis techniques to shed light on both interac- tion and polarization characteristics of networks involved [5], [6], [7]. Link prediction [8] techniques have gained strong momentum in recent years due to their adoption along with their probabilistic and scalable capabilities for analyzing and predicting dynamism of link-structures in the social networks [9]. We highlight two advantages of link prediction techniques: first, link prediction could capture density of conversations about parties and their respective members, which in turn should reflect on their popularity within and across the net- work. Second, dynamism of interactions which reflects on inherent shift in popularity of accounts of politicians, can in turn be captured by a stochastic link mining technique. Finally, link prediction algorithms are topic-sensitive thus valuable to political opinion mining. Thus within this manuscript we make a case for studying evolution of link structures surrounding political tweets to capture the popularity of parties and their respective members. Our implementation entices an exploratory analysis comprised of basic and advanced social network mining studies aimed mainly at understanding the dynamics of parties interaction within and without their neighborhood. We present an in- depth study of dynamics of party and individual members link-structure based popularities along the time-line of two European and general elections during 2014 on Sweden. We validate the resulting popularity estimates using the official statistics of voting outcomes. We consider a combined selec- tion of mentions and retweets as the source of interactions. This choice combined with the link mining algorithm cen- tralize the clusters of network vertices around main topics of discussions rather than individuals [10]. This also helps with selection of relevant topics thus avoiding noisy content, as it is vital to sanity of social media studies. The data presented

Upload: others

Post on 30-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

Predicting Swedish Elections with Twitter:A Case for Stochastic Link Structure Analysis

Nima DokoohakiSwedish Institute of

Computer Science (SICS)SE-16429 Kista, Sweden

Email: [email protected]

Filippia ZikouRoyal Institute of

Technology (KTH)SE-16440 Kista, Sweden

Email: [email protected]

Daniel GillbladSwedish Institute of

Computer Science (SICS)SE-16429 Kista, Sweden

Email: [email protected]

Mihhail MatskinRoyal Institute of

Technology (KTH)SE-16440 Kista, Sweden

Email: [email protected]

Abstract—The question that whether Twitter data can beleveraged to forecast outcome of the elections has always been ofgreat anticipation in the research community. Existing researchfocuses on leveraging content analysis for positivity or negativityanalysis of the sentiments of opinions expressed. This is while,analysis of link structure features of social networks underlyingthe conversation involving politicians has been less looked. Theintuition behind such study comes from the fact that density ofconversations about parties along with their respective members,whether explicit or implicit, should reflect on their popularity.On the other hand, dynamism of interactions, can capture theinherent shift in popularity of accounts of politicians. Withinthis manuscript we present evidence of how a well-known linkprediction algorithm, can reveal an authoritative structural linkformation within which the popularity of the political accountsalong with their neighbourhoods, shows strong correlation withthe standing of electoral outcomes. As an evidence, the publictime-lines of two electoral events from 2014 elections of Swedenon Twitter have been studied. By distinguishing between memberand official party accounts, we report that even using a focus-crawled public dataset, structural link popularities bear strongstatistical similarities with vote outcomes. In addition we reportstrong ranked dependence between standings of selected politi-cians and general election outcome, as well as for official partyaccounts and European election outcome.

I. INTRODUCTION

Twitter as a micro-blogging platform has become a widelyanticipated platform within politics domain. Such anticipationis seen world-wide from both politicians and voters, as thereal-time and democratic nature of information disseminationon Twitter allows masses to express and follow opinions freely.Such phenomena is visible in Sweden, as an increasing numberof politicians leverage social media to communicate their dailyactivities to masses. Twitter became the centerfold medium forobserving the course of public political discussions during andin the aftermath of the Iranian presidential elections back in2011. Ever since, public and private bodies have been eager tosearch, browse and track the politicians, their respective partiesand to gain an understanding of their respective reputation onTwitter. Fascination with aggregation and mining Tweets inorder to predict possible outcome of real world events has ledto a large number of case-studies [1].

Obviously, political domain was not an exception either.What is fascinating is mix of positive [2] and negative attitudes[3] in political data mining from Twitter to forecast vote

outcomes. On the other hand, an overview of literature showshow much existing methodologies for mining democratic dataon Twitter are mostly focused on analyzing the content ofTweets. For instance, a number of researchers have proposedcommon practices for classifying polarity of tweets [4] andsentiment analysis for the task of outcome prediction [2].Comparatively speaking, studies mining topological featuresof underlying tweet networks are less popular. This has beenvisible in several works that try to combine social networkand content analysis techniques to shed light on both interac-tion and polarization characteristics of networks involved [5],[6], [7]. Link prediction [8] techniques have gained strongmomentum in recent years due to their adoption along withtheir probabilistic and scalable capabilities for analyzing andpredicting dynamism of link-structures in the social networks[9]. We highlight two advantages of link prediction techniques:first, link prediction could capture density of conversationsabout parties and their respective members, which in turnshould reflect on their popularity within and across the net-work. Second, dynamism of interactions which reflects oninherent shift in popularity of accounts of politicians, can inturn be captured by a stochastic link mining technique. Finally,link prediction algorithms are topic-sensitive thus valuable topolitical opinion mining.

Thus within this manuscript we make a case for studyingevolution of link structures surrounding political tweets tocapture the popularity of parties and their respective members.Our implementation entices an exploratory analysis comprisedof basic and advanced social network mining studies aimedmainly at understanding the dynamics of parties interactionwithin and without their neighborhood. We present an in-depth study of dynamics of party and individual memberslink-structure based popularities along the time-line of twoEuropean and general elections during 2014 on Sweden. Wevalidate the resulting popularity estimates using the officialstatistics of voting outcomes. We consider a combined selec-tion of mentions and retweets as the source of interactions.This choice combined with the link mining algorithm cen-tralize the clusters of network vertices around main topics ofdiscussions rather than individuals [10]. This also helps withselection of relevant topics thus avoiding noisy content, as itis vital to sanity of social media studies. The data presented

Page 2: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

throughout the the paper is the result of a continuous crawl ofpublic Twitter streams for period of eight months grossing to7 million tweets from Sweden.Specific contributions of this paper, are as follows:• Social network analysis including fitness studies for con-

versation graphs of European and General election time-lines,

• Popularity modeling using personalized PageRank ofpolitical parties and their respective members using astochastic link-structure mining approach,

• Statistical correlation modeling between the vote outcomeand inferred top follower (out-degree) of party and indi-vidual accounts

II. RELATED WORK

A. Political Conversations on Twitter

Twitter as a conversation medium [11], provides severalmeans for individuals to communicate with one another. Men-tions, or simply replies, are the most common way to interactwith another user while mentioning their id (using @ sign) inthe content. Honeycutt and Herring observed that tweets withmentions showed higher variance for content [12]. Retweetsare another conversation means for sharing other user’s tweets,usually as a whole. Hashtags seem to be the main method usedto report an event or communication, while at the same timemaking similar themed conversations findable. As a result,hashtags can add semantics to the stream of tweets by givingthem semantic structure [13]. A number of researchers studythe specific characteristics of political interactions on Twitter.Conover et al. [5], analyze the retweet and mention networksone and half month leading to 2010 US midterm election andthey differentiate between retweet and mention networks basedon limited correlation between right and left oriented usersin retweet network. Following Conover’s study we also havechosen to take into account mentions and retweets. Within ourstudy chosen tweets are a combination of both retweets andmentions.

B. Predicting Elections on Twitter

With increasing popularity of social media, analysts becameinterested in understanding the role of Twitter in elections.Study published by Livne et al.[6], focuses on the use ofTwitter by 2010 midterm elections in the US and throughcorrelating between network structure, content and electionresults, they can accurately predict results of the election.Tomasjan et al.[2], analyze political sentiment from text oftweets of Germany’s multi-party election in 2009 and showthat Twitter can complement traditional methods of politicalforecasting. On the other hand, Jungherr [14] focuses onreported studies on hashtag usage and state that such politicalpredictions are not possible. Gayo-Avello [15] summarizedexisting work for predicting elections, and proposed a schemeto characterize Twitter prediction methods. The scheme coversevery aspect from data collection to performance evaluation,through data processing and vote inference. We will highlight

in the next sections that advanced link analysis techniques arerecent.

C. Political Link Mining and Recommendations on Twitter

Existing works which focus on analyzing characteristics ofinteractions with politicians or their parties have been recent[7]. This is also the case also with evidence on analyzingstructure of interactions with politicians on Twitter. This hasbeen highlighted on a recent survey of election predictionon Twitter [16]. In 2012 during a primary in California,Nielsen [17] reported that in three out of four races, mostfrequently mentioned candidate won. Gaurav et al [18] proposefor prediction model based on the number of times the nameof a candidate is mentioned in tweets. They develop severalmethods to augment the counts by counting not only thepresence of candidate’s official names but also their aliases andcommonly appearing names. They report success in predictingthe winner of three presidential elections in Latin America in2013. Cameron et al.[19], studies data from the 2011 NewZealand General Election and candidates social networks onFacebook and Twitter. They report a statistically significantrelationship between the structural size of social networks andelection voting and election results. Makazhanov and Rafiei[20] focus on Alberta 2012 general election and show thatpolitical taste of users can be predicted according to their inter-action with political parties. They propose prediction modelsusing contextual and behavioral features specially interactionfactors (retweets, following size, etc). Given the increasingliterature on interaction analytics, within this manuscript wemake a case for link-structure mining and analysis. Getoorand Diehl [21], generalize the notion of link mining asdata mining techniques that explicitly consider links, whenbuilding predictive or descriptive models of the linked data.One of the most anticipated of link mining approaches, isthe link prediction problem, which was first introduced byLiben-Nowell and Kleinberg [8]. Tied directly to networkevolution, link prediction asks how much evolution of a socialnetwork can be modeled using features local and salient tothe network itself. There has been increasing attention inlink prediction on Twitter, that use network proximity fornetwork popularity estimation. Zou and Fekri [22] propose twoapproaches to exploiting both popularity and similarity for linkrecommendation. The first approach employs the rank aggre-gation technique to combine rankings generated by popularity-based and similarity-based recommendation algorithms. Yuanet al.[23], note that sentiment analysis studies miss out onthe value that incorporation of social relationships. Thus theyexploit sentiment proximity for link prediction on a Twitterdataset in one month during U.S. 2012 political campaignalong the follows relationship between users. Similar to Yuanet al. [23], we focus on studying how existing link-structureof the network can be used to build a party-centric popularity,although we don’t leverage on sentiment features in this work.As a result, the novelty of this work lies within showing howpopularity estimates using an authoritative and stochastic link-structure mining could reveal popularity rankings that could

Page 3: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

closely correlate with voting outcome, and this is done withouttaking sentiment features into account.

III. SWEDISH ELECTIONS ON TWITTER

A. Previous Swedish Election Analysis on Twitter

Swedish elections have been subject to data analysis in thepast. Larsson and Moe [24] focused on the 2010 Swedishelection campaign and the related discussions in Twitter in thatperiod. They aimed at identifying the various types of usersbased on their usage of Twitter. They describe the politicalnetwork and how users relate to each other by assigningthe users to nodes and designing their interconnections hemeasured the density, centralization and their position at thesocial graph. This work focused on crawling all the tweets with#val2010 hashtag which was the prominent hashtag for the2010 elections. Evidently, the day of the elections showed mosttraffic of political tweets and the most active users were partymembers or journalists. We also observed the same impactby party members, as presented later on. Our advanced linkmining experiments help exploiting inherent value of suchinteractions as presented later on.

B. 2014 Swedish Elections

We outline two time-lines in 2014 where we crawled tweetsfrom: European parliamentary elections (EU) and SwedishGeneral elections (General). Table I summarizes the Swedishnames and their title in English along with their results asreported by Swedish election authority 1.

European Parliamentary elections took place on 25 May2014. At the election, twenty members of the EuropeanParliament (MEPs) were from the Swedish political landscape.The outcome of the vote, saw the rise of left-wing parties inpopularity, specially Social Democrats which took the highestproportion of votes. Whilst the right-leaning parties took thelower vote percentage, specially Moderate Party, who wereat the time in power. General elections were held in Swedenon 14 September 2014. During the elections, The Center-rightAlliance coalition (comprised of Moderate Party, Liberal Peo-ple’s Party, Center Party and Christian Democrats) competedfor a third term in office. In contrast to the 2010 election, threedominant left-leaning parties (Social Democrats, Green Partyand Left Party) took the lead and won, while Alliance camesecond, and Sweden Democrats, a nationalist party took thethird place. Finally, the Feminist Initiative and a left leaningparty, did not secure the 4% needed threshold to join thegovernment.

IV. DATA ANALYSIS FRAMEWORK

Within this section we present the data analysis frameworkalong with experiments that we have devised to analyze theTwitter data at hand. We begin by presenting the character-istics of the data itself. In the subsequent section we presentcommunity modeling and analysis of interactions, focusing onbasic and advanced features of the network. In the later sectionwe leverage these networks for popularity analysis.

1http://www.val.se/

Fig. 1. Frequency of tweets during the crawl timeline.

A. Data Aggregation

We focus crawled public Twitter streams 2 from the be-ginning of February 2014 until the end of September 2014,a while after the general election. We gathered a total of7,000,000 tweets within Sweden. More than 70% are writtenin Swedish language. From these tweets almost the 2,000,000had political content, referring to either the elections, debate,a specific party or a politician. From these data we identified21,000 users, from which we were able to relate to 130,000profiles associated with a specific party. Crawler used threekind of filters, one for the coordinates of the country, one forpolitical and election related hashtags (e.g. #svpol, #val2014)and one for the political profile accounts (e.g. @Folkpartiet,@Feministerna). A selection of most frequent used hashtagsper each party was presented in table I.

Figure 1 visualizes the frequency of tweets during the time-line of the crawl. As seen from the image, we can spot twobursts on the plot. Two visualized bursts are attributed toEuropean parliamentary and then Swedish general elections.It is interesting to observe higher density of tweets during theEuropean elections as compared to national elections. This isfurther motivation to distinguish between two election time-lines, specially for the European parliamentary election.

B. European Parliamentary Election Time-Line

In this section we focus on the time-line during mainlythe European elections. Tweets analyzed for this time-linecorresponds to the May month. Choice of studying thistime-line comes from the fact of observing hight densityof tweets, compared to general election. We focus on thecommunity of political parties on Twitter. To model suchcommunity formation we study the graph shaped at the heartof the interactions taking place in between political parties.While researchers differentiate between analyzing Retweet andMention graphs, for our study we combine both of thesegraphs. Such community analysis will help us understand howcoherent the parties are in terms of their internal formation,meaning how much they interact with their own members aswell as how much other parties interact with them.

There are practical issues with respect to producing suchplot. For instance, we have realized that members usually

2https://dev.twitter.com/streaming/public

Page 4: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

Party Acronym Title European Election General Election #Posts #Mentions #Accounts Hashtag ValenceM Moderates 13,65% 23,33% 558 12124 2459 #jobbvalet #sverigemotetC Center 6,49% 6,11% 889 6742 1459 #gbgftw #klimatFP Liberals 9,91% 5,42% 624 7049 1400 #Almedalen #kfgbgKD Christian Democrats 5,93% 4,57% 373 29 691 #alliansen #svpolS Socials 24,19% 31,01% 839 15383 2459 #svpol #pidebatt #svtagendaV Left 6,3% 5,72% 267 9799 1425 #08pol #intetillsalu #kollektivavtal

MP Greens 15,41% 6,89% 915 20811 3853 #mpkongress #redofor #miljoSD Sweden Democrats 9,67% 12,86% 203 3551 697 #sd #LinkopingFI Feminists 5,49% 3,12% 2503 32585 4951 #feminism #Ukraina #taplatsP Pirates 2,23% 0,43% 450 6953 1215 #sd #val2014 dinrost

TABLE ISWEDISH POLITICAL PARTIES: TABLE SUMMARIZES THE PARTY NAMES, THEIR RESPECTIVE ACRONYMS, COUNT OF POSTS, COUNT OF MENTIONS,

COUNT OF USERS, AND MOST FREQUENTLY USED HASHTAGS.

Fig. 2. Labeled visualization of the party interactions during Europeanparliamentary time-line. Visualization is Force-directed and uses Kamada-Kawai layout. Nodes have been labeled with respect to their parties.

prefer to tag the alliance instead of the individual parties.Figure 2 shows the force directed visualization of the graph.Being a multi-partisan political scene, nodes have been coloredto reflect on this fact. Such community formation depicts howusers associated with a political party communicate with eachother. Each node in the graph represents a political user andeach edge indicates that these users mentioned each otherat least in one of their tweets. The graph is result of theanalysis result of 20,000 nodes and containing almost 40,000edges. We filtered out only discussion involving the politicians.To produce such dense population we put a threshold onnumber of tweets selecting only very active users. Each partycomprises official accounts of each parties and their respectivemembers.

The community formation depicts cohesion in interactionbetween members of each parties. This is depicted by howdense nodes are clustered on the graph. As observed, FolkParty (FP), Center Party (CP), Sweden Democrats (SD) and inturn Feminists (F) show high cohesion by keeping their wordswithin their own inner circle. Visualization also shows howdisperse each party’s interaction has been through engaging in

Fig. 3. Continuous Powerlaw distribution fitting for European conversationnetwork, using a log-likelihood estimator with α= 14.13428, and p=0.9999887

discussion with members of other parties. This is depicted byhow much scattered nodes of the parties are with respect toother party nodes. For instance we can observe high dispersionin discussion between Moderates (M) and Social Democrats(S) and their neighboring parties.

While the visualization reveals a lot of interesting factsabout the nodes of the graph, we won’t know much about theformation of edges of network. Figure 5 shows cumulativedistribution for total degrees of edges in European electionnetwork. Plots show these distributions for the interactionstaking place during the month leading to European electionmonth and also afterwards. As seen, graph seems be properfit for a power law distribution. We compare this degreedistribution to degree distribution of general election in nextsection.

C. Swedish General Election Time-Line

In this section we focus on the time-line during mainly thegeneral elections. General election time-line corresponds tothe September month. According to data gathered, volumeof discussion during general election peaks during the weeksleading to election and will continue for couple of weeksafter the elections. The graph of general elections containsover 50,000 nodes and 440,000 edges. This graph containsthe discussion between public and politicians, explaining such

Page 5: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

Fig. 4. Unlabeled visualization of all conversations during general electionstime-line. Identifiers visualized mark the most active users with largest out-degrees in the graph.

Fig. 5. Continuous Powerlaw distribution fitting for general election network,using a log-likelihood estimator with α= 12.74022, and p=0.9995703

larger data. This means we have to makeup for the impactof having larger number of tweets. Visualization of graph infigure 4 reveals overall conversations same community shapeas of compared to IV-B. Visualization is not labeled dueto size of graph, although source and destination of edgesare colored differently to reveal source and destination ofinteractions. Even though the graph is not labeled, almost thesame community formation is revealed.

Figure 5 shows cumulative distribution for total degrees ofedges in general election network.

Plot shows these distributions for the interactions takingplace on the month of general election month. Edge distri-butions for general elections also seem to be a proper fit forpower law distribution. Visually speaking, degree distributions

for latter network seems to be a better representation of a scale-free network [25], as one might expects in a social networksetting.

D. Link Prediction with Electoral Networks

Following the introductory analysis of characteristics ofnetworks at hand, we explain how we have used link predictionto analyze the evolution of the networks at hand to measurea model of popularity for the parties. Given that a bipartitegraph can be extracted from networks introduced, one couldestimate a notion of popularity based upon the density of inter-actions they receive or provide. What makes link predictionof interest are two folds: primarily, link structure semanticscould explain the density of conversations about parties andtheir respective members, which in turn should reflect on theirpopularity within the network at hand. Secondarily, dynamicsof interactions with the accounts of politicians, reflects onany shift in popularity that a stochastic approach link miningapproach can capture.

1) Stochastic Link-Structure Mining: We leverage an invari-ant of SALSA (Stochastic Link-Structure Analysis)[26] forestimating the scores of nodes in link prediction. similar toHITS [27], SALSA uses the random walks to constructs abipartite graph from a collection of nodes, which eventuallyconverge onto two sets of nodes, collectively referred to asas Hubs and Authorities. Random walks are bi-directionaldue to bi-directional edges of graph which also guaranteeconvergence of the algorithm. This guarantees that we alwaysdevise a bipartite graph from the source edge list. Each walkleverages a set of Markov chains, and each chain is dependenton the stochastic properties of random walks performed on thegraph.

2) Personalized PageRank: Having the bipartite graph athand, we would like to rank the nodes. To address the problemof ranking the vertices in a bipartite graph, we have leveragedan estimation of Personalized PageRank (PPR). One of themost fundamental graph computation problem, personalizedPageRank (PPR) has proven to be effective in link predictionand friend suggestions in on-line social networks [28], [29].PageRank is computed by executing a set of random walksthat at each iteration, with a probability P , picks a randomnode and with probability 1 − P follows a random outgoinglink from the node at hand. Personalized PageRank is thesame as PageRank [30], although all the random picks ofnodes eventually end at the source node, for which we areindividualizing the PageRank for. Given the graph G(V,E),personalized PageRank of node v, from the source node u,referred to as πu(v) is computed as follows [31] :

πu(v) = εδu(v) + (1 − ε)∑

w|(w,v)∈E

πu(w)αw,v

Leveraging Personalized PageRank in turn generates a col-lection of egocentric (personalized) random walks. Egocentric-ity of the walks are implemented by not visiting adjacent nodesduring each active walk. One might consider visiting adjacentnodes if interested in modeling opinion influence [32].

Page 6: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

3) Scaling Massive Random Walks: With increasing size ofgraphs, a single random walk is not enough to satisfy time andspace for computation in social networks. For overall compu-tations, we chose GraphChi [33] that implements a resourceefficient framework for large scale graph processing. Forsimulating the random walks we worked with DrunkardMob[34] component of GraphChi, which allows parallelization oflarge random walks to be executed. We chose this componentto address increasingly important issue of executing largequantities of random walks in a resource efficient manner. Tosummarize, each iteration of our link analysis approach repeatthe following steps:

1) Simulate an egocentric random walk and pick f mostfrequently visited vertices.

2) Build a bipartite graph by placing f vertices on the Hubmatrix, and a subset of their respective followers on theAuthority matrix.

3) Execute SALSA on the graph and take the top k-scoredvertices on the Authority matrix as the Top-followers ofthe node.

Such as most applications, we work with the top-k values(and their corresponding nodes) in each PageRank vector,given the suitable k of course.

E. Popularity Estimates: Parties, Members and Coalitions

We experimented with proposed link analysis approachusing our two sets of graphs. We took in around 50000 sourcenodes, in case of European election network, to 100000 sourcenodes for general election network in two separate experi-ments. For each iteration of bi-directional SALSA execution,we executed at least 50 million walks. To analyze the results,we devised an analysis focused only on party accounts, andanother analysis focused on a selection of member (individual)accounts. The identifiers were chosen from a popular portalthat lists Swedish political presence on Twitter 3. This decisionis due the fact that a political party has its own account, whilepoliticians often have separate personal accounts. The accountschosen for the study have been taken from this listing for thesake of validation.

Figure 6 respectively show estimated size of top-k (out-degree) scores for each parties according to personalizedPageRank values. In each experiment, we took the identifierof the source node of the egocentric walk and estimated thepersonalized follower score accordingly. Figure 6 show theresults of computation for official party accounts on Twitter.Compared to election results the popularity rankings of theparties for European elections match the outcome to a largeextent, as seen Social Democrats and their respective coalitionsstand above the Moderates and their coalitions. This is evenmore visible for Swedish Democrats as the European electionsreflected on their rise to popularity. By far this is the bestpopularity ranking estimated. This is while comparativelylooking at the popularity standings of general elections, we seealmost the similar results which are comparable to outcome

3Svensk politik pa Twitter. http://www.inetmedia.nu/twitter/politik.shtml

Fig. 6. Estimated popularities during the time-lines of European and generalelections: The official accounts of each party.

Fig. 7. Estimated popularities of coalitions: aggregated scores for partyaccounts (top), aggregated scores for member accounts (bottom).

results by small difference. The difference between scores areinteresting to observe.

Figure 7 visualises the comparison of dynamics of grouprankings. The aim is to see if group-wise we can estimate theright standing for two popular coalitions (Allianz and Red-Greens), followed by third runner-up Sweden Democrats. Inthe comparison, both European and general election aggregatesseem to reflect on the outcome of rankings by significantlydistinguishing between the place of all three groups.

We computed a median of scores for each group of indi-vidual members. Since member scores are combined in theprevious figure, we have plotted the individual standings infigure 8. While Moderates and Social Democrats comprisethe largest parties, small gains in the popularity estimates forModerates compared to very large gain in popularity estimatesfor Social Democrats reflect the standings of first and secondplaces in both elections. Even more interesting is large gain

Page 7: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

Fig. 8. Dynamics of individual politician neighborhoods during the time-linesof European and general elections..

in popularity of leader of the third party Social Democratsthat is visible in the plot. The result of general election issimilar to general election vote outcome, which makes a casefor studying interactions with member accounts, not only partyaccounts. Given the observed relationship between results ofmeasure popularity rankings within the next section we studythe statistical correlation between our results and the officialoutcomes corresponding to respective electoral time-lines.

F. Correlating Network Dynamism to Vote Outcomes

We observed that results presented have revealed interest-ing commonalities between link-structure driven popularitystandings for member and party accounts on Twitter. But thequestion remain how strong these commonalities are.

To measure any possible dependence between two results,we take the popularity standings as one statistical variable andvote outcome as another variable. We devised two sets of teststo see if we can evaluate both unranked and most favorably,ranked dependence. Ranked correlation is most favorable aswe are dealing election standing list. First, we used Pearson’scoefficient (denoted as Pearson’s r), which is the covariance ofthe variables divided by product of their standard deviations,estimated as follows:

r =Σ(xi − x)(yi − y)√

Σ(xi − x)2Σ(yi − y)2

Where xi and yi are the statistical variables. Pearson is moreeffective in fitting a linear dependence between values. Mostfavorable for the ranked correlation, Spearman’s coefficient(denoted as Spearman’s ρ) is a more general metric of statisti-cal correlation between two variables. Spearman’s coefficientcan show how well the dependence between two variables canbe explained using a monotonic function, and is estimated asfollows:

ρ = 1 − 6Σd2in(n2 − 1)

Where n is the size of population and di is the differencebetween the ranks. Table II show how positive the results ofthe statistical hypothesis checking are with positive coefficientfor most of the comparisons.

Pearson coefficients show that there is a strong linearcorrelation between the popularity rankings of both party andmember accounts for European election time-line. This iswhile linear dependence for general election time-line is notas high as general election. What is interesting for us is theresult of Spearman ranks, which show quite strong dependencefor both European and general election time-lines. The mostinteresting result is how the choice of ranking of memberaccounts shows strong dependence with outcome of generalelection, while the choice of ranking of party accounts showsstrong dependence with outcome of European elections.

To justify the latter results, we explain possible behaviouraltraits that might contribute to these observations. First ofall, the density of interactions with party accounts duringEuropean parliamentary time-line as we saw in figure 1 israther high. This is while authority reported low voter turnover (reported 51,07 % of eligible population)4, which wouldexplain how a proportion of voters followed the election eventon-line, instead of going to ballots. Comparatively, the gain inincreased interaction with member accounts during the generalelections could explain increased participation from voters(reported 85,81% of eligible population)5. Second, resultshows how activity of member accounts could have significantimpact on explaining result of elections, as compared to solelyfocusing on the party accounts. This is also verifiable by thefact that the number of Swedish politicians using Twitter, areincreasing thus attracting public attention and interaction.

V. CONCLUSION AND FUTURE WORK

This paper aimed at adding to further evidence that Twitterdata can actually be used to explain and perhaps predict the theoutcome of the elections. We studied a link mining approachthat leverages the structural features of the interaction networkunderlying the conversation with politicians during the time-line of two elections. We presented evidence of how ourapproach reveals an authoritative structural link formationwithin which the popularity of the political accounts alongwith their neighbourhoods, shows strong correlation with thevote outcome. The public time-lines of two electoral eventsfrom 2014 elections of Sweden on Twitter were studied. Bydistinguishing between individual and official party accounts,we report that estimated popularities reveal strong statisticalsimilarities with vote outcomes. We also revealed strongranked dependence between standings of selected politiciansand general election outcome, along with official party ac-counts and European election outcome. For future work, firstwe want to analyze other elections using link prediction tosee if we can generalize this concept. We will focus on textualcontent of tweets using sentiment analysis and topic modelling.

4http://www.val.se/val/ep2014/slutresultat/E/rike/index.html.5http://www.val.se/val/val2014/slutresultat/R/rike/index.html.

Page 8: Predicting Swedish Elections with Twitter: A Case for Stochastic Link Structure Analysisnimadokoohaki.com/papers/snaa2015.pdf · 2017-04-17 · that Twitter can complement traditional

TABLE IICORRELATION COEFFICIENTS FOR MEASURED LINK-STRUCTURE POPULARITIES AND OFFICIAL VOTE STATISTICS.

Member accounts Party accountsTimeline Pearson’s r Spearman’s ρ Pearson’s r Spearman’s ρ

European Election 0.792188682 0.375183 0.820957353 0.862921General Election 0.649179744 0.761513 0.54171718 0.394972

We are also planning to study the diversification and languageevolution in the linguistic features of political tweets [35].

VI. ACKNOWLEDGEMENT

This is to acknowledge that the work was carried out duringthe tenure of an ERCIM ”Alain Bensoussan” Fellowship.

REFERENCES

[1] S. Asur and B. Huberman, “Predicting the future with social media,”in Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010IEEE/WIC/ACM International Conference on, vol. 1, Aug 2010, pp.492–499.

[2] A. Tumasjan, T. Sprenger, P. Sandner, and I. Welpe, “PredictingElections with Twitter: What 140 Characters Reveal about PoliticalSentiment.” ICWSM, pp. 178–185, 2010.

[3] P. Metaxas, E. Mustafaraj, and D. Gayo-Avello, “How (not) to predictelections,” in 2011 IEEE Third Inernational Conference on SocialComputing (SocialCom), Oct 2011, pp. 165–171.

[4] D. Garcia, F. Mendez, U. Serdult, and F. Schweitzer, “Political polar-ization and popularity in online participatory media,” in Proceedings ofthe first edition workshop on Politics, elections and data - PLEAD ’12.New York, New York, USA: ACM Press, Nov. 2012, p. 3.

[5] M. Conover, J. Ratkiewicz, M. Francisco, B. Goncalves, A. Flammini,and F. Menczer, “Political polarization on twitter,” in Proc. 5th Interna-tional AAAI Conference on Weblogs and Social Media (ICWSM), 2011.

[6] A. Livne, M. Simmons, E. Adar, and L. Adamic, “The Party Is OverHere: Structure and Content in the 2010 Election.” ICWSM, pp. 201–208, 2011.

[7] H. Lietz, C. W. A. Bleier, and M. Strohmaier, “When politicianstalk: Assessing online conversational practices of political parties ontwitter,” in International AAAI Conference on Weblogs and Social Media(ICWSM), 2014.

[8] D. Liben-Nowell and J. Kleinberg, “The link-prediction problem forsocial networks,” Journal of the American Society for InformationScience and Technology, vol. 58, no. 7, pp. 1019–1031, 2007.

[9] D. Yin, L. Hong, and B. D. Davison, “Structural link analysis andprediction in microblogs,” Proceedings of the 20th ACM internationalconference on Information and knowledge management - CIKM ’11, p.1163, 2011.

[10] N. Dokoohaki and M. Matskin, “Mining divergent opinion trust net-works through latent dirichlet allocation,” in Proceedings of the 2012International Conference on Advances in Social Networks Analysis andMining (ASONAM 2012), ser. ASONAM ’12. Washington, DC, USA:IEEE Computer Society, 2012, pp. 879–886.

[11] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a socialnetwork or a news media?” in Proceedings of the 19th InternationalConference on World Wide Web, ser. WWW ’10. New York, NY,USA: ACM, 2010, pp. 591–600.

[12] C. Honey and S. Herring, “Beyond microblogging: Conversation andcollaboration via twitter,” in System Sciences, 2009. HICSS ’09. 42ndHawaii International Conference on, Jan 2009, pp. 1–10.

[13] C. Wagner, P. Singer, L. Posch, and M. Strohmaier, “The wisdom of theaudience: An empirical study of social semantics in twitter streams,”in The Semantic Web: Semantics and Big Data, ser. Lecture Notes inComputer Science, P. Cimiano, O. Corcho, V. Presutti, L. Hollink, andS. Rudolph, Eds. Springer Berlin Heidelberg, 2013, vol. 7882, pp.502–516.

[14] A. Jungherr, “Tweets and votes, a special relationship: The 2009 federalelection in germany,” in Proceedings of the 2Nd Workshop on Politics,Elections and Data, ser. PLEAD ’13. New York, NY, USA: ACM,2013, pp. 5–14.

[15] D. Gayo-Avello, “A meta-analysis of state-of-the-art electoral pre-diction from twitter data,” Social Science Computer Review, p.0894439313493979, 2013.

[16] N. D. Prasetyo, “Tweet-based election prediction,” Ph.D. dissertation,TU Delft, Delft University of Technology, 2014.

[17] Nielsen, “Buzz and the ballot box: what is social media’s relationshipwith politics?” Apr. 2012. [Online]. Available: http://www.nielsen.com/us/en/insights/news/2012/social-media-buzz-and-us-politics.html

[18] M. Gaurav, A. Srivastava, A. Kumar, and S. Miller, “Leveraging candi-date popularity on twitter to predict election outcome,” in Proceedings ofthe 7th Workshop on Social Network Mining and Analysis, ser. SNAKDD’13. New York, NY, USA: ACM, 2013, pp. 7:1–7:8.

[19] M. P. Cameron, P. Barrett, and B. Stewardson, “Can social mediapredict election results? evidence from new zealand,” Journal of PoliticalMarketing, vol. 0, 2014.

[20] A. Makazhanov and D. Rafiei, “Predicting political preference of twitterusers,” in Proceedings of the 2013 IEEE/ACM International Conferenceon Advances in Social Networks Analysis and Mining, ser. ASONAM’13. New York, NY, USA: ACM, 2013, pp. 298–305.

[21] L. Getoor and C. P. Diehl, “Link mining: A survey,” SIGKDD Explor.Newsl., vol. 7, no. 2, pp. 3–12, Dec. 2005.

[22] J. Zou and F. Fekri, “Exploiting popularity and similarity for link recom-mendation in twitter networks,” in Proceedings of the 6th Workshop onRecommender Systems and the Social Web (RSWeb 2014), ser. RSWeb’14. CEUR-WS, 2014.

[23] G. Yuan, P. K. Murukannaiah, Z. Zhang, and M. P. Singh, “Exploitingsentiment homophily for link prediction,” in Proceedings of the 8th ACMConference on Recommender Systems, ser. RecSys ’14. New York, NY,USA: ACM, 2014, pp. 17–24.

[24] A. O. Larsson and H. Moe, “Studying political microblogging: Twitterusers in the 2010 Swedish election campaign,” pp. 729–747, 2012.

[25] A.-L. Barabasi, “Scale-free networks: a decade and beyond.” Science(New York, N.Y.), vol. 325, pp. 412–413, 2009.

[26] R. Lempel and S. Moran, “Salsa: The stochastic approach for link-structure analysis,” ACM Trans. Inf. Syst., vol. 19, no. 2, pp. 131–160,Apr. 2001.

[27] J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,”J. ACM, vol. 46, no. 5, pp. 604–632, Sep. 1999.

[28] P. Gupta, A. Goel, J. Lin, A. Sharma, D. Wang, and R. Zadeh, “Wtf:The who to follow service at twitter,” in Proceedings of the 22NdInternational Conference on World Wide Web, ser. WWW ’13, 2013,pp. 505–514.

[29] L. Backstrom and J. Leskovec, “Supervised random walks: Predictingand recommending links in social networks,” in Proceedings of theFourth ACM International Conference on Web Search and Data Mining,ser. WSDM ’11. New York, NY, USA: ACM, 2011, pp. 635–644.

[30] L. Page, S. Brin, R. Motwani, and T. Winograd, “The pagerank citationranking: Bringing order to the web,” Stanford University, Tech. Rep.,1999.

[31] B. Bahmani, A. Chowdhury, and A. Goel, “Fast incremental andpersonalized pagerank,” Proc. VLDB Endow., vol. 4, no. 3, pp. 173–184, Dec. 2010.

[32] Y. Eom and D. Shepelyansky, “Opinion formation driven by pageranknode influence on directed networks,” CoRR, vol. abs/1502.02567, 2015.

[33] A. Kyrola, G. E. Blelloch, and C. Guestrin, “Graphchi: Large-scale graphcomputation on just a pc.” in OSDI, vol. 12, 2012, pp. 31–46.

[34] A. Kyrola, “Drunkardmob: Billions of random walks on just a pc,” inProceedings of the 7th ACM Conference on Recommender Systems, ser.RecSys ’13. New York, NY, USA: ACM, 2013, pp. 257–264.

[35] H. Holzmann, N. Tahmasebi, and T. Risse, “Named entity evolutionrecognition on the blogosphere,” International Journal on Digital Li-braries, vol. 15, no. 2-4, pp. 209–235, 2015.