comparative analysis of stack exchange forums: effects of ...snap.stanford.edu › ...2015 ›...

8
Comparative Analysis of Stack Exchange Forums: Effects of the No-Discussion Policy Jungsun Kim, Sudarshan Srinivasan, Ivan Gozali Stanford University Email:{jungsun, sudarsh2, igozali}@stanford.edu Abstract—Many studies have used graph analysis to examine differences in graph structures between fact-based and discussion- based topics, to study activities of users in traditional Q&A forums and to identify expertise in the network. In this paper, we study whether these differences are prevalent on Stack Exchange, where a no-discussion policy is enforced by the community, which aggressively closes, locks and/or deletes questions that would provoke discussions and subjective answers. We find that Stack Exchange forums on different topics display significant differences in graph structure between topics that are inherently discussion-based and those that are inherently fact based despite the “no-discussion” policy. Due to such structural differences, the reputation system is not a robust way to capture underlying network dynamics across all the forums, and therefore may not be the best method to rank users with. I. I NTRODUCTION Stack Overflow [1] was introduced in 2007 as a new forum for users to ask and answer programming related questions. It differentiated itself from existing forums by its novel policies– users were expected to ask questions with simple factual answers; discussion was discouraged. Users asking questions were allowed to “accept” answers that they believed were the most relevant to their question(s). The website hoped to reduce the number of low quality questions by allowing “expert” users to lock and/or delete poor quality questions. The “reputation” mechanism used to identify experts is based solely on upvotes and does not consider any graph metrics, despite previous research [2] suggesting that graph metrics could indeed serve as a useful predictor of expertise. The reputation system has generated much debate, with many users suggesting it can be gamed pretty easily [3], [4]. Despite these concerns, Stack Overflow enjoyed a meteoric rise in popularity over the next decade. The website currently has over 4 million registered users [5] and over 10 million questions. The success of Stack Overflow has led to the creation of a large set of websites following the same rules but focused on many different areas - ranging from history and movies to cooking. These websites are collectively referred to as “Stack Exchange”. Much research [2], [6] has been done on traditional dis- cussion forums such as Yahoo Answers where people are allowed to ask questions with subjective answers and debate over their answers. However, researchers have only recently begun exploring Stack Exchange datasets. Traditional research has concluded that forums on topics such as movies tend to be discussion oriented rather than Q&A based. However, Stack Exchange enforces a no-discussion policy where moderators are encouraged to close questions with subjective answers. In this project, we perform a detailed graph analysis on Stack Exchange and examine whether it has managed to influence the graph structure across different forums so all of them conform to the graph structure of traditional Q&A forum. To achieve this objective, we construct a graph that consists of accepted answers. We also compare the correlation between reputation and various graph metrics across all forums. The rest of this paper is structured as follows. Section 2 provides a brief overview of existing literature. Section 3 discusses the objective of our study. Section 4 describes the dataset and the method of constructing our graph. Section 5 discusses the results of our analysis. Section 6 concludes our study and discusses directions for future work. II. L ITERATURE R EVIEW A. Knowledge Sharing and Yahoo Answers: Everyone Knows Something [6] This paper examined the structure of the question-answer graph across various topics in Yahoo Answers. The authors concluded that topic sub-forums could be classified into three types based on the nature of activity observed - (1) “discussion” forums where the same people tend to ask and answer several questions, (2) opinion based forums where there are many legitimate answers to each question, and (3) “fact”-based forums. The difference was illustrated using programming as an example of a fact-based forum, and wrestling and marriage as examples of discussion based forums. On the programming forum, there was a clear separation between “askers” and “answerers”, and the largest SCC in the graph was small. Additionally, analysis of the motif profile showed a clear absence of reciprocal links, indicating that most users users specialized either in asking or answering, but not both. On the other hand, discussion topics, such as marriage and wrestling, had interaction motifs that were significantly different, with a relatively higher number of motifs contain- ing reciprocal links. Many users served as both askers and answerers. These topics had significantly larger SCCs as well. Stack Exchange includes websites on several topics, ranging from “fact-based” topics such as programming and science to “discussion” based topics such as movies and politics. However, all Stack Exchange forums seek to build a Q&A database with little to no discussion and/or spam. Given the difference in graph structure observed between Q&A focused forums and discussion forums, it would be interesting to examine whether Stack Exchange forums on topics that traditionally tend to become discussion-based still follow the network structure of fact-based forums. B. Questions in, Knowledge-iN? A Study of Navers Question Answering Community [7] This paper analyzed people’s motivations, roles, usage and expertise in Knowledge-iN, a general-purpose question- answering community site in South Korea. The authors gath-

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

Comparative Analysis of Stack Exchange Forums:Effects of the No-Discussion Policy

Jungsun Kim, Sudarshan Srinivasan, Ivan GozaliStanford University

Email:{jungsun, sudarsh2, igozali}@stanford.edu

Abstract—Many studies have used graph analysis to examinedifferences in graph structures between fact-based and discussion-based topics, to study activities of users in traditional Q&Aforums and to identify expertise in the network. In this paper, westudy whether these differences are prevalent on Stack Exchange,where a no-discussion policy is enforced by the community,which aggressively closes, locks and/or deletes questions thatwould provoke discussions and subjective answers. We find thatStack Exchange forums on different topics display significantdifferences in graph structure between topics that are inherentlydiscussion-based and those that are inherently fact based despitethe “no-discussion” policy. Due to such structural differences,the reputation system is not a robust way to capture underlyingnetwork dynamics across all the forums, and therefore may notbe the best method to rank users with.

I . I N T R O D U C T I O N

Stack Overflow [1] was introduced in 2007 as a new forumfor users to ask and answer programming related questions. Itdifferentiated itself from existing forums by its novel policies–users were expected to ask questions with simple factualanswers; discussion was discouraged. Users asking questionswere allowed to “accept” answers that they believed were themost relevant to their question(s). The website hoped to reducethe number of low quality questions by allowing “expert” usersto lock and/or delete poor quality questions. The “reputation”mechanism used to identify experts is based solely on upvotesand does not consider any graph metrics, despite previousresearch [2] suggesting that graph metrics could indeed serveas a useful predictor of expertise. The reputation system hasgenerated much debate, with many users suggesting it can begamed pretty easily [3], [4].

Despite these concerns, Stack Overflow enjoyed a meteoricrise in popularity over the next decade. The website currentlyhas over 4 million registered users [5] and over 10 millionquestions. The success of Stack Overflow has led to the creationof a large set of websites following the same rules but focusedon many different areas - ranging from history and movies tocooking. These websites are collectively referred to as “StackExchange”.

Much research [2], [6] has been done on traditional dis-cussion forums such as Yahoo Answers where people areallowed to ask questions with subjective answers and debateover their answers. However, researchers have only recentlybegun exploring Stack Exchange datasets. Traditional researchhas concluded that forums on topics such as movies tend to bediscussion oriented rather than Q&A based. However, StackExchange enforces a no-discussion policy where moderatorsare encouraged to close questions with subjective answers.

In this project, we perform a detailed graph analysis on StackExchange and examine whether it has managed to influence thegraph structure across different forums so all of them conformto the graph structure of traditional Q&A forum. To achieve

this objective, we construct a graph that consists of acceptedanswers. We also compare the correlation between reputationand various graph metrics across all forums.

The rest of this paper is structured as follows. Section 2provides a brief overview of existing literature. Section 3discusses the objective of our study. Section 4 describes thedataset and the method of constructing our graph. Section 5discusses the results of our analysis. Section 6 concludes ourstudy and discusses directions for future work.

I I . L I T E R AT U R E R E V I E W

A. Knowledge Sharing and Yahoo Answers: Everyone KnowsSomething [6]

This paper examined the structure of the question-answergraph across various topics in Yahoo Answers. The authorsconcluded that topic sub-forums could be classified into threetypes based on the nature of activity observed - (1) “discussion”forums where the same people tend to ask and answer severalquestions, (2) opinion based forums where there are manylegitimate answers to each question, and (3) “fact”-basedforums. The difference was illustrated using programming asan example of a fact-based forum, and wrestling and marriageas examples of discussion based forums.

On the programming forum, there was a clear separationbetween “askers” and “answerers”, and the largest SCC inthe graph was small. Additionally, analysis of the motif profileshowed a clear absence of reciprocal links, indicating that mostusers users specialized either in asking or answering, but notboth.

On the other hand, discussion topics, such as marriageand wrestling, had interaction motifs that were significantlydifferent, with a relatively higher number of motifs contain-ing reciprocal links. Many users served as both askers andanswerers. These topics had significantly larger SCCs as well.

Stack Exchange includes websites on several topics, rangingfrom “fact-based” topics such as programming and science to“discussion” based topics such as movies and politics. However,all Stack Exchange forums seek to build a Q&A database withlittle to no discussion and/or spam. Given the difference ingraph structure observed between Q&A focused forums anddiscussion forums, it would be interesting to examine whetherStack Exchange forums on topics that traditionally tend tobecome discussion-based still follow the network structure offact-based forums.

B. Questions in, Knowledge-iN? A Study of Navers QuestionAnswering Community [7]

This paper analyzed people’s motivations, roles, usageand expertise in Knowledge-iN, a general-purpose question-answering community site in South Korea. The authors gath-

Page 2: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

ered 2.6 million question-answer pairs across 15 categoriesfrom 2002 to 2007.

They first found that the users could be separated intogroups of askers and answerers, with only 5.4% both askingand answering in the same category. However, this asker-answerer division tended to vary by category. The percentageof users engaging both asking and answering were high insubjects requiring low-level expertise such as Singers (10%).Furthermore, whereas most categories had more askers thananswerers, low-level expertise categories such as Fashion,Movie, Naming and Singer had more answerers than askers.The authors also found that answerers tend to compete less forposting the best answer within a question in topics that requirehierarchical expertise (i.e. there is clear tiers in the level ofexpertise required to answer difficult questions) such as C orC++, compared to flat knowledge (i.e. no domain knowledgeneeded to answer most questions) categories such as Moviesand Singers.

Although this paper provided interesting insights into userdynamics across different categories, the authors did not at-tempt to study the subject from network-analysis perspectiveby constructing asker-answerer graph and analyzing the graphmetrics such as degree distribution and modularity.

C. Expertise Networks in Online Communities: Structure andAlgorithms [2]

This paper analyzed the graph dynamics of community Q&Awebsites using the Java forums as an example. The authorsconstructed an asker-replier graph from forum thread databased on the assumption that messages on a thread constituted“replies” to the question asked by the original poster. Theyfound a significant difference in the forum structure from thatof the web, with most users lying in the “in” component ofthe bow tie. Also, experts tended to reply to vastly more usersthan novices, and reply to questions by users with all levels ofknowledge. The paper also attempted to examine whether userexpertise could be predicted from graph metrics and notedthat simple centrality measures (such as degree centrality)performed just as well as significantly more complex metricssuch as ExpertiseRank (a variant of PageRank specificallyoptimized for the ranking task) due to the unique structureof the network that thwarted propagation of expertise values(since the experts tended to answer questions from novices aswell).

Although this paper performed significant analysis on theJava Forum network, it made a few assumptions that mayhave biased the results. First, the assumption that all postson a thread constituted replies implies including spammersas experts. This point was brought up in the paper, but theauthors dismissed concerns by stating that they did not observespamming behavior. Second, users that replied with incorrectand/or incomplete answers would get the same ratings as thosethat provided comprehensive and detailed solutions- we attemptto fix both of these by only including “accepted” answers inour graph. Additionally, the Stack Exchange network has a zerotolerance policy towards discussion and spamming, with thefocus on collecting good answers to questions. In addition,Stack Overflow’s ranking policy tends to favor answeringdifficult questions - since the replies to these tend to collectmore upvotes over time. We seek to examine whether thenetwork dynamics are affected by these changes.

D. Discovering Value from Community Activity on FocusedQuestion Answering Sites: A Case Study of Stack Overflow [8]

This paper sought to analyze and characterize the StackOverflow network, focusing on the community dynamics. Theauthors noted that Stack Overflow was worth studying due tothe “rate and scale of adoption” of the website. It providedinsights into the similarities and differences between thecommunity processes of traditional forums and Stack Overflow.The observed behavior paralleled the Java Forum in somerespects - for instance, high reputation users tended to answerquestions from everyone (rather than only answering questionsfrom other high ranked users). However, several differencesexist between the Stack Overflow forums and the Java Forums.The content in Stack Overflow tends to be of high quality, dueto a complex self-moderation system that is typically absentin traditional forums.

This paper made us realize that there are several avenuesfor further study on the Stack Exchange data set. In particular,since the time the paper was written, numerous forums haveemerged within the Stack Exchange collection of forums. Thesenewer forums focus on diverse topics, some of which arenot inherently factual. Analysis that considers graph metricscould help quantify the extent to which Stack Exchange hassucceeded in maintaining a simple Q&A forum without spamand endless discussions and debates.

I I I . O B J E C T I V E

There is a lot of existing literature on traditional-Q&Aforums. However, there is a shortage of literature that ex-amines the impact of user-regulation policies (such as StackExchange’s no discussion policy) on the structure of Q&Agraphs. Our goal is to fill in this gap in the existing literature- inparticular, to examine whether Stack Exchange’s no-discussionpolicy affects the underlying graph structure in different forums,some of which are on inherently factual topics and some ofwhich are on topics that tend to provoke discussion.

It is to be emphasized that all forums within the Stack Ex-change framework indeed enforce a no-discussion policy, withthe community aggressively closing and/or deleting questionsthat do not elicit objective responses. This project seeks toanalyze whether the differences observed by [6], [7] still remaindespite the no-discussion policy.

I V. D ATA A N D M E T H O D O L O G Y

A. Dataset

For this project, we utilized the Stack Exchange data dump[9], which is publicly available through the Internet Archive.The data originated from a relational database, and an XML fileis provided for each table. Information about badges, comments,post history, post links, posts, tags, users, and votes are allavailable, which can be parsed and joined to construct graphsusing Python.

The most recent data was posted in August 2015. The StackExchange dataset in its entirety is 29GB even in compressedform. Thus, we used the data from a few Stack Exchangeforums of manageable sizes, on topics with varying levels ofsubjectivity as intuitively determined. We believe it is beneficialto analyze the set of data within different forums in theirentirety, as opposed to taking a random sample from a hugeforum which could lead to bias in the results. We used datafrom the AskUbuntu forum to represent factual-based topics,

Page 3: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

Fig. 1. From left to right: Gephi visualization of the accepted answers graph for AskUbuntu, UX, History and Movies forums. The nodes in the graph werearranged circularly, ordered by decreasing outdegree from the top in the clockwise direction. Nodes with low indegree were colored red and those with highindegree were colored blue

Nodes Edges Self Loops Bidir. Edges Max SCC Max WCCAskUbuntu 35127 59023 11.6073% 0.4140% 2.8070% 79.1300%

UX 5972 8692 1.4151% 0.8169% 4.0188% 86.4032%History 1030 1868 1.6059% 2.7203% 9.2233% 91.7476%Movies 2010 3629 3.3342% 3.7628% 8.1095% 89.5522 %

Fig. 2. Graph Metrics

and Movies, UX and History forums to represent discussion-based topics. It is to be noted that

• we evaluated subjectivity of the topics by examining asample set of random questions from each forum

• three forums were used for discussion-based topics be-cause each of them were smaller in size compared toAskUbuntu

Q:——————————

A:——————————

A:——————————

User:u1

User:u2

User:u3

}}

}

Question

Accepted Answer

Other Answers

A:——————————User:u4

u1

u2 u3 u4

Fig. 3. Accepted Answers Graph. Note that there are nodes for all the usersand that there are no edges for non-accepted answers

B. Construction of the Graph

The graph we constructed can be succinctly described asan accepted answers graph. Figure 3 explains how questionsand answers in the forum translate to the graph in our analysis.A node exists for each user in the forum. For each acceptedanswer, there is an edge from the asker to the answerer.

We insert an edge into the graph only if the answer isaccepted, unlike prior analysis on Q&A forums performedby other studies where there are edges for both accepted andnon-accepted answers. We believe that this will dramaticallyreduce the existence of noise in our graph, as there is a very lowchance that accepted answers are spam in the highly moderatedenvironment of Stack Exchange.

V. R E S U LT S

A. Graph Metrics

Preliminary analysis indicated that there was indeed asignificant difference between different forums, visible even

using simple metrics as seen in figure 2.

In the AskUbuntu forum, we observe that the fraction ofbidirectional edges is much smaller than those of the otherforums. Similarly, the size of the largest strongly connectedcomponent (max SCC) as fraction of the total number of nodesis much smaller in AskUbuntu. Both these results are similarto the observations on the fact-based Java forum in [2]- in fact-based forums, people tend to only assume the role of eitheranswerers or askers, but not both which in turn leads to a largeWCC but a very small SCC. In the discussion based forums,there are users who serve both as askers and answerers, thusthe size of the SCC is significantly larger.

We see a much higher percentage of bidirectional edges inthe History, Movies and the UX forums, which again suggeststhat there are more pairs of users who are both answering andasking each other’s questions. This aligns very well with ourexpectation given that those three topics are discussion-based.This is similar to what was observed in [7], where it was seenthat the percentage of users who tended to serve as both askersand answerers was higher in discussion based categories thanin fact-based categories.

The number of self-loops in these forums leads to significantinsights as well. Originally, we examined the self-loops ineach graph to exclude them from the count of bidirectionaledges. However, the percentage of self-loops actually providesinteresting insights on the dynamics of the community. In fact-based forums, people seek solutions for specific problems theyhave. Thus, there is a significant probability of their findinganswers elsewhere and answering their own question. On thecontrary, in discussion-based forums, people seek insight intowhat other people have to say about the topic, and thus are lesslikely to look for answers elsewhere and answer own questions.

Note that all of these basic metrics have suggested that evenwhen the no-discussion policy is enforced in all these StackExchange forums, patterns typically seen in discussion forums,as described by [6], [7] still emerge.

Page 4: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

Fig. 4. Ego Network Analysis

Fig. 5. Normalized degree distributions. The x-axis represents the degree and the y axis represents the fraction of nodes at that degree

B. Degree Distributions

We produced a log-log plot of indegree and outdegreedistributions of the users in all the forums, shown in figure5. Firstly, looking at the indegree distribution plot, we founda power law distribution similar to the degree distributionsfound in [2], [6]. A power law distribution suggests a rich-get-richer phenomenon similar to the one described in [10].Nodes having high indegree correspond to users that have moreof their answers accepted. If a user has had more of his/heranswers accepted, he/she will be more likely to have more ofhis/her answers accepted in the future since it is more likelythat he/she is an “answerer” rather than an asker. This leadsto a rich-get-richer phenomenon, and as time progresses, wesee that most accepted answers are produced by a minority ofthe people.

We do not immediately observe similarities between theoutdegree distribution plot of Stack Exchange forums andthat of Yahoo Answers [6]. This can be explained by thedifferences in the way the two graphs were constructed. Theasker-answerer graph in [6] has one edge for each reply to a

question, which causes a broader distribution of outdegree inthe users. In contrast, only one edge is created in our graph fora particular question, even when there are multiple answers forthat question, as we only take the accepted answer into account.Thus, in this work the outdegree of nodes is proportional tothe number of questions people ask. The power law observedin the outdegree distribution can be explained by the fact thatusers who ask a lot of questions will be more likely to askquestions in the future since they are more likely to be “askers”rather than answerers.

C. Ego Network AnalysisWe constructed the ego networks for each forum by ran-

domly sampling 100 nodes excluding self-looped nodes. Themost prominent ego network patterns are presented in the figure4. The ego networks of AskUbuntu consist mostly of simplepatterns with one central node only having out-going edges,and those of the other forums have a lot more patterns withintertwined edges. For example, in Movies and UX, we see thatthe neighbors of one node also point to each other, forming ahair-ball like structure. The patterns observed for AskUbuntu

Page 5: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

Nor

mal

ized

z-s

core

Motif

Nor

mal

ized

z-s

core

Fig. 6. Motif Analysis

are similar to those for programming in Yahoo Answers [6]and those observed in the others are similar to marriage andwrestling. Thus, the typical differences observed between fact-based and discussion-based forums still emerge despite the“no-discussion” policy.

D. Motif Analysis

We imported our graph into the tool mentioned in [11] andused it to perform motif analysis on the graphs for the differentforums. Using the tool, we enumerated all triads, and then foreach motif type, the z-score was calculated by constructingseveral random graphs of the same size, and then comparingthe frequency of occurrence of that type to the random graph.

The formula for z-score is the following:

Zi =xi − µσ

where xi is fraction of motif type i in the forum graph, µ isthe average fraction of motif type i in random graphs, and σ isthe standard deviation across the randomized graphs. We thennormalized all z-scores by dividing them by the maximum Zi

observed across all forum graphs, similar to [6].The resultant motif profile is provided in figure 6. It can

be seen that the profile for Movies, UX and History (whichare discussion-based topics) is similar to that for wrestling andmarriage, with almost all the motifs having normalized z-scoresthat are close to zero.

However, the motif profile for the AskUbuntu forum isdifferent. In particular, we observe that type 36 (one useranswering two users’s questions, where the two questionersdon’t interact with each other) and 38 (one user answerstwo users’ questions, where one of those questioners answersthe other user’s question) are significantly over-represented,whereas type 6, 12, 78, and 164 are under-represented. In

these ways, the profile for AskUbuntu is similar to the YahooAnswers’ programming forum which was fact-based.

As suggested by existing literature [6], [7], in factual topicssuch as programming, knowledge is hierarchical, which meansthat there are top experts, intermediates, and beginners. In mostcases, experts can answer both the intermediate and noviceusers, and intermediate users can answer novice users at times(type 38), and only expert users can answer difficult questions(type 36). The cases where (1) experts can only answer oneof the lower-expert users but not the other (type 6 and 12), (2)pairs of users ask and answer each others questions (type 78),and (3) experts being answered by a lower-expert user (type164) are rare due to the variance in users’ knowledge level.

To further study the reason behind the difference in motifsacross forums, we plotted the distribution of users’ reputationsin figure 7. We see that the proportion of intermediate users ishigher in discussion based forums than in fact-based forums,which explains the higher representation of flat-hierarchyinteractions (e.g. users of similar expertise level help eachother out) in the motifs of Movies, UX, and History.

E. Correlation Analysis: Reputation and Graph Metrics

We measured the correlation between various graph metricsand reputation across different forums. We take the top 100users ordered by each metric and measured the correlationbetween the rankings produced by each metric with therankings produced by the users’ reputations, using Spearman’srho (ρ) and Kendall’s tau (τ ) rank correlation coefficient. Theformula for these metrics are reproduced below.

τ =(# concordant pairs)− (# discordant pairs)

12n(n− 1)

Page 6: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

100 101 102 103 104 105

Reputation (log)

0

20000

40000

60000

80000

100000

120000

140000N

um

ber

of

Use

rs

Reputation Histogram (Askubuntu)

100 101 102 103 104 105

Reputation (log)

0

1000

2000

3000

4000

5000

6000

Num

ber

of

Use

rs

Reputation Histogram (History)

100 101 102 103 104 105

Reputation (log)

0

5000

10000

15000

20000

25000

30000

35000

Num

ber

of

Use

rs

Reputation Histogram (Ux)

100 101 102 103 104 105

Reputation (log)

0100020003000400050006000700080009000

Num

ber

of

Use

rs

Reputation Histogram (Movies)

Fig. 7. Reputation Histogram

where given n pairs (x1, y1) to (xn, yn), a pair (xi, yi), isconcordant if xi > xj and yi > yj , for example, and discordantotherwise, and

ρ = 1− 6∑d2i

n(n2 − 1)

where di = xi − yi, is the difference between ranks.

ExpertiseRank (Pagerank): The ExpertiseRank [2] of a userA (ER(A)) is calculated using the equation below:

ER(A) = (1− d) + d

(ER(U1)

C(U1)+ ...+

ER(Un)

C(Un)

)where,

A answered U1, ..., Un

0 ≤ d ≤ 1 = damping factorC(Ui) = total number of users helping Ui

In our accepted answers graph, C(Ui) corresponds to theoutdegree of node i, so the ExpertiseRank is equivalent toPageRank on our graph. We observed (Figure 8) that Expertis-eRank has a higher correlation to reputation in discussion-basedforums than in AskUbuntu.

As discussed previously, in discussion-based forums, we ex-pect that users of all expertise levels can have high participation

and get their answers accepted. On the contrary, in fact-basedforums, it is difficult to get low-quality answers accepted, andreputation is decided by the quality of answers rather thanthe quantity. This finding, however, is interesting because itsuggests that PageRank, being more correlated with discussionforums, might not be a good measure of expertise.

Weighted Pagerank: In order to study the impact of incor-porating the number of questions posed and answered betweenpairs of users, we computed weighted pagerank, where theedge weight is the number of answers accepted between thetwo interacting users. As in Figure 8, weighted pagerankonly showed a marginal increase in correlation compared toExpertiseRank. Therefore, we conclude that weighted pagerankis not better at indicating reputation than ExpertiseRank.

HITS Scores and Degree Centrality: From Figure 8, wecan see that HITS authority scores and degree centrality havethe highest correlation to reputation, and HITS hubs scoreshave overall poor correlation. In case of all these three metrics,however, the distinction between correlation for AskUbuntuand discussion-based forums was not detected. That is, therewas no clear positive relation between correlation coefficientsand subjectivity of a forum.

We expected to see high correlation between reputationand degree centrality, simply because the number of accepted

Page 7: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Exper2seRank Weightedpagerank HITS(Authority) HITS(Hubs) Degreecentrality

Spearman'srhoforvariousmetrics

AskUbuntu

Movies

History

UX

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Exper1seRank Weightedpagerank HITS(Authority) HITS(Hubs) Degreecentrality

Kendalltauforvariousmetrics

AskUbuntu

Movies

History

UX

Fig. 8. On the left and right: Spearman’s Rho for graph mtrics across forums; Kendall Tau for graph mtrics across forums

Upvotes0 50 100 150 200 250 300 350

Answ

ers

coun

t (lo

g sc

aled

)

100

101

102

103

104

105 AskUbuntu Answer Upvotes Distribution

Upvotes0 50 100 150 200 250 300 350

Answ

ers

coun

t (lo

g sc

aled

)

100

101

102

103 Movies Answer Upvotes Distribution

Upvotes0 50 100 150 200 250 300 350

Answ

ers

coun

t (lo

g sc

aled

)

100

101

102

103

104 UX Answer Upvotes Distribution

Fig. 9. From left to right: Answer upvotes histogram on AskUbuntu, Movies and UX forums.

ExpertiseRank Weighted Pagerank HITS (Authority) HITS (Hubs) Degree CentralitySRho KTau SRho KTau SRho KTau SRho KTau SRho KTau

AskUbuntu 0.5613 0.4004 0.5858 0.4194 0.8524 0.6739 0.2434 0.1669 0.8089 0.6202Movies 0.7053 0.5285 0.7138 0.5354 0.8756 0.6962 0.6897 0.5091 0.8829 0.7022History 0.7268 0.5386 0.7336 0.5467 0.7826 0.5891 0.3965 0.2820 0.8532 0.6642

UX 0.6505 0.4869 0.6269 0.4626 0.7637 0.5931 0.3994 0.2857 0.7746 0.5887

Fig. 10. Correlation of reputation with various metrics in the accepted answers graph for various forums

answers is expected to correlate with the number of usershelped, which translates directly to indegree in our graph.However, as is the case with PageRank, we see a highercorrelation with degree centrality in discussion based forumsthan in fact-based forums.

In order to analyze these differences, we plotted histogramsof upvotes on the accepted answers as upvotes constitutea significant portion of most users’ reputations. The plotsare shown in figure 9. We scaled the x-axis in all threehistograms so that they have the same limits. We observed thata significant proportion of answers had a high upvote count inthe AskUbuntu forum. On the other hand, in the Movies forum,we observe that most of the answers have low and almost equalupvote counts. The UX forum exhibits anomalous behavior,because the distribution is similar to AskUbuntu although motifanalysis seems to suggest it is discussion based.

V I . C O N C L U S I O N A N D F U T U R E W O R K

A. Conclusion

Despite the no-discussion policy, it is clear that there aresignificant differences in structure between forums on topicsthat provoke discussion and those on topics that are inherently

factual. Therefore, the no-discussion policy does not seem toaffect the underlying structure of the graphs.

We also observed that due to the structural differences ingraphs, some graph metrics correlate better to reputation incertain forums than the others, but it was not always the casethat there was a clear distinction between the fact-based anddiscussion-based forums. The inconsistency was explainedby the difference in the distribution of upvotes, a crucialcomponent of reputation, which was not dependent on whethera forum is discussion-based. Since reputation is susceptible tofactors other than the underlying graph structure, we suggestthat the reputation system may not be a robust way to captureunderlying network dynamic across all the forums.

B. Future Work

We note a couple of interesting avenues of exploration tofurther this study. In particular, we could consider collectingground truth data on expertise by manual evaluation, andexamine how good reputation is as a measure of expertise.

Given the ground truth data, we could also explore analternative to reputation that will better correlate to trueexpertise. For example, we could construct the user to questionbipartite graph, where there exists (1) an edge from a user

Page 8: Comparative Analysis of Stack Exchange Forums: Effects of ...snap.stanford.edu › ...2015 › ...Stack_Exchange_Forums.pdfStack Exchange includes websites on several topics, ranging

to a question node if the user asks that question, and (2) anedge from a question node to the user if the user answersthat question. In addition, the edge weights are the upvotes onthe particular question / answer. By doing so, pagerank and/orHITS on the graph can take advantage of nonlocal information,and may have higher correlation to expertise.

V I I . I N D I V I D U A L C O N T R I B U T I O N S

Please assign equal scores to all our team members.

R E F E R E N C E S

[1] “Stack Overflow,” http://stackoverflow.com, Accessed: 2015-10-14.[2] J. Zhang, M. S. Ackerman, and L. Adamic, “Expertise networks in

online communities: Structure and algorithms,” in Proceedings of the16th International Conference on World Wide Web, ser. WWW ’07.New York, NY, USA: ACM, 2007, pp. 221–230. [Online]. Available:http://doi.acm.org/10.1145/1242572.1242603

[3] “Six simple tips to get reputation fast on any Stack Exchange site,”http://meta.stackexchange.com/questions/17204/, Accessed: 2015-10-14.

[4] “ The surest way to gain lots of reputation on Stack Overflow- ask ques-tions,” http://meta.stackexchange.com/questions/33398, Accessed: 2015-10-14.

[5] “Stack Exchange Data Explorer,” http://data.stackexchange.com/stackoverflow/, Accessed: 2015-10-14.

[6] L. A. Adamic, J. Zhang, E. Bakshy, and M. S. Ackerman, “Knowledgesharing and yahoo answers: Everyone knows something,” in Proceedingsof the 17th International Conference on World Wide Web, ser. WWW’08. New York, NY, USA: ACM, 2008, pp. 665–674. [Online].Available: http://doi.acm.org/10.1145/1367497.1367587

[7] K. K. Nam, M. S. Ackerman, and L. A. Adamic, “Questions in,knowledge in?: A study of naver’s question answering community,” inProceedings of the SIGCHI Conference on Human Factors in ComputingSystems, ser. CHI ’09. New York, NY, USA: ACM, 2009, pp. 779–788.[Online]. Available: http://doi.acm.org/10.1145/1518701.1518821

[8] A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec,“Discovering value from community activity on focused questionanswering sites: A case study of stack overflow,” in Proceedings of the18th ACM SIGKDD International Conference on Knowledge Discoveryand Data Mining, ser. KDD ’12. New York, NY, USA: ACM, 2012,pp. 850–858. [Online]. Available: http://doi.acm.org/10.1145/2339530.2339665

[9] “Internet Archive: Stack Exchange Data Dump,” https://archive.org/details/stackexchange/, Accessed: 2015-10-14.

[10] A.-L. Barabasi and R. Albert, “Emergence of scaling in randomnetworks,” Science, vol. 286, no. 5439, pp. 509–512, 1999. [Online].Available: http://www.sciencemag.org/cgi/content/abstract/286/5439/509

[11] S. Wernicke and F. Rasche, “Fanmod: a tool for fast network motifdetection,” Bioinformatics, vol. 22, no. 9, pp. 1152–1153, 2006.[Online]. Available: http://bioinformatics.oxfordjournals.org/content/22/9/1152.abstract