[ieee 2010 international conference on advances in social networks analysis and mining (asonam 2010)...

5

Click here to load reader

Upload: mikolaj

Post on 11-Dec-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

An Analysis of Communities in Different Types ofOnline Forums

Mikołaj MorzyInstitute of Computing Science

Poznan University of TechnologyPiotrowo 2, 60–965 Poznan, Poland

Email: [email protected]

Abstract—The most important feature of Internet forums istheir social aspect. Many forums are active for a long periodof time and attract a group of dedicated users, who build atight social community around a forum. With great abundanceof forums devoted to every possible aspect of human activity, suchas politics, religion, sports, technology, entertainment, economy,fashion, and many more, users are able to find a forum that per-fectly suits their needs and interests. In this paper we introducea micro-community-based model for descriptive characterizationof Internet forums. We show how a simple concept of a micro-community can be used to quantitatively assess the opennessand durability of an Internet forum. We also show that ourmodel is capable of producing a taxonomy of Internet forumsusing unsupervised clustering method. We present the micro-community model, the set of basic statistics, and we apply themodel to several real-world online forums to experimentallyverify the correctness and robustness of the model.

I. INTRODUCTION

An Internet forum is a Web application for publishing user-generated content under the form of a discussion. Usually,the term forum refers to the entire community of users.Discussions considering particular subjects are called topics orthreads. Messages posted to a forum can be displayed eitherchronologically, or using threads of discussion. Most forumsare limited to textual messages with some multimedia contentembedded (such as images or flash objects). Internet forumsystems also provide sophisticated search tools that allow usersto search for messages containing search criteria or to limitthe search to particular threads or subforums.

The type of Internet forum under consideration influencesthe characteristics of the forum. The participation in an Inter-net forum is tantamount to the participation in an establishedsocial community defined by the Internet forum subject. Thedegree of coherence of the community may vary from verystrict (a closed group of experts who know each other), throughmoderate (a semi-opened group consisting of a core of expertsand a cloud of visitors), to loose (fully opened group ofcasual contributors who participate sporadically in selectedtopics). The degree of coherence informs about informationvalue of the forum. Opened forums are least likely to containinteresting and valuable knowledge content. These forumsare dominated by random visitors, and sometimes attract asmall group of habitual guests who tend to come back to

Research supported by the Polish Ministry of Science grant N N516 371236

the forum on a regular basis. Discussions on opened forumsare often shallow, emotional, inconsistent, lacking disciplineand manners. Opened forums rarely contain useful practicalknowledge or specialized information. On the other hand,opened forums are the best place to analyze controversy,emotionality, and social interactions between participants ofthe discussion. Their spontaneous and impulsive characterencourages users to form their opinions openly, so openedforums may be perceived as the main source of informationabout attitudes and beliefs of John Q Public. On the oppositeside lie closed specialized forums. These forums provide highquality knowledge on selected subject, they are characterizedby discipline, consistency, and credibility. Users are almostalways well known to the community, random guests are veryrare, and users pay attention to maintain their status withinthe community by providing reliable answers to submittedquestions. Closed forums account for a small fraction of theavailable Internet forums. The majority of forums are semi-opened forums that allow both registered and anonymoussubmissions. Such forums may be devoted to a narrow subject,but may also consider a broad range of topics. Usually, suchforum attracts a group of dedicated users, who form thecore of the community, but casual users are also welcomed.These forums are a compromise between the strictly closedspecialized forums and the totally opened forums.

This paper focuses on means to quantitatively measureInternet forum communities. We employ a simple model ofmicro-communities to observe how groups of users graduallychange over time, with new users joining micro-communitieswhile other users leaving micro-communities. These smallchanges in contents and structures of micro-communities allowus to infer various macro-measures of Internet forums. Weidentify key statistics that can be used to characterize variousdifferent types of Internet forums and we verify our modelby using an unsupervised data mining algorithm (hierarchicalclustering algorithm) to recreate the taxonomy of a set ofInternet forums based on the contents of our model.

Main contribution of the paper consists in the following:∙ the introduction of the micro-community model,∙ the identification of key features that can be used to

characterize an Internet forum community,∙ the validation of the presented model using hierarchical

clustering algorithm on a set of Internet forums.

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.88

341

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.88

341

2010 International Conference on Advances in Social Networks Analysis and Mining

978-0-7695-4138-9/10 $26.00 © 2010 IEEE

DOI 10.1109/ASONAM.2010.88

341

Page 2: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

A. Organization of the paper

The paper is organized as follows. Section II contains anoverview of the related work. In Section III we introducethe micro-community model for quantifying the evolution ofsocial networks. We present the notion of a micro-communityand we list basic properties of micro-communities measuredunder the model. Section IV presents statistics gathered duringInternet forum crawling that are used to measure and compareInternet forums. These statistics may be further used as inputfeatures for clustering algorithms. We report on the initialfindings in clustering of Internet forums in Section V. Thepaper concludes in Section VI with a summary of findingsand a future work agenda.

II. RELATED WORK

Much research has been conducted on text mining andknowledge discovery from unstructured data. An interestedreader may find a detailed summary of recent findings in thisdomain in [6] and [19]. In addition, much work has been doneon statistical natural language processing. Statistical methodsfor text mining are described and discussed in detail in [11].

Analysis of threaded conversations, which are the predom-inant pattern of communications in the contemporary Web, isan actively researched domain [9], [15]. In particular, manyproposals have been submitted to derive social roles solelybased on the structural patterns of conversations. Examples ofearlier proposals include [23], [18], [7]. A thorough overviewof structural patterns associated with particular social roles,that can be used as structural signatures, can be found in[16]. Most of the recent work has been performed on thebasis of social network analysis methods [10], [2], [5], but theinvestigation of role typology has been an important challengein sociology [14], [13]. Recently, more attention has beengiven to the identification of social roles that are not general,but specific to online communities. The existence of localexperts, trolls, answer people, fans, conversationalists, etc. hasbeen verified [4], [8], [17], [12]. Moreover, the benefits ofbeing able to deduce the social role of an individual withouthaving to analyze the contents generated by that individual arebecoming apparent [20], [21], [22].

III. MICRO-COMMUNITY MODEL

In this section we introduce the micro-community modelfor the analysis of social network evolution over time. EachInternet forum induces a community of authors who contributeto the forum. The authors are attracted by the main subjectof the forum and we can reasonably expect that the authorsshare interest in topics discussed on the forum. The entirecommunity of users of a particular forum is difficult to analyzebecause even for narrowly defined subjects these communitiestend to be large and diverse. Trends and regularities discoveredat the level of the entire community may be too generalto apply to individual authors. Hence, we have decided todiscover smaller building blocks of communities, namely,micro-communities of Internet forum users. The rationalebehind the division of the community into micro-communities

is that smaller micro-communities are more consistent andcohesive.

The authors contributing to a forum create particular bondswhen they write in the same topics. Co-authorship of atopic indicates shared interests, friendship or enmity, all ofwhich express a sense of micro-community. When two authorsconsistently co-author topics, we treat this fact as a strongconfirmation that these persons belong to a micro-community.The process of micro-community discovery consists of threestages. In the first stage we discover micro-groups of authorswho co-author common topics within a particular period ofanalysis. A micro-community should be durable, i.e., it shouldspan over many periods of analysis. In the second stage foreach micro-group we are trying to identify predecessor micro-groups discovered in previous periods of analysis. Finally, inthe third stage similar micro-groups from subsequent periodsof analysis are aggregated to form a micro-community.

A micro-group is a group of authors who participated inat least minsup common threads, where the threshold minsupis defined as minsup = ⌊𝑙𝑜𝑔10 (tcnt)⌋ and tcnt denotes thethread count, i.e., the count of currently active threads on theforum. We have chosen to scale the minsup threshold logarith-mically with respect to the number of posts, because humancapability of participation in different topics of discussion isalmost constant and it does not scale with the size of the forum.On the other hand, we observe that the authors are slightlymore active on large forums, this is why instead of choosinga global constant, we modify the threshold marginally toimprove its filtering power while not compromising its maingoal.

The micro-groups are discovered using the Apriori algo-rithm [1]. Each topic is transformed into a set of usersparticipating in the topic. These sets form the transactiondatabase for the Apriori algorithm. The result of the Apriorialgorithm, i.e., the set of frequent itemsets, is the set of micro-groups discovered in the forum. Let 𝑀𝑖 denote an i-th micro-group discovered by Apriori. Let 𝑇𝐹 denote the set of all topicson the forum 𝐹 , and let support(𝑀𝑖) denote the number oftopics in which the members of the micro-group 𝑀𝑖 jointlyparticipated:

support(𝑀𝑖) =∣{𝑡 ∈ 𝑇𝐹 : 𝑀𝑖 ⊆ 𝑡}∣

∣𝑇𝐹 ∣The Apriori algorithm works by joining frequent itemsets

found in the (i-1)-th iteration to form candidate itemsets forthe i-th iteration. We modify the Apriori algorithm to performan additional filtering by considering the removal of frequentitemsets using the following rule: if a micro-group 𝑀𝑖𝑗 ={𝑢𝑎, 𝑢𝑏, 𝑢𝑐} consisting of users 𝑢𝑎, 𝑢𝑏, 𝑢𝑐 has been created bymerging micro-groups 𝑀𝑖 = {𝑢𝑎, 𝑢𝑏} and 𝑀𝑗 = {𝑢𝑏, 𝑢𝑐},then micro-groups 𝑀𝑖 and 𝑀𝑗 are removed from the resultset if their occurrences are not independent, i.e., if

support(𝑀𝑖) ∗ support(𝑀𝑖𝑗 −𝑀𝑖) < support(𝑀𝑖𝑗)

Each micro-group represents a set of cooperating authorsduring a single period of analysis. In other words, a micro-

342342342

Page 3: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

group is a static snapshot of inter-author cooperation at a givenpoint in time. In order to extend this notion to a sequence ofperiods, we must identify predecessors and successors of eachmicro-group. To enhance the continuity of micro-groups we donot constrain ourselves to only previous period, but for eachmicro-group we are looking for continuity in two predeceasingperiods (i.e., a micro-group 𝑀𝑖 discovered at period 𝑝𝑘 mighthave been inactive during period 𝑝𝑘−1, but had been activeduring period 𝑝𝑘−2). We define the micro-group precedencerelation ≺ in the following way:

𝑀 ′𝑖 ≺𝛼,𝛽 𝑀𝑖 ⇔ ∣𝑀 ′

𝑖 ∩𝑀𝑖∣∣𝑀 ′

𝑖 ∣≥ 𝛼 ∧ ∣𝑀𝑖 −𝑀 ′

𝑖 ∣∣𝑀𝑖∣ ≤ 𝛽

In other words, for a micro-group 𝑀𝑖 to be identified as thesuccessor of the micro-group 𝑀 ′

𝑖 , the following must be true:∙ 𝑀𝑖 must contain at least 𝛼 members of the preceding

micro-group 𝑀 ′𝑖 , and

∙ 𝑀𝑖 must not contain more than 𝛽 new members.The set of all predecessors of a micro-group 𝑀𝑖 until period

𝑝𝑘 is defined as predecessors(𝑀𝑖, 𝑝𝑘) = {𝑀 ′𝑖 ∈ 𝑝𝑘−1∪𝑝𝑘−2 :

𝑀 ′𝑖 ≺ 𝑀𝑖}.

Fig. 1. Micro-groups and micro-communities

Micro-groups form a forest of multitrees with respect to theprecedence relation ≺. Edges in the multitrees represent thedirection of the precedence relation and each level represents asingle period of analysis. An example of such forest is depictedin Figure 1. Following the example, there are three levels inthe forest representing three consecutive periods of analysis.All authors who belong to micro-groups constituting a singlemultitree are aggregated to a single micro-community. Let 𝑇denote a single multitree present in the forest 𝐹 resulting fromthe precedence relation ≺. A micro-community 𝑀𝐶𝑖 is definedas

𝑀𝐶𝑖 = {𝑢𝑗 : 𝑢𝑗 ∈∪

𝑀𝑖∈𝑇

𝑀𝑖}

Again, using the example forest depicted in Figure 1 we canobserve two micro-communities. The first micro-communityconsists of users {𝑢𝑤, 𝑢𝑥, 𝑢𝑦, 𝑢𝑧} and the second micro-community consists of users {𝑢𝑎, 𝑢𝑏, 𝑢𝑐, 𝑢𝑑, 𝑢𝑒, 𝑢𝑓}.

Micro-communities discovered using the above describedprocedure serve as the community model for the entire forum.

They group authors who share common interests, discusssimilar subjects and (probably) know each other, at leastvirtually. Due to the transitivity of the precedence relation≺ a situation may occur when two authors 𝑢𝑞 and 𝑢𝑟 areaggregated to a micro-community despite never having par-ticipated in a common topic. This may happen if a micro-community is very durable and lasts during several subsequentperiods, undergoing a gradual exchange of members at thesame time. Nevertheless, we still believe that it is legitimateto put such users into a single micro-community, because thetransitivity of the precedence relation ≺ implies the similarityof interests of such users. In the next section we show howmicro-communities discovered in Internet forum data can beused to extract statistics, that are further used to quantitativelydescribe the evolution of online communities.

IV. STATISTICS

The data have been collected using a specialized crawlercreated on top of the popular WebSphinx library. We havechosen 34 different Internet forums as targets for the analysis.As can be easily noticed, the set of selected forums is verydiverse, ranging from general purpose discussion boards (stu-dent news, celebrity gossips), through more focused platforms(conservative Christians, photography), up to highly special-ized and narrowly defined (mushrooms pickers). In order tomeasure the dynamics of Internet forum change over timewe have selected a set of statistics. The values for statisticsdescribing chosen Internet forums are presented in Table I.Some statistics utilize the concept of a period, which is a timeinterval between subsequent forum snapshots, i.e., betweensubsequent sampling of the forum. The length of the intervaldepends on the popularity and characteristics of the forum.Very large and dynamic forums should be sampled frequently,whereas small community-driven forums may be sampled lessfrequently. We have decided to compute the length of theperiod independently for each forum, using the followingassumptions: a period cannot be shorter than 3 days and nightsand each period must contain at least 20 posts (as computedfrom the average daily activity of the forum). Each period isrounded to the multiple of 24 hours. However, periods mustnot be disjoint, because it would not permit the continuity ofevents at both ends of a period (e.g., the continuity of a micro-community existence). Therefore, after computing the initiallength of the period, this length is extended by 40% at eachend of the period to form the final period that overlaps withits predecessor and successor. In this way, each event duringthe lifetime of a forum takes place within at most two periods,each period is at least 6 days and nights long, and each pairof consecutive periods shares 40% of common events.

The statistics displayed in the columns of Table I are asfollows: (A) total number of posts submitted to the forum, (B)total number of threads of discussion on the forum (C) averagenumber of posts per thread of discussion, (D) age of the forumin days since forum creation, (E) average number of postsper day, (F) average openness of the forum, where opennessis defined as the number of new authors joining a micro-

343343343

Page 4: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

TABLE IINTERNET FORUM STATISTICS

Forum name A B C D E F G H I J K L M

iPhone 255111 17910 14,24 933 273,43 11,99 0,36 0 0 0,98 0,9 0,27 0,21

Heroes 238441 1078 221,19 1074 222,01 4,57 5,15 0,01 0,01 0,99 0,97 0,13 0,1

General Discussion and... 212936 317 616,65 1874 104,31 16,76 1,31 0,01 0,02 0,97 0,46 0,76 0,67

Nature & Landscapes 195478 25557 8,33 2099 101,45 12,2 2,01 0 0,01 0,84 0,96 0,22 0,14

Conservative Christians 137060 1373 43,03 742 79,61 12,99 4,56 0,01 0,09 0,98 0,92 0,3 0,19

Societies 119968 86 1394,98 1789 67,06 6,1 1,54 0,01 0 0,94 0,38 0,18 0,1

Chat 95238 1407 67,69 1490 63,92 12,06 1,52 0,01 0 0,84 0,51 0,36 0,31

Gilmore Girls 86176 677 202,45 2321 59,05 5,52 2,06 0,01 0 0,88 0,74 0,13 0,11

Sydney 21076 114 119,9 1454 9,4 19 1,42 0,02 0,02 0,46 0,19 0,47 0,37

Brisbane and Gold Coast 13669 108 132,69 2062 6,95 13,14 0,7 0,02 0 0,62 0,29 0,33 0,22

Honda Civic/Del Sol (1992 -... 9487 763 19,28 2413 6,1 0 0 0,04 0 0,84 0,03 0,08 0

Photography 45515 38 622,13 1215 19,46 12,83 3,1 0,01 0,02 1 0,45 0,31 0,21

iPhone Programming 7399 441 5,84 504 5,11 0 2,13 0,05 0 0,68 0,07 0 0,04

Mac Applications 6214 1350 7,03 2300 4,12 0 0 0,04 0 0,56 0,03 0,06 0

MacBytes.com News Discussion 1146 134 8,55 1515 0,76 0 0 0,5 0 0,31 0,05 0 0

Craig Anderton’s Sound, Studi... 6889 219 31,46 1460 4,72 6 0,73 0,03 0 0,75 0,04 0,08 0,1

Tunes & Tracks 1621 55 34,07 1858 1,01 0 0 0,11 0 0,99 0,08 0,08 0

Health and Fitness 4682 26 180,08 1276 3,67 0 0 0,12 0 0,96 0,04 0 0

Computing and Technology 2574 33 49,12 927 1,75 0 0 0,19 0 0,63 0,14 0,13 0

Missions & Evangelism 3863 1037 5,99 2752 2,26 3 1,27 0,01 0 0,94 0,35 0,06 0,03

Songwriting 1874 138 27,99 2608 1,48 0 0 0,08 0 0,96 0,03 0,21 0,13

Mushroom Cultivation 861 40 21,53 2305 0,37 0 0 0 0 0,2 0 0 0

community and staying within the micro-community for morethan 3 periods, (G) average durability of the forum, wheredurability is defined as the number of micro-communities thatlast more than 3 periods, (H) average number of micro-groupsper author, (I) average density of the forum, where densityis defined as the percentage of popular micro-communitiesto which more than 1000 authors belong, (J) activity of theforum, defined as the percentage of periods during whichat least a single post has been submitted to the forum, (K)percentage of active periods, defined as the ratio of the numberof periods during which at least one micro-community hasbeen started to the number of periods during which at leasta single post has been submitted to the forum, (L) averagepercentage of new authors in a micro-community, (M) averagepercentage of authors who quit a micro-community.

The last two features, average percentage of authors joininga micro-community and average percentage of authors quittinga micro-community, are highly correlated, so in subsequentexperiments only the former has been kept. In the next Sectionwe show how these statistics can be used to quantitativelymeasure the characteristics of various online forums.

V. CLUSTERING OF INTERNET FORUMS

Statistics presented in Section IV has been fed to thehierarchical clustering algorithm in order to find subsets ofsimilar forums. The analysis has been twofold. Initially, wehave verified how basic static features related to forum activitycan be used to group similar forums. Next, we have enhanced

the model with parameters drawn from the discovered micro-communities. We have experimented with various clusteringalgorithms (K-Means, Hierarchical Clustering) and distancemeasures (euclidean, Manhattan distance, power distance). Be-low we report on the results obtained from the most promisingexperiment configurations.

The hierarchical clustering model is created based on themixture of static Internet forum features and dynamic featuresof micro-communities. Our clustering algorithm is presentedwith the following features: average openness of the forum,average durability of the forum, average number of micro-communities per user, average density of the forum, andthe percentage of active periods. For this set of featuresthe Manhattan distance measure with Ward’s cutoff criterionproduced too many isolated clusters. Therefore, we used thepower distance measure d(�̄�, 𝑦) = 3

√∑𝑖(𝑥𝑖 − 𝑦𝑖)2 which

resulted in 7 clusters with the cutoff at 0.3 (see Figure 2).Let us now scrutinize a few discovered clusters.

The first cluster, containing forums such as Tunes & Tracks,Health and Fitness, or Computing and Technology, has thefollowing characteristics: very low openness, very low dura-bility, very low density, and a relatively large average numberof micro-communities per user. Micro-communities either donot form within these forums or they exist for a very shortperiod of time. The forums are active, but do not inducecommunity bonds. We may characterize these forums as eitherexpert forums, or classified forums.

Let us now examine the second cluster containing forums

344344344

Page 5: [IEEE 2010 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2010) - Odense, Denmark (2010.08.9-2010.08.11)] 2010 International Conference on Advances

Fig. 2. Hierarchical tree of Internet forums - with micro-communities

such as Sydney, Brisbane and Gold Coast, Honda Civic, orPhotography. All these forums share the following character-istics: high openness, small durability, small average numberof micro-communities per user, and a large number of newusers joining existing micro-communities, very diverse activityperiods. These are vibrant and active forums, with manymicro-communities forming ad hoc, but only few survivingfor a longer period of time. Users join these forums eagerly,but do not contribute or attach to them (except for a small coreset of animators). We refer to these forums as socially coher-ent forums consisting of members of a large, heterogeneouscommunity (students, music lovers, photographers).

VI. CONCLUSIONS

In this paper we have presented our initial findings in thedomain of Internet forum characterization. We have introduceda new micro-community model for quantitatively measuringInternet forum characteristics and we have verified our modelby the means of an unsupervised data mining algorithm byproducing a division of a set of Internet forums into clustersbased solely on statistics found in our model and manuallyverifying the consistency and cohesion of the resulting clus-ters. We believe that our model correctly captures the mostimportant features of Internet forum community and allows forautomatic tagging of Internet forums based on the observedcharacteristics of the underlying communities.

The abundance of Internet forums, ranging from specializedto popular, makes the subject of mining Internet forums bothinteresting and very desirable. Internet forums hide enormousamounts of high quality knowledge generated by immensecommunities of users. Unfortunately, despite initial attempts tocreate a standard for forum data structure [3], these standardsare not widely accepted, which makes the acquisition of forumdata very difficult and laborious. Research presented in thispaper is a step towards automatic knowledge extraction fromthese opened repositories of knowledge. The statistics arefairly simple, but work surprisingly well in the real world.Rather than being a conclusive report, this paper serves as

the starting ground for further research into the evolutionof online communities. In particular, we intend to focus ourattention on the evolution of individual users and to discoverhow microevolution on the level of individual users informsthe macroevolution of the entire community of users.

REFERENCES

[1] Agrawal, R., and Srikant, R. (1994). Fast Algorithms for Mining As-sociation Rules in Large Databases. 20th International Conference onVery Large Data Bases, Santiago de Chile, Chile (pp. 487-499). MorganKaufmann.

[2] Brandes, U., and Erlebach, T. (2005). Network Analysis, MethodologicalFoundations. Springer.

[3] Breslin, J.G., Kass, R., and Bojars, U. (2007). The Boardscape: Creatinga Super Social Network of Message Boards, International Conference onWeblogs and Social Media ICWSM’2007, Boulder, Colorado, USA.

[4] Burkharter, B., and Smith, M. (2004). Inhabitant’s Uses and Reactionsto Usenet Social Accounting Data. In D. Snowdon, E. F. Churchill, & E.Frecon, Inhabited Information Spaces (pp. 291-305). Springer.

[5] Carrington, P. J., Scott, J., and Wasserman, S. (2005). Models andMethods in Social Network Analysis. Cambridge University Press.

[6] Feldman, R., and Sanger, J. (2006). The Text Mining Handbook: AdvancedApproaches in Analyzing Unstructured Data. Cambridge University Press.

[7] Fisher, D., Smith, M., and Welser, H. T. (2006). You Are Who You Talk To:Detecting Roles in Usenet Newsgroups. 39th Annual Hawaii InternationalConference on System Sciences. Kauai: IEEE Computer Society.

[8] Golder, S. A. (2003). A Typology of Social Roles in Usenet. A thesissubmitted to the Department of Linguistics . Harvard University.

[9] Gomez, V., Kaltenbrunner, A., and Lopez, V. (2008). Statistical analysisof the social network and discussion threads in Slashdot. 17th Interna-tional Conference on World Wide Web (WWW’08). Beijing, China. ACMPress.

[10] Hanneman, R., and Riddle, M. (2005). Introduction to social networkmethods. University of California, Riverside.

[11] Manning. Ch. D., and Schuetze, H. (1999). Foundations of StatisticalNatural Language Processing. MIT Press.

[12] Marcoccia, M. (2004). On-line Polylogues: Conversation Structure andParticipation Framework in Internet Newsgroups. Journal of Pragmatics, 36 (1), pp. 115-145.

[13] Merton, R. K. (1968). Social Theory and Social Structure. New York:Free Press.

[14] Parsons, T. (1951). The Social System. Routledge & Kegan Paul Ltd.[15] Shi, X., Zhu, J., Cai, R., and Zhang, L. (2009) User grouping behavior

in online forums, 15th ACM SIGKDD International Conference onKnowledge Discovery and Data Mining (KDD’09). Paris, France. ACMPress.

[16] Skvoretz, J., and Faust, K. (2002). Relations, Species, and NetworkStructure. Journal of Social Structure , 3 (3).

[17] Turner, T. C., Smith, M., Fisher, D., and Welser, H. T. (2005). Pictur-ing Usenet: Mapping Computer-Mediated Collective Action. Journal ofComputer-Mediated Communication , 10 (4).

[18] Viegas, F. B., and Smith, M. (2004). Newsgroup Crowds and Au-thorLines: Visualizing the Activity of Individuals in Conversational Cy-berspaces. 37th Annual Hawaii International Conference on SystemSciences (HICSS’04) - Track 4. Kauai: IEEE Computer Society.

[19] Weiss, S., Indurkhya, N., Zhang, T., and Damerau, F. (2004). TextMining: Predictive Methods for Analyzing Unstructured Information.Springer.

[20] Wenger, E. (1999). Communities of Practice: Learning, Meaning, andIdentity. Cambridge University Press.

[21] Wenger, E., and Snyder, M. (2000). Communities of Practice: TheOrganizational Frontier. Harvard Business Review , 139-145.

[22] Wenger, E., McDermott, R., and Snyder, W. S. (2002). CultivatingCommunities of Practice: A Guide to Managing Knowledge. Boston:Harvard Business School Press.

[23] White, H. C., Boorman, S. A., and Breiger, R. L. (1976). Social-Structure from Multiple Networks: 1. Blockmodels of Roles and Positions.American Journal of Sociology , 81 (4), pp. 730-780.

345345345