Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke1
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
This slide deck is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Contextualized versus Structural Overlapping Communities in Social Media
Mohsen Shahriari, Sabrina Haefele, Ralf KlammaAdvanced Community Information Systems (ACIS)
RWTH Aachen University, Germany{shahriari, haefele, klamma}@dbis.rwth-aachen.de
Chair of Computer Science 5RWTH Aachen University
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke2
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Outline Research background
– Necessity of community analysis– Community detection
Literature & Challenges Research questions Baselines & Proposed Methods Dataset & Metrics Results Conclusion & Future Works
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke3
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to Characterize Networks
Power law – Eligible for social network analysis – Presence of hubs
Small-World-ness Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif structures
Community structure– Overlapping community structure– But also to support other applications– Scale up information
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke4
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to Characterize Networks
Power law – Eligible for social network analysis – Presence of hubs
Small-World-ness Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif structures
Community structure– Overlapping community structure– But also to support other applications– Scale up information
Degree Distribution of the CiteULike user-tag networkSource: Taken from networkscience.wordpress.com
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke5
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to Characterize Networks
Power law – Eligible for social network analysis – Presence of hubs
Small-World-ness Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif structures
Community structure– Overlapping community structure– But also to support other applications– Scale up information
Source: Milgram experiment “The small world problem”
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke6
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to Characterize Networks
Power law – Eligible for social network analysis – Presence of hubs
Small-World-ness Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif structures
Community structure– Overlapping community structure– But also to support other applications– Scale up information
Source: Taken from networkscience.wordpress.com
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke7
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: How to Characterize Networks
Power law – Eligible for social network analysis – Presence of hubs
Small-World-ness Motifs
– Synchronizability, cooperativity, stability and robustness may depend on motif structures
Community structure– Overlapping community structure– But also to support other applications– Scale up information
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke8
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What Is A (overlapping) Community?
Components have high density inside communities and sparse among clusters
People with similar interests or needs (Preece, 2000)
Recent research: OverlappingStructures are dense (Jaewon Yang & Leskovec, 2012)
(Girvan & Newman, Mark E. J., 2002)
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke9
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What Is A (overlapping) Community?
In some networks even other definitions Signed social networks: density and balancing theory
(Doreian, 2004)
Different interpretation of communities and their definitions
--
+
+ ++
+
++
+
+
++
+
+
+
+
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke10
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Background: What is A (overlapping) Community?
Communities may be formed when people have some ideas, innovation and thoughts to discuss– When they do not know each other
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke11
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
LiteratureLiterature
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke12
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Challenges regarding Content-based OCD
Imperceptible knowledge regarding significance of content – Community events e.g., releases in open source developer network– Correlation of content and structural properties of the social media
Few of them detect overlapping community structures– Detecting only disjoint community structures
Most of the methods are not suitable for thread-based data structures– Needs huge tuning
Most of the approaches do not work on actual posts/contents– Use mainly attributes/tags
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke13
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Research Questions How structural properties like number of overlapping
nodes, modularity and average community size are affected by contextualized similarities among users in question & answer social platforms?
Can adding of content improve the performance of structural based algorithms?
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke14
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Structural/Content-Based OCD Approaches
First we introduce the baselines used in this work– Disassortative degree Mixing and Information Diffusion (DMID)– Speaker-listener Label Propagation Algorithm (SLPA)– Stanoev, Smikov and Kocarev (SSK)– Algorithm by Li, Zhang, Liu, Chen and Zhang (CLIZZ)
Then we introduce the proposed Content-based methods– Cost function optimization clustering algorithm (CFOCA)– Term community merging algorithm (TCMA)– Combining content and structural values
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke15
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Disassortative Degree Mixing and Information Diffusion (DMID)
Detecting most influential nodes (leaders)– Using of disassortative degree mixing property
–– Row normalize disassortative matrix
– Performing a random walk
– Computing local leadership value– Combining degree and disassortative value
Cascading behavior named network coordination game
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke16
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Speaker-listener Label Propagation Algorithm (SLPA)
Extension of label propagation algorithm– Nodes can take multiple labels
Idea: speaker-listener information propagation process (mimics human communication)
Nodes can store updated labels Steps:
1. Node’s memory is initialized with unique label2. Do until a user defined iteration number is reached:
1. Select one node as listener2. Each neighbor randomly selects a label3. Listener accepts one of the propagated labels according to a rule (e.g.,
most popular label)
3. Post-processing phase for identifying the communities
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke17
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: Stanoev, Smikov and Kocarev (SSK)
An algorithm based on influence dynamics and membership computation– Relationships of nodes and their influences are more important than direct
connections– Proxies among nodes are better established while there exits triangles among
nodes Computing transitive link matrix using both adjacency matrix and
triangle occurrences
Computing the membership of nodes to leaders– Weighted average membership of neighbors
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke18
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Baseline Methods: CLIZZ Two phase algorithm
– Identifying influential nodes based on influence range– Influence ranges are computed based on shortest
distance
– Computing membership values of nodes using and updating rule
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke19
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Proposed Content-Based Methods: Feature Creation Phase
Term Matrix– Constructed from threads of the user– Converted by tf-idf
Threads
tf-idf
Threads
Threads
w1 w2 w3 …
0.23 0.5 0
0.8 0 1
0 1.2 0.59
w1
w3
w2Term Matrix
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke20
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Minimization of the costs Cost function J based on cosine similarity
Updating the centroids using gradient descent
Modification for overlapping communities: threshold for distance to other centroids
Cost Function Optimization Clustering Algorithm (CFOCA)
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke21
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Term Community Merging Algorithm (TCMA)
Two phases– Compute one community per each word– Refinement of the communities using overlapping
coefficient
w1 w2 w3 …
0.23 0.5 0
0.8 0.76 1
0 1.2 0.59
Term Matrix
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke22
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Content-Based Weighting Method Generate two weights from content
Use OCD algorithms to compute communities, like DMID, SSK and CLiZZ
Threads
( r , s )
w1 w2 w3 …
0.23 0.5 0 …
0.8 0 1 …
Term Matrix
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke23
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Datasets and Metrics Jmol dataset
– Forum discussion regarding a Java-Tool for molecular modeling of chemical structures
– Open source development– 2002 – 2012– Publicly available at
– https://github.com/rwth-acis/REST-OCDServices/wiki/Jmol-Dataset
Combined modularity– Considering both content and density
Number of overlapping nodes, average community sizes to extract useful information
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke24
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Average Community Size
1, 10 and 11 have low content similarity 6 has the highest content similarity
Community has the highest size
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke25
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Number of Overlapping Nodes
Releases 2, 3, 4 and 5 have high similarity and low overlapping nodes
Similarity costs are global measures
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke26
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Similarity Costs versus Modularity Reverse relation between content similarity and modularity
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke27
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Average Community Size versus Releases
Content-based algorithms are useful when structure of the network is missing
Content-based algorithms detect bigger community sizes
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke28
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Number of Overlapping Nodes versus Releases
Content-based methods may reflect the actual changes Content-based methods detect higher overlaps in
comparison to structural-based methods
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke29
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
Conclusion & Future Works Conclusion & Message:
Content has significant effect on structural-based techniques– Changing in community sizes, number of overlapping nodes and modularity– Content-based methods detect bigger community sizes with bigger overlaps
Future Works:
Investigate local similarity costs Improving time complexity
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke30
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma
References Ahn, Y.-Y., Bagrow, J. P., & Lehmann, S. (2010). Link communities reveal multiscale complexity in networks, Nature, 466(7307), 761–
764. doi:10.1038/nature09182 Derényi, I., Palla, G., & Vicsek, T. (2005). Clique Percolation in Random Networks. Physical Review Letters, 94(16), 160202.
doi:10.1103/PhysRevLett.94.160202 Ding, Z., Zhang, X., Sun, D., & Luo, B. (2016). Overlapping Community Detection based on Network Decomposition. Sci Rep,
6(24115). doi:10.1038/srep24115 Doreian, P. (2004). Evolution of Human Signed Networks, 1(2), 277–293. Retrieved from http://snap.stanford.edu/class/cs224w-
readings/dorean04evolution.pdf Girvan, M., & Newman, Mark E. J. (2002). Community structure in social and biological networks. Proceedings of the National
Academy of Sciences, 99(12), 7821–7826. doi:10.1073/pnas.122653799 Gunnemann, S., Boden, B., Farber, I., & Seidl, T. (2013). Efficient Mining of Combined Subspace and Subgraph Clusters in Graphs
with Feature Vectors. In Advances in Knowledge Discovery and Data Mining (pp. 261–275). Springer Berlin Heidelberg. Gunnemann, S., Farber, I., Boden, B., & Seidl, T. (2010). subspace clustering meets dense subgraph mining; a synthesis of two
paradigms. In The 10th International Conference On Data Mining . Havemann, F., Heinz, M., Struck, A., & Gläser, J. (2011). Identification of overlapping communities and their hierarchy by locally
calculating community-changing resolution levels. Journal of Statistical Mechanics: Theory and Experiment. doi:10.1088/1742-5468/2011/01/P01023
Preece, J. (2002). Supporting Community and Building Social Capital - Guest Editorial. Communications of the ACM, 45(4), 37 39.‐ Shahriari, M., Parekodi, S., & Klamma, R. (2015). Community-aware Ranking Algorithms for Expert Identification in Question-
answer Forums. In Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven Business. I-KNOW (pp. 1–8). ACM. Retrieved from http://doi.acm.org/10.1145/2809563.2809592
Shen, H., Cheng, X., Cai, K., & Hu, M.-B. (2009). Detect overlapping and hierarchical community structure in networks. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 388(8), 1706–1712. doi:10.1016/j.physa.2008.12.021
Yang, J., & Leskovec, J. (2012). Structure and Overlaps of Communities in Networks. CoRR, abs/1205.6228.
Lehrstuhl Informatik 5(Information Systems)
Prof. Dr. M. Jarke31
LearningLayers
Contextualized versus Structural Overlapping Community Structures in Social Media
Mohsen ShahriariYing LiRalf Klamma