Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Download Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena

Post on 11-May-2015




0 download


Paper presentation in PCI 2013. Abstract:


  • 1.PCI13 Thessaloniki, 19 Sep 2013 Community Structure, Interaction and Evolution Analysis of Online Social Networks around Real-World Social Phenomena Konstantinos Konstantinidis, Symeon Papadopoulos, Yiannis Kompatsiaris

2. Problem #2 Online Social Networks (OSNs) are immense! 3. #3 Motivation Social Networks Used to be small (Grevy's zebra dataset) Easy to organize Online Social Networks (Twitter) Have an immense amount of data Incredibly difficult to organize and extract useful information Ways to monitor activity in OSNs: Keywords (Produces too much info, doesnt work when lexical variations are used) Newshounds and Persons of Interest (may result in loss of info) Proposal to leverage: Time Communities formulated by users interested in a specific topic The behavior of these communities in time Provide the user with info regarding: Temporal user activity per topic Influential, Stable and Persistent Communities Users worth following (possibility of new newshounds) Content worth monitoring 4. #4 Framework overview Feature Fusion Most influential users and communities + Popular hashtags Persistence Stability Centrality* (PageRank) Community Size Evolution Heatmap Pre-processsing (Information Extraction) Temporal Adjacency Matrix Creation Interaction Data Discretization Community Evolution Detection Community Detection (Louvain) Ranking Process Evolution Detection Process *Ongoing work Twitter Data Mentions and hashtags in time 5. #5 Interaction data discretization Community evolution study requires timeslot analysis Tweeting activity provides information on whether or not the users are active as well as if something interesting is happening (has happened) In this framework, the timeslots are created using the local minima of the overall activity Peaks and positive slopes inform us that the users are interested in some phenomenon or are involved in a conversation Minima and negative slopes show us that the users interest is diminishing 6. #6 Interaction data discretization example 7. #7 Community detection & evolution 1 1 2 1 1 3 1 2 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 2 1 2 1 4 1 1 2 2 2 2 1 1 1 1 8 2 1 1 1 1 1 2 4 1 1 1 2 1 1 1 1 2 1 1 1 1 1 1 4 1 2 1 1 1 1 4 1 1 2 1 1 3 1 1 1 1 2 1 1 2 1 1 1 1 1 2 1 5 1 1 2 2 Timeslot (n-2) Timeslot (n-1) Timeslot (n) Timeslot (n+1) Louvain Community Detection Method (V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008(10):P10008 (12pp), 2008.) n-1 n n+1 T1 T5 T4 T3 T2 C6(n-1) C1n C1(n+1)C1(n-1) C2(n-1) C2n C2(n+1) C4(n-1) C4(n+1) C5n C5(n+1) C3n C3(n+1)C3(n-1) Sequential Adjacency Matrices Evolving Communities Timeslots [1,,n-1,n,n+1,] Communities C = {C1n,C2n, ...,Ckn} Time-Evolving Communities Ti 8. Louvain Community Detection A popular greedy modularity optimization approach. The two following steps are repeated iteratively until a maximum of modularity is attained and a hierarchy of communities is produced: a) Small community detection by local modularity optimization b) Aggregation of nodes belonging to the same community and creation of a network with the communities as nodes It was selected due to its efficiency regarding: Speed Accuracy when dealing with ad-hoc networks Due to its hierarchical structure it allows to look at communities at different resolutions #8 9. T11 T21 T41 T61 T81 T91 T11 T41 T52 T91 T11 T21 T52 T81 T91 T21 T52 T74 T91 T41 T52 T74 T81 T91 #9 Community evolution detection C11 C21 C31 C41 C51 C61 C71 C81 C91 C12 C22 C32 C42 C52 C62 C72 C82 C92 C13 C23 C33 C43 C53 C63 C73 C83 C93 C14 C24 C34 C44 C54 C64 C74 C84 C94 C15 C25 C35 C45 C55 C65 C75 C85 C95 Comparing the communities from each row to communities from past rows using the Jaccard Index Community similarity according to: Jaccard Index Adaptive threshold Adaptive threshold: Relative to size Range: [0.7,0.1] 10. #10 Single timeslot graph example Searching through a single timeslot (i.e. approximately 24 hours) can be time consuming. Imagine browsing through months of data! Indexing is clearly a necessity. 11. #11 Evolution features, fusion & ranking Centrality Persistence Stability Community Evolution Dynamic Community Ranking Ranked Communities (All Users) Ranked Users in Communities based on Centrality Content (txt) from timeslots of interest User Interface Persistence: overall appearances / total number of timeslots Stability: overall consecutive appearances/ total number of timeslots PageRank Centrality: a rough estimate of how important a node is by counting the number and quality of links 12. Pros and Cons #12 Dynamic Community and User Ranking Advantages Saves user time (manually searching for news is extremely time consuming) Enables browsing through the most important information Provides a sense of user importance over time (users worth following for future investigations) Disadvantages Community Detection and Community Evolution Detection are slow processes No semantic ranking (lack of content consideration) renders the framework susceptible to error 13. Framework application example Application on a dataset extracted from the Twitter OSN. Dataset Characteristics: Period: 32 days Keywords: 40 (English and Greek) Unique users: 857K Messages: 880K Edges: 1.07M #13 Greek Global Hashtags Keywords Hashtags Keywords Michaloliakos nazi #Xryshaygh Kasidiaris #nazi far right #GoldenDawn golden dawn #extremeright extreme right #Kasidiaris xrysh aygh #farright Hitler illegal immigrants Swastica 14. Framework application example Results Total number of communities: 232K Final number of communities (excluding self loops & communities


View more >