Structural trend analysis for online social networks

Download Structural trend analysis for online social networks

Post on 26-Jun-2015




1 download


Summary of Ceren Budak et al.'s paper "Structural Trend Analysis for Online Social Networks"


  • 1. Structural Trend Analysis for Online Social Networkspaper authors: Ceren Budak, Divyakant Agrawal, Amr El AbbadiMarkus Fensterer 11/23/11

2. Outline1.Introduction2.Problem definition3.Trend definitions4.Validating significance4.1. Model based validation4.2. Analysis based validation5.Detecting coordinated trends6.Detecting uncoordinated trends7. Sybil attacks8. Summary9. CriticsMarkus Fensterer 11/23/11 3. 1. Introduction trend detection is of significant interest trends could be seen as reflection of societal concerns collective decision making research in temporal and geographical dimensions ignoring the structure behind network Goals of paper: introduce network structure into trend analysis overcome vulnerabilities detect interesting activities in different communities online algorithms Markus Fensterer 11/23/11 4. 2. Problem Denition graph G = (N, E) eij: node nj is neighbor of ni in network topics T: can be shared direct neighbor-nodes mention: node ni mentions topic Tx stream: history of mentionsMarkus Fensterer 11/23/11 5. 3. Trend denitions traditional trend o trendiness: number of mentions structural trend o popular topic within a structural subgroup of a network types of structural trends o coordinated trends o uncoordinated trends Markus Fensterer 11/23/11 6. Coordinated trends trendiness: number of users discussing it favors topics discussed in clustered nodes formal: favors uniform distribution of mentions per nodesample networks Markus Fensterer 11/23/11 7. Uncoordinated trends trendiness: number of unrelated persons mentioning it favors topics with a large number of mentions no bias towards clustered nodes notion of trustworthiness of a trend formal: sample networksMarkus Fensterer 11/23/11 8. 4. Validating signicance difference to traditional trends nature of detected topics methods: o Model based validation - using an information diffusion model o Analysis based validation analyzing a Twitter data set Markus Fensterer 11/23/11 9. 4.1. Model based validation Independent Trend Formation Model based on Independent Cascade Model assumptions:o independent topic diffusiono diffusion in discrete time steps external influence pix:will ni mention Tx independently from neighbors? peer influence qijx: will ni mention Tx given that neighbor nj mentioned it? generate a synthetic graph with Nearest Neighbor Modelo Facebook Monterey Bay Networko u = 0.8, k=1o 500 nodes, 50 topicsMarkus Fensterer 11/23/11 10. Dierence to traditional trends SRCC Spearman rank correlation coefficient: dx: rank difference n: number of topics all topics considered AP Average Precision: D: relevant documents R: ranked documents top-k topics consideredMarkus Fensterer 11/23/11 11. Dierence to traditional trends similarity measures for varying qssimilarity measures for varying ps structural trends diverge from traditional trends Markus Fensterer 11/23/11 12. Nature of detected topics devide T in two halves T, T experiments within To increase in p can be balanced by qo average traditional ranking of T and T should be equalo experiment 1: p < 0.1 and q > 0.1o experiment 2: p > 0.1 and q < 0.1average ranking of topics experiment 1: top-25 coordinated trends come from T experiment 2: top-25 coordinated trends come from TMarkus Fensterer 11/23/11 13. 4.2. Analysis based validation Twitter data set hashtags are used as topics after filtering and categorization o 2.7 million users o 230 million edges o 2.9 million topics Markus Fensterer 11/23/11 14. Dierence to traditional trendsSRCC and AP of top-k traditional trend topics in predicting top-k structural trend topics traditional trendiness: bad predictor for coordinated trends uncoordinated trends: more similar to traditional trends Markus Fensterer 11/23/11 15. Nature of detected topicsranking of three topics within the data set coordinated trend #hhrs: Hugh Hewitt Radio Show o effect of homophily uncoordinated trend #twitterafterdark o idiom: usage depends more on personal experience insignificant as structural trend: #apple Markus Fensterer 11/23/11 16. Nature of detected topicsvisualization for #pawpawty and #mafiawars #pawpawty: high coordinated importance o suggestion: social motivation, homophily effect #mafiawars: low coordinated importance Markus Fensterer 11/23/11 17. Eect of categorical characteristics on trendiness categorize 500 hashtags in 7 categories o politics, technology, celebrity, games, idioms, movies, music and none CDFs for politics and idioms political hashtags trendiness is improved by coordinated trends idioms trendiness is improved by uncoordinated trends Markus Fensterer 11/23/11 18. 5. Detecting coordinated trends naive: compute g for each topic incremental counting algorithm: o receiving o increment Cl,x by 1 o update score of Tx: o requires O(n) reads o two adjacency lists per node (incoming/outgoing edges) o hashtable per topic: maps users to counts o sorted representation of top-k topics o delivers exact values => computationally expensiveMarkus Fensterer 11/23/11 19. Reduction to counting local triangles multigraph G = (N, E) N = N u T E = {(u,v) (u,v) E v S v S} any three nodes u, v, w with build a triangle in G g(Tx) = number of triangles incident to topic node Tx in GBuilding G out of G and SMarkus Fensterer 11/23/11 20. Upper error boundary for sampling Chebyshevs inequality + case distinction for Var(Xx) Xx: estimated number of triangles incident to Tx x: real number of triangles incident to Tx ps: sampling rate x: number of pairs of triangles involving Tx and are not edge disjoint number of multiedges has big influence to quality of estimate if x gets larger: error becomes linearly worse but for larger x : estimate becomes quadratically better => estimate still better for trendy topics Markus Fensterer 11/23/11 21. Average Precision of samplingAP of sampling for coordinated trends AP of sampling for uncoordinated trends even for smaller ps Average Precision is still high sampling works better for uncoordinated trendsMarkus Fensterer 11/23/11 22. 6. Detecting uncoordinated trends incremental counting algorithm o receiving o increment Cl,x by 1 o update score of Tx: o requires O(n) reads o could be optimized by keeping track of traditional trendiness score Reduction to counting local triangles o multigraph G = (N, E) o N = N u T o E = {(u,v) (u,v) E v S}Markus Fensterer 11/23/11 23. 7. Sybil aQacks many virtual identities highly connected small number of connections to real users Findings: o coordinated rank > traditional rank > uncoordinated rank o breakpoint of popularity earlier than for a normal coordinated trend o breakpoint is seen with smaller set of nodes than normal Markus Fensterer 11/23/11 24. 8. Summary new trend definitions significantly different from traditional focus on coordinated trends characteristics of topics online algorithms to detect trends future research:o spam not limited to Sybil attackso evolution of trends throughout timeo study in between of coordinated and uncoordinated trendsMarkus Fensterer 11/23/11 25. 9. Critics good: o real world example o implementation guideline for algorithms o two measures for similarity SRCC and AP bad: o almost no divergence between traditional and uncoordinated trends withSRCC o no explanation why #apple is not a structural trend Markus Fensterer 11/23/11