Transcript
Page 1: Structural trend analysis for online social networks

Structural  Trend  Analysis  for  Online  Social  Networks paper authors: Ceren Budak, Divyakant Agrawal, Amr El Abbadi

Markus Fensterer 11/23/11

Page 2: Structural trend analysis for online social networks

Outline

1. Introduction 2. Problem definition 3. Trend definitions 4. Validating significance

4.1. Model based validation

4.2. Analysis based validation

5. Detecting coordinated trends 6. Detecting uncoordinated trends 7.  Sybil attacks 8.  Summary 9.  Critics

Markus Fensterer 11/23/11

Page 3: Structural trend analysis for online social networks

1.  Introduction

Markus Fensterer 11/23/11

•  trend detection is of significant interest •  trends could be seen as

•  reflection of societal concerns •  collective decision making

•  research in temporal and geographical dimensions •  ignoring the structure behind network

•  Goals of paper: •  introduce network structure into trend analysis •  overcome vulnerabilities •  detect interesting activities in different communities •  online algorithms

Page 4: Structural trend analysis for online social networks

2.  Problem  Definition

•  graph G = (N, E) •  eij: node nj is neighbor of ni in network •  topics T: can be shared direct neighbor-nodes •  mention: <ni, Tx> node ni mentions topic Tx •  stream: history of mentions

Markus Fensterer 11/23/11

Page 5: Structural trend analysis for online social networks

3.  Trend  definitions

•  traditional trend o  trendiness: number of mentions

•  structural trend o  “popular topic within a structural subgroup of a network”

•  types of structural trends o  coordinated trends o  uncoordinated trends

Markus Fensterer 11/23/11

Page 6: Structural trend analysis for online social networks

Coordinated  trends

•  trendiness: number of users discussing it •  favors topics discussed in clustered nodes •  formal:

•  favors uniform distribution of mentions per node

sample networks

Markus Fensterer 11/23/11

Page 7: Structural trend analysis for online social networks

Uncoordinated  trends

•  trendiness: number of unrelated persons mentioning it •  favors topics with a large number of mentions •  no bias towards clustered nodes •  notion of trustworthiness of a trend •  formal:

sample networks

Markus Fensterer 11/23/11

Page 8: Structural trend analysis for online social networks

4.  Validating  significance

•  difference to traditional trends •  nature of detected topics

•  methods: o  Model based validation - using an information diffusion model o  Analysis based validation – analyzing a Twitter data set

Markus Fensterer 11/23/11

Page 9: Structural trend analysis for online social networks

4.1.  Model  based  validation

•  Independent Trend Formation Model •  based on Independent Cascade Model •  assumptions:

o  independent topic diffusion o  diffusion in discrete time steps

•  external influence pix: will ni mention Tx independently from neighbors?

•  peer influence qijx: will ni mention Tx given that neighbor nj mentioned it?

•  generate a synthetic graph with Nearest Neighbor Model o  Facebook Monterey Bay Network o  u = 0.8, k=1 o  500 nodes, 50 topics

Markus Fensterer 11/23/11

Page 10: Structural trend analysis for online social networks

Difference  to  traditional  trends

•  SRCC – Spearman rank correlation coefficient:

dx: rank difference n: number of topics all topics considered

•  AP – Average Precision: D: relevant documents R: ranked documents top-k topics considered

Markus Fensterer 11/23/11

Page 11: Structural trend analysis for online social networks

Difference  to  traditional  trends

similarity measures for varying q‘s

similarity measures for varying p‘s

•  structural trends diverge from traditional trends

Markus Fensterer 11/23/11

Page 12: Structural trend analysis for online social networks

Nature  of  detected  topics

•  devide T in two halves T‘, T‘‘ •  experiments within T‘‘

o  increase in p‘‘ can be balanced by q‘‘ o  average traditional ranking of T‘ and T‘‘ should be equal o  experiment 1: p‘‘ < 0.1 and q‘‘ > 0.1 o  experiment 2: p‘‘ > 0.1 and q‘‘ < 0.1

average ranking of topics

•  experiment 1: top-25 coordinated trends come from T‘‘ •  experiment 2: top-25 coordinated trends come from T‘

Markus Fensterer 11/23/11

Page 13: Structural trend analysis for online social networks

4.2.  Analysis  based  validation

•  Twitter data set •  hashtags are used as topics •  after filtering and categorization

o  2.7 million users o  230 million edges o  2.9 million topics

Markus Fensterer 11/23/11

Page 14: Structural trend analysis for online social networks

Difference  to  traditional  trends

Markus Fensterer 11/23/11

SRCC and AP of top-k traditional trend topics in predicting top-k structural trend topics

•  traditional trendiness: bad predictor for coordinated trends •  uncoordinated trends: more similar to traditional trends

Page 15: Structural trend analysis for online social networks

Nature  of  detected  topics

Markus Fensterer 11/23/11

ranking of three topics within the data set

•  coordinated trend #hhrs: „Hugh Hewitt Radio Show“ o  effect of homophily

•  uncoordinated trend #twitterafterdark o  idiom: usage depends more on personal experience

•  insignificant as structural trend: #apple

Page 16: Structural trend analysis for online social networks

Nature  of  detected  topics

visualization for #pawpawty and #mafiawars

•  #pawpawty: high coordinated importance o  suggestion: social motivation, homophily effect

•  #mafiawars: low coordinated importance

Markus Fensterer 11/23/11

Page 17: Structural trend analysis for online social networks

Effect  of  categorical  characteristics  on  trendiness

•  categorize 500 hashtags in 7 categories o  politics, technology, celebrity, games, idioms, movies, music and none

CDF‘s for politics and idioms

•  political hashtags trendiness is improved by coordinated trends •  idioms trendiness is improved by uncoordinated trends

Markus Fensterer 11/23/11

Page 18: Structural trend analysis for online social networks

5.  Detecting  coordinated  trends

•  naive: compute g for each topic

•  incremental counting algorithm: o  receiving <nl, Tx> o  increment Cl,x by 1 o  update score of Tx:

o  requires O(n) reads o  two adjacency lists per node (incoming/outgoing edges) o  hashtable per topic: maps users to counts o  sorted representation of top-k topics o  delivers exact values => computationally expensive

Markus Fensterer 11/23/11

Page 19: Structural trend analysis for online social networks

Reduction  to  counting  local  triangles

•  multigraph G‘ = (N‘, E‘) •  N‘ = N u T •  E‘ = {(u,v) ⏐ (u,v) ∈ E v <u,v> ∈ S v <v,u> ∈ S} •  any three nodes u, v, w with build a triangle in G‘ •  g(Tx) = „number of triangles incident to topic node Tx in G‘“

Building G‘ out of G and S

Markus Fensterer 11/23/11

Page 20: Structural trend analysis for online social networks

Upper  error  boundary  for  sampling

Markus Fensterer 11/23/11

•  Chebyshev‘s inequality + case distinction for Var(Xx) •  Xx: estimated number of triangles incident to Tx

•  ∆x: real number of triangles incident to Tx

•  ps: sampling rate •  αx: number of pairs of triangles involving Tx and are not edge

disjoint

•  number of multiedges has big influence to quality of estimate •  if αx gets larger: error becomes linearly worse •  but for larger ∆x : estimate becomes quadratically better

=> estimate still better for trendy topics

Page 21: Structural trend analysis for online social networks

Average  Precision  of  sampling

AP of sampling for coordinated trends AP of sampling for uncoordinated trends

•  even for smaller p‘s Average Precision is still high •  sampling works better for uncoordinated trends

Markus Fensterer 11/23/11

Page 22: Structural trend analysis for online social networks

6.  Detecting  uncoordinated  trends

•  incremental counting algorithm o  receiving <nl, Tx> o  increment Cl,x by 1 o  update score of Tx: o  requires O(n) reads o  could be optimized by keeping track of traditional trendiness score

•  Reduction to counting local triangles o  multigraph G‘ = (N‘, E‘) o  N‘ = N u T o  E‘ = {(u,v) ⏐ (u,v) ∉ E v <u,v> ∈ S}

Markus Fensterer 11/23/11

Page 23: Structural trend analysis for online social networks

7.  Sybil  aQacks

•  many virtual identities

•  highly connected •  small number of connections to real users

•  Findings: o  coordinated rank > traditional rank > uncoordinated rank o  breakpoint of popularity earlier than for a normal coordinated trend o  breakpoint is seen with smaller set of nodes than normal

Markus Fensterer 11/23/11

Page 24: Structural trend analysis for online social networks

8.  Summary

•  new trend definitions •  significantly different from traditional •  focus on coordinated trends •  characteristics of topics •  online algorithms to detect trends •  future research:

o  spam not limited to Sybil attacks o  evolution of trends throughout time o  study in between of coordinated and uncoordinated trends

Markus Fensterer 11/23/11

Page 25: Structural trend analysis for online social networks

9.  Critics

•  good: o  real world example o  implementation guideline for algorithms o  two measures for similarity – SRCC and AP

•  bad: o  almost no divergence between traditional and uncoordinated trends with

SRCC

o  no explanation why #apple is not a structural trend

Markus Fensterer 11/23/11


Top Related