networks - ugrelvex.ugr.es/idbis/dm/slides/50 graph mining - networks.pdf · nbr(u) nbr(u)...

42
Networks Networks © Fernando Berzal © Fernando Berzal [email protected] [email protected] Networks Networks Network Analysis Network Analysis Applications Applications Network Properties Network Properties Network Models Network Models Random Random-Graph Models Graph Models Growing Random Models Growing Random Models Strategic Network Formation Strategic Network Formation Network Structure & Dynamics Network Structure & Dynamics Network Centrality Network Centrality Community Detection Community Detection Diffusion through Networks Diffusion through Networks Search on Networks Search on Networks Bibliography Bibliography 1

Upload: dinhtuyen

Post on 01-Dec-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

NetworksNetworks© Fernando Berzal© Fernando Berzal [email protected]@acm.org

NetworksNetworks

�� Network AnalysisNetwork Analysis�� ApplicationsApplications

�� Network PropertiesNetwork Properties

�� Network ModelsNetwork Models�� RandomRandom--Graph ModelsGraph Models

�� Growing Random ModelsGrowing Random Models

�� Strategic Network FormationStrategic Network Formation

�� Network Structure & DynamicsNetwork Structure & Dynamics�� Network CentralityNetwork Centrality

�� Community DetectionCommunity Detection

�� Diffusion through NetworksDiffusion through Networks

�� Search on NetworksSearch on Networks

�� BibliographyBibliography 11

Network Network AnalysisAnalysis

Networks permeate our lives.Networks permeate our lives.

Networks play a central role in determiningNetworks play a central role in determiningNetworks play a central role in determiningNetworks play a central role in determining

�� the transmission of information about job opportunities,the transmission of information about job opportunities,

�� how diseases spread,how diseases spread,

�� which products we buy,which products we buy,

�� our likelihood of succeeding professionally,our likelihood of succeeding professionally,

�� ……

22

Network Network AnalysisAnalysis

As a field of study…As a field of study…

�� How relationships between parts give rise to the How relationships between parts give rise to the collective behaviors of a system and how the system collective behaviors of a system and how the system collective behaviors of a system and how the system collective behaviors of a system and how the system interacts and forms relationships with its environment interacts and forms relationships with its environment (complex systems).(complex systems).

�� Common principles, algorithms and tools that govern Common principles, algorithms and tools that govern network behavior (network science). network behavior (network science).

33

Origins: Graph TheoryOrigins: Graph Theory

Network Network AnalysisAnalysis

The Seven Bridges of KönisbergThe Seven Bridges of Könisberg

(Leonhard Euler, 1736) (Leonhard Euler, 1736) 44

Network Network AnalysisAnalysis

Networks as graphs “on steroids”…Networks as graphs “on steroids”…

�� ObjectsObjects: Graph vertices.: Graph vertices.

�� Objects can be of different kinds.Objects can be of different kinds.

�� Objects can be labeled.Objects can be labeled.�� Objects can be labeled.Objects can be labeled.

�� Objects can have attributesObjects can have attributes

�� LinksLinks between objects: Graph edges. between objects: Graph edges.

�� Links can be of different kinds.Links can be of different kinds.

�� Links can be directed (arcs) or undirected (edges).Links can be directed (arcs) or undirected (edges).

�� Links can have attributes.Links can have attributes.55

A formal definition of network A formal definition of network

[Ted G. Lewis: “Network Science,” 2009][Ted G. Lewis: “Network Science,” 2009]

G(t) = { N(t), L(t), f(t) : J(t) }G(t) = { N(t), L(t), f(t) : J(t) }

Network Network AnalysisAnalysis

wherewhere t = time (simulated or real)t = time (simulated or real)

N = nodes (a.k.a. vertices or “actors”)N = nodes (a.k.a. vertices or “actors”)

L = links (a.k.a. edges)L = links (a.k.a. edges)

f = topology (connections through links)f = topology (connections through links)

J = behavior of nodes and links (algorithm)J = behavior of nodes and links (algorithm)

66

Network Network AnalysisAnalysis

An interdisciplinary field: Complex systems An interdisciplinary field: Complex systems

(“networks of heterogeneous components that interact”)(“networks of heterogeneous components that interact”)

�� Physics: Physics: Nonlinear dynamics & chaosNonlinear dynamics & chaos..Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions (a.k.a. butterfly effect).(a.k.a. butterfly effect).

�� Economics: Economics: MarketsMarkets..Spontaneous (or emergent) order as the result of human action, Spontaneous (or emergent) order as the result of human action, but not the execution of any human design [Austrian perspective].but not the execution of any human design [Austrian perspective].

�� Information theory: Information theory: Complex adaptive systemsComplex adaptive systems..(focus on the ability to change and learn from experience).(focus on the ability to change and learn from experience). 77

ApplicationsApplications

�� “Cheminformatics”: Chemical compounds.“Cheminformatics”: Chemical compounds.

�� “Bioinformatics”: Protein networks & bio“Bioinformatics”: Protein networks & bio--pathwayspathways

�� Software Engineering: Program analysis…Software Engineering: Program analysis…

�� Network flow analysis (transport, workflows…)Network flow analysis (transport, workflows…)�� Network flow analysis (transport, workflows…)Network flow analysis (transport, workflows…)

�� SemiSemi--structured databases, e.g. XMLstructured databases, e.g. XML

�� Knowledge management: Ontologies & semantic netsKnowledge management: Ontologies & semantic nets

�� ComputerComputer--aided design (CAD): IC design…aided design (CAD): IC design…

�� Geographic information systems (GIS) & cartographyGeographic information systems (GIS) & cartography

�� Social networks, e.g. WebSocial networks, e.g. Web

�� Economic networks, e.g. marketsEconomic networks, e.g. markets88

ApplicationsApplications

“Life complexity pyramid”“Life complexity pyramid”

99from

Z.N

. O

ltva

i and A

.-L.

Bara

basi

. S

cience

, 2002

ApplicationsApplications

Biological networksBiological networks

Gene-protein

GENOME

1010

Gene-protein interactions

Protein-protein interactions

PROTEOME

Citrate Cycle

METABOLISM

Biochemical reactions

ApplicationsApplications

Yeast protein interaction networkYeast protein interaction network

1111from

H.

Jeong e

t al N

atu

re 4

11, 41 (

2001)

ApplicationsApplications

Ecological networkEcological network: Trophic relationships in a food web.: Trophic relationships in a food web.

1212

ApplicationsApplications

Telecommunication networkTelecommunication network

1313

ApplicationsApplications

InternetInternet

1414

ApplicationsApplications

World Wide WebWorld Wide Web

1515

ApplicationsApplications

Social networkSocial network: Bibliographic network (coauthors): Bibliographic network (coauthors)

1616

ApplicationsApplications

Social networkSocial network: Bibliographic network (coauthors): Bibliographic network (coauthors)

1717

ApplicationsApplications

Social networkSocial network: FOAF (“friend of a friend”): FOAF (“friend of a friend”)

1818

ApplicationsApplications

Social networkSocial network: Organization: Organization

1919

ApplicationsApplications

Social networkSocial network: E: E--mail spectroscopymail spectroscopy

2020

ApplicationsApplications

Social networkSocial network: US Biotech Industry: US Biotech Industry

2121

Network PropertiesNetwork Properties

Common network features:Common network features:

�� Large scale.Large scale.

�� Continuous evolution.Continuous evolution.

�� Distribution (nodes decide their connections).Distribution (nodes decide their connections).

�� Interactions only through existing links.Interactions only through existing links.

2222

Network PropertiesNetwork Properties

Some interesting structural properties:Some interesting structural properties:

�� Connected components: How many? Of what size?.Connected components: How many? Of what size?.

�� Network diameter: Average distance, worst case…Network diameter: Average distance, worst case…

�� Node degree distribution Node degree distribution & existence of “hubs” (heavily& existence of “hubs” (heavily--connected nodes).connected nodes).

�� Groupings (balance between local and largeGroupings (balance between local and large--distance distance connections, as well as their roles).connections, as well as their roles). 2323

Network PropertiesNetwork Properties

Network ConnectivityNetwork Connectivity

WWWWWW

2424

Network PropertiesNetwork Properties

Network DiameterNetwork Diameter

“small worlds”“small worlds”

Sarah

2525

Tim

Jane

Sarah

Ralph

Network PropertiesNetwork Properties

Clustering coefficientClustering coefficient

nbr(u) nbr(u) Neighbors of the node u in the network.Neighbors of the node u in the network.

kk Number of neighbors of u, i.e. |nbr(u)|.Number of neighbors of u, i.e. |nbr(u)|.

max(u)max(u) Maximum number of links among the Maximum number of links among the max(u)max(u) Maximum number of links among the Maximum number of links among the neighbors of u, e.g. k*(kneighbors of u, e.g. k*(k--1)/2.1)/2.

Clustering coefficient for the node u:Clustering coefficient for the node u:c(u) = (#links among neighbors of u) / max(u)c(u) = (#links among neighbors of u) / max(u)

Clustering coefficient for the graph G:Clustering coefficient for the graph G:C = average of c(u) for every node in GC = average of c(u) for every node in G 2626

Network PropertiesNetwork Properties

Clustering coefficientClustering coefficient

k = 4k = 4

m = 6m = 6

uc(u) = 4/6 = 0.66c(u) = 4/6 = 0.66

0 <= c(u) <= 10 <= c(u) <= 1

Similarity of u neighbors to a clique (complete graph).Similarity of u neighbors to a clique (complete graph).

Informal interpretation:Informal interpretation:

“My friends tend to be friends among them.”“My friends tend to be friends among them.”2727

Network PropertiesNetwork Properties

Clustering coefficient Clustering coefficient for some real networksfor some real networks

Network N C Crand L

WWW 153127 0.1078 0.00023 3.1

Internet 3015-6209 0.18-0.30 0.001 3.7-3.76

Actor 225226 0.79 0.00027 3.65

Clustering coefficient (C):Clustering coefficient (C): C>CC>Crandrand

Path length (L): Path length (L): L<LL<Lrandrand2828

Actor 225226 0.79 0.00027 3.65

Coauthorship 52909 0.43 0.00018 5.9

Metabolic 282 0.32 0.026 2.9

Foodweb 134 0.22 0.06 2.43

C. elegance 282 0.28 0.05 2.65

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

Normal distributionNormal distribution

Parameters: Average & deviationParameters: Average & deviation

2929

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

Poisson distributionPoisson distribution

Single parameter: Single parameter: λλ (mean (mean & deviation)& deviation)

3030

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

Pareto distribution (a.k.a. “power law”)Pareto distribution (a.k.a. “power law”)

Single parameter: Single parameter: αααααααα

P(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------ααααααααP(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------αααααααα

The Pareto principle (the “80The Pareto principle (the “80--20 rule”):20 rule”):

20% of the population controls 80% of the wealth.20% of the population controls 80% of the wealth.3131

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

HubsHubs

Small number of nodes with a very high degree.Small number of nodes with a very high degree.

�� Hubs appear with power laws (Hubs appear with power laws (P(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------αααααααα),),but not with normal/binomial/Poisson distributions.but not with normal/binomial/Poisson distributions. 3232

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

LogLog--log plotlog plot

�� Pareto distribution Pareto distribution �� log(Pr[X = x]) = log(1/xlog(Pr[X = x]) = log(1/xαα) = ) = --αα log(x) log(x) log(Pr[X = x]) = log(1/xlog(Pr[X = x]) = log(1/x ) = ) = --αα log(x) log(x)

�� Linear, Linear, ––α α slope.slope.

�� Normal distributionNormal distribution�� log(Pr[X = x]) = log(a exp(log(Pr[X = x]) = log(a exp(--xx22/b)) = log(a) /b)) = log(a) –– xx22/b/b

�� Nonlinear, concave around the average.Nonlinear, concave around the average.

�� Poisson distributionPoisson distribution�� log(Pr[X = x]) = log(exp(log(Pr[X = x]) = log(exp(--λλ) ) λλxx/x!) /x!)

�� Nonlinear.Nonlinear. 3333

Network PropertiesNetwork Properties

Node degree distributionNode degree distribution

LogLog--log plotlog plot

aa WWWWWWpower lawpower law

bb Coauthorship networksCoauthorship networkspower law with exponential cutoffpower law with exponential cutoff

cc Power gridPower gridexponentialexponential

dd Social networkSocial networkGaussianGaussian

3434

Network ModelsNetwork Models

“Natural” networks tend to have…“Natural” networks tend to have…

�� One (or a few) connected components.One (or a few) connected components.�� Independent of network size.Independent of network size.

�� A small diameter (“six degrees of separation”).A small diameter (“six degrees of separation”).�� A small diameter (“six degrees of separation”).A small diameter (“six degrees of separation”).�� Constant, logarithmically increasing, Constant, logarithmically increasing,

or even decreasing with network size.or even decreasing with network size.

�� High clustering (“communities”).High clustering (“communities”).�� Much larger than expected from a random networkMuch larger than expected from a random network

(and, even so, with a small diameter!).(and, even so, with a small diameter!).

�� A mixture of connections.A mixture of connections.�� Local vs. “longLocal vs. “long--distance” connectionsdistance” connections

Do they share some “universal” features?Do they share some “universal” features? 3535

Network ModelsNetwork Models

�� Random networks.Random networks.

�� RandomRandom--biased networks.biased networks.

�� SmallSmall--world networks.world networks.

�� ScaleScale--free networks.free networks.

�� Hierarchical & modular networks.Hierarchical & modular networks.

�� Affiliation networks.Affiliation networks.3636

Network ModelsNetwork Models

Random NetworksRandom Networks

ErdösErdös--Rényi modelRényi model

�� Small number of connected components (typically one).Small number of connected components (typically one).

�� Low clustering coefficient.Low clustering coefficient.

�� Poisson distribution.Poisson distribution.�� Poisson distribution.Poisson distribution.

3737

Network ModelsNetwork Models

Random NetworksRandom Networks

ErdösErdös--Renyi modelRenyi model

Princi

pal co

mponent

3838

Number of links

Princi

pal co

mponent

Network ModelsNetwork Models

Random NetworksRandom Networks

Example: Romantic relationships in the Add Health data set.Example: Romantic relationships in the Add Health data set.

Peter S. Bearman, James Moody & Katherine Stovel:Peter S. Bearman, James Moody & Katherine Stovel:

“Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks”“Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks”

American Journal of Sociology, 110(1):44American Journal of Sociology, 110(1):44––91, July 200491, July 2004

3939

Network ModelsNetwork Models

SmallSmall--World NetworksWorld Networks

Watts & Strogatz modelWatts & Strogatz model

�� Small number of connected components (typically one).Small number of connected components (typically one).

�� Small diameter.Small diameter.

�� Poisson distribution.Poisson distribution.�� Poisson distribution.Poisson distribution.

�� High clustering coefficient.High clustering coefficient.

4040

Network ModelsNetwork Models

SmallSmall--World NetworksWorld Networks

Watts & Strogatz modelWatts & Strogatz model

Average path length, normalized by system size, Average path length, normalized by system size, plotted as a function of the average number of shortcuts.plotted as a function of the average number of shortcuts. 4141

Network ModelsNetwork Models

ScaleScale--Free NetworksFree Networks

Barabási & Albert modelBarabási & Albert model

�� Small number of connected components (typically one).Small number of connected components (typically one).

�� Small diameter.Small diameter.

�� Pareto distribution.Pareto distribution.�� Pareto distribution.Pareto distribution.

�� Small clustering coefficient.Small clustering coefficient.

�� Hubs.Hubs.

4242

Network ModelsNetwork Models

ScaleScale--Free NetworksFree Networks

Barabási & Albert modelBarabási & Albert model

“Natural” interpretation of the model:

�� Variable number of nodes: Variable number of nodes: Network grows as new nodes are added.Network grows as new nodes are added.

�� Preferential attachmentPreferential attachment::The more connected a node is, The more connected a node is, the more likely it is to receive new linksthe more likely it is to receive new links(“rich get richer” or Matthew effect).(“rich get richer” or Matthew effect).

4343

Network ModelsNetwork Models

ScaleScale--Free NetworksFree Networks

Barabási & Albert modelBarabási & Albert model

Exponential model… Scale-free model…

… without hubs. … with hubs. 4444

Network ModelsNetwork Models

PoissonPoisson Pareto (power law)Pareto (power law)

4545

Network ModelsNetwork Models

ScaleScale--Free NetworksFree Networks

FeaturesFeatures

�� SelfSelf--organization traits: Links are not randomorganization traits: Links are not random(a feature found in many complex systems).(a feature found in many complex systems).(a feature found in many complex systems).(a feature found in many complex systems).

�� Tolerance to random attacks, which easily disrupt Tolerance to random attacks, which easily disrupt random networks but not scalerandom networks but not scale--free networks.free networks.

�� Vulnerability to targeted attacks: Vulnerability to targeted attacks: “Hubs” are essential to maintain connectedness.“Hubs” are essential to maintain connectedness.

4646

Network ModelsNetwork Models

4747

Network ModelsNetwork Models

Hierarchical/Modular NetworksHierarchical/Modular Networks

�� Hierarchical organization.Hierarchical organization.

�� Hubs.Hubs.

�� CliquesCliques..

4848

Network ModelsNetwork Models

Hierarchical/Modular NetworksHierarchical/Modular Networks

4949

Network ModelsNetwork Models

Affiliation NetworksAffiliation Networks

Bipartite graph to model social interactionsBipartite graph to model social interactions::

5050

Network ModelsNetwork Models

Affiliation NetworksAffiliation Networks

5151

Network ModelsNetwork Models

5252

Network Structure & DynamicsNetwork Structure & Dynamics

The countless ways in which network structuresThe countless ways in which network structuresaffect our lives make it critical to understand:affect our lives make it critical to understand:

1.1. How network structure affect behavior.How network structure affect behavior.

2.2. Which network structure is likely to emerge.Which network structure is likely to emerge.

5353

Network Structure & DynamicsNetwork Structure & Dynamics

A complex system is a system composed of A complex system is a system composed of interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or more properties (behavior) not obvious from the more properties (behavior) not obvious from the properties of the individual parts (i.e. emergence).properties of the individual parts (i.e. emergence).

5454

Network Structure & DynamicsNetwork Structure & Dynamics

Research problemsResearch problems

�� Search on networks Search on networks (with partial local information)(with partial local information)

�� Diffusion problems:Diffusion problems:epidemics, social contagion (ideas, fads, products…)epidemics, social contagion (ideas, fads, products…)

�� Analysis of network propertiesAnalysis of network propertiese.g. robustness/vulnerabilitye.g. robustness/vulnerability

5555

Network Structure & DynamicsNetwork Structure & Dynamics

From an algorithmic point of view…From an algorithmic point of view…

�� Objects: Objects: �� Ranking (HITS, PageRank…).Ranking (HITS, PageRank…).

�� Classification & anomaly detection.Classification & anomaly detection.

Clustering & community detection.Clustering & community detection.�� Clustering & community detection.Clustering & community detection.

�� Object identification (e.g. “entity resolution”).Object identification (e.g. “entity resolution”).

�� Links:Links:�� Link prediction.Link prediction.

�� Graphs:Graphs:�� Subgraph detection.Subgraph detection.

�� Graph classification.Graph classification.

�� Graph generation models.Graph generation models.

�� …… 5656

Network Structure: CentralityNetwork Structure: Centrality

Different notions of centralityDifferent notions of centrality

In each of the following networks, X has higher In each of the following networks, X has higher centrality than Y according to a particular measurecentrality than Y according to a particular measure

inin--degreedegree outout--degreedegree betwennessbetwenness closenesscloseness

[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]5757

Y

X

Y

X

Y X

Y

X

Network Structure: CentralityNetwork Structure: Centrality

DegreeDegree

Nodes with more connections are more central…Nodes with more connections are more central…

[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]] 5858

Network Structure: CentralityNetwork Structure: Centrality

DegreeDegree

Nodes with more connections are more central…Nodes with more connections are more central…

[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]] 5959

Normalization

BetweennessBetweenness

Brokerage (not captured by degree)Brokerage (not captured by degree)

Network Structure: CentralityNetwork Structure: Centrality

IDEA: IDEA: How many pairs of individuals would haveHow many pairs of individuals would have

to go through you in order to reach one anotherto go through you in order to reach one another

in the minimum number of hops?in the minimum number of hops? 6060

BetweennessBetweenness

Brokerage (not captured by degree)Brokerage (not captured by degree)

Network Structure: CentralityNetwork Structure: Centrality

Partial credit for lying in one of several shortest paths…Partial credit for lying in one of several shortest paths… 6161

Network Structure: CentralityNetwork Structure: Centrality

ClosenessCloseness

When it is not so important to have many connections, When it is not so important to have many connections,

nor be between others, but be in the middle of things... nor be between others, but be in the middle of things...

not too far from the center.not too far from the center.

Y

[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]6262

Y

X

Y

X

Y

X

Network Structure: CentralityNetwork Structure: Centrality

ClosenessCloseness

based on the length of the average shortest path based on the length of the average shortest path between a node and all other nodes in the network.between a node and all other nodes in the network.

6363

Network Structure: CentralityNetwork Structure: Centrality

PageRankPageRank

A random walkerA random walker following links in a network for a very following links in a network for a very

long time will spend a fraction of time at each node long time will spend a fraction of time at each node

that can be used as a measure of its importancethat can be used as a measure of its importance..

[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]6464

Network Network StructureStructure: : CentralityCentrality

PageRankPageRank

ProblemProblem: : StuckStuck in in thethe networknetwork

SolutionSolution: : TeleportationTeleportationSolutionSolution: : TeleportationTeleportation

A A randomrandom jumpjump toto anywhereanywhere elseelse withwith a a givengiven probabilityprobability..

6565

1

2

3 4

5

7

6 8

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eR

an

k

t=0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eR

an

k

t=1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7 8

Pag

eR

an

k

t=10

Network Structure: CommunitiesNetwork Structure: Communities

Community detection (i.e. clustering)Community detection (i.e. clustering)

Identification of groups of nodes within a network…Identification of groups of nodes within a network…

David Easley & Jon Kleinberg: David Easley & Jon Kleinberg: “Networks, Crowds & Markets: Reasoning About a Highly Connected World”, “Networks, Crowds & Markets: Reasoning About a Highly Connected World”, http://www.cs.cornell.edu/home/kleinber/networkshttp://www.cs.cornell.edu/home/kleinber/networks--book/book/ 6666

Network Structure: CommunitiesNetwork Structure: Communities

HeuristicsHeuristics

� Mutual ties

� Frequency of ties within a community (cliques & k-cores)� Frequency of ties within a community (cliques & k-cores)

� Closeness/reachability of community members (n-cliques)

� Relative frequency of ties within a community (ties among members compared to ties to non-members)

6767

Network Structure: CommunitiesNetwork Structure: Communities

Cliques & kCliques & k--corescores

� Cliques (complete subgraphs)

� A single missing links disqualifies the clique

� Overlapping cliques� Overlapping cliques

� K-cores(every node connected to at least k other nodes)

6868

Network Structure: CommunitiesNetwork Structure: Communities

nn--cliques cliques

Maximal distance between any two nodes is n

IDEA: Information flow throw intermediaries.

Problems: 2 – cliqueProblems:

� Diameter > n

� Disconnected n-cliques

Solution: n-clubs (maximal subgraphs of diameter n)

6969

2 – cliquediameter = 3

path outside the 2-clique

Network Structure: CommunitiesNetwork Structure: Communities

Example: Political blogsExample: Political blogs

A)A) All citations between blogs.All citations between blogs.

B)B) Blogs with at least 5 citationsBlogs with at least 5 citationsin both directions.in both directions.

1 21

1 DigbysBlog

2 JamesWalcott

3 Pandagon

4 blog.johnkerry.com

5 OliverWillis

6 AmericaBlog

7 Crooked Timber

8 Daily Kos

9 American Prospect

10Eschaton

11Wonkette

(A)

C)C) Edges further limited to thoseEdges further limited to thoseexceeding 25 combined citations.exceeding 25 combined citations.

[Adamic & Glance, LinkKDD2005]7070

1 2

3

456

78

910

11

1213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

3738 39

40

11Wonkette

12TalkLeft

13Political Wire

14Talking Points Memo

15Matthew Yglesias

16Washington Monthly

17MyDD

18Juan Cole

19Left Coaster

20Bradford DeLong

21 JawaReport

22VokaPundit

23Roger LSimon

24Tim Blair

25Andrew Sullivan

26 Instapundit

27Blogsfor Bush

28 LittleGreenFootballs

29Belmont Club

30Captain’sQuarters

31Powerline

32 HughHewitt

33 INDCJournal

34RealClearPolitics

35Winds ofChange

36Allahpundit

37MichelleMalkin

38WizBang

39Dean’sWorld

40Volokh(C)

(B)

only 15% of the citations

bridge communities

Network Structure: CommunitiesNetwork Structure: Communities

Community detection algorithmsCommunity detection algorithms

Hierarchical clustering

Michelle Girvan & Mark E.J. Newman:Michelle Girvan & Mark E.J. Newman:“Community structure in social and biological networks” “Community structure in social and biological networks” PNASPNAS 9999(12):7821(12):7821––7826, 2002 7826, 2002 doi:10.1073/pnas.122653799doi:10.1073/pnas.122653799 7171

Network Structure: CommunitiesNetwork Structure: Communities

BetweennessBetweenness clusteringclustering

Hierarchical clustering using edge betweenness

compute the compute the betweennessbetweenness of all edgesof all edges

while (while (betweennessbetweenness of any edge > threshold)of any edge > threshold)

remove edge with highest remove edge with highest betweennessbetweenness

recalculate recalculate betweennessbetweenness

� Betweenness clustering is inefficient due to the needto recompute edge betweenness in every iteration.

7272

Network Structure: CommunitiesNetwork Structure: Communities

Modularity clusteringModularity clustering

�� Consider links that fall within a community (vs. links Consider links that fall within a community (vs. links between a community and the rest of the network)between a community and the rest of the network)

Modularity QModularity Q if vertices are in the same community

Modularity QModularity Q

NOTE: For a random network, Q=0NOTE: For a random network, Q=0 7373

),(22

1wv

vw

wvvw cc

m

kkA

mQ δ∑

−=

probability of an edge betweentwo vertices is proportional to their degrees

adjacency matrix

same community

Network Structure: CommunitiesNetwork Structure: Communities

Modularity clusteringModularity clustering

AlgorithmAlgorithm

start with all vertices as isolatesstart with all vertices as isolates

dodo

join clusters with the greatest increase in modularity (join clusters with the greatest increase in modularity (∆∆Q)Q)

while (while (∆∆Q > 0)Q > 0)

Aaron Aaron ClausetClauset, Mark E. J. Newman, , Mark E. J. Newman, CristopherCristopher Moore:Moore:“Finding community structure in very large networks”“Finding community structure in very large networks”PhysicalPhysical ReviewReview E 70(6):066111, 2004 E 70(6):066111, 2004 doi:10.1103/physreve.70.066111doi:10.1103/physreve.70.066111 7474

Network Structure: CommunitiesNetwork Structure: Communities

Modularity clusteringModularity clustering

An application: Visualization of large networks (An application: Visualization of large networks (GephiGephi))

7575

Network Structure: CommunitiesNetwork Structure: Communities

Limitations of current community detection methodsLimitations of current community detection methods

�� Scalability: Identification of large communities.Scalability: Identification of large communities.

�� Existence of overlapping communities in large networks.Existence of overlapping communities in large networks.

�� Unrealistic models (algorithms make oversimplified assumptionsUnrealistic models (algorithms make oversimplified assumptionsover the networks or community structures, but perform poorly over the networks or community structures, but perform poorly against real world data sets).against real world data sets).

�� Heuristics without performance guarantees (for those heuristic Heuristics without performance guarantees (for those heuristic algorithms that work well in practice, algorithms that work well in practice, therethere isis no performance no performance guarantee over the quality of their output).guarantee over the quality of their output).

7676

BibliographyBibliography

Networks: Origins & Applications (social networks, Web…)Networks: Origins & Applications (social networks, Web…)�� Stanley Milgram: Stanley Milgram: The small world problemThe small world problem. .

Psychology Today, 2:60Psychology Today, 2:60--67 (1967)67 (1967)

�� Phillip W. Anderson: Phillip W. Anderson: More is differentMore is different..Science, 177:393Science, 177:393--396 (1972)396 (1972)

�� Mark S. Granovetter: Mark S. Granovetter: The strength of weak tiesThe strength of weak ties..American Journal of Sociology, 78:1360American Journal of Sociology, 78:1360--1380 (1973)1380 (1973)

�� Stanley Wasserman & Katherine Faust: Stanley Wasserman & Katherine Faust: Social Network Analysis: Methods and Social Network Analysis: Methods and �� Stanley Wasserman & Katherine Faust: Stanley Wasserman & Katherine Faust: Social Network Analysis: Methods and Social Network Analysis: Methods and ApplicationsApplications. Cambridge University Press, 1994. Cambridge University Press, 1994

�� John P. Scott: John P. Scott: Social Network AnalysisSocial Network Analysis, 2nd edition., 2nd edition.Sage Publications Ltd., 2000. Sage Publications Ltd., 2000.

�� Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins & Janet Wiener: Raymie Stata, Andrew Tomkins & Janet Wiener: Graph structure in the WebGraph structure in the Web. Computer . Computer Networks 33:309Networks 33:309––320 (2000)320 (2000)

�� Steven H. Strogatz: Steven H. Strogatz: Exploring Complex NetworksExploring Complex Networks..Nature, 410:268Nature, 410:268--275 (2001)275 (2001)

�� AlbertAlbert--Laszlo Barabasi: Laszlo Barabasi: Linked: How Everything Is Connected to Everything Else and Linked: How Everything Is Connected to Everything Else and What It MeansWhat It Means. Plume, 2003. ISBN 0452284392. Plume, 2003. ISBN 0452284392

�� Duncan J. Watts: Duncan J. Watts: Six Degrees: The Science of a Connected AgeSix Degrees: The Science of a Connected Age. W. W. Norton & . W. W. Norton & Company, 2004. ISBN 0393325423Company, 2004. ISBN 0393325423

�� Jure Leskovec, Jon M. Kleinberg & Christos Faloutsos: Jure Leskovec, Jon M. Kleinberg & Christos Faloutsos: Graphs over time: densification Graphs over time: densification laws, shrinking diameters and possible explanationslaws, shrinking diameters and possible explanations. KDD'2005. KDD'2005

7777

BibliographyBibliography

Network ModelsNetwork Models�� Paul ErdPaul Erdöös & Alfred Rs & Alfred Réényi: nyi: On the evolution of random graphsOn the evolution of random graphs..

Mathematical Institute of the Hungarian Academy of Sciences, 5:17Mathematical Institute of the Hungarian Academy of Sciences, 5:17--61 (1960) 61 (1960) reprinted in Duncan, Barabasi & Watts (eds.): reprinted in Duncan, Barabasi & Watts (eds.): ““The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks””

�� Ray Solomonoff & Anatol Rapoport: Ray Solomonoff & Anatol Rapoport: Connectivity of random netsConnectivity of random nets..Bulletin of Mathematical Biophysics, 13:107Bulletin of Mathematical Biophysics, 13:107--117 (1951)117 (1951)reprinted in Duncan, Barabasi & Watts (eds.): reprinted in Duncan, Barabasi & Watts (eds.): ““The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks””

�� Duncan J. Watts & Steven H. Strogatz: Duncan J. Watts & Steven H. Strogatz: Collective dynamics of Collective dynamics of ‘‘smallsmall--worldworld’’ networksnetworks..�� Duncan J. Watts & Steven H. Strogatz: Duncan J. Watts & Steven H. Strogatz: Collective dynamics of Collective dynamics of ‘‘smallsmall--worldworld’’ networksnetworks..Nature, 393:440Nature, 393:440--442 (1998)442 (1998)

�� AlbertAlbert--LLáászlszlóó BarabBarabáási & Rsi & Rééka Albert: ka Albert: Emergence of scaling in random networksEmergence of scaling in random networks..Science, 286:509Science, 286:509--512 (1999)512 (1999)

�� RRééka Albert, Hawoong Jeong & Albertka Albert, Hawoong Jeong & Albert--LLáászlszlóó BarabBarabáási: si: Error and attack tolerance of Error and attack tolerance of

complex networkscomplex networks. Nature 406:378. Nature 406:378--382 (2000)382 (2000)

�� M.E.J. Newman, S.H. Strogatz & D.J. Watts: M.E.J. Newman, S.H. Strogatz & D.J. Watts: Random graphs with arbitrary degree Random graphs with arbitrary degree distributions and their applicationsdistributions and their applications. Physical Review E, 64:026118 (2001). Physical Review E, 64:026118 (2001)

�� M.E.J. Newman, S.H. Strogatz & D.J. Watts: M.E.J. Newman, S.H. Strogatz & D.J. Watts: Random graphs models of social networksRandom graphs models of social networks. . PNAS 99:2566PNAS 99:2566--2572 (2002)2572 (2002)

�� ErzsErzséébet Ravasz & Albertbet Ravasz & Albert--LLáászlszlóó BarabBarabáási: si: Hierarchical organization in complex Hierarchical organization in complex

networksnetworks. Physical Review E, 67:026112 (2003). Physical Review E, 67:026112 (2003)

�� Mark Newman: Mark Newman: The structure and function of complex networksThe structure and function of complex networks. SIAM Review . SIAM Review 45:16745:167--256 (2003)256 (2003) 7878

BibliographyBibliography

Community DetectionCommunity Detection�� Michelle Girvan & Mark E.J. Newman: Michelle Girvan & Mark E.J. Newman: Community structure in social and biological Community structure in social and biological

networksnetworks, PNAS, PNAS 99(12):782199(12):7821––7826 (2002)7826 (2002)

�� Aaron Aaron ClausetClauset, Mark E. J. Newman & , Mark E. J. Newman & CristopherCristopher Moore: Moore: Finding community structure in Finding community structure in very large networksvery large networks, , PhysicalPhysical ReviewReview E 70(6):066111 (2004)E 70(6):066111 (2004)

�� GergelyGergely PallaPalla, , ImreImre DerényiDerényi, , IllésIllés FarkasFarkas & & TamásTamás VicsekVicsek: : Uncovering the overlapping Uncovering the overlapping community structure of complex networks in nature and societycommunity structure of complex networks in nature and society, , Nature 435:814Nature 435:814--818 (2005)818 (2005)Nature 435:814Nature 435:814--818 (2005)818 (2005)

�� Jure Jure LeskovecLeskovec, Kevin J. Lang, , Kevin J. Lang, AnirbanAnirban DasguptaDasgupta & Michael W. Mahoney: & Michael W. Mahoney: Statistical Statistical Properties of Community Structure in Large Social and Information NetworksProperties of Community Structure in Large Social and Information Networks, , International World Wide Web Conference, WWW’08 (2008International World Wide Web Conference, WWW’08 (2008). Extended version:). Extended version: Community Community structure in large networks: Natural cluster sizes and the absence of large wellstructure in large networks: Natural cluster sizes and the absence of large well--defined clustersdefined clusters, arxiv:0810.1355 (2008), arxiv:0810.1355 (2008)

�� Martin Martin RosvallRosvall & Carl T. Bergstrom: & Carl T. Bergstrom: Maps of random walks on complex networks Maps of random walks on complex networks reveal community structurereveal community structure, PNAS 105(4):1118, PNAS 105(4):1118--1123 (20081123 (2008))

�� Jure Jure LeskovecLeskovec, Kevin J. , Kevin J. Lang & Lang & Michael W. Mahoney:Michael W. Mahoney: Empirical Empirical Comparison of Comparison of Algorithms Algorithms for for Network Network CommunityCommunity DetectionDetection, , WWW 2010 WWW 2010 ((2010)2010)

�� Santo Santo FortunatoFortunato: : Community detection in graphsCommunity detection in graphs. . Physics Reports, 486(3Physics Reports, 486(3--5):755):75--174, (2010).174, (2010).

�� S. S. AroraArora, R. , R. GeGe, S. , S. SachdevaSachdeva & G. & G. SchoenebeckSchoenebeck: : Finding overlapping communities in Finding overlapping communities in social networks: toward a rigorous approachsocial networks: toward a rigorous approach. Proceedings of the 13th ACM Conference . Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pp. 37on Electronic Commerce, EC ’12, pp. 37--54 (2012)54 (2012)

7979

BibliographyBibliography

Search on NetworksSearch on Networks�� Sergey Brin & Lawrence Page: Sergey Brin & Lawrence Page: The anatomy of a largeThe anatomy of a large--scale hypertextual Web search scale hypertextual Web search

engineengine. Computer Networks and ISDN Systems, April 1998. Computer Networks and ISDN Systems, April 1998

�� David Gibson, Jon M. Kleinberg & Prabhakar Raghavan: David Gibson, Jon M. Kleinberg & Prabhakar Raghavan: Inferring Web Communities from Inferring Web Communities from Link TopologyLink Topology. ACM Conference on Hypertext and Hypermedia, June 1998. ACM Conference on Hypertext and Hypermedia, June 1998

�� Jon M. Kleinberg: Jon M. Kleinberg: Authoritative sources in a hyperlinked environmentAuthoritative sources in a hyperlinked environment. . Journal of the ACM, September 1999Journal of the ACM, September 1999

�� Toby Walsh: Toby Walsh: Search in a Small WorldSearch in a Small World. IJCAI. IJCAI’’19991999�� Toby Walsh: Toby Walsh: Search in a Small WorldSearch in a Small World. IJCAI. IJCAI’’19991999

�� Jon M. Kleinberg. Jon M. Kleinberg. Navigation in a Small WorldNavigation in a Small World. Nature, August 2000.. Nature, August 2000.

�� Jon M. Kleinberg: Jon M. Kleinberg: The smallThe small--world phenomenon: An algorithm perspectiveworld phenomenon: An algorithm perspective. . STOCSTOC’’2000 2000

�� Scott White & Scott White & PadhraicPadhraic Smyth: Smyth: Algorithms for Estimating Relative Importance in Algorithms for Estimating Relative Importance in NetworksNetworks. KDD'2003. KDD'2003

�� HanghangHanghang Tong & Christos Tong & Christos FaloutsosFaloutsos: : CenterCenter--Piece Piece SubgraphsSubgraphs: Problem Definition and : Problem Definition and Fast SolutionsFast Solutions. KDD'2006. KDD'2006

�� AlekhAlekh AgarwalAgarwal, , SoumenSoumen ChakrabartiChakrabarti & Sunny & Sunny AggarwalAggarwal: : Learning to Rank Networked Learning to Rank Networked EntitiesEntities. KDD'2006. KDD'2006

�� Jeffrey Jeffrey DavitzDavitz, , JiyeJiye Yu, Yu, SugatoSugato BasuBasu, David , David GuteliusGutelius & Alexandra Harris: & Alexandra Harris: iLinkiLink: Search and : Search and Routing in Social NetworksRouting in Social Networks. KDD'2007. . KDD'2007.

8080

BibliographyBibliography

�� David Easley & Jon Kleinberg: David Easley & Jon Kleinberg: Networks, Crowds, and Networks, Crowds, and Markets: Reasoning About a Highly Connected WorldMarkets: Reasoning About a Highly Connected World. . Cambridge University Press, 2010. ISBN Cambridge University Press, 2010. ISBN 05211953300521195330http://www.cs.cornell.edu/home/kleinber/networkshttp://www.cs.cornell.edu/home/kleinber/networks--book/book/

�� Mark Newman: Mark Newman: Networks: An IntroductionNetworks: An Introduction. . Oxford University Press, 2010. ISBN Oxford University Press, 2010. ISBN 00--1919--920665920665--11

�� Matthew O. Jackson: Matthew O. Jackson: Social and Economic NetworksSocial and Economic Networks, , Princeton University Press, 2008. ISBN Princeton University Press, 2008. ISBN 00--691691--1344013440--55

8181

BibliographyBibliography

�� JiaweiJiawei Han & Han & MichelineMicheline KamberKamber: : Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques [2[2ndnd edition],edition],section 9.2. Addisonsection 9.2. Addison--Wesley, 2006. ISBN 1Wesley, 2006. ISBN 1--5586055860--901901--33

�� Mark Newman, AlbertMark Newman, Albert--Laszlo Laszlo BarabasiBarabasi & Duncan J. Watts & Duncan J. Watts (editors): (editors): The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks. . Princeton University Press, 2006. ISBN 0Princeton University Press, 2006. ISBN 0--691691--1135711357--22

�� Ted G. Lewis: Ted G. Lewis: Network Science: Theory and ApplicationsNetwork Science: Theory and Applications..Wiley, 2009. ISBN 0Wiley, 2009. ISBN 0--470470--3318833188--7 7

8282

BibliographyBibliography

�� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Linked: How Everything Is Linked: How Everything Is Connected to Everything Else and What It MeansConnected to Everything Else and What It Means. . Plume, 2003. ISBN 0452284392Plume, 2003. ISBN 0452284392

�� Duncan J. Watts: Duncan J. Watts: Six Degrees: The Science of a Connected Six Degrees: The Science of a Connected AgeAge. W. W. Norton & Company, 2004. ISBN 0393325423. W. W. Norton & Company, 2004. ISBN 0393325423

�� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Bursts: The Hidden Pattern Behind Bursts: The Hidden Pattern Behind �� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Bursts: The Hidden Pattern Behind Bursts: The Hidden Pattern Behind Everything We DoEverything We Do. Dutton, 2010. ISBN . Dutton, 2010. ISBN 05259516010525951601

8383