networks - ugrelvex.ugr.es/idbis/dm/slides/50 graph mining - networks.pdf · nbr(u) nbr(u)...
TRANSCRIPT
NetworksNetworks© Fernando Berzal© Fernando Berzal [email protected]@acm.org
NetworksNetworks
�� Network AnalysisNetwork Analysis�� ApplicationsApplications
�� Network PropertiesNetwork Properties
�� Network ModelsNetwork Models�� RandomRandom--Graph ModelsGraph Models
�� Growing Random ModelsGrowing Random Models
�� Strategic Network FormationStrategic Network Formation
�� Network Structure & DynamicsNetwork Structure & Dynamics�� Network CentralityNetwork Centrality
�� Community DetectionCommunity Detection
�� Diffusion through NetworksDiffusion through Networks
�� Search on NetworksSearch on Networks
�� BibliographyBibliography 11
Network Network AnalysisAnalysis
Networks permeate our lives.Networks permeate our lives.
Networks play a central role in determiningNetworks play a central role in determiningNetworks play a central role in determiningNetworks play a central role in determining
�� the transmission of information about job opportunities,the transmission of information about job opportunities,
�� how diseases spread,how diseases spread,
�� which products we buy,which products we buy,
�� our likelihood of succeeding professionally,our likelihood of succeeding professionally,
�� ……
22
Network Network AnalysisAnalysis
As a field of study…As a field of study…
�� How relationships between parts give rise to the How relationships between parts give rise to the collective behaviors of a system and how the system collective behaviors of a system and how the system collective behaviors of a system and how the system collective behaviors of a system and how the system interacts and forms relationships with its environment interacts and forms relationships with its environment (complex systems).(complex systems).
�� Common principles, algorithms and tools that govern Common principles, algorithms and tools that govern network behavior (network science). network behavior (network science).
33
Origins: Graph TheoryOrigins: Graph Theory
Network Network AnalysisAnalysis
The Seven Bridges of KönisbergThe Seven Bridges of Könisberg
(Leonhard Euler, 1736) (Leonhard Euler, 1736) 44
Network Network AnalysisAnalysis
Networks as graphs “on steroids”…Networks as graphs “on steroids”…
�� ObjectsObjects: Graph vertices.: Graph vertices.
�� Objects can be of different kinds.Objects can be of different kinds.
�� Objects can be labeled.Objects can be labeled.�� Objects can be labeled.Objects can be labeled.
�� Objects can have attributesObjects can have attributes
�� LinksLinks between objects: Graph edges. between objects: Graph edges.
�� Links can be of different kinds.Links can be of different kinds.
�� Links can be directed (arcs) or undirected (edges).Links can be directed (arcs) or undirected (edges).
�� Links can have attributes.Links can have attributes.55
A formal definition of network A formal definition of network
[Ted G. Lewis: “Network Science,” 2009][Ted G. Lewis: “Network Science,” 2009]
G(t) = { N(t), L(t), f(t) : J(t) }G(t) = { N(t), L(t), f(t) : J(t) }
Network Network AnalysisAnalysis
wherewhere t = time (simulated or real)t = time (simulated or real)
N = nodes (a.k.a. vertices or “actors”)N = nodes (a.k.a. vertices or “actors”)
L = links (a.k.a. edges)L = links (a.k.a. edges)
f = topology (connections through links)f = topology (connections through links)
J = behavior of nodes and links (algorithm)J = behavior of nodes and links (algorithm)
66
Network Network AnalysisAnalysis
An interdisciplinary field: Complex systems An interdisciplinary field: Complex systems
(“networks of heterogeneous components that interact”)(“networks of heterogeneous components that interact”)
�� Physics: Physics: Nonlinear dynamics & chaosNonlinear dynamics & chaos..Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions Dynamical systems that are highly sensitive to initial conditions (a.k.a. butterfly effect).(a.k.a. butterfly effect).
�� Economics: Economics: MarketsMarkets..Spontaneous (or emergent) order as the result of human action, Spontaneous (or emergent) order as the result of human action, but not the execution of any human design [Austrian perspective].but not the execution of any human design [Austrian perspective].
�� Information theory: Information theory: Complex adaptive systemsComplex adaptive systems..(focus on the ability to change and learn from experience).(focus on the ability to change and learn from experience). 77
ApplicationsApplications
�� “Cheminformatics”: Chemical compounds.“Cheminformatics”: Chemical compounds.
�� “Bioinformatics”: Protein networks & bio“Bioinformatics”: Protein networks & bio--pathwayspathways
�� Software Engineering: Program analysis…Software Engineering: Program analysis…
�� Network flow analysis (transport, workflows…)Network flow analysis (transport, workflows…)�� Network flow analysis (transport, workflows…)Network flow analysis (transport, workflows…)
�� SemiSemi--structured databases, e.g. XMLstructured databases, e.g. XML
�� Knowledge management: Ontologies & semantic netsKnowledge management: Ontologies & semantic nets
�� ComputerComputer--aided design (CAD): IC design…aided design (CAD): IC design…
�� Geographic information systems (GIS) & cartographyGeographic information systems (GIS) & cartography
�� Social networks, e.g. WebSocial networks, e.g. Web
�� Economic networks, e.g. marketsEconomic networks, e.g. markets88
ApplicationsApplications
“Life complexity pyramid”“Life complexity pyramid”
99from
Z.N
. O
ltva
i and A
.-L.
Bara
basi
. S
cience
, 2002
ApplicationsApplications
Biological networksBiological networks
Gene-protein
GENOME
1010
Gene-protein interactions
Protein-protein interactions
PROTEOME
Citrate Cycle
METABOLISM
Biochemical reactions
ApplicationsApplications
Yeast protein interaction networkYeast protein interaction network
1111from
H.
Jeong e
t al N
atu
re 4
11, 41 (
2001)
ApplicationsApplications
Ecological networkEcological network: Trophic relationships in a food web.: Trophic relationships in a food web.
1212
ApplicationsApplications
Telecommunication networkTelecommunication network
1313
ApplicationsApplications
InternetInternet
1414
ApplicationsApplications
World Wide WebWorld Wide Web
1515
ApplicationsApplications
Social networkSocial network: Bibliographic network (coauthors): Bibliographic network (coauthors)
1616
ApplicationsApplications
Social networkSocial network: Bibliographic network (coauthors): Bibliographic network (coauthors)
1717
ApplicationsApplications
Social networkSocial network: FOAF (“friend of a friend”): FOAF (“friend of a friend”)
1818
ApplicationsApplications
Social networkSocial network: Organization: Organization
1919
ApplicationsApplications
Social networkSocial network: E: E--mail spectroscopymail spectroscopy
2020
ApplicationsApplications
Social networkSocial network: US Biotech Industry: US Biotech Industry
2121
Network PropertiesNetwork Properties
Common network features:Common network features:
�� Large scale.Large scale.
�� Continuous evolution.Continuous evolution.
�� Distribution (nodes decide their connections).Distribution (nodes decide their connections).
�� Interactions only through existing links.Interactions only through existing links.
2222
Network PropertiesNetwork Properties
Some interesting structural properties:Some interesting structural properties:
�� Connected components: How many? Of what size?.Connected components: How many? Of what size?.
�� Network diameter: Average distance, worst case…Network diameter: Average distance, worst case…
�� Node degree distribution Node degree distribution & existence of “hubs” (heavily& existence of “hubs” (heavily--connected nodes).connected nodes).
�� Groupings (balance between local and largeGroupings (balance between local and large--distance distance connections, as well as their roles).connections, as well as their roles). 2323
Network PropertiesNetwork Properties
Network ConnectivityNetwork Connectivity
WWWWWW
2424
Network PropertiesNetwork Properties
Network DiameterNetwork Diameter
“small worlds”“small worlds”
Sarah
2525
Tim
Jane
Sarah
Ralph
Network PropertiesNetwork Properties
Clustering coefficientClustering coefficient
nbr(u) nbr(u) Neighbors of the node u in the network.Neighbors of the node u in the network.
kk Number of neighbors of u, i.e. |nbr(u)|.Number of neighbors of u, i.e. |nbr(u)|.
max(u)max(u) Maximum number of links among the Maximum number of links among the max(u)max(u) Maximum number of links among the Maximum number of links among the neighbors of u, e.g. k*(kneighbors of u, e.g. k*(k--1)/2.1)/2.
Clustering coefficient for the node u:Clustering coefficient for the node u:c(u) = (#links among neighbors of u) / max(u)c(u) = (#links among neighbors of u) / max(u)
Clustering coefficient for the graph G:Clustering coefficient for the graph G:C = average of c(u) for every node in GC = average of c(u) for every node in G 2626
Network PropertiesNetwork Properties
Clustering coefficientClustering coefficient
k = 4k = 4
m = 6m = 6
uc(u) = 4/6 = 0.66c(u) = 4/6 = 0.66
0 <= c(u) <= 10 <= c(u) <= 1
Similarity of u neighbors to a clique (complete graph).Similarity of u neighbors to a clique (complete graph).
Informal interpretation:Informal interpretation:
“My friends tend to be friends among them.”“My friends tend to be friends among them.”2727
Network PropertiesNetwork Properties
Clustering coefficient Clustering coefficient for some real networksfor some real networks
Network N C Crand L
WWW 153127 0.1078 0.00023 3.1
Internet 3015-6209 0.18-0.30 0.001 3.7-3.76
Actor 225226 0.79 0.00027 3.65
Clustering coefficient (C):Clustering coefficient (C): C>CC>Crandrand
Path length (L): Path length (L): L<LL<Lrandrand2828
Actor 225226 0.79 0.00027 3.65
Coauthorship 52909 0.43 0.00018 5.9
Metabolic 282 0.32 0.026 2.9
Foodweb 134 0.22 0.06 2.43
C. elegance 282 0.28 0.05 2.65
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
Normal distributionNormal distribution
Parameters: Average & deviationParameters: Average & deviation
2929
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
Poisson distributionPoisson distribution
Single parameter: Single parameter: λλ (mean (mean & deviation)& deviation)
3030
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
Pareto distribution (a.k.a. “power law”)Pareto distribution (a.k.a. “power law”)
Single parameter: Single parameter: αααααααα
P(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------ααααααααP(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------αααααααα
The Pareto principle (the “80The Pareto principle (the “80--20 rule”):20 rule”):
20% of the population controls 80% of the wealth.20% of the population controls 80% of the wealth.3131
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
HubsHubs
Small number of nodes with a very high degree.Small number of nodes with a very high degree.
�� Hubs appear with power laws (Hubs appear with power laws (P(x) P(x) ~ x~ x~ x~ x~ x~ x~ x~ x--------αααααααα),),but not with normal/binomial/Poisson distributions.but not with normal/binomial/Poisson distributions. 3232
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
LogLog--log plotlog plot
�� Pareto distribution Pareto distribution �� log(Pr[X = x]) = log(1/xlog(Pr[X = x]) = log(1/xαα) = ) = --αα log(x) log(x) log(Pr[X = x]) = log(1/xlog(Pr[X = x]) = log(1/x ) = ) = --αα log(x) log(x)
�� Linear, Linear, ––α α slope.slope.
�� Normal distributionNormal distribution�� log(Pr[X = x]) = log(a exp(log(Pr[X = x]) = log(a exp(--xx22/b)) = log(a) /b)) = log(a) –– xx22/b/b
�� Nonlinear, concave around the average.Nonlinear, concave around the average.
�� Poisson distributionPoisson distribution�� log(Pr[X = x]) = log(exp(log(Pr[X = x]) = log(exp(--λλ) ) λλxx/x!) /x!)
�� Nonlinear.Nonlinear. 3333
Network PropertiesNetwork Properties
Node degree distributionNode degree distribution
LogLog--log plotlog plot
aa WWWWWWpower lawpower law
bb Coauthorship networksCoauthorship networkspower law with exponential cutoffpower law with exponential cutoff
cc Power gridPower gridexponentialexponential
dd Social networkSocial networkGaussianGaussian
3434
Network ModelsNetwork Models
“Natural” networks tend to have…“Natural” networks tend to have…
�� One (or a few) connected components.One (or a few) connected components.�� Independent of network size.Independent of network size.
�� A small diameter (“six degrees of separation”).A small diameter (“six degrees of separation”).�� A small diameter (“six degrees of separation”).A small diameter (“six degrees of separation”).�� Constant, logarithmically increasing, Constant, logarithmically increasing,
or even decreasing with network size.or even decreasing with network size.
�� High clustering (“communities”).High clustering (“communities”).�� Much larger than expected from a random networkMuch larger than expected from a random network
(and, even so, with a small diameter!).(and, even so, with a small diameter!).
�� A mixture of connections.A mixture of connections.�� Local vs. “longLocal vs. “long--distance” connectionsdistance” connections
Do they share some “universal” features?Do they share some “universal” features? 3535
Network ModelsNetwork Models
�� Random networks.Random networks.
�� RandomRandom--biased networks.biased networks.
�� SmallSmall--world networks.world networks.
�� ScaleScale--free networks.free networks.
�� Hierarchical & modular networks.Hierarchical & modular networks.
�� Affiliation networks.Affiliation networks.3636
Network ModelsNetwork Models
Random NetworksRandom Networks
ErdösErdös--Rényi modelRényi model
�� Small number of connected components (typically one).Small number of connected components (typically one).
�� Low clustering coefficient.Low clustering coefficient.
�� Poisson distribution.Poisson distribution.�� Poisson distribution.Poisson distribution.
3737
Network ModelsNetwork Models
Random NetworksRandom Networks
ErdösErdös--Renyi modelRenyi model
Princi
pal co
mponent
3838
Number of links
Princi
pal co
mponent
Network ModelsNetwork Models
Random NetworksRandom Networks
Example: Romantic relationships in the Add Health data set.Example: Romantic relationships in the Add Health data set.
Peter S. Bearman, James Moody & Katherine Stovel:Peter S. Bearman, James Moody & Katherine Stovel:
“Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks”“Chains of Affection: The Structure of Adolescent Romantic and Sexual Networks”
American Journal of Sociology, 110(1):44American Journal of Sociology, 110(1):44––91, July 200491, July 2004
3939
Network ModelsNetwork Models
SmallSmall--World NetworksWorld Networks
Watts & Strogatz modelWatts & Strogatz model
�� Small number of connected components (typically one).Small number of connected components (typically one).
�� Small diameter.Small diameter.
�� Poisson distribution.Poisson distribution.�� Poisson distribution.Poisson distribution.
�� High clustering coefficient.High clustering coefficient.
4040
Network ModelsNetwork Models
SmallSmall--World NetworksWorld Networks
Watts & Strogatz modelWatts & Strogatz model
Average path length, normalized by system size, Average path length, normalized by system size, plotted as a function of the average number of shortcuts.plotted as a function of the average number of shortcuts. 4141
Network ModelsNetwork Models
ScaleScale--Free NetworksFree Networks
Barabási & Albert modelBarabási & Albert model
�� Small number of connected components (typically one).Small number of connected components (typically one).
�� Small diameter.Small diameter.
�� Pareto distribution.Pareto distribution.�� Pareto distribution.Pareto distribution.
�� Small clustering coefficient.Small clustering coefficient.
�� Hubs.Hubs.
4242
Network ModelsNetwork Models
ScaleScale--Free NetworksFree Networks
Barabási & Albert modelBarabási & Albert model
“Natural” interpretation of the model:
�� Variable number of nodes: Variable number of nodes: Network grows as new nodes are added.Network grows as new nodes are added.
�� Preferential attachmentPreferential attachment::The more connected a node is, The more connected a node is, the more likely it is to receive new linksthe more likely it is to receive new links(“rich get richer” or Matthew effect).(“rich get richer” or Matthew effect).
4343
Network ModelsNetwork Models
ScaleScale--Free NetworksFree Networks
Barabási & Albert modelBarabási & Albert model
Exponential model… Scale-free model…
… without hubs. … with hubs. 4444
Network ModelsNetwork Models
PoissonPoisson Pareto (power law)Pareto (power law)
4545
Network ModelsNetwork Models
ScaleScale--Free NetworksFree Networks
FeaturesFeatures
�� SelfSelf--organization traits: Links are not randomorganization traits: Links are not random(a feature found in many complex systems).(a feature found in many complex systems).(a feature found in many complex systems).(a feature found in many complex systems).
�� Tolerance to random attacks, which easily disrupt Tolerance to random attacks, which easily disrupt random networks but not scalerandom networks but not scale--free networks.free networks.
�� Vulnerability to targeted attacks: Vulnerability to targeted attacks: “Hubs” are essential to maintain connectedness.“Hubs” are essential to maintain connectedness.
4646
Network ModelsNetwork Models
4747
Network ModelsNetwork Models
Hierarchical/Modular NetworksHierarchical/Modular Networks
�� Hierarchical organization.Hierarchical organization.
�� Hubs.Hubs.
�� CliquesCliques..
4848
Network ModelsNetwork Models
Hierarchical/Modular NetworksHierarchical/Modular Networks
4949
Network ModelsNetwork Models
Affiliation NetworksAffiliation Networks
Bipartite graph to model social interactionsBipartite graph to model social interactions::
5050
Network ModelsNetwork Models
Affiliation NetworksAffiliation Networks
5151
Network ModelsNetwork Models
5252
Network Structure & DynamicsNetwork Structure & Dynamics
The countless ways in which network structuresThe countless ways in which network structuresaffect our lives make it critical to understand:affect our lives make it critical to understand:
1.1. How network structure affect behavior.How network structure affect behavior.
2.2. Which network structure is likely to emerge.Which network structure is likely to emerge.
5353
Network Structure & DynamicsNetwork Structure & Dynamics
A complex system is a system composed of A complex system is a system composed of interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or interconnected parts that, as a whole, exhibit one or more properties (behavior) not obvious from the more properties (behavior) not obvious from the properties of the individual parts (i.e. emergence).properties of the individual parts (i.e. emergence).
5454
Network Structure & DynamicsNetwork Structure & Dynamics
Research problemsResearch problems
�� Search on networks Search on networks (with partial local information)(with partial local information)
�� Diffusion problems:Diffusion problems:epidemics, social contagion (ideas, fads, products…)epidemics, social contagion (ideas, fads, products…)
�� Analysis of network propertiesAnalysis of network propertiese.g. robustness/vulnerabilitye.g. robustness/vulnerability
5555
Network Structure & DynamicsNetwork Structure & Dynamics
From an algorithmic point of view…From an algorithmic point of view…
�� Objects: Objects: �� Ranking (HITS, PageRank…).Ranking (HITS, PageRank…).
�� Classification & anomaly detection.Classification & anomaly detection.
Clustering & community detection.Clustering & community detection.�� Clustering & community detection.Clustering & community detection.
�� Object identification (e.g. “entity resolution”).Object identification (e.g. “entity resolution”).
�� Links:Links:�� Link prediction.Link prediction.
�� Graphs:Graphs:�� Subgraph detection.Subgraph detection.
�� Graph classification.Graph classification.
�� Graph generation models.Graph generation models.
�� …… 5656
Network Structure: CentralityNetwork Structure: Centrality
Different notions of centralityDifferent notions of centrality
In each of the following networks, X has higher In each of the following networks, X has higher centrality than Y according to a particular measurecentrality than Y according to a particular measure
inin--degreedegree outout--degreedegree betwennessbetwenness closenesscloseness
[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]5757
Y
X
Y
X
Y X
Y
X
Network Structure: CentralityNetwork Structure: Centrality
DegreeDegree
Nodes with more connections are more central…Nodes with more connections are more central…
[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]] 5858
Network Structure: CentralityNetwork Structure: Centrality
DegreeDegree
Nodes with more connections are more central…Nodes with more connections are more central…
[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]] 5959
Normalization
BetweennessBetweenness
Brokerage (not captured by degree)Brokerage (not captured by degree)
Network Structure: CentralityNetwork Structure: Centrality
IDEA: IDEA: How many pairs of individuals would haveHow many pairs of individuals would have
to go through you in order to reach one anotherto go through you in order to reach one another
in the minimum number of hops?in the minimum number of hops? 6060
BetweennessBetweenness
Brokerage (not captured by degree)Brokerage (not captured by degree)
Network Structure: CentralityNetwork Structure: Centrality
Partial credit for lying in one of several shortest paths…Partial credit for lying in one of several shortest paths… 6161
Network Structure: CentralityNetwork Structure: Centrality
ClosenessCloseness
When it is not so important to have many connections, When it is not so important to have many connections,
nor be between others, but be in the middle of things... nor be between others, but be in the middle of things...
not too far from the center.not too far from the center.
Y
[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]6262
Y
X
Y
X
Y
X
Network Structure: CentralityNetwork Structure: Centrality
ClosenessCloseness
based on the length of the average shortest path based on the length of the average shortest path between a node and all other nodes in the network.between a node and all other nodes in the network.
6363
Network Structure: CentralityNetwork Structure: Centrality
PageRankPageRank
A random walkerA random walker following links in a network for a very following links in a network for a very
long time will spend a fraction of time at each node long time will spend a fraction of time at each node
that can be used as a measure of its importancethat can be used as a measure of its importance..
[[LadaLada AdamicAdamic, “Social Network Analysis”, , “Social Network Analysis”, https://www.coursera.org/course/snahttps://www.coursera.org/course/sna]]6464
Network Network StructureStructure: : CentralityCentrality
PageRankPageRank
ProblemProblem: : StuckStuck in in thethe networknetwork
SolutionSolution: : TeleportationTeleportationSolutionSolution: : TeleportationTeleportation
A A randomrandom jumpjump toto anywhereanywhere elseelse withwith a a givengiven probabilityprobability..
6565
1
2
3 4
5
7
6 8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
Pag
eR
an
k
t=0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
Pag
eR
an
k
t=1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8
Pag
eR
an
k
t=10
Network Structure: CommunitiesNetwork Structure: Communities
Community detection (i.e. clustering)Community detection (i.e. clustering)
Identification of groups of nodes within a network…Identification of groups of nodes within a network…
David Easley & Jon Kleinberg: David Easley & Jon Kleinberg: “Networks, Crowds & Markets: Reasoning About a Highly Connected World”, “Networks, Crowds & Markets: Reasoning About a Highly Connected World”, http://www.cs.cornell.edu/home/kleinber/networkshttp://www.cs.cornell.edu/home/kleinber/networks--book/book/ 6666
Network Structure: CommunitiesNetwork Structure: Communities
HeuristicsHeuristics
� Mutual ties
� Frequency of ties within a community (cliques & k-cores)� Frequency of ties within a community (cliques & k-cores)
� Closeness/reachability of community members (n-cliques)
� Relative frequency of ties within a community (ties among members compared to ties to non-members)
6767
Network Structure: CommunitiesNetwork Structure: Communities
Cliques & kCliques & k--corescores
� Cliques (complete subgraphs)
� A single missing links disqualifies the clique
� Overlapping cliques� Overlapping cliques
� K-cores(every node connected to at least k other nodes)
6868
Network Structure: CommunitiesNetwork Structure: Communities
nn--cliques cliques
Maximal distance between any two nodes is n
IDEA: Information flow throw intermediaries.
Problems: 2 – cliqueProblems:
� Diameter > n
� Disconnected n-cliques
Solution: n-clubs (maximal subgraphs of diameter n)
6969
2 – cliquediameter = 3
path outside the 2-clique
Network Structure: CommunitiesNetwork Structure: Communities
Example: Political blogsExample: Political blogs
A)A) All citations between blogs.All citations between blogs.
B)B) Blogs with at least 5 citationsBlogs with at least 5 citationsin both directions.in both directions.
1 21
1 DigbysBlog
2 JamesWalcott
3 Pandagon
4 blog.johnkerry.com
5 OliverWillis
6 AmericaBlog
7 Crooked Timber
8 Daily Kos
9 American Prospect
10Eschaton
11Wonkette
(A)
C)C) Edges further limited to thoseEdges further limited to thoseexceeding 25 combined citations.exceeding 25 combined citations.
[Adamic & Glance, LinkKDD2005]7070
1 2
3
456
78
910
11
1213
1415
16
1718
19
20
21
22 2324
2526
27
2829 30
3132
3334 35 36
3738 39
40
11Wonkette
12TalkLeft
13Political Wire
14Talking Points Memo
15Matthew Yglesias
16Washington Monthly
17MyDD
18Juan Cole
19Left Coaster
20Bradford DeLong
21 JawaReport
22VokaPundit
23Roger LSimon
24Tim Blair
25Andrew Sullivan
26 Instapundit
27Blogsfor Bush
28 LittleGreenFootballs
29Belmont Club
30Captain’sQuarters
31Powerline
32 HughHewitt
33 INDCJournal
34RealClearPolitics
35Winds ofChange
36Allahpundit
37MichelleMalkin
38WizBang
39Dean’sWorld
40Volokh(C)
(B)
only 15% of the citations
bridge communities
Network Structure: CommunitiesNetwork Structure: Communities
Community detection algorithmsCommunity detection algorithms
Hierarchical clustering
Michelle Girvan & Mark E.J. Newman:Michelle Girvan & Mark E.J. Newman:“Community structure in social and biological networks” “Community structure in social and biological networks” PNASPNAS 9999(12):7821(12):7821––7826, 2002 7826, 2002 doi:10.1073/pnas.122653799doi:10.1073/pnas.122653799 7171
Network Structure: CommunitiesNetwork Structure: Communities
BetweennessBetweenness clusteringclustering
Hierarchical clustering using edge betweenness
compute the compute the betweennessbetweenness of all edgesof all edges
while (while (betweennessbetweenness of any edge > threshold)of any edge > threshold)
remove edge with highest remove edge with highest betweennessbetweenness
recalculate recalculate betweennessbetweenness
� Betweenness clustering is inefficient due to the needto recompute edge betweenness in every iteration.
7272
Network Structure: CommunitiesNetwork Structure: Communities
Modularity clusteringModularity clustering
�� Consider links that fall within a community (vs. links Consider links that fall within a community (vs. links between a community and the rest of the network)between a community and the rest of the network)
Modularity QModularity Q if vertices are in the same community
Modularity QModularity Q
NOTE: For a random network, Q=0NOTE: For a random network, Q=0 7373
),(22
1wv
vw
wvvw cc
m
kkA
mQ δ∑
−=
probability of an edge betweentwo vertices is proportional to their degrees
adjacency matrix
same community
Network Structure: CommunitiesNetwork Structure: Communities
Modularity clusteringModularity clustering
AlgorithmAlgorithm
start with all vertices as isolatesstart with all vertices as isolates
dodo
join clusters with the greatest increase in modularity (join clusters with the greatest increase in modularity (∆∆Q)Q)
while (while (∆∆Q > 0)Q > 0)
Aaron Aaron ClausetClauset, Mark E. J. Newman, , Mark E. J. Newman, CristopherCristopher Moore:Moore:“Finding community structure in very large networks”“Finding community structure in very large networks”PhysicalPhysical ReviewReview E 70(6):066111, 2004 E 70(6):066111, 2004 doi:10.1103/physreve.70.066111doi:10.1103/physreve.70.066111 7474
Network Structure: CommunitiesNetwork Structure: Communities
Modularity clusteringModularity clustering
An application: Visualization of large networks (An application: Visualization of large networks (GephiGephi))
7575
Network Structure: CommunitiesNetwork Structure: Communities
Limitations of current community detection methodsLimitations of current community detection methods
�� Scalability: Identification of large communities.Scalability: Identification of large communities.
�� Existence of overlapping communities in large networks.Existence of overlapping communities in large networks.
�� Unrealistic models (algorithms make oversimplified assumptionsUnrealistic models (algorithms make oversimplified assumptionsover the networks or community structures, but perform poorly over the networks or community structures, but perform poorly against real world data sets).against real world data sets).
�� Heuristics without performance guarantees (for those heuristic Heuristics without performance guarantees (for those heuristic algorithms that work well in practice, algorithms that work well in practice, therethere isis no performance no performance guarantee over the quality of their output).guarantee over the quality of their output).
7676
BibliographyBibliography
Networks: Origins & Applications (social networks, Web…)Networks: Origins & Applications (social networks, Web…)�� Stanley Milgram: Stanley Milgram: The small world problemThe small world problem. .
Psychology Today, 2:60Psychology Today, 2:60--67 (1967)67 (1967)
�� Phillip W. Anderson: Phillip W. Anderson: More is differentMore is different..Science, 177:393Science, 177:393--396 (1972)396 (1972)
�� Mark S. Granovetter: Mark S. Granovetter: The strength of weak tiesThe strength of weak ties..American Journal of Sociology, 78:1360American Journal of Sociology, 78:1360--1380 (1973)1380 (1973)
�� Stanley Wasserman & Katherine Faust: Stanley Wasserman & Katherine Faust: Social Network Analysis: Methods and Social Network Analysis: Methods and �� Stanley Wasserman & Katherine Faust: Stanley Wasserman & Katherine Faust: Social Network Analysis: Methods and Social Network Analysis: Methods and ApplicationsApplications. Cambridge University Press, 1994. Cambridge University Press, 1994
�� John P. Scott: John P. Scott: Social Network AnalysisSocial Network Analysis, 2nd edition., 2nd edition.Sage Publications Ltd., 2000. Sage Publications Ltd., 2000.
�� Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins & Janet Wiener: Raymie Stata, Andrew Tomkins & Janet Wiener: Graph structure in the WebGraph structure in the Web. Computer . Computer Networks 33:309Networks 33:309––320 (2000)320 (2000)
�� Steven H. Strogatz: Steven H. Strogatz: Exploring Complex NetworksExploring Complex Networks..Nature, 410:268Nature, 410:268--275 (2001)275 (2001)
�� AlbertAlbert--Laszlo Barabasi: Laszlo Barabasi: Linked: How Everything Is Connected to Everything Else and Linked: How Everything Is Connected to Everything Else and What It MeansWhat It Means. Plume, 2003. ISBN 0452284392. Plume, 2003. ISBN 0452284392
�� Duncan J. Watts: Duncan J. Watts: Six Degrees: The Science of a Connected AgeSix Degrees: The Science of a Connected Age. W. W. Norton & . W. W. Norton & Company, 2004. ISBN 0393325423Company, 2004. ISBN 0393325423
�� Jure Leskovec, Jon M. Kleinberg & Christos Faloutsos: Jure Leskovec, Jon M. Kleinberg & Christos Faloutsos: Graphs over time: densification Graphs over time: densification laws, shrinking diameters and possible explanationslaws, shrinking diameters and possible explanations. KDD'2005. KDD'2005
7777
BibliographyBibliography
Network ModelsNetwork Models�� Paul ErdPaul Erdöös & Alfred Rs & Alfred Réényi: nyi: On the evolution of random graphsOn the evolution of random graphs..
Mathematical Institute of the Hungarian Academy of Sciences, 5:17Mathematical Institute of the Hungarian Academy of Sciences, 5:17--61 (1960) 61 (1960) reprinted in Duncan, Barabasi & Watts (eds.): reprinted in Duncan, Barabasi & Watts (eds.): ““The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks””
�� Ray Solomonoff & Anatol Rapoport: Ray Solomonoff & Anatol Rapoport: Connectivity of random netsConnectivity of random nets..Bulletin of Mathematical Biophysics, 13:107Bulletin of Mathematical Biophysics, 13:107--117 (1951)117 (1951)reprinted in Duncan, Barabasi & Watts (eds.): reprinted in Duncan, Barabasi & Watts (eds.): ““The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks””
�� Duncan J. Watts & Steven H. Strogatz: Duncan J. Watts & Steven H. Strogatz: Collective dynamics of Collective dynamics of ‘‘smallsmall--worldworld’’ networksnetworks..�� Duncan J. Watts & Steven H. Strogatz: Duncan J. Watts & Steven H. Strogatz: Collective dynamics of Collective dynamics of ‘‘smallsmall--worldworld’’ networksnetworks..Nature, 393:440Nature, 393:440--442 (1998)442 (1998)
�� AlbertAlbert--LLáászlszlóó BarabBarabáási & Rsi & Rééka Albert: ka Albert: Emergence of scaling in random networksEmergence of scaling in random networks..Science, 286:509Science, 286:509--512 (1999)512 (1999)
�� RRééka Albert, Hawoong Jeong & Albertka Albert, Hawoong Jeong & Albert--LLáászlszlóó BarabBarabáási: si: Error and attack tolerance of Error and attack tolerance of
complex networkscomplex networks. Nature 406:378. Nature 406:378--382 (2000)382 (2000)
�� M.E.J. Newman, S.H. Strogatz & D.J. Watts: M.E.J. Newman, S.H. Strogatz & D.J. Watts: Random graphs with arbitrary degree Random graphs with arbitrary degree distributions and their applicationsdistributions and their applications. Physical Review E, 64:026118 (2001). Physical Review E, 64:026118 (2001)
�� M.E.J. Newman, S.H. Strogatz & D.J. Watts: M.E.J. Newman, S.H. Strogatz & D.J. Watts: Random graphs models of social networksRandom graphs models of social networks. . PNAS 99:2566PNAS 99:2566--2572 (2002)2572 (2002)
�� ErzsErzséébet Ravasz & Albertbet Ravasz & Albert--LLáászlszlóó BarabBarabáási: si: Hierarchical organization in complex Hierarchical organization in complex
networksnetworks. Physical Review E, 67:026112 (2003). Physical Review E, 67:026112 (2003)
�� Mark Newman: Mark Newman: The structure and function of complex networksThe structure and function of complex networks. SIAM Review . SIAM Review 45:16745:167--256 (2003)256 (2003) 7878
BibliographyBibliography
Community DetectionCommunity Detection�� Michelle Girvan & Mark E.J. Newman: Michelle Girvan & Mark E.J. Newman: Community structure in social and biological Community structure in social and biological
networksnetworks, PNAS, PNAS 99(12):782199(12):7821––7826 (2002)7826 (2002)
�� Aaron Aaron ClausetClauset, Mark E. J. Newman & , Mark E. J. Newman & CristopherCristopher Moore: Moore: Finding community structure in Finding community structure in very large networksvery large networks, , PhysicalPhysical ReviewReview E 70(6):066111 (2004)E 70(6):066111 (2004)
�� GergelyGergely PallaPalla, , ImreImre DerényiDerényi, , IllésIllés FarkasFarkas & & TamásTamás VicsekVicsek: : Uncovering the overlapping Uncovering the overlapping community structure of complex networks in nature and societycommunity structure of complex networks in nature and society, , Nature 435:814Nature 435:814--818 (2005)818 (2005)Nature 435:814Nature 435:814--818 (2005)818 (2005)
�� Jure Jure LeskovecLeskovec, Kevin J. Lang, , Kevin J. Lang, AnirbanAnirban DasguptaDasgupta & Michael W. Mahoney: & Michael W. Mahoney: Statistical Statistical Properties of Community Structure in Large Social and Information NetworksProperties of Community Structure in Large Social and Information Networks, , International World Wide Web Conference, WWW’08 (2008International World Wide Web Conference, WWW’08 (2008). Extended version:). Extended version: Community Community structure in large networks: Natural cluster sizes and the absence of large wellstructure in large networks: Natural cluster sizes and the absence of large well--defined clustersdefined clusters, arxiv:0810.1355 (2008), arxiv:0810.1355 (2008)
�� Martin Martin RosvallRosvall & Carl T. Bergstrom: & Carl T. Bergstrom: Maps of random walks on complex networks Maps of random walks on complex networks reveal community structurereveal community structure, PNAS 105(4):1118, PNAS 105(4):1118--1123 (20081123 (2008))
�� Jure Jure LeskovecLeskovec, Kevin J. , Kevin J. Lang & Lang & Michael W. Mahoney:Michael W. Mahoney: Empirical Empirical Comparison of Comparison of Algorithms Algorithms for for Network Network CommunityCommunity DetectionDetection, , WWW 2010 WWW 2010 ((2010)2010)
�� Santo Santo FortunatoFortunato: : Community detection in graphsCommunity detection in graphs. . Physics Reports, 486(3Physics Reports, 486(3--5):755):75--174, (2010).174, (2010).
�� S. S. AroraArora, R. , R. GeGe, S. , S. SachdevaSachdeva & G. & G. SchoenebeckSchoenebeck: : Finding overlapping communities in Finding overlapping communities in social networks: toward a rigorous approachsocial networks: toward a rigorous approach. Proceedings of the 13th ACM Conference . Proceedings of the 13th ACM Conference on Electronic Commerce, EC ’12, pp. 37on Electronic Commerce, EC ’12, pp. 37--54 (2012)54 (2012)
7979
BibliographyBibliography
Search on NetworksSearch on Networks�� Sergey Brin & Lawrence Page: Sergey Brin & Lawrence Page: The anatomy of a largeThe anatomy of a large--scale hypertextual Web search scale hypertextual Web search
engineengine. Computer Networks and ISDN Systems, April 1998. Computer Networks and ISDN Systems, April 1998
�� David Gibson, Jon M. Kleinberg & Prabhakar Raghavan: David Gibson, Jon M. Kleinberg & Prabhakar Raghavan: Inferring Web Communities from Inferring Web Communities from Link TopologyLink Topology. ACM Conference on Hypertext and Hypermedia, June 1998. ACM Conference on Hypertext and Hypermedia, June 1998
�� Jon M. Kleinberg: Jon M. Kleinberg: Authoritative sources in a hyperlinked environmentAuthoritative sources in a hyperlinked environment. . Journal of the ACM, September 1999Journal of the ACM, September 1999
�� Toby Walsh: Toby Walsh: Search in a Small WorldSearch in a Small World. IJCAI. IJCAI’’19991999�� Toby Walsh: Toby Walsh: Search in a Small WorldSearch in a Small World. IJCAI. IJCAI’’19991999
�� Jon M. Kleinberg. Jon M. Kleinberg. Navigation in a Small WorldNavigation in a Small World. Nature, August 2000.. Nature, August 2000.
�� Jon M. Kleinberg: Jon M. Kleinberg: The smallThe small--world phenomenon: An algorithm perspectiveworld phenomenon: An algorithm perspective. . STOCSTOC’’2000 2000
�� Scott White & Scott White & PadhraicPadhraic Smyth: Smyth: Algorithms for Estimating Relative Importance in Algorithms for Estimating Relative Importance in NetworksNetworks. KDD'2003. KDD'2003
�� HanghangHanghang Tong & Christos Tong & Christos FaloutsosFaloutsos: : CenterCenter--Piece Piece SubgraphsSubgraphs: Problem Definition and : Problem Definition and Fast SolutionsFast Solutions. KDD'2006. KDD'2006
�� AlekhAlekh AgarwalAgarwal, , SoumenSoumen ChakrabartiChakrabarti & Sunny & Sunny AggarwalAggarwal: : Learning to Rank Networked Learning to Rank Networked EntitiesEntities. KDD'2006. KDD'2006
�� Jeffrey Jeffrey DavitzDavitz, , JiyeJiye Yu, Yu, SugatoSugato BasuBasu, David , David GuteliusGutelius & Alexandra Harris: & Alexandra Harris: iLinkiLink: Search and : Search and Routing in Social NetworksRouting in Social Networks. KDD'2007. . KDD'2007.
8080
BibliographyBibliography
�� David Easley & Jon Kleinberg: David Easley & Jon Kleinberg: Networks, Crowds, and Networks, Crowds, and Markets: Reasoning About a Highly Connected WorldMarkets: Reasoning About a Highly Connected World. . Cambridge University Press, 2010. ISBN Cambridge University Press, 2010. ISBN 05211953300521195330http://www.cs.cornell.edu/home/kleinber/networkshttp://www.cs.cornell.edu/home/kleinber/networks--book/book/
�� Mark Newman: Mark Newman: Networks: An IntroductionNetworks: An Introduction. . Oxford University Press, 2010. ISBN Oxford University Press, 2010. ISBN 00--1919--920665920665--11
�� Matthew O. Jackson: Matthew O. Jackson: Social and Economic NetworksSocial and Economic Networks, , Princeton University Press, 2008. ISBN Princeton University Press, 2008. ISBN 00--691691--1344013440--55
8181
BibliographyBibliography
�� JiaweiJiawei Han & Han & MichelineMicheline KamberKamber: : Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques [2[2ndnd edition],edition],section 9.2. Addisonsection 9.2. Addison--Wesley, 2006. ISBN 1Wesley, 2006. ISBN 1--5586055860--901901--33
�� Mark Newman, AlbertMark Newman, Albert--Laszlo Laszlo BarabasiBarabasi & Duncan J. Watts & Duncan J. Watts (editors): (editors): The Structure and Dynamics of NetworksThe Structure and Dynamics of Networks. . Princeton University Press, 2006. ISBN 0Princeton University Press, 2006. ISBN 0--691691--1135711357--22
�� Ted G. Lewis: Ted G. Lewis: Network Science: Theory and ApplicationsNetwork Science: Theory and Applications..Wiley, 2009. ISBN 0Wiley, 2009. ISBN 0--470470--3318833188--7 7
8282
BibliographyBibliography
�� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Linked: How Everything Is Linked: How Everything Is Connected to Everything Else and What It MeansConnected to Everything Else and What It Means. . Plume, 2003. ISBN 0452284392Plume, 2003. ISBN 0452284392
�� Duncan J. Watts: Duncan J. Watts: Six Degrees: The Science of a Connected Six Degrees: The Science of a Connected AgeAge. W. W. Norton & Company, 2004. ISBN 0393325423. W. W. Norton & Company, 2004. ISBN 0393325423
�� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Bursts: The Hidden Pattern Behind Bursts: The Hidden Pattern Behind �� AlbertAlbert--Laszlo Barabási: Laszlo Barabási: Bursts: The Hidden Pattern Behind Bursts: The Hidden Pattern Behind Everything We DoEverything We Do. Dutton, 2010. ISBN . Dutton, 2010. ISBN 05259516010525951601
8383