identifying and characterizing nodes important to ...identifying and characterizing nodes important...
TRANSCRIPT
Identifying and Characterizing Nodes
Important to Community Using the
Spectrum of the graph
=> Published in volume 6 of the journal PLoS ONE’s
November 2011 edition
=> Authors: Yang Wang, Zengru Di, Ying Fan
all from the department of Systems Science, Beijing
Normal University, China
Citation
Overview
• Networks represent the interaction structure
among components in a wide range of real complex
systems
• Exploring network communities
• reveals the network
• provides new aspect of dynamic processes
• uncovers the relationship among the nodes
• This paper devices a new approach to identify the
important nodes without knowing the exact partition
of the network
Construction
• Based on the implication that the Spectrum of the
adjacency matrix gives indication of community
structure in network
• Distinguishes the critical nodes as
• community core - eigenvalues
• bridge – graph Laplacian
• Experiments on synthetic and real networks
Definitions
• Eigen vector: A non-zero column vector v is a
eigenvector of a matrix A iff there exists a number λ
such that Av= λv.
• Eigen value: The number λ is called the eigen
value corresponding to that eigenvector v.
Identifying important nodes
• Proposed Method: A Centrality Metric based on
the spectrum of Adjacency Matrix
• Definitions: Binary network G=(V,E)
• |V| = m, |E| = n
• Eigenvectors are orthogonal and normalized
• Objective Function :
• Maximize eigenvalues (λ) using perturbation
theory
• where Pk is the relative change
in the c largest eigenvalues as node k is
removed
Centrality Metric
• where Vik is the kth element of vi
and Pk lies in the interval [0,1]. If a node k is
important to the community structure, Pk will be
large
• In a network with n nodes and c communities,
• To scale the index to 1, Ik = Pk / c where
• If the index I is large than 1/n, it is an important
node
Distinguish two kinds of important nodes
• RatioCut Technique:
|Ci| is the size of the community Ci. Ratio cut
problem reduces to Mincut problem when the sizes
of the communities are almost the same.
• Case 1: c = 2
Index vector s with N elements
Continued
• RatioCut function becomes::
L is the graph Laplacian defined as Lij=-Aij for i≠j
and Lii=ki where ki is the degree of node i.
Also there are two constraints on s
Continued
• The partition problem can be devised as the
following minimization problem
• Solution to this problem is found to be the
eigenvector corresponding to the second-smallest
eigenvalue of L, denoted by u2
• Community core nodes: |ui2| is relatively large
• bridge nodes: |ui2| is near zero
Continued
• Case 2: c > 2
A new n x c-index matrix S is defined as
si,j = 1/√|Cj| if vertex i є Cj, else 0
RatioCut= Tr(STLS). L is a symmetric matrix which
can be written as L=UDUT where U is the
eigenvector of L and D is the diagonal matrix of
eigenvalues Dii=βi
RatioCut can be written as
Continued
• Defining vertex vector of i as ri and let [ri]j=Uij
the equates can be written as
given that the network has almost equal sized
communities. [Gk: set of vertices in community k]
Minimizing the RatioCut equates to the
maximization problem
Where p is a parameter. For clear community
structure, p=c can be chosen.
Continued
• If the community structure is quire clear, vertex
vector magnitude |ri| in the first p terms give the
identity of bridge nodes, denoted by b
if the index b of a given vertex is near zero, it
indicates that the presence of that node results in a
large RatioCut and hence it is a bridge node.
Continued
• In order to scale the index to 1, a new term is
defined as wk where wk= bk / c
• Considering an ER random network with n nodes
as a null model, index of each node would be 1/n
• If w-score of any node is smaller than 1/n, this
vertex has nearly equal membership in more than
one community and hence it is a bridge node.
Pros of this approach
• Less computational cost O(mn)
Experimental Results • Synthetic Network
The centrality metric I predicts
node 1, 8 and 15 as important
nodes. W-score identifies 15 as
the bridge node
ΔH index also gives correct
prediction, however requires
significant computational cost
M can identify cores only
Experimental Results (contd.)
Real World Network
Zachary’s karate club (social network) with c=2
The centrality metric I identifies the community core: node 1 and node 34
(administrator and Instructor).
The w-score identifies node 3 as the overlapping node i.e. the bridge between
these two communities
Zachary’s karate club visualization
The diameter of each vertex is proportional to I
Large diameter indicates important vertex
Color of each vertex is related to the index w-score
Red vertices behave like “overlapping” nodes or bridges
Yellow vertices lie inside their own communities
Word Association Network
Four communities: Intelligence, Astronomy, Light, Colors
word Bright is related to all of them. Likewise Sun
Community critical nodes: Bright, Sun, Moon, Smart
Community cores: Moon and Smart
Bridges: Bright and Sun
Scientist Collaboration Network
Network represents scientists whose research centers on the properties of
networks of one kind or another
Edges placed between scientists who have published one paper together
Centrality metric I identifies the group leader: Newman, Boccaletti, Barabasi
w-score is not large as they have collaboration between scientists outside
their own communities
C. Elegans neural network
Network is divided into 3 communities (sensory, interneuron, motor neuron)
Each node represents a neuron and each edge represents a synaptic
connection between neurons
high centrality metric I: important interneurons (AVA, AVB, … )
w-score is very small because most of the important nodes act as bridge
since the connection between communities is more necessary
Applications in weighted networks
Artificial Network
Adjacency matrix for undirected network is real and symmetric
Works well in small artificial network
10 nodes with two communities
Higher weight means closer relationship between vertices
4 and 9 are the core of the communities
11 is the bridge between communities
Applications in weighted networks (Contd.)
Real Network: SFI (Santa Fe Collaboration)
SFI collaboration network
Vertices 2, 12 and 24 are group leaders (community cores)
Vertices 1, 9 and 11 are bridges
The result is different from the corresponding unweighted network
edge weight might affect the result s
Limitations
In case of many heterogeneous cluster size, the community identification fails
This limitation is a result of the adjacency matrix property
Nsmall 2 < Nlarge , small communities cannot be detected
δ = Nlarge / Nsmall
I cannot identify the important nodes in the small communities when the
communities are in very different size
Conclusion/Observation
Proposed method works well in many cases without knowing the exact
community structure
The number of communities must be known, although
This paper does not say anything about the effect of removing/adding any
node
The underlying community structure change is not taken into consideration
The directed case is not considered which is subject to future research
The identification of such key nodes is important and could potentially be
used
to identify the organizer of the community in social networks,
to develop an immunization strategy in an epidemic process,
to identify key nodes in biological networks