e6885 network science lecture 6cylin/course/netsci/netsci... · more targeted advertising data...
TRANSCRIPT
© 2013 Columbia University
E6885 Network Science Lecture 6: Network Topology Inference
E 6885 Topics in Signal Processing -- Network Science
Ching-Yung Lin, Dept. of Electrical Engineering, Columbia University
October 14th, 2013
© 2013 Columbia University2 E6885 Network Science – Lecture 6: Topology Inference
Course Structure
Class Date Lecture Topics Covered
09/09/13 1 Overview of Network Science
09/16/13 2 Network Representation and Feature Extraction
09/23/13 3 Network Paritioning, Clustering and Visualization
09/30/13 4 Network Analysis Use Case
10/07/13 5 Network Sampling, Estimation, and Modeling
10/14/13 6 Network Topology Inference
10/21/13 7 Network Information Flow
10/28/13 8 Graph Database
11/11/13 9 Final Project Proposal Presentation
11/18/13 10 Dynamic and Probabilistic Networks
11/25/13 11 Information Diffusion in Networks
12/02/13 12 Impact of Network Analysis
12/09/13 13 Large-Scale Network Processing System
12/16/13 14 Final Project Presentation
© 2013 Columbia University3 E6885 Network Science – Lecture 6: Topology Inference
Correlation Networks
Pearson product-moment correlation between two nodes.
Empirical correlations
If the pair of variables (Xi,Xj) has a bivariate Gaussian distribution, the density under H0: has a concise closed-form expression, but it is somewhat complicated and requires numerical integration or tables to produce p-values.
( , ) ijij i j
ii jj
corr X Xs
rs s
= =
ˆˆ
ˆ ˆij
ij
ii jj
sr
s s=
ˆijr 0ijr =
© 2013 Columbia University4 E6885 Network Science – Lecture 6: Topology Inference
Partial Correlation Networks
Important -- ‘Correlation does not imply causation’
For instance:
–Two vertices may have highly correlated attributes because the vertices somehow strongly ‘influence’ each other in a direct fashion.
–Alternatively, their correlation may be high primarily because they each are strongly influenced by a third vertex.
Need more considerations!!
© 2013 Columbia University5 E6885 Network Science – Lecture 6: Topology Inference
Partial Correlation Networks (cont’d)
If it is felt desirable to construct a graph G where the inferred edges are more reflective of direct influence among vertices, rather than indirect influence, the notion of partial correlation becomes relevant.
The partial correlation of attributes Xi and Xj of vertices i, j, defined with respect to the attributes Xk1, … Xkm of vertices k1, … km in , is the correlation between Xi and Xj left over after adjusting for those effects of Xk1, … Xkm common to both.
Let Sm={k1, … km }, we define the partial correlation of Xi and Xj , adjusting for , as
1,..., \{ , }mk k V i jÎ
1( ,..., )
m
TS k kmX X=X
||
| |
m
m
m m
ij Sij S
ii S jj S
sr
s s=
© 2013 Columbia University6 E6885 Network Science – Lecture 6: Topology Inference
Partial Correlation Networks (cont’d)
Here
for and .
and then , , and are the diagonal and off-diagonal elements of this 2x2 partial covariance matrix.
For example, for a given choice of m, we may dictate that an edge be present only when there is correlation between Xi and Xj regardless of which m other vertices are conditioned upon.
1 11 12
2 21 22
CovS Sæ ö é ù
=ç ÷ ê úS Sè ø ë û
W
W
1 ( , )Ti jX X=W
2 mS=W X
| |m mij S ji Ss s=| mii Ss | mjj Ss
111|2 11 12 22 21
-S = S -S S S
{ }(2) ( )| \{ , }{ , } : 0,
m
mij S m i jE i j V S Vr= Î ¹ " Î
© 2013 Columbia University7 E6885 Network Science – Lecture 6: Topology Inference
Gaussian Graphical Model Networks
A special and popular case of the use of partial correlation coefficients is when m=Nv-2 and the attributes are assumed to have a multivariate Gaussian joint distribution.
Here the partial correlation between attributes of two vertices is defined conditional upon the attribute information at all other vertices.
The graph with edge set
is called a conditional independence graph. The overall model, combing the multivariate Gaussian distribution with the graph G, is called a Gaussian graphical model.
The partial correlation coefficients may be expressed in the form:
where is the (I,j)-th entry of , the inverse of the covariance matrix of vertex attributes.
{ }(2)| \{ . }{ , } : 0ij V i jE i j V r= Î ¹
| \{ , }ij
ij V i j
ii jj
wr
w w
-=
ijw 1-W = S
© 2013 Columbia University8 E6885 Network Science – Lecture 6: Topology Inference
Inferring Interests from Social Graph
© 2013 Columbia University9 E6885 Network Science – Lecture 6: Topology Inference
On the quality of inferring interests from social neighbors (Wen and Lin, KDD 2010)
Modeling user interests enables personalized servicesMore relevant search/recommendation results
More targeted advertising
Data about users are sparseMany user profiles are static, incomplete and/or outdated
<10% employees actively participate social software [Brzozowski2009]
Inferring user interests from neighbors can be a solutionAlso bring up a concern of exposing user’s private information
How true are “You are who you know”, “Birds of a Feather Flocks
Together”?
© 2013 Columbia University10 E6885 Network Science – Lecture 6: Topology Inference
Challenges in Observing Users
Diverse types of media
Public social media (friending, blogs, etc.)
• Data are public but limited (esp. in enterprises)
Private communication media (email, instant messaging, face-to-face meetings, etc)
• Much more data
• Privacy is a major issue
© 2013 Columbia University11 E6885 Network Science – Lecture 6: Topology Inference
Example of Diverse Types of Media
Number of people participated in top 3 media in an Enterprise with 400K employees
Number of entries:• Social bookmarking: 400K• Electronic communication: 20M• File sharing: 140K
© 2013 Columbia University12 E6885 Network Science – Lecture 6: Topology Inference
Our Goals
How well a user’s interests can be inferred from his/her social neighbors?
Can the diverse types of media be combined to improve inferring user interests from social neighbors?
Can the quality of the inference be predicted based on features of social neighbors?
Only sufficiently accurate inference may help personalized services
© 2013 Columbia University13 E6885 Network Science – Lecture 6: Topology Inference
Our Approach
Infer user interests from social neighbors
Model user interests based on multiple types of information they accessed
Construct employee social network from communication data
Infer using social influence model
Study the relationship between inference quality and network characteristics
Identify effective factors to ensure high quality results for applications
© 2013 Columbia University14 E6885 Network Science – Lecture 6: Topology Inference
Dataset
25315 users’ contributed content
– 20M email/chats
– 400K social bookmarks
– 20K shared public files
– Profile information
• Job role, division, news categories of interests, etc
Infer social network based on email/chats
X’: number of emails
© 2013 Columbia University15 E6885 Network Science – Lecture 6: Topology Inference
User Interests Model – Implicit Interests
Model users’ interests implicitly indicated by their contributed content
Extract latent topics from the multiple types of content using LDA
Select top-N distinct topics as the implicit interests model of a user
The degree the user is interested
The similarity of topics
© 2013 Columbia University16 E6885 Network Science – Lecture 6: Topology Inference
User Interests Model – Explicit Interests
29% users manually specify interests in their profile
A list of selected terms
• From a static 1120-term taxonomy related to work
Compare implicit and explicit interests
Explicit interests models are more limited
• Implicit interests cover 60.4% explicit interests
• Explicit interests cover 2.2% implicit interests
© 2013 Columbia University17 E6885 Network Science – Lecture 6: Topology Inference
Infer Interests Based on Social Influence
Social influence model
Network autocorrelation model [Leenders02]
• Social influence represented as a weighted combination of neighbors’ attributes
The weight is an exponential function of the social distance
© 2013 Columbia University18 E6885 Network Science – Lecture 6: Topology Inference
Inference Quality
Condition Max Mean St. Deviation
Using social bookmark data only 59.4% 19.2% 10.7%
Using file sharing data only 44.9% 12.7% 7.2%
Using email/IM data only 62.1% 29.6% 14.1%
Using all three data 100% 45.1% 21.7%
Implicit interests: how close the inferred top-20 topics to the ground truth
– Significant advantage in combining multiple sources
– Large variance can affect practical application, thus need predict when to infer interests
– Much better recall than precision
Explicit interests: precision and recall of inferred terms
Measure Mean St. Deviation
Precision 30.1% 26.9%
Recall 61.5% 27.6%
© 2013 Columbia University19 E6885 Network Science – Lecture 6: Topology Inference
Can Inference Quality be Predicted?
Hypothesis: inference quality can be predicted from social network properties
–User activeness: the amount of contribution
–In-degree
–Out-degree
–Betweenness
–User management role
Use Support Vector Regression to perform prediction
Evaluate prediction
–Precision/recall of the prediction (10-fold cross validation)
–Use prediction to improve inference• Only infer when we predict it’s high quality
© 2013 Columbia University20 E6885 Network Science – Lecture 6: Topology Inference
Quality Prediction ResultsPrecision/recall of prediction
Implicit Interests Explicit Interests
© 2013 Columbia University21 E6885 Network Science – Lecture 6: Topology Inference
Quality Prediction Results
Improve inference
Measure Improved to Improvement (%)
Precision 60.5% 101%
Recall 85.7% 39.3%
Implicit Interests
Explicit Interests
© 2013 Columbia University22 E6885 Network Science – Lecture 6: Topology Inference
Feature Comparison “Leave-one-feature-out" comparisons of prediction results
Most social influences are from 1&2-degree neighbors
You neighbors decide how well you can be inferred
© 2013 Columbia University23 E6885 Network Science – Lecture 6: Topology Inference
Feature Comparison (cont’d) “Leave-one-feature-out" comparisons of prediction results
You neighbors’ network positions may be even more important than how active they are
– Formal organizational properties
• Manager neighbors are more important in inferencei.e., more social influence (about 5% more)
© 2013 Columbia University24 E6885 Network Science – Lecture 6: Topology Inference
Conclusion of the Wen-Lin 2010 KDD paper
There’s large variance in the quality of inferring user interests from social neighbors
The “recall” of the inference is much better than “precision”
The inference quality may be predicted from social network properties
© 2013 Columbia University25 E6885 Network Science – Lecture 6: Topology Inference
Tomographical Inference
© 2013 Columbia University26 E6885 Network Science – Lecture 6: Topology Inference
Tomographic Network Topology Inference
Inference of ‘interior’ components of a network – both vertices and edges – from data obtained at some subset of ‘exterior’ vertices.
E.g., in computer networks, desktop and laptop computers are typical instances of ‘exterior’ vertices, while Internet routers to which we do not have access are effecitvely ‘interior’ vertices.
Network tomography: describe prblems in the context of computer network monitoring, in which aspects of the ‘internal’ workings of the network, suchas intensity of traffic flowing between vertices, are inferred from ‘external’ measurements.
This is an’ill-posed inverse proble’ in mathematics. Mapping being many-to-one.
For the tomographic inference of network topologies, a key structural simplification has been the restriction to inference of networks in the form of trees.
© 2013 Columbia University27 E6885 Network Science – Lecture 6: Topology Inference
Tomographic Inference of Tree Topologies A tree structure.
© 2013 Columbia University28 E6885 Network Science – Lecture 6: Topology Inference
Tomographic network inference problem
Suppose a set of Nl vertices, we have n iid observations of some random variables {X1, … XNl}. We aim to find that three of all binary trees with Nl labeled leaves that best explains the data.
Example: Multicast probes
© 2013 Columbia University29 E6885 Network Science – Lecture 6: Topology Inference
Two most popular classes of methods
Hierarchical clustering and related ides
Likelihood-based methods
© 2013 Columbia University30 E6885 Network Science – Lecture 6: Topology Inference
Hierarchical Clustering- based methods
We treat the Nl leaves as the ‘objects’ to be clustered and the tree corresponding to the resulting clustering as our inferred tree.
The tree corresponding to the entire set of parititions is our focus.
Example: Rantnasamy and McCanne (1999): the observed rate of shared losses of packets should be fairly indicative of how close two leaf vertices (i.e., destination addresses) are on a multicast tree T.
– Two different types of shared loss between a pair of leaf vertices – ‘true’ and ‘false’ shared loss.
– The true shared losses are due to loss of packets on the path common to the vertices i and j.
– The false shared losses would refer to cases where packets were lost separately on the two paths from i1 to the vertices 1 and 3.
© 2013 Columbia University31 E6885 Network Science – Lecture 6: Topology Inference
Likelihood-based Methods
If we are willing to specify probability models, then we have the potential for likelihood-based methods of inference.
In general, maximum likelihood inference of tree topologies may also be pursued through the use of MCMC. MCMC is critical to the use of Bayesian methods to tomographic inference of trees.
© 2013 Columbia University32 E6885 Network Science – Lecture 6: Topology Inference
Bayesian Inference for Content Classification
© 2013 Columbia University33 E6885 Network Science – Lecture 6: Topology Inference
river
TOPIC 2
river
riverstreambank
bank
stream
loan
TOPIC 1
money
loan
bank money
ban
k bank
loanDOCUMENT 2: loan1 river2 stream2 loan1 bank2 river2 bank2
bank1 stream2 river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2 bank2 stream2 bank2 money1 river2
DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1
stream2 bank1 money1 loan1 river2 stream2 bank1 money10.3
0.8
0.2
Application – Content Clustering
0.7
Goal – categorize the documents into topics
Each document is a probability distribution over topics Each topic is a probability distribution over words
Mixture components
Mixture weights
© 2013 Columbia University34 E6885 Network Science – Lecture 6: Topology Inference
Content Analysis Prior Art I -- Latent Semantic Analysis
Latent Semantic Analysis (LSA) [Landauer, Dumais 1997]
– Descriptions:
• Capture the semantic concepts of documents
by mapping words into the latent semantic
space which captures the possible synonym
and polysemy of words
• Training based on different level of documents.
Experiments show the synergy of the # of
training documents and the psychological
studies of students at 4th, 10th, and college
level. Used as an alternative to TOEFL test.
– Based on truncated SVD of document-term matrix:
optimal least-square projection to reduce
dimensionality
– Capture the concepts instead of words
• Synonym
• Polysemy
X T0
S0
D0
N x M N x K K x K K x M
· ·=te
rms
documents
00
~
LSA
© 2013 Columbia University35 E6885 Network Science – Lecture 6: Topology Inference
Traditional Content Clustering
fw2
fwj : the frequency of the word wj
in a document
fw1
fw3 Clustering:Partition the feature space into segments based on training documents. Each segment represents a topic / category. ( Topic Detection)
Hard clustering: e.g., K-mean clustering
1 2{ , ,..., }
Nw w wd f f f z= ®
( | )P Z wW = f
Soft clustering: e.g., Fuzzy C-mean
clustering
w1
: observationsz1TopicsTopics
WordsWords
z2 z3 z4 z5
w5w2 w3 w4 w6
Another representation of clustering
d1DocumentsDocuments d5d2 d3 d4 d6
© 2013 Columbia University36 E6885 Network Science – Lecture 6: Topology Inference
Traditional Content Clustering
fw2
fwj : the frequency of the word wj
in a document
fw1
fw3 Clustering:Partition the feature space into segments based on training documents. Each segment represents a topic / category. ( Topic Detection)
Hard clustering: e.g., K-mean clustering
1 2{ , ,..., }
Nw w wd f f f z= ®
w1
: observationsz1
TopicsTopics
WordsWords
z2 z3 z4 z5
w5w2 w3 w4 w6Another representation of clustering (w/o showing the deterministic part)
( | )P Z wW = f
Soft clustering: e.g., Fuzzy C-mean
clustering
© 2013 Columbia University37 E6885 Network Science – Lecture 6: Topology Inference
Content Clustering based on Bayesian Network
( | )P W Z
Bayesian Network:• Causality Network – models the causal relationship of attributes / nodes
• Allows hidden / latent nodes
Hard clustering:
w1 : observations
z1TopicsTopics
WordsWords
z2 z3 z4 z5
w5w2 w3 w4 w6
soft clustering
d1DocumentsDocuments d2 d3
( | ) ( )( | )
( )
P Z W P WP W Z
P Z=
( | )P Z D
( ) arg max ( | )z
h D d P Z= = wW = f
( )h D
hard clustering
s. c.
<= MLE
<= Bayes Theorem
© 2013 Columbia University38 E6885 Network Science – Lecture 6: Topology Inference
Content Clustering based on Bayesian Network – Hard Clustering
w1
: observations
z1TopicsTopics
WordsWords
z2 z3 z4 z5
w5w2 w3 w4 w6
( | ) ( ) ( | ) ( )( | )
( ) ( | )
P Z W P W P Z W P WP W Z
P Z P Z W dW= =
ò
z
w
N: the number of words(The number of topics (M) are
pre-determined)
Major Solution 1 -- Dirichlet Process:• Models P( W | Z) as mixtures of Dirichlet probabilities
• Before training, the prior of P(W|Z) can be a easy
Dirichlet (uniform distribution). After training, P(W|Z)
will still be Dirichlet. ( The reason of using Dirichlet)
Major Solution 2 -- Gibbs Sampling:• A Markov chain Monte Carlo (MCMC) method for
integration of large samples calculate P(Z)
z
w
N
bM
f
Topic-WordTopic-Worddistributionsdistributions
Latent Dirichlet Allocation (LDA) (Blei 2003)
shown as
© 2013 Columbia University39 E6885 Network Science – Lecture 6: Topology Inference
river
TOPIC 2
river
riverstreambank
bank
stream
loan
TOPIC 1
money
loan
bank money
ban
k bank
loanDOCUMENT 2: loan1 river2 stream2 loan1 bank2 river2 bank2
bank1 stream2 river2 loan1 bank2 stream2 bank2 money1 loan1 river2 stream2 bank2 stream2 bank2 money1 river2
DOCUMENT 1: money1 bank1 bank1 loan1 river2 stream2 bank1 money1 river2 bank1 money1 bank1 loan1 money1
stream2 bank1 money1 loan1 river2 stream2 bank1 money10.3
0.8
0.2
Latent Dirichlet Allocation (LDA) [Blei et al. 2003]
0.7
Goal – categorize the documents into topics
Each document is a probability distribution over topics Each topic is a probability distribution over words
Mixture components
Mixture weights ( ) ( ) ( )
1
|T
i i i ij
P w P w z j P z j=
= = =åThe probability of ith word in a given document
Mixture components
Mixture weights
© 2013 Columbia University40 E6885 Network Science – Lecture 6: Topology Inference
INPUT: document-word counts
• D documents, W words
OUTPUT: Mixture Components Mixture Weights
LDA (cont.)
Parameters can be estimated by Gibbs Sampling
Mixture Components:
Mixture weights:
θθ
wwWD
ββ
αα
zz
T
T: number of topics
f: Observations
Bayesian approach: use priors Mixture weights ~ Dirichlet() Mixture components ~ Dirichlet()
( ) ( ) ( )1
|T
i i i ij
P w P w z j P z j=
= = =å
( )( )djq( )( )j
wf
( )jwf( )d
jq
© 2013 Columbia University41 E6885 Network Science – Lecture 6: Topology Inference
Comparison of Dirichlet Distribution with Gaussian Mixture Models
Dirichlet Distribution:1 21 1 1
1 2 1 1 2 1 2
1
( )( , ,..., ; , ,..., ) ...
( )
ra a ar r rr
kk
Nf f f a a a f f f
ar - - -
-
=
G=
GÕ
0 1kf£ £1
1r
kk
f=
=å
Multivariate Gaussian: 2 2 2( ) ( ) ( )1 1 2 2 1 12 2 21 2 1
1 2 1 1 1 2 2 1 1 11
1
1( , ,..., ; , , , ,..., , ) ...
(2 )
f f fr r
rr r r r
rk
k
f f f e e em m m
s s sr m s m s m sp s
- - -- -
-- - -
- - - --
=
=
Õ
© 2013 Columbia University42 E6885 Network Science – Lecture 6: Topology Inference
Beyond Gaussian: Examples of Dirichlet Distribution
© 2013 Columbia University43 E6885 Network Science – Lecture 6: Topology Inference
Importance of Dirichlet Distribution
In 1982, Sandy Zabell proved that, if we make certain assumptions about an individual’s beliefs, then that individual must use the Dirichlet density function to quantify any prior beliefs about a relative frequency.
© 2013 Columbia University44 E6885 Network Science – Lecture 6: Topology Inference
Use Dirichlet Distribution to model prior and posterior beliefs (I)
Prior beliefs:
E.g.: *fair* coin?
– Flipping a coin, what’s the probability of getting ‘head’.
( ) ( ; , )f beta f a br =
beta(1,1): Never tossed that coin.No prior knowledge
© 2013 Columbia University45 E6885 Network Science – Lecture 6: Topology Inference
Use Dirichlet Distribution to model prior and posterior beliefs (II)
( ) ( ; , )f beta f a br =
beta(1,1): No prior knowledge beta(3,3): posterior knowledge after 2 head trials (s) and 2 tail trials (t)
-- this coin may be fair
d: 2, 2
Prior beliefs:
Posterior beliefs:
( | ) ( ; , )f d beta f a s b tr = + +
© 2013 Columbia University46 E6885 Network Science – Lecture 6: Topology Inference
Use Dirichlet Distribution to model prior and posterior beliefs (III)
Prior beliefs:
Posterior beliefs:( ) ( ; , )f beta f a br =
( | ) ( ; , )f d beta f a s b tr = + +
d: 8, 2
beta(3,3): prior knowledge -- this coin may be fair
beta(11,5): posterior belief-- the coin may not be fair after
tossing 8 heads and 2 tails
© 2013 Columbia University47 E6885 Network Science – Lecture 6: Topology Inference
Use Dirichlet Distribution to model prior and posterior beliefs (IV)
Another Example:
– Our belief that an event, which in general has 5% occurrence possibility, may be true in our case.
d: 3,0
beta(1/360,19/360): prior knowledge that an 5% chance-event may be true
beta(1/360+3,19/360): posterior belief that an 5% chance-event may be true
© 2013 Columbia University48 E6885 Network Science – Lecture 6: Topology Inference
Gibbs Sampling (I)
Given a PDF, ,usually it’s difficult to get an integration, .
We can use Gibbs Sampling to simulate the *samples* that follows the PDF. Thus, we can do an integration by summing from those samples.
( ) ( )f X X dXr×ò( )Xr
( )sf X
,1 ,2 ,{ , ,..., }s s s s TX X X X=
© 2013 Columbia University49 E6885 Network Science – Lecture 6: Topology Inference
Gibbs Sampling (II)
Gibbs Sampling Find a Markov Chain
, s.t., the outcome of samples follow the PDF
1( | )s i iX Xr -
( )Xr
x1 x2
1 2( , )X x x=
1, 1, 2, 1( | )s i ix xr -
2, 2, 1,( | )s i ix xr
Example: 2-dimensional X vector:
1( | )s i iX Xr - is determined by
x1
x2
X1
X2
X3
X4
Derived from the PDF
1( | )s i iX Xr -
© 2013 Columbia University50 E6885 Network Science – Lecture 6: Topology Inference
Some Insight on BN-based Content Clustering
Content Clustering:• Because documents and words are
dependent,
only close documents in the feature
space can be clustered together as
one topic.
fw2
fwj : the frequency of the word wj
in a document
fw1
fw3
Incorporating human factors can possibly *link* multiple clusters together.
Bayesian Network:• Models the *practical* causal
relationships..
© 2013 Columbia University51 E6885 Network Science – Lecture 6: Topology Inference
Why We Need Simultaneous Multimodality Clustering?
Multiple-Step Clustering:• e.g., Naïve way to combine content filtering and collaborative filtering
Independently cluster first. Combine later.
fw2
fw1
fw3
OK
fw2
fw1
fw3
Not-OK
Simultaneous Multimodality Clustering is important.