elchanan mossel (mit) - university of california, santa cruzabel/atd2017/session5/mossel.pdfelchanan...
TRANSCRIPT
Corruption and Anomaly Detection in Networks
Elchanan Mossel (MIT)
Supported by1. NSF CCF-16652522. ONR N00014-17-1-25983. NSF DMS-1737944 (this new one ; thank you!)
Intrusion Detection in Networks
Figure: ”Cisco Network-Based Intrusion Detection–Functionalities andConfiguration” from cisco.com
Institutional Corruption Networks
Figure: From drawingbynumbers.org Based on the report ”The Effect ofDrug Trafficking and Corruption on Democratic Institutions in Mexico”
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:
I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:
I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:
I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.
I Model of:I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:
I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
The PMC Model
I Directed graph G = (V ,E ) of agents.
I V = T ∪ B = Truthful ∪ Corrupt / Bad .
I Truthful nodes report status (T/B) of neighbors accurately.
I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:
I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.
I Thousands of papers!
Previous Results
I How to identify all corrupt nodes?
I PMC67: If |B| = t, need min in − deg(v) ≥ t.
I No results on bounded degree graphs!
I Corruption Detection in Bounded Degree Graphs?
Previous Results
I How to identify all corrupt nodes?
I PMC67: If |B| = t, need min in − deg(v) ≥ t.
I No results on bounded degree graphs!
I Corruption Detection in Bounded Degree Graphs?
Previous Results
I How to identify all corrupt nodes?
I PMC67: If |B| = t, need min in − deg(v) ≥ t.
I No results on bounded degree graphs!
I Corruption Detection in Bounded Degree Graphs?
Previous Results
I How to identify all corrupt nodes?
I PMC67: If |B| = t, need min in − deg(v) ≥ t.
I No results on bounded degree graphs!
I Corruption Detection in Bounded Degree Graphs?
Our Results
Theorem (Alon-Mossel-Pemantle-15)
In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.
I |T | > |B| necessary.
I If all neighbors of v are in B cannot diagnose v .
I Running time exponential.
I Running time linear if |T | > (0.5 + 2δ)n.
Our Results
Theorem (Alon-Mossel-Pemantle-15)
In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.
I |T | > |B| necessary.
I If all neighbors of v are in B cannot diagnose v .
I Running time exponential.
I Running time linear if |T | > (0.5 + 2δ)n.
Our Results
Theorem (Alon-Mossel-Pemantle-15)
In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.
I |T | > |B| necessary.
I If all neighbors of v are in B cannot diagnose v .
I Running time exponential.
I Running time linear if |T | > (0.5 + 2δ)n.
Our Results
Theorem (Alon-Mossel-Pemantle-15)
In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.
I |T | > |B| necessary.
I If all neighbors of v are in B cannot diagnose v .
I Running time exponential.
I Running time linear if |T | > (0.5 + 2δ)n.
On Expansion
I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.
I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.
I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.
On Expansion
I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.
I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.
I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.
On Expansion
I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.
I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.
I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.
future work on the PMC Model
I Suppose the graph G is given: What is the largest set ofcorrupt guys we can identify? identify efficiently?
I Related to expansion / small set expansion.
I Suppose a constraint graph G on n vertices is given. What isthe best G ′ ⊂ G with m edges in terms of corruptiondetection?
I What if truthful nodes make (1 sided / 2 sided) mistakes?
I Applications to real data?
I Example: Applications to real news / fakes new sites?
Future Work: Non-backtracking walks for anomaly andclustering
I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.
I Talk to me!
Future Work: Non-backtracking walks for anomaly andclustering
I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.
I Talk to me!
Thank you!
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
BP and a New Type of Random Matrix
I Thm If d(1− 2ε)2 > 1 then possible to detect.
I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of
N =
(0 D − I−I A
), D = diag(dv1 , . . . , dvn),
is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).
I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.
I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).
I Note: conjectured linear algebra algorithm is deterministic.
I Conjecture established by Bordenave-Lelarge-Massoulie 15.
The Eigenvalues of N
d = 3, d(1− 2ε) = 2,√d = 1.732...
�1 1 2 3
�1.5
�1.0
�0.5
0.5
1.0
1.5
λ2
The spectrum on real networks
-4
-3
-2
-1
1
2
3
4
-4 -2 2 4 6 8 10 12
Football q=10
-30
-20
-10
10
20
30
-40 -20 20 40 60 80
Polblogs q=2Overlap: 0.8533
-4
-2
2
4
-5 5 10
Adjnoun q=2Overlap: 0.6250
-3
-2
-1
1
2
3
-4 -2 2 4 6 8
Dolphins q=2Overlap: 0.7419
-4
-3
-2
-1
1
2
3
4
-4 -2 2 4 6 8 10 12
Polbooks q=2Overlap: 0.7571
-4
-3
-2
-1
1
2
3
4
-4 -2 2 4 6 8 10 12
Karate q=2Overlap: 1
Thank you!