elchanan mossel (mit) - university of california, santa cruzabel/atd2017/session5/mossel.pdfelchanan...

Corruption and Anomaly Detection in Networks

Elchanan Mossel (MIT)

Supported by1. NSF CCF-16652522. ONR N00014-17-1-25983. NSF DMS-1737944 (this new one ; thank you!)

Intrusion Detection in Networks

Figure: ”Cisco Network-Based Intrusion Detection–Functionalities andConfiguration” from cisco.com

Institutional Corruption Networks

Figure: From drawingbynumbers.org Based on the report ”The Effect ofDrug Trafficking and Corruption on Democratic Institutions in Mexico”

The PMC Model

I Directed graph G = (V ,E ) of agents.

I V = T ∪ B = Truthful ∪ Corrupt / Bad .

I Truthful nodes report status (T/B) of neighbors accurately.

I Corrupt nodes report status (T/B) of neighbors adversarially.I Model of:

I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

I Thousands of papers!

The PMC Model

I Corrupt nodes report status (T/B) of neighbors adversarially.

I Model of:I Diagnosable digital systems (Perparata, Metze, Chien 1967 ...).I Byzantine Computing (e.g. Lamport et al. 1982)I Intrusion Detection (e.g Mukherjee et al. 1994).I Corruption in Social Networks.

The PMC Model

Previous Results

I How to identify all corrupt nodes?

I PMC67: If |B| = t, need min in − deg(v) ≥ t.

I No results on bounded degree graphs!

I Corruption Detection in Bounded Degree Graphs?

Previous Results

Our Results

Theorem (Alon-Mossel-Pemantle-15)

In a δ-good expander with |T | > |B| can find T ′ ⊂ T and B ′ ⊂ Bwith |T ′ ∪ B ′| > (1− δ)n.

I |T | > |B| necessary.

I If all neighbors of v are in B cannot diagnose v .

I Running time exponential.

I Running time linear if |T | > (0.5 + 2δ)n.

Our Results

On Expansion

I Def: G is δ -good ifI |N+(U) \ U| > |U|, ∀|U| ≤ 4δn.I A× B ∩ E 6= ∅, ∀|A| ≥ δn, |B| ≥ n/4 with A ∩ B = ∅.

I Expansion is needed:If ∃|U| ≤ εn such that all connected components of V \ U areof size ≤ εn, then impossible to find even one element of Teven if T = (1− 2ε)n.

I Weak Expansion suffices if T large enough:Suppose for every disjoint sets |A1| ≥ δn and |A2| ≥ 1− 3δn,there is an edge between A1 and A2. Then if |T | ≥ 1− δn,can find T ′ ⊂ T with |T ′| ≥ (1− 2δ)n.

On Expansion

future work on the PMC Model

I Suppose the graph G is given: What is the largest set ofcorrupt guys we can identify? identify efficiently?

I Related to expansion / small set expansion.

I Suppose a constraint graph G on n vertices is given. What isthe best G ′ ⊂ G with m edges in terms of corruptiondetection?

I What if truthful nodes make (1 sided / 2 sided) mistakes?

I Applications to real data?

I Example: Applications to real news / fakes new sites?

Future Work: Non-backtracking walks for anomaly andclustering

I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.

I Talk to me!

Future Work: Non-backtracking walks for anomaly andclustering

I From past experience ... - for sparse matrices,non-backtracking spectrum is better than simple spectrum.

I Talk to me!

Thank you!

BP and a New Type of Random Matrix

I Thm If d(1− 2ε)2 > 1 then possible to detect.

I Conj:(Krzakala,Moore,M,Neeman,Sly, Zdebrovoa,Zhang 13): If Ais the adjacency matrix, then w.h.p the second eigenvector of

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

is correlated with the partition and the second eigenvalue isd(1− 2ε) + on(1).

I No orthogonal structure! N is not symmetric nor normal.Singular vector of N are useless.

I KMMNSZZ derived N by Linearizing Belief Propagation andapplying a number-theory identity by Hashimoto (89).

I Note: conjectured linear algebra algorithm is deterministic.

I Conjecture established by Bordenave-Lelarge-Massoulie 15.

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

(0 D − I−I A

), D = diag(dv1 , . . . , dvn),

The Eigenvalues of N

d = 3, d(1− 2ε) = 2,√d = 1.732...

�1 1 2 3

�1.5

�1.0

�0.5

The spectrum on real networks

-4 -2 2 4 6 8 10 12

Football q=10

-40 -20 20 40 60 80

Polblogs q=2Overlap: 0.8533

-5 5 10

Adjnoun q=2Overlap: 0.6250

-4 -2 2 4 6 8

Dolphins q=2Overlap: 0.7419

-4 -2 2 4 6 8 10 12

Polbooks q=2Overlap: 0.7571

-4 -2 2 4 6 8 10 12

Karate q=2Overlap: 1

Thank you!

elchanan mossel (mit) - university of california, santa cruzabel/atd2017/session5/mossel.pdfelchanan...

Documents