robert savell, ph.d. sbp ‘08 april 1,2008 mining for social processes in intelligence data streams...

38
Robert Savell, Ph.D. SBP ‘08 April 1,2008 Mining for Social Processes in Intelligence Data Streams 04/01/08

Post on 20-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Robert Savell, Ph.D.SBP ‘08

April 1,2008

Mining for Social Processes in Intelligence Data Streams

04/01/08

Overview:

1. Introduction: Process Based SNA.

2. Process Detection and the Process Query System (PQS).

3. Experiment: The Alibaba Data Set.

4. Results.

5. Conclusion.

Traditional SNA and DSNA are reductionist:

Graph representations Directed and undirected Reachability and connectedness

Define Structure and Properties Centrality and prestige Sub groups - clustering

Analyze Role and Position Structural Equivalence Block Models Network Level Equivalence

Project Rich Data Sets onto:

Real World DSNA: Complex Systems on Networks.

Real World social networks are composed of dynamicmultimodal systems whose attendant processes and interactions both determine and are determined by the network topology

Process Based Social Network Analysis:

Problem:

1. Identify and track active process threads in transactional datasets.

2. Identify supporting control and communication processes in the social network.

3. Establish structural roles of agents.

4. Define active individual or group processes and track the state of these processes.

Note: Paradoxically added complexity can sometimes simplify the analysis.

Methodology:Process Detection and Tracking

Underlying(hidden)state spaces

Process 1 Process n

a, b

a, c

c, d

e

f, c c, d

h

f, g

a b c d a b b a d a c f h c c g d gObservationsrelated to statesequences

Observationsare interleaved

a b c c f h d cc a b g d b a g d a

Observations missed,noise added, unlabelled(This is what we see)

a b a c f k h d c b g d b k h a g d a

Note: Complexity from Entanglement of Distributed Simple Processes

Discrete Source Separation:

The Process Query System (PQS):

Sensor: Upon query produces a constrained set of recent email events from stream.

Subscriber: Queries Sensors. Preprocesses streams. Produces attribute rich encapsulated observations.

Trafen Engine: Partitions observation set into tracks (evidence of underlying social processes). Produces maximum likelihood hypothesis (collections of tracks and inferred process descriptions). [current implementation is based on the MHT algorithm].

Publisher: Formats Output of Trafen Engine.Please refer to www.pqsnet.net

Weak Process Detection:

Task: The Alibaba Dataset (Scenario 1)

A Simulated SigInt and HumInt collection. Approximately 800 reports. 8 month plot window. 409 named entities. 98 locations.

Ground Truth: A 12 Member Terrorist Cell --- connected with the Ali Baba Network plans to “bake a cake” (build a bomb) which will be targeted to blow up a water treatment facility near London. The plot takes place from April to September of 2003.

A close knit association of terrorists and sympathizers from other organizations will fill the air w/ fake chatter and decoy plots.

Alibaba Scn 1: discover the plot.

Alibaba Scenario 1 Ground Truth

Scenario 1:

820 reports.409 named entities.98 locations.

Approx. 8 months.

(Lethal characters in green w/ their connected component in cyan).

Alibaba Scenario 1 Ground Truth

Alibaba terrorist Network in green.Background connected component in cyan.

Leader: Imad Abdul.Planner: Tarik Mashal.Hacker: Ali Hakem.Financier: Salam Seeweed.Recruiter: Yakib Abbaz.Security: Ramad Raed.Demolitions: Quazi Aziz.Demolitions: Hosni Abdel.Associate: Phil Salwah.Associate: Lu’ay.

Ground Truth:

Alibaba Scenario 1: SNA (cluster analysis)

1-Phil Salwah

2-Abdul

3-Yakib Abbaz

4-Tarik Mashal

5-Qazi

6-Fawzan

7-Alvaka

8-Afia

9-Mazhar

10-Salam

11-Ahlima Amit

12-Wazir Bengazi

13-Raed

14-Saud Uvmyuzik

15-Mahira

Stationary clustering finds some key suspects:

Algorithm: Extract triads. Collect common neighbors. Threat score for node is

proportional to number of triads containing node. Top 15 suspects shown at right.

Alibaba S1: SNA Results

1-Phil Salwah

2-Abdul

3-Yakib Abbaz

4-Tarik Mashal

5-Qazi

6-Fawzan

7-Alvaka

8-Afia

9-Mazhar

10-Salam

11-Ahlima Amit

12-Wazir Bengazi

13-Raed

14-Saud Uvmyuzik

15-Mahira

Stationary clustering:

Leader: Imad Abdul.

Planner: Tarik Mashal.

Hacker: Ali Hakem.

Financier: Salam Seeweed.

Recruiter: Yakib Abbaz.

Security: Ramad Raed.

Demolitions: Quazi Aziz.

Demolitions: Hosni Abdel.

Associate: Phil Salwah.

Associate: Lu’ay.

Ground Truth:

Results vs. Ground Truth

----> Significant Deviations from Ground Truth

DSNA: A Process View

Ex. A Complete Meeting FSM

--- What we’d like: Full Transactional Data

A Process View I--- What we have:

Colocation Information:

Event(d) = {date, location, named entities x 3}.

A Process View II--- Process Fragments:

A Process View III---and singleton evidence of local state:

• 'Abdul tasked Yakib to recruit’.• 'Declining invitation to meet Phil Salwah’.• 'Discussed planning schedule’.• 'Arranged for meeting next week’.

• 'Charity fundraiser’.• 'Discussed payment for assisting in baking of cakes’.

• 'Informed that deception is in effect’.

• 'Discussed training arrangements for baking cake’.

• 'Attempted theft of chemicals’.

• 'Casing Portsmouth Facility’.

Some Example Target/Event Strings from Alibaba Scenario 1:

Stages of Process Detection (1):Track Individual Entities.

A. Remove Broadcasts. (Minimal information content).

1. Infer a home location for entities, and track individual trajectories.

Stages of Process Detection (2):Track Group Coordination Processes.

2. Aggregate l trajectories according to group synchronization FSM.

Given: A Constrained Alibaba corpus--- colocation event tuples:

The Problem: Make Group and Subgroup coordination and broadcast process assignments (partition the event space):

Define a quality measure for the partition:

Weak Process Methodology (Stage 1):

Results: Alibaba Network Discovery.

1 Leader: Imad Abdul.2 Planner: Tarik Mashal.3 Hacker: Ali Hakem.4 Financier: Salam Seeweed.5 Recruiter: Yakib Abbaz.6 Security: Ramad Raed.7 Associate: Phil Salwah.8 Demolitions: Quazi Aziz.9 Demolitions: Hosni Abdel.9 ******: Omar.10 Recruitee: Fawzan.11: Decoy: Ahmet, Ali,…12 Associate: Lu’ay.12. ******: Sinan.

Ground Truth:

Ali Baba Network Discovery 2:

Result: The technique successfully assigns significant hierarchical relationships across the net.

Ali Baba Cell Process Signature:

Initiation of plot

Coordination w/ Top 4 Suspects

Downstream Control

Peak planning period

Peak Logistical Preparation

Ali Baba Subgroup Signatures:

Ali Baba Top 4 Suspects:

Decoy Plot (Ship or Port):

Early and persistent meeting events.

Downstream Control

High Level Coordination

Sparse early structure. No meetings.

Also lacks upstream coordination.

Ali Baba Role Differentiation I:

Leader: Imad Abdul

Downstream Control

Predominance of Home Events

Ali Baba Role Differentiation II:

Planner: Tarik Mashall

Downstream Control

Operational Independence (from Abdul)

Balanced Travel and Home Events

Ali Baba Role Differentiation III:

Financier: Salaam Seeweed

Few Downstream Events (not a subgroup leader)

Close Interaction w/ Imad Abdul

Predominantly Home Events w/ Abdul

Stages of Process Detection (4):Track an Evolving Threat.

1-personnel

2-skill3-leadership 4-train 5-finance 6-material 7-transport8-house 9-stealth 10-recon 11-action

Arrange Assign Reprimand

Deception

Disengage

Finance

Housing

Exit Return

Meeting

Payment

Plan

Invitation

Recruit

Report

Skill

Train

Trip

CaseCaseCase

AttackActivate

MaterialSmuggle

Price

Target

Propose

Request

Recommend

Break and Enter

Target

Sleep

TerminateAssassinate

Assign Task

Some potential Keyword Mappings:

Requisite processes:

Note: too little training examples to do this systematically. But...

Detecting an aliased plot:Cake Plot vs. WaterPlot Sync Events

Cake Plot

Water Plot

Activity profiles, of suspectsAssociated with each plot.

Similar but not obviously related.

Aliased plot detection:Cake vs Water Threat Spectra

1-personnel 2-skill 3-leadership 4-train 5-finance 6-material 7-transport 8-house 9-stealth 10-recon 11-action

Legend:

AND THE WINNER IS…

Cake: blueWater : red

Conclusion:

Process analysis provides a generic framework for identification tracking and categorization of social organisms.

Excellent results so far from process based techniques--- even on restricted attribute sets.

Just the beginning of exploration of this methodology within the complex systems framework.

Questions?

Thanks To:

George Cybenko: Postdoctoral advisor.Gary Kuhn: IC advisor.and the Process Query Systems Group.

For further examples of PQS applications please visit: www.pqsnet.net.

Extra Slides:

A hostile network as an autopoietic system (collection of processes):

Sustaining Processes (a partial list):

Structural coherence: planning. leadership. synchronization.

Differentiation (from environment) --- deception, active defense. obsolescence and termination.

Metabolism: financial and material support. transportation and housing.

Sustainability/Reproduction: recruitment. reward. indoctrination. fission. merger.

Responsiveness (environmental interaction): plot generation. planning. execution. adaptive strategies. replanning.

DSSP Step I--- Partition the event space:

1) Entity Tracking and Stream Aggregation:1) Isolate low entropy events such as broadcasts

via process signature (broadcast FSM).

2) Track spatio and socio-temporal trajectories of individuals.

3) Identify trajectory collisions (co-occurrences).

DSSP Step Ia--- Derive network hierarchy from partition structure:

1) Identify coordinated entities (subgroups):i. Aggregate entities into structurally coherent

units.

ii. Establish hierarchical relationship of units.

iii. Identify primary communication channels between units.

DSSP Step 2--- State Assignments:

1) Identify Process Signatures:i. Assign event states in Coordination FSM using

hierarchical context defined in Step I.

ii. Distribution of event states defines the weak process signature for the individual and subgroup.

iii. Qualitatively (for now) assess individual roles via analysis of synch process event distributions.