crimelink explorer: lt. jennifer schroeder tucson police department jie xu university of arizona...

37
CrimeLink Explorer: Lt. Jennifer Schroeder Tucson Police Department Jie Xu University of Arizona June 2, 2003 Using Domain Knowledge to Facilitate Automated Crime Link Analysis

Upload: theresa-bates

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

CrimeLink Explorer:

Lt. Jennifer Schroeder

Tucson Police Department

Jie XuUniversity of Arizona

June 2, 2003

Using Domain Knowledge to Facilitate Automated Crime Link Analysis

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

Link Analysis in Law Enforcement• Extremely valuable, but extremely time

consuming for investigators (sometimes months are spent constructing a large network)

• Can uncover valuable investigative leads

• Usually only conducted in high profile cases that justify the resource expenditure

• Can be very complex (high branching factors, especially among repeat offenders)

Data Sources for Link Analysis• Police Incident reports (Police RMS)

– Often largest source of data for analysis– Link based on co-occurrence in an incident– Analysts must examine each report to determine the strength of the

link– Must be searched across multiple jurisdictions

• Field interviews• Phone records • Financial information• Intelligence information (sometimes stored in databases)• Interviews with witnesses, suspects, confidential informants

An example• Eddie “Smith” is in 18 incident reports• These incidents contain a total of 152 entities that are

potential branches:– 31 People– 11 Vehicles– 57 Locations– 2 Organizations– 1 Property item– 1 Weapon

• This complexity is at a depth of one! Imagine the task for crime analysts to search each of these possible branches to create a large, multi-level link chart

Obstacles to LA Automation

• Lack of Integration/Data Consolidation

• High branching factors cause information overload

• Investigators must manually analyze every link to determine relevance

• No domain specific way to automate analysis for relevance of links

Proposed Approach

• Use concept space to extract associations from incident records

• Focus on domain specific heuristic to provide accurate link assessment

• Use shortest-path algorithm to find best path between individuals of interest

• Incorporate the approach into a prototype system with visualization of resulting paths

• Conduct a user study to evaluate the system

Literature Review• Link Analysis

– Anacapa Charting– Free Text Association Searches (NLP)– Watson– COPLINK Detect

• Domain Knowledge Incorporation– Expert Systems– Bayesian Networks

• Shortest-Path Algorithms

Domain Knowledge Incorporation

• Expert Systems

• Bayesian Networks

• Law Enforcement Specific Research

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

System Design

Concept Space IncidentReports

Heuristics(crime types,

shared address,shared phone)

Association Path Search(shortest-path algorithm)

Graphical User Interface

Heuristic WeightsCo-occurrence Weights

Experimental Database• Dataset must contain real data so that crime investigators

will be engaged and interested in the results• The dataset must contain sufficient amounts of data for

association paths between a reasonable number of subjects to exist

• Approximately 20 months of incident reports were extracted

• Age, gender, race, addresses, and phone numbers of persons involved in the incidents was also extracted

• Simple data consolidation on name for prototype

Heuristic Design Goals

• Provide weighting scheme for links that more accurately reflects judgment of human analysts

• Weights should be understandable to law enforcement users

• Improved weights should be used for shortest-path calculations

Heuristic Design• Incorporated most important information

considered by human analysts:– Relationship between crime type and person roles– Shared addresses or telephone numbers– Repeated co-occurrence in incident reports

• Employed a 1-100 scale, familiar to users (used in RMS queries)

• Logarithmic transformation of link weight used to compute shortest path during searches

Crime Type and Person Role• We constructed a matrix and assigned scores to role

combinations in each of the crime types• To construct the crime type/role matrix we interviewed

sergeants from Homicide, Aggravated Assault, Robbery, Fraud, Auto Theft, Sexual Assault, Child Sexual Abuse, Domestic Violence

• Crime type/role combinations were assigned weights based on estimation by experts of likelihood of association for that combination

• Person roles used in the TPD dataset include: Victim, Witness, Suspect, Arrestee, and Other.

Co-occurrence• Goal was to capture judgments of analysts when

looking at repeated co-occurrences of entities

• Analyzed a random sample of 40 incident reports counting the number of times each pair of persons co-occurred

• Read supporting narrative reports for each incident to determine whether an association was important

Co-occurrence probability distribution

Co-occurrence count

Association probability (%)

1 1

2 45

3 98

4 100

Heuristic Function• Investigators may rely more on crime type/role and shared

associations, but a high co-occurrence weight can outweigh a low association weight

• Value calculated based on summed crime-type/person-role relationship, shared address, shared phone values

• Second value based on association probability of co-occurrence counts

• Maximum (0.85 (crime-type/person-role score) + 0.05 (shared phone score) + 0.10 (shared address score)) (100 (association probability based on co-occurrence counts))

Association Path Search

• Used Dijkstra’s shortest-path algorithm (1959) to address the search complexity problem

• Conventional shortest-path algorithms could not be used directly to solve the problem of identifying the strongest association between a pair of persons (Xu & Chen 2000)

• A logarithmic transformation was made on association weights

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

User Study Questions

• Can the automated link analysis approaches proposed (concept space approach, heuristic approach, and the shortest-path algorithm) help address the information overload and search complexity problem?

• Can incorporated domain knowledge help identify associations between crime entities more accurately than the concept space approach?

• Will domain experts perceive the automated link analysis approaches to be useful for crime investigation?

Hypotheses• H1: Subjects will achieve higher efficiency

conducting an association path search with the prototype system than with the “single-level” link analysis tool

• H2: Association paths found using heuristics will be more accurate than paths found using simple co-occurrence weight

• H3: Subjects will perceive the heuristic approach to be more useful than the concept space approach for investigative work.

Efficiency and Accuracy H1 and H2

• Efficiency = the time a subject spends completing a given task

• Accuracy = the average agreement scale a subject indicates on the weights of associations on a path

• Usefulness = the average agreement scale is > 4, indicating positive assessment of usefulness

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

User Study Tasks

• Task 1: Use COPLINK Detect to find the strongest association paths between those criminals.

• Task 2: Use the concept space approach provided by the prototype system to find the strongest association paths, evaluate each association on the path, and indicate scales of agreement on the association weights.

• Task 3: Given the same set of criminal names used in task 2, use the heuristic approach to do same

Two or more names were entered to search for association paths

Returns are displayed in a network

Weak links can be removed to focus investigation where more information is needed

Clicking on a link displays information about origin and strength of link

Concept Space and heuristic values were compared by the users to assess comparative accuracy

H1, H2, H3 Two-tailed t-tests

• H1 was supported (t = 11.47, p < 0.001)

• H2 was supported (t = 2.04, p < 0.001)

• H3 was supported (t = 2.35, p < 0.05)

Weighting Agreement Scale

0

5

10

15

20

25

30

1 3 5 7 9

11 13

15

17

19

21

23

25

27

29

path

Concept Space

Heuristic

Ag

reem

ent

Sca

le(s

um

med

per

pat

h)

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A

Conclusions

• The system evaluation focused on the approaches’ efficiency, accuracy, and usefulness

• The three characteristics are desirable features of a sophisticated link analysis system

• The experiment results demonstrated the potential of our approach to achieve these features using domain-specific heuristics

Future Work

• Apply a statistical analysis on NIBRS (National Incident-Based Reporting System) data for more accurate crime type/relationship weights

• Extend heuristics to include common vehicles and common organization associations

• Encode expert knowledge in Bayesian networks and incrementally learn new knowledge from crime data

• Interface improvements suggested by users• Improve data consolidation rules

Agenda

• Review of Problem: Link Analysis In Law Enforcement Problem

• Literature Review• System & Heuristic Design• User Study Design• Demo• User Study Results & Conclusions• Q & A