tajinder presentation4

21
crimeX Real time crime analysis and alert system Tajinder Singh

Upload: tajinder-singh

Post on 16-Apr-2017

228 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Tajinder presentation4

crimeX Real time crime analysis and alert system

Tajinder Singh

Page 2: Tajinder presentation4

Motivation

Page 3: Tajinder presentation4

Motivation

• How criminals operate

• Dynamics between criminals and anti criminal squad

Page 4: Tajinder presentation4

Demo

www.crimefighter.ninja

Page 5: Tajinder presentation4

Pipeline

Crime data (real)

User data (real)

Crime data (batch)

Ingestion Batch Layer Serving Layer

Real Time

Page 6: Tajinder presentation4

Data flow

• Seed: http://us-city.census.okfn.org/dataset/crime-stats

• Engineered Data (600 GB)

Data sources

Page 7: Tajinder presentation4

Data flow

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

Crime data (batch)

Batch Processing

Page 8: Tajinder presentation4

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

Page 9: Tajinder presentation4

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”, ……etc….}

+ Python Script (Refining)

Page 10: Tajinder presentation4

Data flow

Crime data (batch)

Batch Processing

{ “crime_id”: “C786” , “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”,

“crime_occ_ts”: “2015-03-05 15:24:49”, “lat”: “34.5462”, “lon”: “-118.453”,

“zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa”}

Index Type: crimes

Page 11: Tajinder presentation4

Data flow

{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}

Real Time Processing

Crime data User data

{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }

Page 12: Tajinder presentation4

Data flow

{ “crimetype”: “robbery”, “lat”: “34.5462”, “lon”: “-118.453”}

Real Time Processing

[ Processing ]

Crime data User data { “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243” }

Page 13: Tajinder presentation4

Data flow Real Time Processing

{ “crimetype”: “robbery”, “crime_rptd_ts”: “2015-05-06 11:34:43”, “lat”: “34.5462”,

“lon”: “-118.453”, “zip”:”90007”, “city”: “los angeles”, “state”:”california”,

“country”:”usa”}

Crime data User data

{ “user_id”: “user453”, “username”: “Tajinder”, “lat”: “34.653356”, ”lon”: “-118.53243”,

”zip”:”90007”, “city”: “los angeles”, “state”:”california”, “country”:”usa” }

Index Type: crimes_realtime and user-subscribe-crime

Page 14: Tajinder presentation4

Data flow use case 1 (batch)

Input [ “location”:”2611 portland street, los

angeles”]

Page 15: Tajinder presentation4

Data flow use case 1 (batch)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp

Page 16: Tajinder presentation4

Data flow use case 1 (batch)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Average latency* : Average difference between crimes occurring timestamp & crimes reporting timestamp

[output]

Page 17: Tajinder presentation4

Data flow use case 2 (real)

Real Time [ “crimetype”:”robbery”, “lat”:

”34.2353”, “lon”:”-113.42534”]

Page 18: Tajinder presentation4

Data flow use case 2 (real)

Output Fields

Distance Covered (radius)

Total crimes analyzed

Average latency*

Crime Types

Alert nearby users

User Phone number

User Name

User latitude

User longitude

[output]

Page 19: Tajinder presentation4

Challenge: Front-end display after 5 seconds per request

Reason:

• A lot of I/O operations (all crime documents were fetched to the UI)

• Business logic and query execution on front-end (flask)

Solution:

• Query execution on Elasticsearch cluster

• NO I/O operation

• Dynamic scripting enabled on ES cluster.

• Used Groovy scripts as opposed to Javascript, Python, MVEL (built-in),

expression (built-in) etc.

Challenge: Network Latency

Solution: Co-locate Storm and Elasticsearch cluster nodes to reduce network

latency

Performance Optimization

Challenges

Page 20: Tajinder presentation4

Caveat: Vulnerable to outside attacks (Security vulnerability)

Reason:

• Enabled dynamic scripting

Solution:

• Don’t run Elasticsearch as root

• Provide read-only access to requisite directories

Performance Optimization

Challenges

Page 21: Tajinder presentation4

about me

Tajinder Singh [University of Southern California]

5 yrs experience in web development