spatial analysis of crime data: a case study mike tischler presented by arnold boedihardjo
TRANSCRIPT
Spatial Analysis of Crime Data: A Case Study
Mike Tischler Presented by Arnold Boedihardjo
Outline
• Motivation• Spatial autocorrelation• Approach• Issues• Data sets
Motivation
• Goal: reduce crime activity• Develop a tool to extract crime patterns– Allow visualization of patterns
• Ultimately, predict crime occurrences
Spatial Autocorrelation
• Tobler’s first law of geography: “everything is related to everything else, but near things are more related than distant things”
• Possible causes of spatial dependency – Spatial causality: an object (event) is a direct cause of
nearby objects (events)– Spatial correlation: nearby objects (events) behave
similarly– Spatial interaction: movements of objects induce a
relationship between objects in different locations
Approach
• Provide a spatial-based model to describe the density of incident objects (e.g., crime locations) within a given set of spatial objects
• The density values are essentially probability values, hence can be used as a predictive metric for future occurrences of incident objects
Example: When will the next crime happen?
C
C
C
Bank A
Bank BC
C
Bank C
Store
StoreC
How to formalize our intuition in a probabilistic framework?
• The probability of a crime occurring at bank C is higher than the stores– Furthermore, the probability is equivalent to bank
A and bank B• How to define the probabilities?– Kernel Density Estimation
Applying the KDE
• Suppose that the our sample set, S, is not the incident points, but the pair-wise distances of the incidents to the NN non-incident objects (e.g., banks and stores)
• If we apply the KDE to S, the kernel functions will be centered at these pair-wise distances and our query points will be transformed to the NN of the non-incident spatial objects
• Formally, we have the following multivariate KDE
𝐷(𝑥Ԧ) = ෑ� 𝐾𝐻𝑑(𝑁𝑁(𝑥Ԧ,𝑑) −𝑠𝑖,𝑑)|𝑑𝑖𝑚|𝑑=1
|𝑖𝑛𝑐𝑖𝑑𝑒𝑛𝑡𝑠 |𝑖=1
After applying the KDE, we have the following…
Bank A
Bank B
Bank C
Store
Store
Research Issues
• How to select the features (e.g., banks, stores)? Employ notions of density attractors and repellers.
• If the above is solved, how to improve the quality of the density estimates? Currently, an adaptive KDE approach is being tested.
• How to incorporate temporal correlation? • Producing this model is computationally intensive:
feature selection, NN search for every feature, and multiple queries on KDE
Data Set
• Washington DC crime data• Crime incident reports in parse-able formats:– XML, Text/CSV, KML or ESRI
• Geographic feature layers are also available for download (could not verify, but was told by a very reliable source)
• Other regional information are available (e.g., census tract)
• http://data.octo.dc.gov