data mining university crime presenters: stuart king konrad kuczynski ying liu
TRANSCRIPT
Data MiningData Mining University Crime University Crime
Presenters: Stuart King
Konrad Kuczynski
Ying Liu
Data Sourcesa) FBI Uniform Crime Reporting campus crime
statistics for 669 schools from 1995 thru 2007 (except 2004): 1995-2005: http://www.securityoncampus.org/crimestats/index.html 2006-2007:
http://www.fbi.gov/ucr/cius2006/offenses/standard_links/universities_colleges.html http://www.fbi.gov/ucr/cius2007/offenses/standard_links/universities_colleges.html
b) City Crime Data: 1999-2007: http://www.fbi.gov/ucr/ucr.htm
c) State Poverty: 1999-2007: http://www.census.gov/hhes/www/saipe/county.html
Data Mining Tasks
• Cluster/Classify1: Prediction of missing population values
Used for improving both charting and anomaly detection of Per Capita crime data
2: Prediction of crime trends Can be used by universities to allocate resources for the coming year
• Outlier Detection3: Detect anomalies in crime
Can be used by universities to target “root cause and prevention” projects
Data Preprocessing• University crime data cleanup 669 universities with 971 names 136 universities were removed because they reported less than 5 years of data If a year of crime data was missing, a fabricated record of averages was added
• Matching City, State, and Zip code information for each University
• Post cleaned and merged data sets: 5,869 records of crime data for 533 universities with City/State/Zip code added 1,712 records of crime data for 249 universities with the following added:
City/State/Zip, City Crime Data, State Poverty Data
• Additionally, Per Capita and values were calculated
Data Mining• Data used for each task
St. StPov%
Pop. Vio. Murd. Rape Rob Aslt Prpty Burg Theft Car Arson
1 G | |
P | | % % % % % % % % % %
2 G U∆ U∆ U∆ U∆ U∆ U∆ U∆ U∆ U∆
P S∆ U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
U∆
C∆
3 %U∆
%U∆
%U∆
%U∆
%U∆
%U∆
%U∆
%U∆
%U∆
• 1: Population cluster/classify
• 2: Crime Trend cluster/classify
• 3: Anomaly Detection
• G: Grouping = Clustering
• P: Prediction = Classifying
• % Per capita values• ∆ Difference values• | | Absolute values• U University• C City• S State
Data Mining• Algorithm used for each task
Task Mining Algorithm
1: Population
Prediction
Clustering EM
Classification Decorate using J48
2: Crime
Trend
Prediction
Clustering EM
Classification J48
3: Crime
Anomalies
Outlier Detection DBSCAN with minPoints=1
Visualizations• Summary charts and graphs for Per Capita and Clustering data
• Interactive Map showing cluster changes for 533 Universities
• Interactive Map showing predicted clusters for 249 Universities
• Interactive Map showing where 355 outliers occurred
• Interactive Charts showing values for outliers
http://www.cse.msu.edu/~kingstua/Team3