crime analytics: analysis of crimes through news paper articles
TRANSCRIPT
Crime Analytics: Analysis of Crimes
Through Newspaper ArticlesIndika Perera
Isuru Jayaweera
Chamath Sajeewa
Sampath Liyanage
Tharindu Wijewardane
Department of Computer Science and Engineering
Faculty of Engineering
University of Moratuwa
Sri Lanka
Adeesha Wijayasiri
Department of Computer and Information Science and Engineering
University of Florida
United States
Introduction
Why Crime Analysis?• To identify general and specific crime trends, patterns and
series in an ongoing, timely manner.
• To utilize limited law enforcement resources.
• To be proactive in detecting and preventing crimes.
• To meet the law enforcement needs of changing society.
A major challenge faced by most of the law enforcement and intelligence
organizations is efficiently and accurately analyzing the growing volumes of crime
related data. The vast geographical diversity and the complexity of crime patterns
have made the analyzing and recording of crime data more difficult. Data mining is a
powerful tool that can be used effectively for analyzing large databases and deriving
important analytical results.
Crime Analysis Approaches Using in
SriLanka
• Police department uses manual crime recording and
analysis system.
• There is no free and open accessible crime analysis
system in Sri Lanka.
Grave Crime Records
Minor Crime Records
Usefulness of Crime Analysis System
● Police Department can use the system when they create
security plans.
● Police department can evaluate their existing plans.
● Investors can use the system when they want to find suitable
areas for investments.
● Tourists and tourist agents can use the system when they are
planning their tours.
Related Work
Crime Data Mining Techniques
• Entity extraction
• Association rule mining
• Deviation detection
• Classification
Existing systems that use data mining techniques for
crime investigation
• Regional crime analysis program
• Link analysis concepts
• Data mining framework for crime pattern identification
• Narcotics network in Tucson police department
• Clustering techniques
• Sequential pattern mining
• String comparison
Crime Analysis Steps
• Hotspot Detection
• Crime Pattern Visualization
• Nearest Police Station Detection
A framework has been proposed which includes relationships between the crime data
mining techniques and crime type characteristics. Framework has been developed by
using Tucson Police Department crime classification database. Using this framework,
investigator can determine the most suitable data mining technique for his/her task. As
given by the proposed framework, investigators can use neural network techniques in
crime entity extraction/ prediction, clustering techniques are effective in crime association/
prediction, and social network analysis can facilitate crime association/pattern
visualization.
• Crime Comparison
• Crime Clock
• Outbreaks Detection
• Web Crawling – multi threaded crawlers, preferential crawlers,focused crawlers
• Document Classification – stop words removal,lemmatization/stemming, tf-idf, syntactic and semanticarrangement of words, support vector machines, word net/ ontology,different error costs, sampling techniques, cross validation
• Entity Extraction – sentence splitting, tokenizing, POS tagging,supervised/ semi- supervised/ unsupervised entity extraction
• Duplicate Detection – near duplicate detection, finger prints(shingles, simhash), hamming distance
Preliminaries
The Proposed System
This paper presents a web based crime analysis system.
Sri Lankan English newspapers (Daily Mirror, The Island,
and Ceylon Today) are used as the source for details of
crime incidents.
Newspaper articles are crawled using a focused crawler
and then they are classified.
The Proposed System
Required entities are extracted from classified crime
articles and duplicate detection is performed.
By using these preprocessed data, crime analysis
operations are performed and results are displayed
using a web based GUI.
Unlike most systems, this system is open to anyone who
is interested in crime analysis.
The Proposed System (cont.)
Web CrawlingCrime Analysis and Prediction
Document Classification Entity Extraction Duplicate Detection
Our Solution
• Crawler – crawler4j, Jsoup, cookie handler
• Document Classifier – Weka, LibSVM, SMOTE, Different
Error Costs
• Entity Extractor – GATE, ANNIE, Stanford NLP, Google
Map API
• Duplicate Detector – 64 bit simhash calculator,
murmur hash calculator, hamming distances
• Web Interface – HighCharts, Java scripts, AJAX
Implementation Details
Results
Crime Hot Spot Analysis
Crime Comparison
Crime Pattern Visualization
Conclusion
• The proposed system performs crime analysis
operations such as hotspot detection, crime comparison
and crime pattern visualization.
• Graphical user interface of the system uses graphs and
diagrams to display the results which make crime
analysis a very simple task.
• Therefore law enforcement officers and other
interested users will be able to use this system
effectively and efficiently for crime analysis processes.
• Also this is a publicly accessible system, so that anyone
who is interested in this area will be able to use this
system freely.
Future Work
• Crime prediction is expected to be implemented in future
to enhance the functionality of the system.
• Comprehensiveness of the news article collection can be
further improved by extending the news article crawler to
crawl more news websites.
• Linguistic knowledge (WordNet, Ontology, etc.) can be
incorporated with the document classifier module in
order to improve the accuracy of the classification
process.
• Entity extraction module can be improved by
incorporating more rules which will improve accuracy
and comprehensiveness of the entity extraction process.
Questions?
Thank You!