mobile device forensics using nlp

26
MOBILE DEVICE FORENSICS USING NLP Presented By: Ankita Jadhao Roll no. CSE15S2002 Supervised By: Dr. A. J. Agrawal Department of Computer Science & Engineering Shri Ramdeobaba College of Engineering and Management Nagpur 03/13/2022 1 Department of Computer Science and Engineering

Upload: ankita-jadhao

Post on 07-Jan-2017

407 views

Category:

Engineering


7 download

TRANSCRIPT

05/02/2023 Department of Computer Science and Engineering

1

MOBILE DEVICE FORENSICS USING NLP

Presented By:Ankita Jadhao

Roll no. CSE15S2002Supervised By:

Dr. A. J. Agrawal

Department of Computer Science & EngineeringShri Ramdeobaba College of Engineering and Management

Nagpur

Contents

• Introduction• Motivation• Review of Literature• Issues • Methodology • Existing System• Advantages• Bibliography

05/02/2023 Department of Computer Science and Engineering 2

Introduction• Text mining also is known as Text Data Mining (TDM) and

Knowledge Discovery in Textual Database (KDT).• Text Mining Tasks: 1. Exploratory Data Analysis 2. Information Extraction 3.Text Classification

Fig.1 Overview of Process

05/02/2023 Department of Computer Science and Engineering 3

Introduction

Where Text Mining Used

• Biomedical applications - To the identification of biological such as protein and gene names as well as chemical compounds and drugs.

• Software applications - The mining and analysis processes, and by different firms to improve their results. - Software for tracking and monitoring terrorist activities.

• Online media applications - Provide readers with greater search experiences which in turn increases site “stickiness” and revenue.

• Marketing applications - In analytical customer relationship management.

05/02/2023 Department of Computer Science and Engineering 4

Introduction• Sentiment analysis -Analysis of movie reviews for estimating how favorable a review is for a movie

-Text has been used to detect emotions in the related area of affective computing.

• Security Application -Monitoring and analysis of online plain text sources such as Internet news, blogs,

etc. for national security purposes. - Criminal activity

05/02/2023 Department of Computer Science and Engineering 5

Introduction

05/02/2023 Department of Computer Science and Engineering 6

•Use of mobile phones to store and transmit personal and corporate

information

•Law enforcement, criminals and mobile phone devices

•There are limited corpora available

• A simple methodology is proposed for feature extraction

•What is corpora?A text corpus is a large and structured set of texts.

Motivation• Growth of mobile devices is rapid. • The average cell phone user sends over 15,000 texts annually• The average 18-24 year old sends almost 40,000 text messages

every year • Most tools and methodologies merely acquire all supported data

and dump the output to a spreadsheet or HTML report• Search hits must be manually examined and noted in a report.• Problem 1.Simple keyword searches 2. Limited corpora

05/02/2023 7Department of Computer Science and Engineering

Literature Review

• Corpora: A corpus linguistics study of SMS text messaging[3] -Tagg developed a text message corpus in British English, but an American English

corpus focusing on forensic application is desirable. -Even for neutral text messages (non-drug-related), as the language is

significantly different which will skew results.

• Integrating Machine Learning into the Forensic ProcessApproaches:- 1.The digital forensic process can be summarized as preservation, isolation,

correlation, and logging [4] 2. Begins with acquisition, then analysis, then concludes with presentation[5] 3. preservation, extraction, and then interpretation[1]

05/02/2023 Department of Computer Science and Engineering 8

Literature Review

• Natural Language Processing: Dela Rosa and Ellen - Detect linguistic patterns is an invaluable tool when applied to text messaging

data

- NLTK machine learning algorithms can be applied to a training set and assessed on a test set to create and train an experimental model

- Applying k nearest neighbor (kNN) and support vector machines (SVM) machine-learning algorithms to micro-text classification

05/02/2023 Department of Computer Science and Engineering 9

Issues in Mobile Forensics

• Corpora is not available• Accuracy Problem• Feature Extraction• Micro-Text Problem

05/02/2023 Department of Computer Science and Engineering 10

Overview

05/02/2023 Department of Computer Science and Engineering 11

•Mobile Device Forensic Extraction

•Text Message Corpus

•Feature Extraction

•Supervised Machine Learning

Mobile Device Forensic Extraction

• Text messages were extracted from mobile.

• Administrative access to the device was gained by utilizing the

redsn0w software to “jailbreak” the device.

• The text message database was accessed on the device by

navigating to the default location

• An MD5 hash value was computed for the text message

database file to mathematically verify that the file had not

been altered during the execution of the methodology

05/02/2023 Department of Computer Science and Engineering 12

Corpora

05/02/2023 Department of Computer Science and Engineering 13

1. First, collect the corpus data2. Save the text in plain text format3. Provide an identification of the text at the beginning of it.4. Carry out any pre-processing of the text5. The corpus was saved in extensible markup language XML format.

Fig 3 Common Structures for Text Corpora

Corpora

05/02/2023 Department of Computer Science and Engineering 14

<?xml version="1.0" encoding="UTF-8"?><corpus_data><text_message><class>0</class><subscriber>1</subscriber><message_body>Text Message</message_body><timestamp>9/4/2012 2:40 PM</timestamp><type>Incoming</type></text_message>

• Class refers to whether or not each individual text message is drug-

related (1) or neutral (0).

• The data were modified and additional text messages were developed

Information Extraction System

05/02/2023 Department of Computer Science and Engineering 15

Fig 1 Simple Pipeline Architecture for an Information Extraction System

We first convert the unstructured data of natural language sentences into

the structured data.

Then getting meaning from text is called Information Extraction

Information Extraction System

Example:

String: We saw the yellow dog

05/02/2023 Department of Computer Science and Engineering 16

Fig 2 Segmentation and Labeling at both the Token and Chunk Levels

Feature Extraction

05/02/2023 Department of Computer Science and Engineering 17

Data Representation– “Bag of words” most commonly used: either counts or binary– Can also use “phrases” for commonly occurring combinations of words

There are three aspects of feature extraction:• Feature construction;• Feature subset generation (or search strategy);• Evaluation criterion estimation

Approach for Feature Extraction

05/02/2023 Department of Computer Science and Engineering 18

•Utilizing a count of known drug-related unigrams as a Feature•NLTK was used to identify bigrams of interest•The alternate approach -Two-word pairs as features and to allow the algorithm to determine which bigrams were most effective in classifying text messages as drug-related or neutral.

Example:1. “After school today let’s go smoke some weed at my house.”

2. “Hey pull that weed in my flower garden when you get home.”

Approach for Feature Extraction

05/02/2023 Department of Computer Science and Engineering 19

•While the first text message was drug-related, the second

was neutral and would therefore be a false positive.

•The hypothesis was that drugrelated terms would exist in

frequented bigrams, such as “smoke weed,” “mary jane,” “hit

acid,” “pop pilz,” etc. and that these bigrams would increase

classification accuracy

Algorithm

05/02/2023 Department of Computer Science and Engineering 20

Supervised Machine Learning

05/02/2023 Department of Computer Science and Engineering 21

•Input- text message corpus.

•Bigrams were selected as features.

•System was trained utilizing NLTK’s implementation of the Naïve

Bayes classifier .

•It was hypothesized that a smaller training set might increase the

accuracy.

Application’s of Mobile Forensics

• Makes SMS analysis techniques highly applicable to Twitter “tweet” analysis.

• It is useful for corporate investigation, criminal and civil defense.

• Useful for law enforcement investigators to analyze Social Media Profile for evidence of criminal activity

05/02/2023 Department of Computer Science and Engineering 22

Conclusion

05/02/2023 Department of Computer Science and Engineering 23

•Natural language processing and machine classification have been applied to mobile device forensic analysis in a unique way

•Text message classification and are free to develop a better methodology using the text message corpus.

•Develop the more efficient corpora, it has been made available to the research community

Future Work

05/02/2023 Department of Computer Science and Engineering 24

•We can overcome on the “micro-text” problem by using more efficient feature extraction techniques

•Future research recommendations include determination of the frequency of text messaging between criminal suspects

•Calculating the average time span between sent and received messages in text message conversation threads.

References

05/02/2023 Department of Computer Science and Engineering 25

1. Daniel R. O’Day and Ricardo A. Calix“TEXT MESSAGE CORPUS: APPLYING NATURAL LANGUAGE PROCESSING TO MOBILE DEVICE FORENSICS”, Purdue University Calumet, 2200 169th Street, Hammond, IN, 46323, USA

2. D. Phuc and N.T.K. Phung, “Using Naïve Bayes model and natural language processing for classifying messages on online forum,” 2007 IEEE International Conference on Research, Innovation and Vision for the Future, pp. 247-252, March 2007

3. A.Smith. “Americans and Text Messaging”. 2011.[Online]. http://pewinternet.org media/Files/Reports/2011/Americans %20and%20Text%20Messaging.pdf

4. B. Carrier, “File System Forensic Analysis”. Boston,MA: Addison-Wesley, 2005, p. 8.

5. C. Altheide and H. Carvey, “Digital Forensics With Open Source Tools”, Waltham, MA: Syngress, 2011.

6. S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. Sebastopol, CA: O’Reilly Media, 2009, pp. 221-255.

05/02/2023 Department of Computer Science and Engineering

26

Thank you!