visualization of computer forensics analysis on digital evidence

Download Visualization of Computer Forensics Analysis on Digital Evidence

If you can't read please download the document

Upload: muhd-muizuddin

Post on 14-Apr-2017

62 views

Category:

Data & Analytics


3 download

TRANSCRIPT

VISUALIZATION OF COMPUTER FORENSICS ANALYSIS ON DIGITAL EVIDENCE

Muhd Muizuddin b. Hj.Muhsinon, Nazri b. Ahmad Zamani

University Tenaga Nasional, CyberSecurity Malaysia

[email protected]

Abstract

The project is to explore the usage of data science methodology in further analyzing computer forensics analysis results. In computer forensics the analysis in carried out via forensics tools, for example EnCase, FTK, and XRY. These tools have powerful engine to zooming in digital evidence and finding information pertinent to an investigation. What lack in these tools are features for statistical, machine learning and visualization function that may be crucial in looking into the evidence in its entirety. The project will explore methods to profile and visualize these in computer forensics analysis findings by using Python and Jupyter Notebook. The EnCase csv files of a real-life case analysis will be loaded and will be analyzed by using Pythons SKLearn statistical and pattern recognition engine. The result will be plotted by using Pythons visualization tools such as Matplotlib, Seaborn, and Pandas.

Introduction

Computer technology is the major integral part of everyday human life, and it is growing rapidly, as are computer crimes such as financial fraud, unauthorized intrusion, identity theft and intellectual theft. To counteract those computer-related crimes, Computer Forensics plays a very important role. Computer Forensics involves obtaining and analysing digital information for use as evidence in civil, criminal or administrative cases [1] .

A Computer Forensic Investigation generally investigates the data which could be taken from computer hard disks or any other storage devices with adherence to standard operating policies and procedures to determine if those devices have been compromised by unauthorized access or not [2]. Computer Forensics Investigators work as a team to investigate the incident and conduct the forensic analysis by using various methodologies (e.g. Static and Dynamic) and tools (e.g. EnCase csv files of a real-life case).To ensure the computer network system is secure in an organization. A successful Computer Forensic Investigator must be familiar with various laws and regulations related to computer crimes in their country (e.g. Malaysian Computer Crimes Act , CCA 1997) and various computer operating systems (e.g. Windows, Linux) and network operating systems (e.g. Win NT). This report will be analyzed the method and visualize these computer forensics analysis results by using Python and Jupyter Notebook. The result will be plotted in visualization so that it more easy to make reference or any improvement [2].

Digital investigations are constantly changing as new technologies are utilized to create, store or transfer vital data [3]. Augmenting existing forensic platforms with innovative methods of acquiring, processing, reasoning about and providing actionable evidence is vital. Integrating open-source Python scripts with leading-edge forensic platforms like EnCase provides great versatility and can speed new investigative methods and processing algorithms to address these emerging technologies.

In Malaysia, law enforcement agency is now faced with the task of enforcing law in cyberspace that transcends borders and raises issues of jurisdiction. Cybercrime has surpassed drug trafficking as the most lucrative crime. Almost anybody who is an active computer/online user would have been a cybercrime victim, and in most cases too its perpetrators. Cybercriminals usually use to cheat, harass, disseminate false information for their own good. This project basically want to improve the results of the investigation have been made to visualize these computer forensics analysis results by using Python and Jupyter Notebook. By not only have raw data into something that is more easily understood as a whole. So that, people can also see the overview of the results and it will be more accurate.

II.Problem Statement

In the analysis period of the computer forensics crime scene investigation, the analyst may confront numerous issues on getting the exceptionally precise result. They only get some kind of raw information and less clear than regular visualizations even more understandable. One of the problems is:

Computer forensics system lacks statistical and visualization tools.

There are key points that need to be considered in the investigation period of the digital evidence:

Evidence profiling is crucial in understanding relationships of the digital evidence activities timeline to the case investigation timeline.

III.Workflow

Figure 3.1: Flowchart

Figure 3.2: Current Situation

Figure 3.3: OverviewThe security analysts are having problems with lack of statistical and visualizations tools in order to get accurate results. They need to manually compared all the raw informations from the digital evidence instead of visualize it. The data from csv file may consist so many data that came from various sources. To avoid the situation where analysts having issues with time consuming and getting unclear visualization of the csv data, the system is needed. By using Jupyter Notebook with Python, it may assist analyst to gathering information and speed up prove of evidence collection. Visualizaitons will help to provide more understandable and clear view of data from the csv file.Due to the problems that have been declared, the system provided the best solution in order to get the visualizations. The data .csv will be loaded into Jupyter Notebook with Python 2.7, and then user will choose type of analysis to be included. In this part, it will decide on the building blocks of the language such as variables, datatypes, functions, conditionals and loops. In addition, question may be asked in this phase, what type of analysis has been choosen. In models & algorithms part, user may choose what kind of models to be produce based on the coding part. After that, visualization will be visualized based on request. In reporting part, user may choose whether to export the data science results and code base to PDF, Microsoft Word and the web (html). IV.Data Specimen

In computing, a comma-separated values (CSV) document stores unthinkable information (numbers and content) in plain content. Every line of the document is an information record. Every record comprises of one or more fields, isolated by commas. The utilization of the comma as a field separator is the wellspring of the name for this document group. The CSV record organization is not standardized. The essential thought of isolating fields with a comma is clear, yet that thought gets confused when the field information may likewise contain commas or even implanted line-breaks. CSV usage may not handle such field information, or they may utilize quotes to encompass the field. CSV data contains many datatypes and fields, it need to be clean in order to get a better view of the data. Jupyter Notebook with Python have provided csvkit library in order to clean the data. It can be set during the coding part of the system.

Figure 1.4: Data .csvV.Methodology

Methodology that are used by this project is Security Data Visualization Process.

Figure 5.1: Security Data Visualization ProcessVisualization Goals

On this step, it should get the overview of current situation. Then, it follows with gathering requirement from security analyst where is the main user. The requirement consists of determine the visualization goals for the specific ease. In fullfulling the requirement, the program development of the system is produced in order to achieve the visualization goals that will be determined by the security analyst. The visualization goals may consists of what kind of information and question required by the security analyst.Data Preparation

It begins with seeking information and setting up the information for analysis. The following stride is to investigate the information with the right inquiries, then picture the information to create bits of knowledge and follow up on it. The most essential stride before beginning representation is information purifying or making the information accessible in a usable configuration. For example, encase data form csv file. It will search for different learns of files found inside an external hardrive and represent it in visualization methods.Explore

Asking the right question will prompt further investigation and representation utilizing factual/probabilistic models/calculations and lead to helpful bits of knowledge/choices. Statistical methods that suitable to be used will be decided in this step.The investigate stage will take a gander at some systematic exercises that will empower security groups to ask the right inquiries and take a gander at the information to perceive how security groups can accomplish their objectives.Visualize

There are two angles to perception hypothesis; one of it is the style. There is writing around how to utilize shading, tone, thickness and different perspectives to make outwardly satisfying pictures to target group. There is part of outline rules in the book [4]. Graphics Press: There is a committed section in the book [5]. These are sample of visualizations and some explanation about it that could be made.Feedback

This step involves continuous improvement with feedback from the stakeholders and availability of new data. In reporting part, data science results could be represented in many ways.VI.Results

For this visualization, the CyberSecurity Malaysia has provided this data. It provides metadata from Encase Result in real forensic cases. The format for this data is in .csv.The metadata from the Encase Result was a real data that given by Digital Forensics Departments In CyberSecurity Malaysia. The data was exhibit from external hard drive. The first impression by just looking the raw data, visualization can make the data into something that is more easily understood as a whole. So that, people can also see the overview of the results and it will be more accurate. In this way, analysts are doing deduction of material evidence so that they are easy to identify the suspect

Figure 6.1: Overall Data Pie ChartThis pie chart shows the perentage of each data type in the metadata file. From the chart, .jpg data type is the highest data that are produced/keeped by the suspect. Followed by .xls, .pdf, .doc and lastly .pptx.Suspects showed a deep interest in data type .jpg extent that more than 50% of the data is based on data type .jpg. But, none the less the number of data types .xls where it represents 22% of the total data. The suspect is likely an overpowering interest in the collection, but the suspect was also a diligent collecting data in the calculation of whether skilled or analyze.

Figure 6.2: Data Compared by MonthFrom the graph, total number of metadata shows that in April is the active period for the suspect to produced/keeped the data. So, it can be predict that April month for each year are the most busy time for the suspect to produced/keeped the data.Followed by May, July and November each of the above shows the number of data rates are relatively high. The probabilities that suspect are actively doing the job in the middle of the year. While in January for each year are the lowest count that suspect frequently to produced/keeped the data.As seen in the graph, in January and December rates meant their numbers are very different compared to other months. The assumption can be made that in two months the suspect took time off and less interested in generating any data.

Figure 6.3: Data Compared by YearsIn this graph, it compare the data type of the data for each year. It's proved that .jpg file was the most file that being produce/keep by the suspect. It can be said that in 2006 & 2013 respectively was the highest data being produce/keep based on the visualization.

From that, we can aspect the suspect behavior. .jpg format is for digital photos andother digital graphics. So, from that we can concluded that suspect loves picture.In beginning of years 1998 until 2011 it keep going produce/keep those kind of data.

Suspect likely to take great pictures. it uses the advantages of and interest on the image to get his wish. Conclusion that can be made is the suspect is a Photographer. In 2013, but no less intense .xls format.It's a file extension for a spreadsheet file format created by Microsoft for use with Microsoft Excel.Microsoft Excel is a well-organized platform that give freedom to write data on grids and worksheets, organized at will, formatted as they prefer.It's also uses in any quantity in business or finance.

Suspect maybe someone that loves to write and doing doing something with its own way.Conclusion that can be made is the suspect is an Analyst.

VII.Conclusion and Way Forward

There are sugeestion that can be making for the future works. Visualization results improved with the addition of information and the right technique.Numerical data can improve the quality of the visualization. The graphs are more attractive and easy to understand.If jupyter notebook can import more data library, the more attractive form of graphs can be

In conclusion, the phases that was involved throughout the development of this system starting from the idea, requirement gathering, analysis, design, coding, testing and finally presentation was a very precious journey of learning, failures, successes and persistence. From this journey, this application has opened my thoughts on how I used to view on programming and it builds a sense of interest in me towards programming. Even though there are, still much enhancement to be made in future, the current developed system still manages to fulfill the minimum requirements and solves the problems stated.VIII.References

Nelson, B., et al., Computer Forensics Investigation, 2008.Case studies, http://resources.infosecinstitute.com/, 2016.

Michael G. Noblett; Mark M. Pollitt; Lawrence A. Presley, Computer Forensics, https://en.wikipedia.org/wiki/Computer_forensics October 2000.

Tufte, E., The visual display of quantitative information, Cheshire, Conn. (Box 430, Cheshire 06410), 1983.

Marty, R., Applied security visualization, Upper Saddle River, NJ: Addison-Wesley", 2009.