Transcript
Page 1: Anomaly Detection Via PCA

Anomaly Detection via Online Over-Sampling Principal Component Analysis

Page 2: Anomaly Detection Via PCA

Guide

NAME USN Kumara BG 1NT11CS408

Mahesha GR 1NT11CS409

Mallikarjun S 1NT11CS410

Deepak Kumar 1NT10CS129

Ms.NirmalaSenior lecturer Dept of CSE

Page 3: Anomaly Detection Via PCA

Problem Statement We propose an online over-

sampling principal component analysis (osPCA) algorithm and it is detecting the presence of outliers from a large amount of data. Unlike prior PCA based approaches, we do not store the entire data matrix or covariance matrix, and thus our approach is especially of interest in online or large-scale problems.

Page 4: Anomaly Detection Via PCA

IntroductionWe are drowning in the deluge of

data that are being collected world-wide, while starving for knowledge at the same time.

Anomalous events occur relatively infrequently

Page 5: Anomaly Detection Via PCA

What are Anomalies?Anomaly is a pattern in the data

that does not conform to the expected behaviour

Also referred to as outliers, exceptions, peculiarities, surprise, etc.

Anomalies translate to significant (often critical) real life entities◦Credit card fraud◦An abnormally high purchase made

on a credit card

Page 6: Anomaly Detection Via PCA

Motivation

National / International Journals

Page 7: Anomaly Detection Via PCA

ObjectivesThe aim for this project is to

detect the presence of outliers in a very large sampled data by finding the :◦Covariance matrix◦EigenValues◦EigenVectors, which are the

direction of principal component ◦Find Coordinates of each point in the

direction of principal component

Page 8: Anomaly Detection Via PCA

Hardware Specification:Processor - Pentium –IVRAM - 256 MB(min)Hard Disk - 20 GBKey Board - Standard

Windows KeyboardMouse - Two or Three

Button Mouse

Page 9: Anomaly Detection Via PCA

Software SpecificationOperating System :

Windows XPProgramming Language :

JAVAJava Version : JDK 1.6 &

above.IDE tool : ECLIPSE

Page 10: Anomaly Detection Via PCA

Literature Survey:Research Paper Referred : Anomaly Detection Via Online Oversampling

Principal Component Analysis by Yuh-Jye Lee, Yi-Ren Yeh and Yu-Chiang Frank Wang

Other References:

A Survey on Intrusion Detection Using Outlier Detection Techniques by V. Gunamani, M. Abarna

Page 11: Anomaly Detection Via PCA

Design Of the Project :

Page 12: Anomaly Detection Via PCA

Algorithm- Principal Component Analysis :PCA is a dimension reduction

method.PCA is sensitive to outliers and

we only need few principal components to represent the main data structure.

An outlier or a deviated instance will cause a larger effect on these principal directions.

With PCA outliers are detected by means of “Leave One Out” procedure .

Page 13: Anomaly Detection Via PCA

We explore the variation of the principal directions with removing or adding a data point and use this information to identify outliers and detect new arriving deviated data

The effect of LOO with a particular data may be diminished when the size of the data is large.

An outlier via LOO strategy, we duplicate the target instance instead of removing it.

Finally, we duplicate the target instance many times (10% of the whole data in our experiments) and observe how much variation do the principal directions vary.

Page 14: Anomaly Detection Via PCA

Implementation:It includes two steps : Data Cleaning Phase On-line Anomaly Detection Phase

Data Cleaning Phase :The osPCA is applied for the data set for finding the principal direction. In this method the target instance will be duplicated multiple times, and the idea is to amplify the effect of outlier rather than that of normal data. After that using Leave One Out (LOO) strategy, the angle difference will be identified. In which if we add or remove one data instance, the direction will be changed.

Page 15: Anomaly Detection Via PCA

On-line Anomaly Detection Phase : In the on-line anomaly detection phase, the goal is to identify the new arriving abnormal instance. The quick updating of the principal directions given in this approach can satisfy the on-line detecting demand. A new arriving instance will be marked .

Page 16: Anomaly Detection Via PCA

Snapshots :

Page 17: Anomaly Detection Via PCA
Page 18: Anomaly Detection Via PCA
Page 19: Anomaly Detection Via PCA

Outcomes

We have explored the variation of principal directions in the leave one out scenario.

We demonstrated that the variation of principal directions caused by outliers indeed can help us to detect the anomaly.

The over-sampling PCA to enlarge the outlierness of an outlier.

Page 20: Anomaly Detection Via PCA

Conclusion :This project has attempted to

establish the significance of anomaly detection using osPCA technique.

Our method does not need to keep the entire covariance or data matrices during the online detection process.

Compared with other anomaly detection methods, our approach is able to achieve satisfactory results while significantly reducing computational costs and memory requirements.

Page 21: Anomaly Detection Via PCA

Future Enhancement :In this Project we are working on

a particular data set that we got from an online website but in future we’ll work on any data set to detect the anomalies.

Page 22: Anomaly Detection Via PCA

Thank You


Top Related