discovering outlier filtering rules from unlabeled data
DESCRIPTION
Discovering Outlier Filtering Rules from Unlabeled Data. Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu. Outline. Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC - PowerPoint PPT PresentationTRANSCRIPT
Discovering Outlier Filtering Rules from Unlabeled Data
Author: Kenji Yamanishi & Jun-ichi Takeuchi
Advisor: Dr. Hsu Graduate: Chia- Hsien Wu
Outline
Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC Experimentation–The network intrusion Experimental Results Conclusion Opinion
Motivation
The problem of the SmartSifter’s accuracy
The SmartSifter cannot find the general pattern of the identified outliers
Objective
Improving the accuracy of SmartSiFter.
Discovering a new pattern that outliers in a specific group may commonly have
Introduction
Developing SmartSifer : It is an on-line outlier detection algorithm
Improving the power of the SamtSifer by combining supervised learning method
Main Framework
Classifier L
A New Rule
Outlier Detector - SmartSifter ->SS
Using a probabilistic (Gaussian mixture) model->P(x,y) = p(x)p(y|x)
Employing an on-line discounting learning algorithm (SDLE)/(SDEM) to update the model
Giving a score to each datum
Outlier Detector - SmartSifter ->SS(cont.)
SDLE algorithm: An on-line discounting variant of the Laplace law based estimation algorithm
SDEM algorithm: An on-line discounting variant of the incremental EM (Expectation Maximization) algorithm
Outlier Detector - SmartSifter ->SS(cont.)
Outputting a sorted datasetA highly scored data indicates a high
possibility be an outlier
Rule Generator – DL-ESC/DL-SC
Using a stochastic decision list
Employing the principle of minimizing extended stochastic complexity or stochastic complexity
Rule Generator – DL-ESC/DL-SC (cont.)
If ξ makes t1 true, then μ = v1 with probability p1
else if ξ makes t2 true, then μ = v2 with probability p2
………………………
else μ = vs with probability ps
Experimentation - Network intrusion detection
The purpose of our experiment is to detect without making use of the labels concerning intrusions
Experimentation – Dataset (cont.)
Using the dataset KDD Cup 1999 prepared for network intrusion detection
Using the 13 attributes for DL-ESC Using four attributes for SmartSifter (service ,d
uration ,src_bytes ,dst_bytes) Only “service” is categorical Y= log(x+0.1),where the base of logarithm is e Generating five datasets S0,S1,S2,S3,S4
Experimentation – Dataset (cont.)
Experimentation – Illustration by an Example (cont.)
Update Rule – S1
First Rule – S1
Update Rule – S2
Experimental Results
SS : SmartSifter R&S: Rule and SmartSifter (This framework) Using S0 as a training set to construct a filtering
rule, each of S1,S2,S3,and S4 is used for test
Experimental Results (cont.)
Experimental Results (cont.)
Conclusion
This new framework has two features
Improving the power of SmartSifter
Helping the user discovers a general pattern
Opinion
Making the detection process more effective and more understandable
This framework can apply to other field