data mining technique for classification and feature evaluation using stream mining(ranjit banshpal)
Post on 05-Dec-2014
884 Views
Preview:
DESCRIPTION
TRANSCRIPT
Data Mining Technique For Classification and Feature Evaluation
Using Stream Mining
Ranjit R. Banshpal
•Introduction
•Data streams classification
•Decision Tree
•VFDT
•Challenges
•Applications
•Conclusion
•References
OUTLINE
IntroductionIntroduction
• What is Data mining ?
• Extracting knowledge from historical data.
• What is Data stream Mining ?
• Extracting knowledge from real high stream data
• Why we use Data stream Mining ?
Network Traffic Data
Sensor Data Call Center Data
Continue flow Data
Examples:
Introduction (Cont…)Introduction (Cont…)
5
• Uses past labeled data to build classification model
• Predicts the labels of future instances using the model
• Helps decision making
Data Stream ClassificationData Stream Classification
Network traffic
Classification model
Attack traffic
Firewall
Block and quarantine
Benign traffic
Server
Model
update
Expert analysis and labeling
Decision TreesDecision Trees
• Decision tree is a classification model. Its structure is a like a general tree structure or flow chart.– Internal node: It is used for testing the attribute
value.
–Leaf node: class labels.
Fig: Decision Tree of Weather
Decision Tree (cont...)Decision Tree (cont...)
• Limitations–Classic decision tree assume all training data
can be simultaneously stored in main memory.
–Disk-based decision tree repeatedly read training data from disk sequentially.
VFDTVFDT
• VFDT takes less time as compare to Decision tree.
• In order to find the best attribute at a node, it will take small subset of
the training examples that pass through that node.
– Given a stream of examples, use the first ones to choose the
root attribute.
– Once the root attribute is chosen, the successive examples
are passed down to the corresponding leaves, and used to
choose the attribute there, and so on recursively.
VFDT (cont...)VFDT (cont...)
Data Stream
Data Stream
(Gender)-Type) (Car_
GG_
Age<30?
Yes
Yes No
Age<30?
Car Type=Sports Car?
No
Yes
Yes No
No
Car Type= normal
• Infinite length
• Concept-drift
• Concept-evolution
• Feature Evolution
ChallengesChallenges
classifier Ensemble M
outlier detection moduleBuffer outliers instances.
Clusters instances in
Buffer
cluster isTransform
ed
into a pseudopoin
t data
structure
clusters clusters
clusters
Centroid,Weight,radiusCentroid,Weight,radi
usCentroid,Weight,radiusCentroid,Weight,radius
Set of Pseudopoint H
The data stream is divided into equal sized chunks(Input)
Calculate q-NSC value Assigned to every instance in Pseudopoint
If tp is greater than the threshold
corresponding classifier
votes in favor
of a another class
Another instance
algorithm
Fig: Work flow for Identifying concept evolution.
Feature-EvolutionFeature-Evolution
•Applicable to many domains such as•Intrusion detection system.
•Share Market Data.
•Security Monitoring.
•Network monitoring and traffic engineering.
•Business : credit card transaction flows.
•Telecommunication calling records.
•Web logs and web page click streams.
ApplicationsApplications
• In data stream classification VFDT algorithm is efficient to
classified high dimensional data in to the another class.
• Then, VFDT shows two key mechanisms of the another class
detection technique, outlier detection, and multiple class
detection.
ConclusionConclusion
ReferencesReferences[1] Mohammad M. Masud, Qing Chen, Latifur Khan, Charu C. Aggarwal, JingGao,
Jiawei Han, “Classification and Adaptive Novel Class Detection of Feature-Evolving
Data Streams”, IEEE Tran. on Knowledge And Data Engi., Vol. 25, No. 7, July
2013.
[2] Durga Toshniwal, Yogita K,“Clustering Techniques for Streaming Data–A
Survey”, 3rd IEEE International Advance Computing Conference (IACC), 2013.
[3] S. Hashemi, Y. Yang, Z. Mirzamomen, and M. Kangavari, “Adapted One-versus-
All Decision Trees for Data Stream Classi-fication,” IEEE Trans. Knowledge and
Data Eng., vol. 21, no. 5, pp. 624-637, May 2012.
[4] A. Bifet, G. Holmes, B. Pfahringer, R. Kirkby, and R. Gavalda,“New Ensemble
Methods for Evolving Data Streams,” Proc. ACMSIGKDD 15th Int’l Conf.
Knowledge Discovery and Data Mining,pp. 139-148, 2011.
[5] C.C. Aggarwal, “On Classification and Segmentation of Massive Audio Data
Streams,” Knowledge and Information System, vol. 20, pp. 137-156, July 2009.
[6] M.M. Masud, J. Gao, L. Khan, J. Han, and B.M. Thuraisingham, “Classification
and Novel Class Detection in Concept-Drifting Data Streams under Time
Constraints,” IEEE Trans. Knowledge and Data Eng.,vol. 23, no. 6, pp. 859-874,
June 2011.
[7] M.M. Masud, Q. Chen, L. Khan, C. Aggarwal, J. Gao, J. Han, and B.M.
Thuraisingham, “Addressing Concept-Evolution in Concept-Drifting Data Streams,”
Proc. IEEE Int’l Conf. Data Mining (ICDM), pp. 929-934, 2010.[8] M.-Y. Yeh, B.-R. Dai, and M.-S. Chen, “Clustering over multiple evolving streams by events and correlations,” IEEE Trans. on Knowl. and Data Eng., vol. 19, no. 10, pp. 1349–1362, Oct. 2009
ReferencesReferences
Any Questions?
THANK YOUTHANK YOU
top related