data mining project poster
TRANSCRIPT
MISSION
Build a spam filter using semi-supervised learning
method.WHY?
Labelled data is usu-ally hard to obtain, we could use the unlabelled data as much as possi-
ble.HOW?
Using semi-supervised learning method to imple-ment the spam filter.
Spam
Ham
Labelled
Unlabelled
01We compared the performance of three semi-supervised learning methods (Self-training, EM-based and Graph-based) and chose the best.
02Since Self-training has the best performance of all, we compared the peroformance of its different learn-ers (Bayesian, Decision Tree and AdaBoost)
03We resembled the semi-super-vised learning method and super-vised learning method with Bag-ging.
References
1. Mark Culp. spa: A semi-supervised r package
for semi-parametric graph-based estim
ation.
Journal of Statistical Software.
2. Niamh Russell, Laura Cribbin, and
2. Niamh Russell, Laura Cribbin, and
Thomas Brendan Murphy. upclass: An
r package for updating model-based
classification rules..
3. Xiaojin Zhu. Sem
i-supervised
learning tutorial, 2007.
4. Xiaojin Zhu and Andrew
4. Xiaojin Zhu and Andrew
B Goldberg. Introduction
to semi-supervised
learning.
LabelledData
TrainClassifier
Apply onunlabedlleddata
EnhanceClassifier