data mining project poster

1

Upload: jack-zhang

Post on 21-Jul-2015

73 views

Category:

Engineering


5 download

TRANSCRIPT

Page 1: Data mining project poster

MISSION

Build a spam filter using semi-supervised learning

method.WHY?

Labelled data is usu-ally hard to obtain, we could use the unlabelled data as much as possi-

ble.HOW?

Using semi-supervised learning method to imple-ment the spam filter.

Spam

Ham

Labelled

Unlabelled

01We compared the performance of three semi-supervised learning methods (Self-training, EM-based and Graph-based) and chose the best.

02Since Self-training has the best performance of all, we compared the peroformance of its different learn-ers (Bayesian, Decision Tree and AdaBoost)

03We resembled the semi-super-vised learning method and super-vised learning method with Bag-ging.

References

1. Mark Culp. spa: A semi-supervised r package

for semi-parametric graph-based estim

ation.

Journal of Statistical Software.

2. Niamh Russell, Laura Cribbin, and

2. Niamh Russell, Laura Cribbin, and

Thomas Brendan Murphy. upclass: An

r package for updating model-based

classification rules..

3. Xiaojin Zhu. Sem

i-supervised

learning tutorial, 2007.

4. Xiaojin Zhu and Andrew

4. Xiaojin Zhu and Andrew

B Goldberg. Introduction

to semi-supervised

learning.

LabelledData

TrainClassifier

Apply onunlabedlleddata

EnhanceClassifier