a generalized cluster centroid based classifier for text categorization

Post on 24-Feb-2016

60 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

A generalized cluster centroid based classifier for text categorization. Presenter : Bei -YI Jiang Authors : Guansong Pang, Shengyi Jiang 2013. Information Processing and Management. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. KNN - PowerPoint PPT Presentation

TRANSCRIPT

Intelligent Database Systems Lab

Presenter : BEI-YI JIANG

Authors : GUANSONG PANG, SHENGYI JIANG

2013. INFORMATION PROCESSING AND MANAGEMENT

A generalized cluster centroid based classifier for text categorization

Intelligent Database Systems Lab

OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments

Intelligent Database Systems Lab

Motivation

• KNN− With the exponential growth of online textual

information, how to organize text data effectively and efficiently has become an important and demanding issue.

• Rocchio− Fails to obtain an expressive categorization

model due to its inherent linear separability assumption.

Intelligent Database Systems Lab

Objectives

• To strengthen the expressiveness of the Rocchio model.

• Employ the improved Rocchio model to speed up the categorization process of KNN.

Intelligent Database Systems Lab

Methodology

• KNN

• Rocchio

Intelligent Database Systems Lab

Methodology

Intelligent Database Systems Lab

Methodology

Intelligent Database Systems Lab

Methodology

• GCC

• Determine the threshold

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Experiments

Intelligent Database Systems Lab

Conclusions

• strengthen the expressiveness of the Rocchio model• GCCC and its variants achieve impressive

performance• obtain near linear time complexity in modeling• GCCC’s modeling stage is more time-consuming

Intelligent Database Systems Lab

Comments• Advantages

-relatively stable-favorable performance

• Applications-online categorization

top related