1 liveclassifier: creating hierarchical text classifiers through web corpora chien-chung huang...

Post on 12-Jan-2016

235 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora

Chien-Chung HuangShui-Lung Chuang

Lee-Feng Chien

Presented by: Vu LONG

2

Outline

1. Introduction

2. LiveClassifier

3. Evaluation

4. Contribution

5. Future work

3

Introduction

http://140.109.19.252:8080/charles/index.jsp

• Uses Web search-result pages as the corpus

source• Exploits the structure information in the topic

hierarchy to train the classifier• Creates key terms to amend the insufficiency

of the topic hierarchy

4

LiveClassifier (Demo version)

Classify documents

Computer Science Classifier is chosen

• There are three created classifiers (topics): Computer Science, Europe, Scientists based on Yahoo! directory

5

LiveClassifier

Classify documents

Pseudo class

6

LiveClassifier• Users can self create their classifiers

7

LiveClassifier

• Feature Extractor

- Interacts with Search Engine and extracts highly-ranked search snippets as effective feature source

- Outputs feature vectors to describe both topic classes and text objects

8

LiveClassifier• Hier-Concept-Query-Formulation

- Formulate query through the topic hierarchy

9

LiveClassifier• Text Classifier

10

Evaluation• Overall performance evaluation

11

Evaluation• Granularity & Diversity

- Classifying text objects into different levels of the topic hierarchy got roughly the same results.

12

Evaluation• Thematic Metadata for Textual Data

13

Evaluation

• Paper Title Classification

- Collect data from 4 CS conferences in 2002

- Classify them into 36 second-level CS classes

14

Contribution

• Finds the ways to collect and organize corpora effectively

• Creates key terms to amend the insufficiency of the topic hierarchy

• Classifies text objects automatically without a pre-labeled training set

• Cooperates with Web information services and other systems easily

• Helps to create more refined data (thematic metadata) for textual data

15

Future work

• Optimize the classifier based by focusing on the training stage rather than only on organizing corpora

• Improve responding time

• Find appropriate pseudo classes

top related