the large scale hierarchical text classification pascal...

OutlineIntroduction

Presentation of the ChallengeResults

Conclusion and Perspectives

TheLarge Scale Hierarchical Text Classification

PASCAL Challenge

A. Kosmopoulos†,�,E. Gaussier∗, G. Paliouras†, S.Aseervatham∗

∗ Lab. d’Informatique de Grenoble & Grenoble University, France† National Center for Scientific Research ”Demokritos”, Greece

� Athens University of Economics and Business

March 28, 2010

The Large Scale Hierarchical Text Classification PASCAL Challenge 1 / 26

OutlineIntroduction



Outline

Introduction

Presentation of the ChallengeDatasets and Data PreparationTasksEvaluation Measures

ResultsQuick Overview of ApproachesResults per TasksScalability Tests



OutlineIntroduction



Large scale hierarchical text classification (1)

Situation

I Problem has been addressed in the pastI Seminal work of Yang et. al. [5] (ca. 14,000 categories) in

2003I Followed by extensions in 2005 ([2, 3]) to more than 100,000

categories

I Comparison of different classifiers in different settings: flat vshierarchical

I Continuous interest in the problem (under different forms)and continuous flow of new ideas and approaches - work byXue et al. in 2008 [4]


OutlineIntroduction



Large scale hierarchical text classification (2)

Comments

I The dichotomy introduced in early works between flat andhierarchical approaches blurred in more recent works

I Different classifiers can be used differently (e.g. featureselection or document smapling/filtering can be used in flatapproaches to speed up the process); experimental space isindeed large

I Recent challenges on large scale classification on largenumbers of high-dimensional exmaples, but very fewcategories

⇒ All these elements led us to propose a challenge on large scalehierarchical classification


OutlineIntroduction



Description of the Problem

I Hierarchy of categories is provided (flat vshierarchical)

I Reasonable, large numbers of categories(12,000) and examples (160,000) - don’tscare potential participants with largenumbers!

I Simple hierarchical problem: documentsat the leaves only

I Simple multiclass, single label problem:each document belongs to only onecategory

Arts

MoviesMusic

Pop Rock


OutlineIntroduction



Datasets and Data PreparationTasksEvaluation Measures

Data preparation

I Pre-processing:I stemming/lemmatizationI stop-word removal

I Two type of vectors:

Content data Description dataThe Movies Net Top 20 brings you the pick of the most popular and highest-‐ra=ng sites for movies on the Net today. It gives you access to the world's best movies sites, all from a single page.

Movies.com features the latest movie news, reviews, trailers and a wide variety of general movie informa=on.

Movies Top 20 -‐ Lists links to several film sites, with brief descrip=ons of

each.


OutlineIntroduction




Datasets

I Large dataset (12294 categories)Used for system evaluation.

I Small dataset (1139 categories)Used for system tuning.

I Each dataset is split into:I a training set (93805/4463 vectors)I a validation set (34905/1860 vectors)I a test set (34880/1858 vectors)


OutlineIntroduction




Tasks of the Challenge

Task NameContent Description

Train Test Train Test

Task 1: Basic X X - -Task 2: Cheap - X X -Task 3: Expensive X X X -Task 4: Full X X X X


OutlineIntroduction




Tasks of the Challenge

Task Name Train Validation Test

Task 1: Basic 347255 191224 194024Task 2: Cheap 71322 39070 194024Task 3: Expensive 368113 201487 194024Task 4: Full 368113 201487 204288

Task Vocabulary


OutlineIntroduction




Evaluation Measures (1)

Accuracy =Correct Classifications

D

Tree-induced error [1] =

∑Dd=1 Path-length(cd , td )

D

I D is the number of testing documents

I cd the category in which the document was classified

I td the true category of the document


OutlineIntroduction




Evaluation Measures (2)

Macro-average precision, recall and F1-measure

Macro precision =

∑Mi=1 precisioni

M, precision =

TP

TP + FP

Macro recall =

∑Mi=1 recalli

M, recall =

TP

TP + FN

Macro F1 =2 ·Macro precision ·Macro recall

Macro precision + Macro recall

M is the number of categories.


OutlineIntroduction




Significance Tests (1)

Comparing proportions(p-test)[6] for Accuracy

p =pa + pb

2

Z =pa − pb√

2p(1− p)/n

I pa accuracy of the first system

I pb accuracy of the second system

I n the number of trials (number of testing vectors)

I Significant different if P-value < 0.05


OutlineIntroduction




Significance Tests (2)

Macro sign test (S-test)[6] for Macro-average F1-measure

Z =k − 0.5n

0.5√

n, since n > 12

I n is the number of times that ai and bi differ

I k is he number of times that ai is larger than bi

I ai ∈ [0, 1] is the F1 score of system A on the ith category (i=1, 2, ..., M)

I bi ∈ [0, 1] is the F1 score of system B on the ith category (i=1, 2, ..., M)

I M is the number of categories

I Significant different if P-value < 0.05


OutlineIntroduction



Quick Overview of ApproachesResults per TasksScalability Tests

Standard Approaches

Two main approaches[4]:

Big-bang Directly categorize documents to the leaves.

Top-down Hierarchy is exploited in order to divide the probleminto smaller ones.

Big-bang approaches are usually more accurate while Top-downapproaches are usually faster.


OutlineIntroduction




Approaches used in the Challenge (1)

I Most participants either used a big-bang approach orexploited only a small part of the hierarchy.

I Feature selection was tried but did not always help.I Regarding the classifiers:

I Lazy learners were used, which are very fast at training butslower at classification. (i.e. kNN)

I Eager learners were also used and were faster at classification.(i.e. Naıve Bayes, SVMs)


OutlineIntroduction




Approaches used in the Challenge (2)

I arthur general- two-level hierarchy, multi-class SVM.

I logicators - subset of the hierarchy, hierarchical SVMs.

I Turing - knn for a subset and then Naıve Bayes classifier.

I XipengQiu - centroid for each class and IDF of terms.

I jhuang - flat hierarchy, two online algorithms (OOZ and PA).

I NakaCristo - flat hierarchy, kNN with three variants.

I Brouard - IDF feature selection, linking terms to categories.

I alpaca - classifier combination, 2-degree polynomial SVMs.


OutlineIntroduction




Highest Scores per Task

Task 1 Task 2 Task 3 Task 4

Accuracy 0.467632 0.380619 0.467861 0.504759

Macro F-measure 0.35494 0.317133 0.359557 0.386195

Tree Induced Error 3.07858 3.80803 3.2621 2.82101


OutlineIntroduction




Variation of Highest Scores

0.20

0.25

0.30

0.35

0.40

0.45

0.50

0.5530

-7

15-8

30-8

15-9

30-9

15-1

0

30-1

0

7-11

14-1

1

21-1

1

30-1

1

Accuracy

Task1Task 2Task 3Task 4

0.15

0.20

0.25

0.30

0.35

0.40

30-7

15-8

30-8

15-9

30-9

15-10

30-10

7-11

14-11

21-11

30-11

Μa

cro

F-m

ea

su

re

Task1Task2Task3Task4

2

3

4

5

6

7

8

9

30-7

15-8

30-8

15-9

30-9

15-1

0

30-1

0

7-11

14-1

1

21-1

1

30-1

1

Tre

e in

du

ce

d e

rro

r

Task 1Task 2Task 3Task 4


OutlineIntroduction




Task 1: Basic

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

alpaca

jhuang

arthur_general

XipengQiu

Turing

Dya

konov

logicators

ysyk

y

zwner

wdd

NakaCristo

shawndr

alfonso

eromero

bbutc

illes.so

lt

leokury

semelak1

brouardc

controlledchaos

0.152

0.2920.3010.3110.315

0.3600.361

0.4020.4020.4030.4050.4080.4150.4270.4320.4430.4430.4630.468

Task1 - Accuracy


OutlineIntroduction




Task 1: Basic

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

jhuang

alpaca

XipengQiu

logicators

Dya

konov

Turing

arthur_general

shawndr

ysyk

y

wdd

NakaCristo

alfonso

eromero

zwner

semelak1

bbutc

illes.so

lt

brouardc

leokury

controlledchaos

0.136

0.1700.186

0.199

0.2260.248

0.2820.2850.2870.2940.2970.3160.3200.3230.3280.3290.3370.341

0.355

Task1 - Macro F-measure


OutlineIntroduction




Task 1: Basic

0

1

2

3

4

5

6

7alpaca

jhuang

arthur_general

XipengQiu

Turing

Dya

konov

ysyk

y

zwner

wdd

logicators

NakaCristo

shawndr

alfonso

eromero

leokury

bbutc

illes.so

lt

brouardc

semelak1

controlledchaos

6.383

4.6954.4884.4304.403

4.2504.089

3.7153.6503.5803.5783.5543.4953.4333.3933.3633.2933.2813.079

Task1 - Cumulative tree-induced error


OutlineIntroduction




Task 2: Cheap

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Turing

arthur_general

jhuang

XipengQiu

logicators

NakaCristo

brouardc

0.249

0.2990.309

0.3470.3550.3660.381

Task2 - Accuracy

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Turing

XipengQiu

jhuang

logicators

arthur_general

NakaCristo

brouardc

0.161

0.209

0.2500.2550.2650.285

0.317


0

0.75

1.50

2.25

3.00

3.75

4.50

5.25

6.00

Turing

arthur_general

jhuang

XipengQiu

logicators

NakaCristo

brouardc

4.9914.6674.520

4.3214.2723.967

3.808



OutlineIntroduction




Task 3: Expensive

0

0.055

0.110

0.165

0.220

0.275

0.330

0.385

0.440

0.495

0.550

jhuang

XipengQiu

Turing

logicators

NakaCristo

0.4040.422

0.4400.4520.468

Task3 - Accuracy

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

jhuang

XipengQiu

logicators

Turing

NakaCristo

0.301

0.3380.3380.3580.360


0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

jhuang

XipengQiu

Turing

logicators

NakaCristo

3.7113.474

3.3333.3243.262



OutlineIntroduction




Task 4: Full

0

0.055

0.110

0.165

0.220

0.275

0.330

0.385

0.440

0.495

0.550

alpaca

jhuang

XipengQiu

Turing

zwner

logicators

NakaCristo

0.4310.4610.4630.4640.480

0.5020.505

Task4 - Accuracy

0

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

jhuang

XipengQiu

alpaca

logicators

Turing

zwner

NakaCristo

0.3230.3600.3610.3710.3770.3840.386


0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

alpaca

jhuang

XipengQiu

zwner

Turing

logicators

NakaCristo

3.4673.1683.1313.1283.0952.977

2.821



OutlineIntroduction




F1-measure per Category Size

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Task 1

jhuang

alpaca

XipengQiu

logicators

Dyakonov

Turing

arthur_general

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Task 2

Turing

XipengQiu

jhuang

logicators

arthur_general

NakaCristo

brouardc

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Task 3

jhuang

XipengQiu

logicators

Turing

NakaCristo

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80

Task 4

jhuang

XipengQiu

alpaca

logicators

Turing

zwner

NakaCristo


OutlineIntroduction




Scalability Tests - Time

Categories XipengQiu logicators Turing NakaCristo

122941m 12.5s 67m 1.4s 0m 13s 18.8s94m 8.2s 296m 45s 1258m 2s 42m 5s

100000m 58s 41m 20.7s 0m 8s 4m 11.5s60m 3s 153m 20s 779m 35s 27m 6s

10000m 7.6s 0m 35s 0m 0.7s 0m 24.2s0m 42s 1m 28s 9m 36s 0m 30.2s

1000m 4.7s 0m 0.2s 0m 13s 0m 2.5s0m 1.2s 0m 3.7s 0m 22.6s 0m 1.9s

First line = trainSecond line = classify


OutlineIntroduction




Scalability Tests - Memory

Categories XipengQiu logicators Turing NakaCristo

122942920 Mb 5700 Mb 200 Mb 921 Mb1382 Mb 3900 Mb 6000 Mb 996 Mb

100002400 Mb 4600 Mb 76 Mb 828 Mb1050 Mb 2950 Mb 5800 Mb 762 Mb

1000170 Mb 444 Mb < 50 Mb 110Mb149 Mb 434 Mb 1320 Mb 79 Mb

First line = trainSecond line = classify


OutlineIntroduction




I All the approaches we are aware of on large scale classificationtried by participants; we thus believe the results obtainedrepresent the state-of-the-art on this collection

I No complete hierarchical approaches, a la pachinko; ratherapproaches with a limited use of the hierarchy

I Best results are provided by both hierarchical and flatmethods; hierarchical methods seem faster

I Useful benchmark for future use; oracle will be available for awhile at the challenge site

I LSHTC-2 - What should be different? Multilabel? Originaltext? Other data?


OutlineIntroduction



Ofer Dekel, Joseph Keshet, and Yoram Singer.Large margin hierarchical classification.In ICML ’04: Proceedings of the twenty-first internationalconference on Machine learning, page 27, New York, NY,USA, 2004. ACM.

Tie-Yan Liu, Yiming Yang, Hao Wan, Hua-Jun Zeng, ZhengChen, and Wei-Ying Ma.Support vector machines classification with a very large-scaletaxonomy.SIGKDD Explorations, 7(1):36–43, 2005.

Tie-Yan Liu, Yiming Yang, Hao Wan, Qian Zhou, Bin Gao,Hua-Jun Zeng, Zheng Chen, and Wei-Ying Ma.An experimental study on large-scale web categorization.In Allan Ellis and Tatsuya Hagino, editors, WWW (Specialinterest tracks and posters), pages 1106–1107. ACM, 2005.


OutlineIntroduction



Gui-Rong Xue, Dikan Xing, Qiang Yang, and Yong Yu.Deep classification in large-scale text hierarchies.In SIGIR ’08: Proceedings of the 31st annual internationalACM SIGIR conference on Research and development ininformation retrieval, pages 619–626, New York, NY, USA,2008. ACM.

B. Kisiel Y. Yang, J. Zhang.A scalability analysis of classifiers in text.In ACM SIGIR Conference. ACM, 2003.

Yiming Yang and Xin Liu.A re-examination of text categorization methods.pages 42–49. ACM Press, 1999.


the large scale hierarchical text classification pascal...

Documents