transductive inference for text classification using support vector machines

34
Transductive Inference for Text Classification using Support Vector Machines Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany [email protected] Presented by Yueng-Tien ,Lo 1

Upload: tevin

Post on 22-Mar-2016

61 views

Category:

Documents


4 download

DESCRIPTION

Transductive Inference for Text Classification using Support Vector Machines. Thorsten Joachims Universit at Dortmund, LS VIII 44221 Dortmund, Germany [email protected]. Presented by Yueng-Tien ,Lo. outline. Introduction Text Classification - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Transductive  Inference for Text Classification using Support Vector Machines

1

Transductive Inference for Text Classification using Support Vector Machines

Thorsten JoachimsUniversit at Dortmund, LS VIII44221 Dortmund, Germany

[email protected]

Presented by Yueng-Tien ,Lo

Page 2: Transductive  Inference for Text Classification using Support Vector Machines

2

outline

• Introduction• Text Classification• Transductive Support Vector Machines• What Makes TSVMs Especially well Suited for Text

Classification• Conclusions and Outlook

Page 3: Transductive  Inference for Text Classification using Support Vector Machines

3

Introduction

• Over the recent years, text classification has become one of the key techniques for organizing online information. Ex. filtering spam from people's email, or learning users' newsreading preferences

• The work presented here tackles the problem of learning from small training samples by taking a transductive, instead of an inductive approach.

Page 4: Transductive  Inference for Text Classification using Support Vector Machines

4

Introduction

• In the inductive setting the learner tries to induce adecision function which has a low error rate on the whole distribution of examples for the particular learning task.

• In many situations we do not care about the particular decision function, but rather that we classify a given set of examples (i.e. a test set) with as few errors as possible.

Page 5: Transductive  Inference for Text Classification using Support Vector Machines

5

Introduction

• Relevance Feedback :– The user is interested in a good classification of the test set

into those documents relevant or irrelevant to the query.• Netnews Filtering :– Given the few training examples the user labeled on

previous days, he or she wants today's most interesting articles.

• Reorganizing a document collection :

Page 6: Transductive  Inference for Text Classification using Support Vector Machines

6

Text Classification

• The goal of text classification is the automatic assignment of documents to a fixed number of semantic categories.

• Each such problem answers the question of whether or not a document should be assigned to a particular category.

Page 7: Transductive  Inference for Text Classification using Support Vector Machines

7

• Documents, which typically are strings of characters, have to be transformed into a representation suitable for the learning algorithm and the classification task.

Page 8: Transductive  Inference for Text Classification using Support Vector Machines

8

Transductive Support Vector Machines

Page 9: Transductive  Inference for Text Classification using Support Vector Machines

9

Transductive Support Vector Machines

Page 10: Transductive  Inference for Text Classification using Support Vector Machines

10

Transductive Support Vector Machines

Page 11: Transductive  Inference for Text Classification using Support Vector Machines

11

Transductive Support Vector Machines

Page 12: Transductive  Inference for Text Classification using Support Vector Machines

12

Transductive Support Vector Machines

• For a learning task the learner is given a hypothesis space of functions and an i.i.d. sample of n training examples

• Each training example consists of a document vector and a binary label In contrast to the

inductive setting, the learner is also given an i.i.d. sample of test examples

from the same distribution

)()|(),( xPxyPyxP

1 ,1- : h

)2( ),(,),,(),,( 2211 nn yxyxyx

)3( ,,, ***21 kxxx

LH

1 ,1- yx

Page 13: Transductive  Inference for Text Classification using Support Vector Machines

13

Transductive Support Vector Machines

• The transductive learner aims to selects a function from using and so that the expected number of erroneous predictions

on the test examples is minimized.• is zero if otherwise it is one.

k

ikkiiL yxdPyxdPyxh

kLR

1

**11

** ),(),(),(1)(

ba,

testtrainL SSLh , trainS testSL

H

ba

Page 14: Transductive  Inference for Text Classification using Support Vector Machines

14

Transductive Support Vector Machines

• Bounds on the relative uniform deviation of training error

and test error

• With probability

• where the confidence interval depends on the number of training examples n, the number of test examples k, and the VC-Dimension d of H.

)4( ),(1)(1

n

iiitrain yxh

nhR

)6( ),,,()()( dknhRhR traintest 1

),,,( dkn

)5( ),(1)(1

*

k

i

trueiitest yxh

khR

Page 15: Transductive  Inference for Text Classification using Support Vector Machines

15

Transductive Support Vector Machines

• What information do we get from studying the test sample (3) and how can we use it?

• The training and the test sample split the hypothesis space into a finite number of equivalence classes .

• This reduces the learning problem from finding a function in the possibly infinite set to finding one of finitely many equivalence classes

• We can use these equivalence classes to build a structure of increasing VC-Dimension for structural risk minimization

''' 21 HHH

H'H

H'H

Page 16: Transductive  Inference for Text Classification using Support Vector Machines

16

Transductive Support Vector Machines

• Vapnik shows that with the size of the margin we can control the maximum number of equivalence classes (i. e. the VC-Dimension)

Page 17: Transductive  Inference for Text Classification using Support Vector Machines

17

Theorem 1

• Consider hyperplanes as hypothesis space . If the attribute vectors of a training sample (2) and a test sample (3) are contained in a ball of diameter D, then there are at most

equivalence classes which contain a separating

hyper-plane with

1,min,1lnexp 2

2

Dad

dkndN r

bxbx jkji

ni

*11

bxsignxh )(

H

Page 18: Transductive  Inference for Text Classification using Support Vector Machines

18

Transductive Support Vector Machines

• Note that the VC-Dimension does not necessarily depend on the number of features, but can be much lower than the dimensionality of the space.

• Structural risk minimization tells us that we get the smallest bound on the test error if we select the equivalence class from the structure element which minimizes (6).

'iH

Page 19: Transductive  Inference for Text Classification using Support Vector Machines

19

Transductive SVM (lin. sep. case) Optimization Problem 1

• Minimize over byy n ,,,, **1

1:

1: 21

**1

1

2

bxy

bxytosubject

jkj

iini

j

Page 20: Transductive  Inference for Text Classification using Support Vector Machines

20

Transductive SVM (non-sep. case) Optimization Problem 2

• Minimize over

and are parameters set by the user.

0:

0:

1:

1:

21

*1

1

***1

1

0

**

0

2

jkj

ini

jjkj

iiini

k

jj

n

ii

bxy

bxytosubject

CC

j

**11

**1 ,,,,,,,,,, knn byy

*CC

Page 21: Transductive  Inference for Text Classification using Support Vector Machines

21

What Makes TSVMs Especially well Suited for Text Classification

• The text classification task is characterized by a special set of properties.– High dimensional input space:

• each(stemmed) word is a feature– Document vectors are sparse:

• For each document, the corresponding document vector contains few entries that are not zero.

– Few irrelevant features:• Experiments in [Joachims, 1998] suggest that most words are

relevant.

Page 22: Transductive  Inference for Text Classification using Support Vector Machines

22

What Makes TSVMs Especially well Suited for Text Classification

• But how can TSVMs be any better?

• when asking the search engine Altavista about all documents containing the words pepper and salt, it returns 327,180 web pages.

• When asking for the documents with the words pepper and physics, we get only 4,220 hits, although physics is a more popular word on the web than salt.

• And it is this co-occurrence information that TSVMs exploit as prior knowledge about the learning task.

Page 23: Transductive  Inference for Text Classification using Support Vector Machines

23

What Makes TSVMs Especially well Suited for Text Classification

• Imagine document D1 was given as a training example for class A and document D6 was given as a training example for class B.

• How should we classify documents D2 to D4 (the test set)?

Page 24: Transductive  Inference for Text Classification using Support Vector Machines

24

What Makes TSVMs Especially well Suited for Text Classification

• The reason we choose this classification of the test data over the others stems from our prior knowledge about the properties of text and common text classification tasks.

Page 25: Transductive  Inference for Text Classification using Support Vector Machines

25

What Makes TSVMs Especially well Suited for Text Classification

• Note again that we got to this classification by studying the location of the test examples, which is not possible for an inductive learner.

• We see that the maximum margin bias reflects our prior knowledge about text classification well.

• By analyzing the test set, we can exploit this prior knowledge for learning.

Page 26: Transductive  Inference for Text Classification using Support Vector Machines

26

Solving the Optimization Problem

• Training a transductive SVM means solving the (partly) combinatorial optimization problem OP2

• The key idea of the algorithm is that it begins with a labeling of the test data based on the classificationof an inductive SVM.

• Then it improves the solution by switching the labels of test examples so that the objective function decreases.

Page 27: Transductive  Inference for Text Classification using Support Vector Machines

27

Solving the Optimization Problem

• It starts with training an inductive SVM on the training data and classifying the test data accordingly.

• Then it uniformly increases the influence of the test examples by incrementing the cost-factors and up to the user defined value of (loop 1).

• While the criterion in the condition of loop 2 identifies two examples for which changing the class labels leads to a decrease in the current objective function, these examples are switched.

*C

*C

*C

Page 28: Transductive  Inference for Text Classification using Support Vector Machines

28

Algorithm for training Transductive Support Vector Machines

• Input: – training examples– Test examples

• Parameters:– : parameters from OP(2)– : number of test examples to be assigned to class +

• Output: predicted labels of the test examples

Classify the test examples using The test exampleswith the highest value of are assigned to the classthe remaining test examples are assigned to class

),(,),,(),,( 2211 nn yxyxyx

*** ,,,21 kxxx

*,CCnum

; 0,0,,,,,__:_,,, 11Cyxyxqpsvmsolveb nn

; 10:

; 10:

5*

5*

numknumC

C

b,

bx j **

*** ,,,21 kyyy

num 1: * jy

1: * jy

Page 29: Transductive  Inference for Text Classification using Support Vector Machines

29

Algorithm for training Transductive Support Vector Machines

• While {

while { }

}

**** CCCC

; ,,,,,,,,__:,,, ******1

*111 CCCyxyxyxyxqpsvmsolveb kknn

2&0&0&0:, ****** lmlmlm yylm

; ,,,,,,,,__:,,,

:

:

******1

*

**

**

111

CCCyxyxyxyxqpsvmsolveb

yy

yy

kknn

ll

mm

; , 2min:

; , 2min:***

***

CCC

CCC

; ,, **1 kyyreturn

Page 30: Transductive  Inference for Text Classification using Support Vector Machines

30

Inductive SVM (primal) Optimization Problem 3

• The function solve_svm_qp refers to quadratic programs of the following type.

• Minimize over *,,, b

**

1

1

1:

**

1:

**

1

2

1:

1:

21

**

jjkj

iiini

yjj

yjj

n

ii

bxy

bxytosubject

CCC

j

jj

Page 31: Transductive  Inference for Text Classification using Support Vector Machines

31

Theorem 2

• Algorithm 1 converges in a finite number of steps.

• The condition in loop 2 requires that the examples to be switched have different class labels.

0** lm yy

Page 32: Transductive  Inference for Text Classification using Support Vector Machines

32

Theorem 2

• Let so that we can write.

''21

)2()2(2121

21

****

0

2

****

0

2

****

0

2

1:

**

1:

**

0

2

**

lm

n

ii

lm

n

ii

lm

n

ii

yjj

yjj

n

ii

CCC

CCC

CCC

CCCjj

1* my

Page 33: Transductive  Inference for Text Classification using Support Vector Machines

33

Theorem 2

• The inequality holds due to the selection criterion in loop 2, since and

• This means that loop 2 is exited after a finite number of iterations, since there is only a finite number of permutations of the test examples.

• Loop 1 also terminates after a finite number of iterations, since is bounded by .*

C C

*** 0,2max' lmm *** 0,2max' mll

Page 34: Transductive  Inference for Text Classification using Support Vector Machines

34

Conclusions and Outlook

• By taking a transductive instead of an inductive approach, the test set can be used as an additional source of information about margins.

• On all data sets the transductive approach showed improvements over the currently best performing method, most substantially for small training samples and large test sets