variations of minimax probability machine

35
Variations of Minimax Probability Machine Huang, Kaizhu 2003-09-16

Upload: tiva

Post on 23-Jan-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Variations of Minimax Probability Machine. Huang, Kaizhu 2003-09-16. Overview. Classification types, problems Minimax Probability Machine Main work Biased Minimax Probability Machine Minimum Error Minimax probability Machine Experiments Future work. Classification. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Variations of  Minimax Probability Machine

Variations of Minimax Probability Machine

Huang, Kaizhu

2003-09-16

Page 2: Variations of  Minimax Probability Machine

Overview

• Classification– types, problems

• Minimax Probability Machine• Main work

– Biased Minimax Probability Machine– Minimum Error Minimax probability Machine

• Experiments• Future work

Page 3: Variations of  Minimax Probability Machine

Classification

x

y

bT za

bT xa

bT ya

Page 4: Variations of  Minimax Probability Machine

Types of Classifiers

• Generative Classifiers

• Discriminative Classifiers

Page 5: Variations of  Minimax Probability Machine

Classification—Generative Classifier

x

y

bT zabT xa

bT ya

p1

p2

Generative model assumes specific distributions on two class of data and uses these distributions to construct classification boundary.

Page 6: Variations of  Minimax Probability Machine

Problems of Generative Model

• All models are wrong, but some are useful –by Box

• The distributional assumptions lack the generality and are invalidate in real cases

It seems that Generative model should not assume specific model on the data

Page 7: Variations of  Minimax Probability Machine

Classification—Discriminative Classifier:SVM

x

y

bT zabT xa

bT ya

support vectors

Page 8: Variations of  Minimax Probability Machine

Problems of SVM

x

y

bT za

bT xa

bT ya

support vectors

It seems that SVM should consider the distribution of the data

Page 9: Variations of  Minimax Probability Machine

SVM GMIt seems that Generative model should not assume specific models on the data

It seems that SVM should consider the distribution of the data

Page 10: Variations of  Minimax Probability Machine

Minimax Probability Machine (MPM)

• Features:– With distribution considerations

– With no specific distribution assumption

Page 11: Variations of  Minimax Probability Machine

Minimax Probability Machine

• With distribution considerations– Assume the mean and covariance directly

estimated from data reliably represent the real mean of covariance

• Without specific distribution assumption– Directly construct classifiers from data

Page 12: Variations of  Minimax Probability Machine

Minimax Probability Machine (Formulation)

}Pr{inf

}Pr{infs.t.max

),(~

),(~,,

b

b

T

T

b

ya

xa

y

x

yy

xx0a

Objective

Page 13: Variations of  Minimax Probability Machine

Minimax Probability Machine (Cont’d)

• MPM problem leads to Second Order Cone Programming

• Dual Problem

• Geometric interpretation

1)(s.t.min22

21

21

yxaaa yxa

T

22,,

,:min 21

21

vuvyux yxvu

opt2

u

opt2

v

Page 14: Variations of  Minimax Probability Machine

Minimax Probability Machine (Cont’d)

• Summary – Distribution-free– In general case, the accuracy of classification of

the future data is bounded by α– Demonstrated to achieve comparative

performance with the SVM.

Page 15: Variations of  Minimax Probability Machine

Problems of MPM

}Pr{inf

}Pr{infs.t.max

),(~

),(~,,

b

b

T

T

b

ya

xa

y

x

yy

xx0a

1. In real cases, the importance for two classes is not always the same, which implies the lower bound α for two classes is not necessarily the same. – Motivate Biased Minimax Probability Machine

2. On the other hand, it seems that no reason exists that these equal bounds are required to be equal. The derived model is thus non-optimal in this sense.– Motivate Minimum Error Minimax Probability Machine

Page 16: Variations of  Minimax Probability Machine

Biased Minimax Probability Machine

• Observation: In diagnosing a severe epidemic disease, misclassification of the positive class causes more serious consequence than misclassification of the negative class.

• A typical setting: as long as the accuracy of classification of the less important maintains at an acceptable level ( specified by the real practitioners), the accuracy of classification of the important class should be as high as possible.

Page 17: Variations of  Minimax Probability Machine

• Objective

• the same meaning as previous

• an acceptable accuracy level

• Equivalently

Biased Minimax Probability Machine (BMPM)

}Pr{ inf

}Pr{infs.t.max

),(~

),(~,,,

b

b

T

T

b

ya

xa

y

x

yy

xx0a

),,(~ xxx ),(~ yyy

)()( 1)(

)()(1s.t.)(max,,,

yxa

aaaa yx0a

T

TT

b

11 )(,)(

Page 18: Variations of  Minimax Probability Machine

• Objective

• Equivalently,

• Equivalently,

BMPM (Cont’d)

)()( 1)(

)()(1s.t.)(max,,,

yxa

aaaa yx0a

T

TT

b

)()(,1)(s.t.)(1

max),(

yxaaa

aa

x

y

0a

T

T

T

1)(s.t.)(1

max

yxaaa

aa

x

y

0a

T

T

T

Page 19: Variations of  Minimax Probability Machine

BMPM (Cont’d)

• Parametric Method1. Find by solving

2. Update

• Equivalently

• Least-squares approach

1)(s.t.)(1max

yxaaaaa xy0a

TTT

1)(s.t.)(min

yxaaaaa xy0a

TTT

a

aa

aa

x

y

T

T)(1

Page 20: Variations of  Minimax Probability Machine

Biased Minimax Probability Machine

x

y

bmpmTbmpm bxa

bmpmT bbmpm ya

bT za

MPM

bmpmTbmpm bza

BMPM

at an acceptable accuracy level

Page 21: Variations of  Minimax Probability Machine

Minimum Error Minimax Probability Machine

-4 -2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

x

1-1-

p1

p2

decision plane when =

}Pr{inf

}Pr{infs.t.max

),(~

),(~,,

b

b

T

T

b

ya

xa

y

x

yy

xx0a

.}Pr{inf

,}Pr{inf

s.t.,)1(max

),(~

),(~

,,,

b

b

T

T

b

ya

xa

y

x

yy

xx

0a

-4 -2 0 2 4 6 80

0.1

0.2

0.3

0.4

0.5

x

1- 1-

p1

p2

optimal decision plane

MPM MEMPM

The MEMPM achieves the distribution-free Bayes optimal hyperplane in the worst-case setting.

Page 22: Variations of  Minimax Probability Machine

Minimum Error Minimax Probability Machine

• MEMPM achieves the Bayes optimal hyerplane when we assume some specific distribution, e.g. Gaussian distribution on data.

Lemma : If the distribution of the normalized random variable

is independent of a , the classifier derived by MEMPM will exactly represent the real Bayes optimal hyerplane.

Page 23: Variations of  Minimax Probability Machine

• Objective

• Equivalently

MEMPM (Cont’d)

1)(

)()(1

s.t.,)1(max,,

yxa

aaaa yx

0a

T

TT

1)(

)()(1

s.t.,1)(

1

1)(min

22),(),(

yxa

aaaa yx

0a

T

TT

Page 24: Variations of  Minimax Probability Machine

• Objective

• Line search + sequential BMPM method

MEMPM (Cont’d)

aa

aa

yxa

x

y

0a

T

T

T

where)(1

)(

,125.0

,1)(

s.t.,)1(1)(

)(max

2

2

,,

Page 25: Variations of  Minimax Probability Machine

• Kernelized BMPM

• where

Kernelized Version

function mappinga ,RR:),,)((~)(

),,)((~)(

)(

______)(

______

fnwhere

y

x

yyy

xxx

.1))()((s.t.)(1

max____________

)(

)(

yxaaa

aa

x

y

0a

T

T

T

yx

yxaN

jjj

N

iii

11

)()(

.))()()()()((1

,))()()()()((1

,)(1

)(

,)(1

)(

____________

1)(

____________

1)(

1

______1

______

Tj

N

jj

Ti

N

ii

N

jj

N

ii

N

N

N

N

yyyy

xxxx

yy

xx

y

x

y

x

yy

xx

y

x

Page 26: Variations of  Minimax Probability Machine

• Kernelized BMPM

• where

• and

Kernelized Version (Cont’d)

.1)~~

(s.t.~~1

~~1)(1

max

yx

xxx

yyy

0wkkw

wKKw

wKKwT

TT

TT

N

N

TNN ],,,,,[ 11 yx

w

,)(1~

,)(1~

1

1

y

x

zyKk

zxKk

yy

xx

N

jiji

N

jiji

N

N

.,,1,

,,1,

yxx

x

xy

xz

NNNi

Ni

Ni

i

i

T

N

TN

yy

xx

y

x

k1K

k1K

K

KK

y

x ~

~

~

~~

)()( jT

iij zzK

y

x

x

yzwxzwzN

iiiN

N

iii bKKf

1

**

1

* ),(),()(

Page 27: Variations of  Minimax Probability Machine

Illustration of kernel methods

Linear

Kernel

Page 28: Variations of  Minimax Probability Machine

Experimental results (BMPM)

• Five benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Sonar

• Procedure – 5-fold cross validation– Linear– Gaussian Kernel

• Parameter setting– pima – others

%0.20

%0.60

Page 29: Variations of  Minimax Probability Machine

Experimental results

Page 30: Variations of  Minimax Probability Machine

Experiments for MEMPM

• Six benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Heart, Vote

• Procedure – 10-fold cross validation– Linear

– Gaussian Kernel

Page 31: Variations of  Minimax Probability Machine

Results for MEMPM

Page 32: Variations of  Minimax Probability Machine

Experiments for MEMPM

• Six benchmark datasets– Twonorm, Breast, Ionosphere, Pima, Heart, Vote

• Procedure – 10-fold cross validation– Linear

– Gaussian Kernel

Page 33: Variations of  Minimax Probability Machine

Results for MEMPM

Page 34: Variations of  Minimax Probability Machine

Conclusions and Future works• Conclusions

– First quantitative method to analyze the biased classification task

– Minimize the classification error rate in the worst case

• Future works– Improve the efficiency of algorithm, especially in the

kernelized version• Any decomposed method?

– Robust estimation – Relation between VC bound in Support Vector Machine

and bound in MEMPM– Regression model?

Page 35: Variations of  Minimax Probability Machine

Reference

• Popescu, I. and Bertsimas, D. (2001). Optimal inequalities in probability theory: A convex optimization approach. Technical Report TM62, INSEAD.

• Lanckriet, G. R. G., El Ghaoui, L., and Jordan, M. I. (200a). Minimax probability machine. In Advances in Neural Information Processing Systems (NIPS) 14, Cambridge, MA. MIT Press.

• Kaizhu Huang, Haiqin Yang, Irwin King, R. Michael Lyu, and Laiwan Chan. Biased minimax probability machine. 2003.

• Kaizhu Huang, Haiqin Yang, Irwin King, R. Michael Lyu, and Laiwan Chan. Minimum error minimax probability machine. 2003.