classification theory

25
Classification Theory Modelling of Kernel Machine by Infinite and Semi-Infinite Programming 4th International Summer School Achievements and Applications of Contemporary Informatics, Mathematics and Physics National University of Technology of the Ukraine Kiev, Ukraine, August 5-16, 2009 1 August 7, 2009 Süreyya Özö ğür-Akyüz, Gerhard-Wilhelm Weber * Institute of Applied Mathematics, METU, Ankara, Turkey * Faculty of Economics, Management Science and Law, University of Siegen, Germany Center for Research on Optimization and Control, University of Aveiro, Portugal

Upload: ssa-kpi

Post on 18-Dec-2014

451 views

Category:

Education


0 download

DESCRIPTION

AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.

TRANSCRIPT

Page 1: Classification Theory

Classification Theory

Modelling of Kernel Machine by Infinite and Semi-Infinite Programming

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

1August 7, 2009

Infinite and Semi-Infinite Programming

Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber*

Institute of Applied Mathematics, METU, Ankara, Turkey

* Faculty of Economics, Management Science and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal

Page 2: Classification Theory

Motivation Prediction of Cleavage Sites

2August 7, 2009

mature partsignal part

γ

Page 3: Classification Theory

0 1 1 2( 1 )

log( 0 )

= == + ⋅ + ⋅ + + ⋅ = =

Kl lp2 l pP Y X x

β β x β x β xP Y X x

l

l

Logistic Regression

( 1,2,..., )=l N

3August 7, 2009

Page 4: Classification Theory

Maximum margin classifier:

Note: implies correct classification.

: ( , )i i iy w x bγ = ⋅ < > +

0 >iγ

Linear Classifiers

4August 7, 2009

( , ) 1j jy w x b⋅ < > + =

( , ) 1k ky w x b⋅ < > + =γγγγ

Page 5: Classification Theory

• The geometric margin:

2

2

wγ =

2

2max

w 2

2min w

Linear Classifiers

5August 7, 2009

2

Convex

Problem

2

2

,

subject to ( , ) 1 ( 1,2,..., )

min

⋅ + ≥ = l

w b

i i

w

y w x b i

Page 6: Classification Theory

Dual Problem:

1max ,α α α−∑ ∑

l l

i i j i j i jy y x x

Linear Classifiers

6August 7, 2009

1 , 1

1

subject to

max ,2

0,

0 ( 1,2,..., ).

α α α

α

α

= =

=

=

≥ =

∑ ∑

∑l

l

i i j i j i ji i j

i ii

i

y y x x

y

i

Page 7: Classification Theory

Dual Problem:

1max ( , )α α α κ−∑ ∑

l l

i i j i j i jy y x x

Linear Classifiers

7August 7, 2009

1 , 1

1

subject to

max ( , )2

0,

0 ( 1,2,..., ).

α α α κ

α

α

= =

=

=

≥ =

∑ ∑

∑l

l

i i j i j i ji i j

i ii

i

y y x x

y

i

kernel function

Page 8: Classification Theory

Soft Margin Classifier:

• Introduce slack variables to allow the margin constraints to be violated

subject to ( , ) 1 ,

0 ( 1,2,..., )

ξξ

⋅ + ≥ −≥ = l

i i iy w x b

i

Linear Classifiers

8August 7, 2009

0 ( 1,2,..., )ξ ≥ = li i

2

2 2

, ,1

min

subject to ( , ) 1 ,

0 ( 1,2,..., )

ξξ

ξξ

=

+

⋅ + ≥ −≥ =

∑l

l

iw b

i

i i i

i

w C

y w x b

i

Page 9: Classification Theory

• Projection of the data into a higher dimensional feature space.

• Mapping the input space into a new space

1 1( ,..., ) ( ) ( ( ),..., ( ))n Nx x x x x xφ φ φ= =a

X :F

Linear Classifiers

9August 7, 2009

)0(φ

)0(φ )0(φ)0(φ

)0(φ)(xφ

)(xφ )(xφ

)(xφ

)(xφ

)(xφ

Page 10: Classification Theory

set of hypotheses

dual representation 1

( ) ( ), ( ) .α φ φ=

= +∑l

i i ii

f x y x x b

∑=

+=N

iii bxwxf

1

,)()( φ

kernel function

Nonlinear Classifiers

10August 7, 2009

kernel function

Ex.: polynomial kernels

sigmoid Kernel

Gaussian (RBF) kernel

( , ) (1 )κ = + T kx z x z

( , ) tanh( )κ = +Tx z ax z b

2

2 2( , ) exp( / )κ σ= − −x z x z

Page 11: Classification Theory

• Based on the motivation of multiple kernel learning (MKL):

kernel functions

( ) ( )1

, ,κ β κ=

= ∑K

i j k k i jk

x x x x

( , ) :κ ⋅ ⋅l

( )1

0 1, , , 1β β=

≥ = =∑KK

l kkl K

(In-) Finite Kernel Learning

11August 7, 2009

• Semi-infinite LP formulation:

(SILP MKL) ( )

( )

,

1

1 1

max ,

such that 0 , 1,

with 0 and 0.

θ βθ θ β

β β

β α θ α α α=

= =

∈ ∈

≤ =

≥ ∀ ∈ ≤ ≤ =

∑ ∑1 ll

K

Kkk

Kk k i ik iS C y

R R

R

( ) ( ), 1 11

: ,2

α α α κ α= == −∑ ∑l l

k i j i j k i j ii j iS y y x x

Page 12: Classification Theory

2*2: exp (1( , , (1) ) )

ωω ω ωκ

− −= + +−i j

ix x T d

ij jx xx x

( ) : ( , , )κω ω= i jH x x

ex.:

homotopy

Infinite Kernel Learning Infinite Programming

12August 7, 2009

( )0) (1= + T di jH x x

2*2(1) exp

ω− −= i jx x

H

( , ) : ( , )) (,β κ β ωκ ωΩ

= ∫ i ji j dx xx x

Infinite Programming

Page 13: Classification Theory

[0,1]Ω =( , ) ( , , ) ( )βκ κ ω β ωΩ

= ∫i j i jx x x x d

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

Infinite Kernel Learning Infinite Programming

13August 7, 2009

Page 14: Classification Theory

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

(IP)[ ]( )

( ),

1

0

max , : 0,1 : monotonically increasing

subject to 1,

θ βθ θ β

β ω

∈ →

=∫ d

R R

Infinite Kernel Learning Infinite Programming

14August 7, 2009

( ) ( )0

1 11

, with 0 , 0.2

ω α α β ω θ α α α= =Ω − ≥ ∀ ∈ ≤ ≤ =

∑ ∑∫l ll

i i ii iS d C yR

( ) ( ), 11

, : , ,2

ω α α α κ ω== ∑l

i j i j i ji jS y y x x

1

: 0 and 0α α α=

= ∈ ≤ ≤ =

∑1l

li i

i

A C yR

( ),T ω α ( ) 11

: ,2

ω α α== −∑l

iiS

Page 15: Classification Theory

(IP)

infinite programming

dual of (IP):

(DIP)

( )

( ) ( ),

max , : a positive measure on

such that , 0 , ( ) 1.T d A d

θ βθ θ β

θ ω α β ω α β ωΩ Ω

∈ Ω

− ≤ ∀ ∈ =∫ ∫

R

( ),

min , : a positive measure on σ ρ

σ σ ρ∈ AR

Infinite Kernel Learning Infinite Programming

15August 7, 2009

(DIP)

• Duality Conditions: Let and be feasible for their respective problems, and complementary slack, so

has measure only where and

has measure only where

Then, both solutions are optimal for their respective problems.

( ) ( ) ( )such that - , 0 , 1.σ ω α ρ α ω ρ α≥ ∀ ∈Ω =∫ ∫A AT d d

( , )θ β ( , )σ ρ

β ( , )A

T dσ ω α ρ= ∫ρ ( , ) .θ ω α β

Ω= ∫ T d

Page 16: Classification Theory

• The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic is a step function).

• Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizersof

β

Infinite Kernel Learning Infinite Programming

16August 7, 2009

• Lower Level Problem: For a given parameter we consider

(LLP)

( )( ) ( ), , : , ( ).σ ρ ω σ ω α ρ α= − ∫Ag T d

( , ),σ ρ

( )( )min , , subject to .ω

σ ρ ω ω ∈ Ωg

Page 17: Classification Theory

Infinite Kernel Learning Infinite Programming

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

17August 7, 2009

• “ finite optimization”

Page 18: Classification Theory

22

21 ( )

e.g., ( ;( , )) exp2 2

ω µω µ σσ π σ

− −=f

exp( ), 0λ λω ω− ≥

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

Infinite Kernel Learning Infinite Programming

18August 7, 2009

exp( ), 0( ; )

0, 0

λ λω ωω λ

ω− ≥

= <f

1

1 1

1 10

(1 )( ;( , ))

(1 )β

α β

αω ωω α β −

− −

−=−∫ u u du

f

( ) ( )( ;( , ))

ω ωω − − −=−

H a H ba b

b af

• “ finite optimization”

Page 19: Classification Theory

( ,.)g x

( , )⋅%g x

Infinite Kernel Learning Reduction Ansatz

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

19August 7, 2009

ΩΩΩΩ

( , ) 0

min ( , ) 0y I

g x y y I

g x y∈

≥ ∀ ∈⇔ ≥

jy jy% py

( ,.)g x

( ) implicit functiona jx y x

Page 20: Classification Theory

min ( )

subject to ( ) : ( , ( )) 0 ( : 1,2, , )= ≥ ∈ = Kj j

f x

g x g x y x j J p

based on the reduction ansatz :

(( , ), )σ ρ ⋅g

Infinite Kernel Learning Reduction Ansatz

20August 7, 2009

(( , ), )σ ρ ⋅g

( , )σ ρ

( , )σ ρ

topology( , )ω ω σ ρ=%

ω•ω

Page 21: Classification Theory

0 10 1ι< <= < =K

t t

ttt

regularization2

2, [0, 01] 0

min su ( )p ( )θ β

µ ωθ β ω β∈

− + ∫ ∫t

ttd

ddt

dd

dt

subject to the constraints

Infinite Kernel Learning Regularization

21August 7, 2009

1

10 0

110

( ) ( )1

( ) )(

ν ν

ν ν

νν ν ν ν

β ω β ωβ ω β ω

+

+

+ +

≈ =− −

∫ ∫∫ ∫

t t

t t

t

d dd

d dd tt tt t

2 1

12 2 1 1

210

1 1( ) ( )

( )

ν ν

νν ν

ν ν ν ν

ν ν

β ω β ω

β ω

+ +

++ + +

+

−− −

≈−

∫ ∫

t t

tt t

d dt t t td

dt tdt

Page 22: Classification Theory

Radon measure: measure on the -algebra of Borel sets of E that is

locally finite andinner regular.

(E,d): metric space

neighbourhood of measure :( ) : Η E Eset of Radon measures on

ρ

σ

inner regularity

Infinite Kernel Learning Topology

22August 7, 2009

ρ

( ) : ( )

( ( ))

( ( ))

ρ ε µ µ ρ ε

= ∈ − <

′′

∫ ∫f

A A

f

B Η E fd fd

Η E

Η E

dual space of continuous bounded functions,

( )νµ K

ν ⊂K E: compact set

Page 23: Classification Theory

Def.: Basis of neighbourhoodof a measure :

Def.: Prokhorov metric:

( ) ( 1,2,..., ) .µ ρ µ ε∈ − < =∫ ∫i iE EΗ E f d f d i n

1( ,..., ( ( )) ; 0)ρ ε′∈ >nf f Η E

Infinite Kernel Learning Topology

23August 7, 2009

0( , ) : inf 0 | ( ) ( ) ( ) ( ) ( : ) ,

: | ( , ) .

ε εε

ε

µ ρ ε µ ρ ε ρ µ ε

ε

= ≥ ≤ + ≤ +

= ∈ <

d A A A A A

A x E d x A

and closed

where

0( ) : ( ) ( , ) .δ ρ µ ρ µ δ= ∈ <B Η E d

Open -neighbourhood of a measureδ :ρ

Page 24: Classification Theory

Infinite Kernel Learning Numerical Results

24August 7, 2009

Page 25: Classification Theory

ReferencesÖzöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B.,Pattern analysis for the prediction of eukoryatic pro

peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,

10 (May 2009) 2388-2394.

Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,

Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings

1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4

(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

25August 7, 2009

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

appear in the special issue of OMS (Optimization Software and Application) at the occasion of International

Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.

(guest ed.).

Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,

METU, submitted to JOGO (Journal of Global Optimization).