classification theory

Post on 18-Dec-2014

451 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

AACIMP 2009 Summer School lecture by Gerhard Wilhelm Weber. "Modern Operational Research and Its Mathematical Methods" course.

TRANSCRIPT

Classification Theory

Modelling of Kernel Machine by Infinite and Semi-Infinite Programming

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

1August 7, 2009

Infinite and Semi-Infinite Programming

Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber*

Institute of Applied Mathematics, METU, Ankara, Turkey

* Faculty of Economics, Management Science and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal

Motivation Prediction of Cleavage Sites

2August 7, 2009

mature partsignal part

γ

0 1 1 2( 1 )

log( 0 )

= == + ⋅ + ⋅ + + ⋅ = =

Kl lp2 l pP Y X x

β β x β x β xP Y X x

l

l

Logistic Regression

( 1,2,..., )=l N

3August 7, 2009

Maximum margin classifier:

Note: implies correct classification.

: ( , )i i iy w x bγ = ⋅ < > +

0 >iγ

Linear Classifiers

4August 7, 2009

( , ) 1j jy w x b⋅ < > + =

( , ) 1k ky w x b⋅ < > + =γγγγ

• The geometric margin:

2

2

wγ =

2

2max

w 2

2min w

Linear Classifiers

5August 7, 2009

2

Convex

Problem

2

2

,

subject to ( , ) 1 ( 1,2,..., )

min

⋅ + ≥ = l

w b

i i

w

y w x b i

Dual Problem:

1max ,α α α−∑ ∑

l l

i i j i j i jy y x x

Linear Classifiers

6August 7, 2009

1 , 1

1

subject to

max ,2

0,

0 ( 1,2,..., ).

α α α

α

α

= =

=

=

≥ =

∑ ∑

∑l

l

i i j i j i ji i j

i ii

i

y y x x

y

i

Dual Problem:

1max ( , )α α α κ−∑ ∑

l l

i i j i j i jy y x x

Linear Classifiers

7August 7, 2009

1 , 1

1

subject to

max ( , )2

0,

0 ( 1,2,..., ).

α α α κ

α

α

= =

=

=

≥ =

∑ ∑

∑l

l

i i j i j i ji i j

i ii

i

y y x x

y

i

kernel function

Soft Margin Classifier:

• Introduce slack variables to allow the margin constraints to be violated

subject to ( , ) 1 ,

0 ( 1,2,..., )

ξξ

⋅ + ≥ −≥ = l

i i iy w x b

i

Linear Classifiers

8August 7, 2009

0 ( 1,2,..., )ξ ≥ = li i

2

2 2

, ,1

min

subject to ( , ) 1 ,

0 ( 1,2,..., )

ξξ

ξξ

=

+

⋅ + ≥ −≥ =

∑l

l

iw b

i

i i i

i

w C

y w x b

i

• Projection of the data into a higher dimensional feature space.

• Mapping the input space into a new space

1 1( ,..., ) ( ) ( ( ),..., ( ))n Nx x x x x xφ φ φ= =a

X :F

Linear Classifiers

9August 7, 2009

)0(φ

)0(φ )0(φ)0(φ

)0(φ)(xφ

)(xφ )(xφ

)(xφ

)(xφ

)(xφ

set of hypotheses

dual representation 1

( ) ( ), ( ) .α φ φ=

= +∑l

i i ii

f x y x x b

∑=

+=N

iii bxwxf

1

,)()( φ

kernel function

Nonlinear Classifiers

10August 7, 2009

kernel function

Ex.: polynomial kernels

sigmoid Kernel

Gaussian (RBF) kernel

( , ) (1 )κ = + T kx z x z

( , ) tanh( )κ = +Tx z ax z b

2

2 2( , ) exp( / )κ σ= − −x z x z

• Based on the motivation of multiple kernel learning (MKL):

kernel functions

( ) ( )1

, ,κ β κ=

= ∑K

i j k k i jk

x x x x

( , ) :κ ⋅ ⋅l

( )1

0 1, , , 1β β=

≥ = =∑KK

l kkl K

(In-) Finite Kernel Learning

11August 7, 2009

• Semi-infinite LP formulation:

(SILP MKL) ( )

( )

,

1

1 1

max ,

such that 0 , 1,

with 0 and 0.

θ βθ θ β

β β

β α θ α α α=

= =

∈ ∈

≤ =

≥ ∀ ∈ ≤ ≤ =

∑ ∑1 ll

K

Kkk

Kk k i ik iS C y

R R

R

( ) ( ), 1 11

: ,2

α α α κ α= == −∑ ∑l l

k i j i j k i j ii j iS y y x x

2*2: exp (1( , , (1) ) )

ωω ω ωκ

− −= + +−i j

ix x T d

ij jx xx x

( ) : ( , , )κω ω= i jH x x

ex.:

homotopy

Infinite Kernel Learning Infinite Programming

12August 7, 2009

( )0) (1= + T di jH x x

2*2(1) exp

ω− −= i jx x

H

( , ) : ( , )) (,β κ β ωκ ωΩ

= ∫ i ji j dx xx x

Infinite Programming

[0,1]Ω =( , ) ( , , ) ( )βκ κ ω β ωΩ

= ∫i j i jx x x x d

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

Infinite Kernel Learning Infinite Programming

13August 7, 2009

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

(IP)[ ]( )

( ),

1

0

max , : 0,1 : monotonically increasing

subject to 1,

θ βθ θ β

β ω

∈ →

=∫ d

R R

Infinite Kernel Learning Infinite Programming

14August 7, 2009

( ) ( )0

1 11

, with 0 , 0.2

ω α α β ω θ α α α= =Ω − ≥ ∀ ∈ ≤ ≤ =

∑ ∑∫l ll

i i ii iS d C yR

( ) ( ), 11

, : , ,2

ω α α α κ ω== ∑l

i j i j i ji jS y y x x

1

: 0 and 0α α α=

= ∈ ≤ ≤ =

∑1l

li i

i

A C yR

( ),T ω α ( ) 11

: ,2

ω α α== −∑l

iiS

(IP)

infinite programming

dual of (IP):

(DIP)

( )

( ) ( ),

max , : a positive measure on

such that , 0 , ( ) 1.T d A d

θ βθ θ β

θ ω α β ω α β ωΩ Ω

∈ Ω

− ≤ ∀ ∈ =∫ ∫

R

( ),

min , : a positive measure on σ ρ

σ σ ρ∈ AR

Infinite Kernel Learning Infinite Programming

15August 7, 2009

(DIP)

• Duality Conditions: Let and be feasible for their respective problems, and complementary slack, so

has measure only where and

has measure only where

Then, both solutions are optimal for their respective problems.

( ) ( ) ( )such that - , 0 , 1.σ ω α ρ α ω ρ α≥ ∀ ∈Ω =∫ ∫A AT d d

( , )θ β ( , )σ ρ

β ( , )A

T dσ ω α ρ= ∫ρ ( , ) .θ ω α β

Ω= ∫ T d

• The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic is a step function).

• Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizersof

β

Infinite Kernel Learning Infinite Programming

16August 7, 2009

• Lower Level Problem: For a given parameter we consider

(LLP)

( )( ) ( ), , : , ( ).σ ρ ω σ ω α ρ α= − ∫Ag T d

( , ),σ ρ

( )( )min , , subject to .ω

σ ρ ω ω ∈ Ωg

Infinite Kernel Learning Infinite Programming

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

17August 7, 2009

• “ finite optimization”

22

21 ( )

e.g., ( ;( , )) exp2 2

ω µω µ σσ π σ

− −=f

exp( ), 0λ λω ω− ≥

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

Infinite Kernel Learning Infinite Programming

18August 7, 2009

exp( ), 0( ; )

0, 0

λ λω ωω λ

ω− ≥

= <f

1

1 1

1 10

(1 )( ;( , ))

(1 )β

α β

αω ωω α β −

− −

−=−∫ u u du

f

( ) ( )( ;( , ))

ω ωω − − −=−

H a H ba b

b af

• “ finite optimization”

( ,.)g x

( , )⋅%g x

Infinite Kernel Learning Reduction Ansatz

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

19August 7, 2009

ΩΩΩΩ

( , ) 0

min ( , ) 0y I

g x y y I

g x y∈

≥ ∀ ∈⇔ ≥

jy jy% py

( ,.)g x

( ) implicit functiona jx y x

min ( )

subject to ( ) : ( , ( )) 0 ( : 1,2, , )= ≥ ∈ = Kj j

f x

g x g x y x j J p

based on the reduction ansatz :

(( , ), )σ ρ ⋅g

Infinite Kernel Learning Reduction Ansatz

20August 7, 2009

(( , ), )σ ρ ⋅g

( , )σ ρ

( , )σ ρ

topology( , )ω ω σ ρ=%

ω•ω

0 10 1ι< <= < =K

t t

ttt

regularization2

2, [0, 01] 0

min su ( )p ( )θ β

µ ωθ β ω β∈

− + ∫ ∫t

ttd

ddt

dd

dt

subject to the constraints

Infinite Kernel Learning Regularization

21August 7, 2009

1

10 0

110

( ) ( )1

( ) )(

ν ν

ν ν

νν ν ν ν

β ω β ωβ ω β ω

+

+

+ +

≈ =− −

∫ ∫∫ ∫

t t

t t

t

d dd

d dd tt tt t

2 1

12 2 1 1

210

1 1( ) ( )

( )

ν ν

νν ν

ν ν ν ν

ν ν

β ω β ω

β ω

+ +

++ + +

+

−− −

≈−

∫ ∫

t t

tt t

d dt t t td

dt tdt

Radon measure: measure on the -algebra of Borel sets of E that is

locally finite andinner regular.

(E,d): metric space

neighbourhood of measure :( ) : Η E Eset of Radon measures on

ρ

σ

inner regularity

Infinite Kernel Learning Topology

22August 7, 2009

ρ

( ) : ( )

( ( ))

( ( ))

ρ ε µ µ ρ ε

= ∈ − <

′′

∫ ∫f

A A

f

B Η E fd fd

Η E

Η E

dual space of continuous bounded functions,

( )νµ K

ν ⊂K E: compact set

Def.: Basis of neighbourhoodof a measure :

Def.: Prokhorov metric:

( ) ( 1,2,..., ) .µ ρ µ ε∈ − < =∫ ∫i iE EΗ E f d f d i n

1( ,..., ( ( )) ; 0)ρ ε′∈ >nf f Η E

Infinite Kernel Learning Topology

23August 7, 2009

0( , ) : inf 0 | ( ) ( ) ( ) ( ) ( : ) ,

: | ( , ) .

ε εε

ε

µ ρ ε µ ρ ε ρ µ ε

ε

= ≥ ≤ + ≤ +

= ∈ <

d A A A A A

A x E d x A

and closed

where

0( ) : ( ) ( , ) .δ ρ µ ρ µ δ= ∈ <B Η E d

Open -neighbourhood of a measureδ :ρ

Infinite Kernel Learning Numerical Results

24August 7, 2009

ReferencesÖzöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B.,Pattern analysis for the prediction of eukoryatic pro

peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,

10 (May 2009) 2388-2394.

Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,

Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings

1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4

(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

25August 7, 2009

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

appear in the special issue of OMS (Optimization Software and Application) at the occasion of International

Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.

(guest ed.).

Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,

METU, submitted to JOGO (Journal of Global Optimization).

top related