classification theory

Classification Theory

Modelling of Kernel Machine by Infinite and Semi-Infinite Programming

4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009

1August 7, 2009

Infinite and Semi-Infinite Programming

Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber*

Institute of Applied Mathematics, METU, Ankara, Turkey

* Faculty of Economics, Management Science and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal

Motivation Prediction of Cleavage Sites

2August 7, 2009

mature partsignal part

0 1 1 2( 1 )

log( 0 )

= == + ⋅ + ⋅ + + ⋅ = =

Kl lp2 l pP Y X x

β β x β x β xP Y X x

Logistic Regression

( 1,2,..., )=l N

3August 7, 2009

Maximum margin classifier:

Note: implies correct classification.

: ( , )i i iy w x bγ = ⋅ < > +

0 >iγ

Linear Classifiers

4August 7, 2009

( , ) 1j jy w x b⋅ < > + =

( , ) 1k ky w x b⋅ < > + =γγγγ

• The geometric margin:

2min w

Linear Classifiers

5August 7, 2009

Convex

Problem

subject to ( , ) 1 ( 1,2,..., )

⋅ + ≥ = l

y w x b i

Dual Problem:

1max ,α α α−∑ ∑

i i j i j i jy y x x

Linear Classifiers

6August 7, 2009

subject to

max ,2

0 ( 1,2,..., ).

α α α

∑ ∑

i i j i j i ji i j

y y x x

Dual Problem:

1max ( , )α α α κ−∑ ∑

i i j i j i jy y x x

Linear Classifiers

7August 7, 2009

subject to

max ( , )2

0 ( 1,2,..., ).

α α α κ

∑ ∑

i i j i j i ji i j

y y x x

kernel function

Soft Margin Classifier:

• Introduce slack variables to allow the margin constraints to be violated

subject to ( , ) 1 ,

0 ( 1,2,..., )

⋅ + ≥ −≥ = l

i i iy w x b

Linear Classifiers

8August 7, 2009

0 ( 1,2,..., )ξ ≥ = li i

subject to ( , ) 1 ,

0 ( 1,2,..., )

⋅ + ≥ −≥ =

y w x b

• Projection of the data into a higher dimensional feature space.

• Mapping the input space into a new space

1 1( ,..., ) ( ) ( ( ),..., ( ))n Nx x x x x xφ φ φ= =a

Linear Classifiers

9August 7, 2009

)0(φ )0(φ)0(φ

)0(φ)(xφ

)(xφ )(xφ

set of hypotheses

dual representation 1

( ) ( ), ( ) .α φ φ=

= +∑l

i i ii

f x y x x b

iii bxwxf

,)()( φ

kernel function

Nonlinear Classifiers

10August 7, 2009

kernel function

Ex.: polynomial kernels

sigmoid Kernel

Gaussian (RBF) kernel

( , ) (1 )κ = + T kx z x z

( , ) tanh( )κ = +Tx z ax z b

2 2( , ) exp( / )κ σ= − −x z x z

• Based on the motivation of multiple kernel learning (MKL):

kernel functions

( ) ( )1

, ,κ β κ=

= ∑K

i j k k i jk

x x x x

( , ) :κ ⋅ ⋅l

0 1, , , 1β β=

≥ = =∑KK

l kkl K

(In-) Finite Kernel Learning

11August 7, 2009

• Semi-infinite LP formulation:

(SILP MKL) ( )

such that 0 , 1,

with 0 and 0.

θ βθ θ β

β α θ α α α=

∈ ∈

≥ ∀ ∈ ≤ ≤ =

∑ ∑1 ll

Kk k i ik iS C y

( ) ( ), 1 11

α α α κ α= == −∑ ∑l l

k i j i j k i j ii j iS y y x x

2*2: exp (1( , , (1) ) )

ωω ω ωκ

− −= + +−i j

ix x T d

ij jx xx x

( ) : ( , , )κω ω= i jH x x

homotopy

Infinite Kernel Learning Infinite Programming

12August 7, 2009

( )0) (1= + T di jH x x

2*2(1) exp

ω− −= i jx x

( , ) : ( , )) (,β κ β ωκ ωΩ

= ∫ i ji j dx xx x

Infinite Programming

[0,1]Ω =( , ) ( , , ) ( )βκ κ ω β ωΩ

= ∫i j i jx x x x d

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

13August 7, 2009

• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:

(IP)[ ]( )

max , : 0,1 : monotonically increasing

subject to 1,

θ βθ θ β

∈ →

=∫ d

14August 7, 2009

( ) ( )0

, with 0 , 0.2

ω α α β ω θ α α α= =Ω − ≥ ∀ ∈ ≤ ≤ =

∑ ∑∫l ll

i i ii iS d C yR

( ) ( ), 11

, : , ,2

ω α α α κ ω== ∑l

i j i j i ji jS y y x x

: 0 and 0α α α=

= ∈ ≤ ≤ =

A C yR

( ),T ω α ( ) 11

ω α α== −∑l

infinite programming

dual of (IP):

( ) ( ),

max , : a positive measure on

such that , 0 , ( ) 1.T d A d

θ βθ θ β

θ ω α β ω α β ωΩ Ω

∈ Ω

− ≤ ∀ ∈ =∫ ∫

min , : a positive measure on σ ρ

σ σ ρ∈ AR

15August 7, 2009

• Duality Conditions: Let and be feasible for their respective problems, and complementary slack, so

has measure only where and

has measure only where

Then, both solutions are optimal for their respective problems.

( ) ( ) ( )such that - , 0 , 1.σ ω α ρ α ω ρ α≥ ∀ ∈Ω =∫ ∫A AT d d

( , )θ β ( , )σ ρ

β ( , )A

T dσ ω α ρ= ∫ρ ( , ) .θ ω α β

Ω= ∫ T d

• The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic is a step function).

• Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizersof

16August 7, 2009

• Lower Level Problem: For a given parameter we consider

( )( ) ( ), , : , ( ).σ ρ ω σ ω α ρ α= − ∫Ag T d

( , ),σ ρ

( )( )min , , subject to .ω

σ ρ ω ω ∈ Ωg

• “reduction ansatz” and

• Implicit Function Theorem• parametrical measures

17August 7, 2009

• “ finite optimization”

21 ( )

e.g., ( ;( , )) exp2 2

ω µω µ σσ π σ

− −=f

exp( ), 0λ λω ω− ≥

18August 7, 2009

exp( ), 0( ; )

λ λω ωω λ

ω− ≥

(1 )( ;( , ))

(1 )β

αω ωω α β −

− −

−=−∫ u u du

( ) ( )( ;( , ))

ω ωω − − −=−

H a H ba b

• “ finite optimization”

( ,.)g x

( , )⋅%g x

Infinite Kernel Learning Reduction Ansatz

19August 7, 2009

ΩΩΩΩ

( , ) 0

min ( , ) 0y I

g x y y I

g x y∈

≥ ∀ ∈⇔ ≥

jy jy% py

( ,.)g x

( ) implicit functiona jx y x

min ( )

subject to ( ) : ( , ( )) 0 ( : 1,2, , )= ≥ ∈ = Kj j

g x g x y x j J p

based on the reduction ansatz :

(( , ), )σ ρ ⋅g

Infinite Kernel Learning Reduction Ansatz

20August 7, 2009

(( , ), )σ ρ ⋅g

( , )σ ρ

topology( , )ω ω σ ρ=%

ω•ω

0 10 1ι< <= < =K

regularization2

2, [0, 01] 0

min su ( )p ( )θ β

µ ωθ β ω β∈

− + ∫ ∫t

subject to the constraints

Infinite Kernel Learning Regularization

21August 7, 2009

( ) ( )1

( ) )(

νν ν ν ν

β ω β ωβ ω β ω

≈ =− −

∫ ∫∫ ∫

d dd tt tt t

12 2 1 1

1 1( ) ( )

νν ν

ν ν ν ν

β ω β ω

++ + +

−− −

≈−

∫ ∫

d dt t t td

dt tdt

Radon measure: measure on the -algebra of Borel sets of E that is

locally finite andinner regular.

(E,d): metric space

neighbourhood of measure :( ) : Η E Eset of Radon measures on

inner regularity

Infinite Kernel Learning Topology

22August 7, 2009

( ) : ( )

( ( ))

ρ ε µ µ ρ ε

= ∈ − <

′′

∫ ∫f

B Η E fd fd

dual space of continuous bounded functions,

( )νµ K

ν ⊂K E: compact set

Def.: Basis of neighbourhoodof a measure :

Def.: Prokhorov metric:

( ) ( 1,2,..., ) .µ ρ µ ε∈ − < =∫ ∫i iE EΗ E f d f d i n

1( ,..., ( ( )) ; 0)ρ ε′∈ >nf f Η E

Infinite Kernel Learning Topology

23August 7, 2009

0( , ) : inf 0 | ( ) ( ) ( ) ( ) ( : ) ,

: | ( , ) .

ε εε

µ ρ ε µ ρ ε ρ µ ε

= ≥ ≤ + ≤ +

= ∈ <

d A A A A A

A x E d x A

and closed

0( ) : ( ) ( , ) .δ ρ µ ρ µ δ= ∈ <B Η E d

Open -neighbourhood of a measureδ :ρ

Infinite Kernel Learning Numerical Results

24August 7, 2009

ReferencesÖzöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B.,Pattern analysis for the prediction of eukoryatic pro

peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,

10 (May 2009) 2388-2394.

Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,

Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings

1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4

(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

25August 7, 2009

Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to

appear in the special issue of OMS (Optimization Software and Application) at the occasion of International

Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.

(guest ed.).

Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,

METU, submitted to JOGO (Journal of Global Optimization).

classification theory

Education

classification of computer viruses using the theory of...

580.691 learning theory reza shadmehr neural mechanisms of...

classification in descriptive set theory

context theory of classification...

bayesian decision theory (classification)...

774 theory of machines - eng.karrar · 774 theory of...

image classification with the fisher vector: theory and

02 - phenomenological theory of magnetism and classification...

library+classification+theory+unit 13

theory and practice of classification...the library of...

image classification using topological features ...graph...

optimized unsupervised image classification based on...

network traffic classification: from theory to practice

bayesian classification theory - nasa · robin hanson...

grounded classification: grounded theory and faceted...

pattern classification & decision theory

toward a classification of general systems theory concepts

reproducer classification using the theory of affordances...

textile dyeing - theory of dyeing & classification of dyes

580.691 learning theory reza shadmehr neural mechanisms of...