classification theory
Post on 18-Dec-2014
451 Views
Preview:
DESCRIPTION
TRANSCRIPT
Classification Theory
Modelling of Kernel Machine by Infinite and Semi-Infinite Programming
4th International Summer SchoolAchievements and Applications of Contemporary Informatics, Mathematics and PhysicsNational University of Technology of the UkraineKiev, Ukraine, August 5-16, 2009
1August 7, 2009
Infinite and Semi-Infinite Programming
Süreyya Özöğür-Akyüz, Gerhard-Wilhelm Weber*
Institute of Applied Mathematics, METU, Ankara, Turkey
* Faculty of Economics, Management Science and Law, University of Siegen, GermanyCenter for Research on Optimization and Control, University of Aveiro, Portugal
Motivation Prediction of Cleavage Sites
2August 7, 2009
mature partsignal part
γ
0 1 1 2( 1 )
log( 0 )
= == + ⋅ + ⋅ + + ⋅ = =
Kl lp2 l pP Y X x
β β x β x β xP Y X x
l
l
Logistic Regression
( 1,2,..., )=l N
3August 7, 2009
Maximum margin classifier:
Note: implies correct classification.
: ( , )i i iy w x bγ = ⋅ < > +
0 >iγ
Linear Classifiers
4August 7, 2009
( , ) 1j jy w x b⋅ < > + =
( , ) 1k ky w x b⋅ < > + =γγγγ
• The geometric margin:
2
2
wγ =
2
2max
w 2
2min w
Linear Classifiers
5August 7, 2009
2
Convex
Problem
2
2
,
subject to ( , ) 1 ( 1,2,..., )
min
⋅ + ≥ = l
w b
i i
w
y w x b i
Dual Problem:
1max ,α α α−∑ ∑
l l
i i j i j i jy y x x
Linear Classifiers
6August 7, 2009
1 , 1
1
subject to
max ,2
0,
0 ( 1,2,..., ).
α α α
α
α
= =
=
−
=
≥ =
∑ ∑
∑l
l
i i j i j i ji i j
i ii
i
y y x x
y
i
Dual Problem:
1max ( , )α α α κ−∑ ∑
l l
i i j i j i jy y x x
Linear Classifiers
7August 7, 2009
1 , 1
1
subject to
max ( , )2
0,
0 ( 1,2,..., ).
α α α κ
α
α
= =
=
−
=
≥ =
∑ ∑
∑l
l
i i j i j i ji i j
i ii
i
y y x x
y
i
kernel function
Soft Margin Classifier:
• Introduce slack variables to allow the margin constraints to be violated
subject to ( , ) 1 ,
0 ( 1,2,..., )
ξξ
⋅ + ≥ −≥ = l
i i iy w x b
i
Linear Classifiers
8August 7, 2009
0 ( 1,2,..., )ξ ≥ = li i
2
2 2
, ,1
min
subject to ( , ) 1 ,
0 ( 1,2,..., )
ξξ
ξξ
=
+
⋅ + ≥ −≥ =
∑l
l
iw b
i
i i i
i
w C
y w x b
i
• Projection of the data into a higher dimensional feature space.
• Mapping the input space into a new space
1 1( ,..., ) ( ) ( ( ),..., ( ))n Nx x x x x xφ φ φ= =a
X :F
Linear Classifiers
9August 7, 2009
)0(φ
)0(φ )0(φ)0(φ
)0(φ)(xφ
)(xφ )(xφ
)(xφ
)(xφ
)(xφ
set of hypotheses
dual representation 1
( ) ( ), ( ) .α φ φ=
= +∑l
i i ii
f x y x x b
∑=
+=N
iii bxwxf
1
,)()( φ
kernel function
Nonlinear Classifiers
10August 7, 2009
kernel function
Ex.: polynomial kernels
sigmoid Kernel
Gaussian (RBF) kernel
( , ) (1 )κ = + T kx z x z
( , ) tanh( )κ = +Tx z ax z b
2
2 2( , ) exp( / )κ σ= − −x z x z
• Based on the motivation of multiple kernel learning (MKL):
kernel functions
( ) ( )1
, ,κ β κ=
= ∑K
i j k k i jk
x x x x
( , ) :κ ⋅ ⋅l
( )1
0 1, , , 1β β=
≥ = =∑KK
l kkl K
(In-) Finite Kernel Learning
11August 7, 2009
• Semi-infinite LP formulation:
(SILP MKL) ( )
( )
,
1
1 1
max ,
such that 0 , 1,
with 0 and 0.
θ βθ θ β
β β
β α θ α α α=
= =
∈ ∈
≤ =
≥ ∀ ∈ ≤ ≤ =
∑
∑ ∑1 ll
K
Kkk
Kk k i ik iS C y
R R
R
( ) ( ), 1 11
: ,2
α α α κ α= == −∑ ∑l l
k i j i j k i j ii j iS y y x x
2*2: exp (1( , , (1) ) )
ωω ω ωκ
− −= + +−i j
ix x T d
ij jx xx x
( ) : ( , , )κω ω= i jH x x
ex.:
homotopy
Infinite Kernel Learning Infinite Programming
12August 7, 2009
( )0) (1= + T di jH x x
2*2(1) exp
ω− −= i jx x
H
( , ) : ( , )) (,β κ β ωκ ωΩ
= ∫ i ji j dx xx x
Infinite Programming
[0,1]Ω =( , ) ( , , ) ( )βκ κ ω β ωΩ
= ∫i j i jx x x x d
• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:
Infinite Kernel Learning Infinite Programming
13August 7, 2009
• IntroducingRiemann-Stieltjes integralsto the problem (SILP-MKL), we get the following general problem formulation:
(IP)[ ]( )
( ),
1
0
max , : 0,1 : monotonically increasing
subject to 1,
θ βθ θ β
β ω
∈ →
=∫ d
R R
Infinite Kernel Learning Infinite Programming
14August 7, 2009
( ) ( )0
1 11
, with 0 , 0.2
ω α α β ω θ α α α= =Ω − ≥ ∀ ∈ ≤ ≤ =
∫
∑ ∑∫l ll
i i ii iS d C yR
( ) ( ), 11
, : , ,2
ω α α α κ ω== ∑l
i j i j i ji jS y y x x
1
: 0 and 0α α α=
= ∈ ≤ ≤ =
∑1l
li i
i
A C yR
( ),T ω α ( ) 11
: ,2
ω α α== −∑l
iiS
(IP)
infinite programming
dual of (IP):
(DIP)
( )
( ) ( ),
max , : a positive measure on
such that , 0 , ( ) 1.T d A d
θ βθ θ β
θ ω α β ω α β ωΩ Ω
∈ Ω
− ≤ ∀ ∈ =∫ ∫
R
( ),
min , : a positive measure on σ ρ
σ σ ρ∈ AR
Infinite Kernel Learning Infinite Programming
15August 7, 2009
(DIP)
• Duality Conditions: Let and be feasible for their respective problems, and complementary slack, so
has measure only where and
has measure only where
Then, both solutions are optimal for their respective problems.
( ) ( ) ( )such that - , 0 , 1.σ ω α ρ α ω ρ α≥ ∀ ∈Ω =∫ ∫A AT d d
( , )θ β ( , )σ ρ
β ( , )A
T dσ ω α ρ= ∫ρ ( , ) .θ ω α β
Ω= ∫ T d
• The interesting theoretical problem here is to find conditions which ensure that solutions are point masses (i.e., the original monotonic is a step function).
• Because of this and in view of the compactness of the feasible (index) sets at the lower levels, A and we are interested in the nondegeneracy of the local minima of the lower level problem to get finitely many local minimizersof
β
,Ω
Infinite Kernel Learning Infinite Programming
16August 7, 2009
• Lower Level Problem: For a given parameter we consider
(LLP)
( )( ) ( ), , : , ( ).σ ρ ω σ ω α ρ α= − ∫Ag T d
( , ),σ ρ
( )( )min , , subject to .ω
σ ρ ω ω ∈ Ωg
Infinite Kernel Learning Infinite Programming
• “reduction ansatz” and
• Implicit Function Theorem• parametrical measures
17August 7, 2009
• “ finite optimization”
22
21 ( )
e.g., ( ;( , )) exp2 2
ω µω µ σσ π σ
− −=f
exp( ), 0λ λω ω− ≥
• “reduction ansatz” and
• Implicit Function Theorem• parametrical measures
Infinite Kernel Learning Infinite Programming
18August 7, 2009
exp( ), 0( ; )
0, 0
λ λω ωω λ
ω− ≥
= <f
1
1 1
1 10
(1 )( ;( , ))
(1 )β
α β
αω ωω α β −
− −
−
−=−∫ u u du
f
( ) ( )( ;( , ))
ω ωω − − −=−
H a H ba b
b af
• “ finite optimization”
( ,.)g x
( , )⋅%g x
Infinite Kernel Learning Reduction Ansatz
• “reduction ansatz” and
• Implicit Function Theorem• parametrical measures
19August 7, 2009
ΩΩΩΩ
( , ) 0
min ( , ) 0y I
g x y y I
g x y∈
≥ ∀ ∈⇔ ≥
jy jy% py
( ,.)g x
( ) implicit functiona jx y x
min ( )
subject to ( ) : ( , ( )) 0 ( : 1,2, , )= ≥ ∈ = Kj j
f x
g x g x y x j J p
based on the reduction ansatz :
(( , ), )σ ρ ⋅g
Infinite Kernel Learning Reduction Ansatz
20August 7, 2009
(( , ), )σ ρ ⋅g
( , )σ ρ
( , )σ ρ
topology( , )ω ω σ ρ=%
ω•ω
•
0 10 1ι< <= < =K
t t
ttt
regularization2
2, [0, 01] 0
min su ( )p ( )θ β
µ ωθ β ω β∈
− + ∫ ∫t
ttd
ddt
dd
dt
subject to the constraints
Infinite Kernel Learning Regularization
21August 7, 2009
1
10 0
110
( ) ( )1
( ) )(
ν ν
ν ν
νν ν ν ν
β ω β ωβ ω β ω
+
+
+ +
−
≈ =− −
∫ ∫∫ ∫
t t
t t
t
d dd
d dd tt tt t
2 1
12 2 1 1
210
1 1( ) ( )
( )
ν ν
νν ν
ν ν ν ν
ν ν
β ω β ω
β ω
+ +
++ + +
+
−− −
≈−
∫ ∫
∫
t t
tt t
d dt t t td
dt tdt
Radon measure: measure on the -algebra of Borel sets of E that is
locally finite andinner regular.
(E,d): metric space
neighbourhood of measure :( ) : Η E Eset of Radon measures on
ρ
σ
inner regularity
Infinite Kernel Learning Topology
22August 7, 2009
ρ
( ) : ( )
( ( ))
( ( ))
ρ ε µ µ ρ ε
∈
= ∈ − <
′′
∫ ∫f
A A
f
B Η E fd fd
Η E
Η E
dual space of continuous bounded functions,
( )νµ K
ν ⊂K E: compact set
Def.: Basis of neighbourhoodof a measure :
Def.: Prokhorov metric:
( ) ( 1,2,..., ) .µ ρ µ ε∈ − < =∫ ∫i iE EΗ E f d f d i n
1( ,..., ( ( )) ; 0)ρ ε′∈ >nf f Η E
Infinite Kernel Learning Topology
23August 7, 2009
0( , ) : inf 0 | ( ) ( ) ( ) ( ) ( : ) ,
: | ( , ) .
ε εε
ε
µ ρ ε µ ρ ε ρ µ ε
ε
= ≥ ≤ + ≤ +
= ∈ <
d A A A A A
A x E d x A
and closed
where
0( ) : ( ) ( , ) .δ ρ µ ρ µ δ= ∈ <B Η E d
Open -neighbourhood of a measureδ :ρ
Infinite Kernel Learning Numerical Results
24August 7, 2009
ReferencesÖzöğür, S., Shawe-Taylor, J., Weber, G.-W., and Ögel, Z.B.,Pattern analysis for the prediction of eukoryatic pro
peptide cleavage sites, in the special issue Networks in Computational Biology of Discrete Applied Mathematics 157,
10 (May 2009) 2388-2394.
Özöğür-Akyüz, S., and Weber, G.-W., Infinite kernel learning by infinite and semi-infinite programming,
Proceedings of the Second Global Conference on Power Control and Optimization, AIP Conference Proceedings
1159, Bali, Indonesia, 1-3 June 2009, Subseries: Mathematical and Statistical Physics; ISBN 978-0-7354-0696-4
(August 2009) 306-313; Hakim, A.H., Vasant, P., and Barsoum, N., guest eds..
Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to
25August 7, 2009
Özöğür-Akyüz, S., and Weber, G.-W., Infinite Kernel Learning via infinite and semi-infinite programming, to
appear in the special issue of OMS (Optimization Software and Application) at the occasion of International
Conference on Engineering Optimization (EngOpt 2008; Rio de Janeiro, Brazil, June 1-5, 2008), Schittkowski, K.
(guest ed.).
Özöğür-Akyüz, S., and Weber, G.-W., On numerical optimization theory of infinite kernel learning, preprint at IAM,
METU, submitted to JOGO (Journal of Global Optimization).
top related