continuous optimization problems and successes tijl de bie intelligent systems laboratory mvse,...

Continuous optimizationProblems and successes

Tijl De BieIntelligent Systems Laboratory

MVSE, University of BristolUnited Kingdom

[email protected]

Continuous OptimizationTijl De Bie

Slide2

• Back-propagation algorithm for training neural networks (gradient descent)

• Support vector machines• Convex optimization `boom’ (NIPS, also ICML, KDD...)

Motivation

What explains this success?(Is it really a success?)

(Mainly for CP-ers not familiar with continuous optimization)


Slide3

•

• Continuousoptimization:

• Convex optimization:

(Convex) continuous optimization

:lig

:kif

f

i

i

1 ,0

1 ,0:subject to

min 0

x

x

xx

functions affine are

functionsconvex are

i

i

g

f

dii

d

Rgf

R

over defined , functions valued-real and

,Consider x


Slide4

Convex optimization


Slide5

• General convex optimization approach– Start with a guess, iteratively improve until optimum found– E.g. Gradient descent, conjugate gradient, Newton method, etc

• For constrained convex optimization:Interior point methods– Provably efficient (worst-case, typical case even better)– Iteration complexity: – Complexity per iteration: polynomial

• Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...)• Purely declarative• Book: Convex Optimization (Boyd & Vandenberghe)

Convex optimization

kO )/1log(


Slide6

Convex optimization

SDPSOCP

QP LP

Logdet

Geometric programming

Convex optimization

Cone Programming


Slide7

• Linear objectiveLinear inequality constraintsAffine equality constraints

• Applications:– Relaxations of Integer LP’s– Classification: linear support vector machines (SVM),

forms of boosting– (Lots outside DM/ML)

Linear Programming (LP)

ii

ii

b

h

xa

xg

xcx

'

':subject to

'min


Slide8

• Convex Quadratic constraints

• LP is a special case where

• Applications:– Classification/regression: SVM– Novelty detection: minimum volume enclosing hypersphere– Regression + feature selection: lasso– Structured prediction problems

Convex Quadratic Programming (QP)

0''' bxaBxBx

0B


Slide9

• Second Order Cone constraints

• QCQP is a special case where

• Applications:– Metric learning– Fermat-Weber problem: find a point in a plane with minimal sum

of distances to a set of points– Robust linear programming

Second-Order Cone Programming (SOCP)

d xcbAx '2

0c


Slide10

• Constraints requiring a matrix to be Positive Semi-Definite:

• SOCP is a special case:

• Applications:– Metric learning– Low rank matrix approximations (dimensionality reduction)– Very tight relaxations of graph labeling problems (e.g. Max-cut)– Semi-supervised learning– Approximate inference in difficult graphical models

Semi-Definite Programming (SDP)

0FF k

kkx0

0

'

''

IxcbAx

bAxxc

d

d


Slide11

• Objective and constraints of the form:

• Applications:– Maximum entropy modeling with moment constraints– Maximum likelihood fitting of exponential family distributions

Geometric programming

kkk bxa 'explog


Slide12

• Objective is the log determinant of a matrix:

• = -volume of parallelepiped spanned by columns of X

• Applications:– Novelty detection: minimum volume enclosing ellipsoid– Experimental design / active learning (which labels for which

data points are likely to be most informative)

Log Determinant Optimization (Logdet)

Xdetlog


Slide13

• Eigenvalue problems are not convex optimization problems

• Still, a relatively efficient and globally convergent, and a useful primitive:– Dimensionality reduction (PCA)– Finding relations between datasets (CCA)– Spectral clustering– Metric learning– Relaxations of combinatorial problems

Eigenvalue problems


Slide14

• Very popular in conferences like NIPS, ICML, KDD• These model classes are sufficiently rich to do

sophisticated things– Sparsity: L1 norm/linear constraints feature selection– Low-rank of matrices: SDP constraint and trace norm (sparse

PCA, labeling problems...)

• Declarative nature, little expertise needed• Computational complexity is easy to understand

The hype


Slide15

• But:– Polynomial-time, often with a high exponent

E.g. SDP: and sometimes – Convex constraints can be too limitative

• Tendency toward other paradigms:– Convex-concave programming

(Few guarantees, but works well in practice)– Submodular optimization

(Approximation guarantees, works well in practice)

After the hype

5.22qdO 2qOd


Slide16

• “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry)

• I understand in CP there is a hierarchy of propagation methods, but...

• Is there a hierarchy of problem complexities?– How hard is it to see if a constraint will propagate well?– Does it depend on the implementation?– ...

CP vs Convex Optimization

continuous optimization problems and successes tijl de bie intelligent systems laboratory mvse,...

Documents

bie slide

continuous optimization

successes tijl

special case

order cone constraints

columns of x applications

typical case

svm novelty detection