continuous optimization problems and successes tijl de bie intelligent systems laboratory mvse,...
TRANSCRIPT
Continuous optimizationProblems and successes
Tijl De BieIntelligent Systems Laboratory
MVSE, University of BristolUnited Kingdom
Continuous OptimizationTijl De Bie
Slide2
• Back-propagation algorithm for training neural networks (gradient descent)
• Support vector machines• Convex optimization `boom’ (NIPS, also ICML, KDD...)
Motivation
What explains this success?(Is it really a success?)
(Mainly for CP-ers not familiar with continuous optimization)
Continuous OptimizationTijl De Bie
Slide3
•
• Continuousoptimization:
• Convex optimization:
(Convex) continuous optimization
:lig
:kif
f
i
i
1 ,0
1 ,0:subject to
min 0
x
x
xx
functions affine are
functionsconvex are
i
i
g
f
dii
d
Rgf
R
over defined , functions valued-real and
,Consider x
Continuous OptimizationTijl De Bie
Slide4
Convex optimization
Continuous OptimizationTijl De Bie
Slide5
• General convex optimization approach– Start with a guess, iteratively improve until optimum found– E.g. Gradient descent, conjugate gradient, Newton method, etc
• For constrained convex optimization:Interior point methods– Provably efficient (worst-case, typical case even better)– Iteration complexity: – Complexity per iteration: polynomial
• Out-of-the-box tools exist (SeDuMi, SDPT3, MOSEK...)• Purely declarative• Book: Convex Optimization (Boyd & Vandenberghe)
Convex optimization
kO )/1log(
Continuous OptimizationTijl De Bie
Slide6
Convex optimization
SDPSOCP
QP LP
Logdet
Geometric programming
Convex optimization
Cone Programming
Continuous OptimizationTijl De Bie
Slide7
• Linear objectiveLinear inequality constraintsAffine equality constraints
• Applications:– Relaxations of Integer LP’s– Classification: linear support vector machines (SVM),
forms of boosting– (Lots outside DM/ML)
Linear Programming (LP)
ii
ii
b
h
xa
xg
xcx
'
':subject to
'min
Continuous OptimizationTijl De Bie
Slide8
• Convex Quadratic constraints
• LP is a special case where
• Applications:– Classification/regression: SVM– Novelty detection: minimum volume enclosing hypersphere– Regression + feature selection: lasso– Structured prediction problems
Convex Quadratic Programming (QP)
0''' bxaBxBx
0B
Continuous OptimizationTijl De Bie
Slide9
• Second Order Cone constraints
• QCQP is a special case where
• Applications:– Metric learning– Fermat-Weber problem: find a point in a plane with minimal sum
of distances to a set of points– Robust linear programming
Second-Order Cone Programming (SOCP)
d xcbAx '2
0c
Continuous OptimizationTijl De Bie
Slide10
• Constraints requiring a matrix to be Positive Semi-Definite:
• SOCP is a special case:
• Applications:– Metric learning– Low rank matrix approximations (dimensionality reduction)– Very tight relaxations of graph labeling problems (e.g. Max-cut)– Semi-supervised learning– Approximate inference in difficult graphical models
Semi-Definite Programming (SDP)
0FF k
kkx0
0
'
''
IxcbAx
bAxxc
d
d
Continuous OptimizationTijl De Bie
Slide11
• Objective and constraints of the form:
• Applications:– Maximum entropy modeling with moment constraints– Maximum likelihood fitting of exponential family distributions
Geometric programming
kkk bxa 'explog
Continuous OptimizationTijl De Bie
Slide12
• Objective is the log determinant of a matrix:
• = -volume of parallelepiped spanned by columns of X
• Applications:– Novelty detection: minimum volume enclosing ellipsoid– Experimental design / active learning (which labels for which
data points are likely to be most informative)
Log Determinant Optimization (Logdet)
Xdetlog
Continuous OptimizationTijl De Bie
Slide13
• Eigenvalue problems are not convex optimization problems
• Still, a relatively efficient and globally convergent, and a useful primitive:– Dimensionality reduction (PCA)– Finding relations between datasets (CCA)– Spectral clustering– Metric learning– Relaxations of combinatorial problems
Eigenvalue problems
Continuous OptimizationTijl De Bie
Slide14
• Very popular in conferences like NIPS, ICML, KDD• These model classes are sufficiently rich to do
sophisticated things– Sparsity: L1 norm/linear constraints feature selection– Low-rank of matrices: SDP constraint and trace norm (sparse
PCA, labeling problems...)
• Declarative nature, little expertise needed• Computational complexity is easy to understand
The hype
Continuous OptimizationTijl De Bie
Slide15
• But:– Polynomial-time, often with a high exponent
E.g. SDP: and sometimes – Convex constraints can be too limitative
• Tendency toward other paradigms:– Convex-concave programming
(Few guarantees, but works well in practice)– Submodular optimization
(Approximation guarantees, works well in practice)
After the hype
5.22qdO 2qOd
Continuous OptimizationTijl De Bie
Slide16
• “CP: Choosing the best model is an art” (Helmut)“CP requires skill and ingenuity” (Barry)
• I understand in CP there is a hierarchy of propagation methods, but...
• Is there a hierarchy of problem complexities?– How hard is it to see if a constraint will propagate well?– Does it depend on the implementation?– ...
CP vs Convex Optimization