ziming zhang*, ze-nian li, mark drew school of computing science simon fraser university vancouver,...

Click here to load reader

Upload: martina-casey

Post on 17-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, mark}@cs.sfu.ca AdaMKL: A Novel Biconvex Multiple Kernel Learning Approach 1 * This work was done when the author was at SFU.
  • Slide 2
  • Outline Introduction Adaptive Multiple Kernel Learning Experiments Conclusion 2
  • Slide 3
  • Introduction Kernel Given a set of data and a feature mapping function, a kernel matrix can be defined as the inner product of each pair of feature vectors of. Multiple Kernel Learning Aim to learn an optimal kernel as well as support vectors by combining a set of kernels linearly. 3 Kernel coefficients
  • Slide 4
  • Introduction Multiple Kernel Learning 4
  • Slide 5
  • Example: L p -norm Multiple Kernel Learning [1] Learning train a traditional SVM by fixing the kernel coefficients; learn kernel coefficients by fixing w Introduction Traditional SVM constraints Kernel coefficient constraints 5 [1] M. Kloft, et al. Efficient and accurate Lp-norm multiple kernel learning. In NIPS09, 2009. Convex function
  • Slide 6
  • Introduction Motivation L p -norm kernel coefficient constraint makes the learning of kernel coefficients difficult, especially when p>1 Intuition Solve the MKL problem without considering the kernel coefficient constraints explicitly Contributions Propose a family of biconvex optimization formulations for MKL Can handle the cases of arbitrary norms of kernel coefficients Easy and fast to optimize 6
  • Slide 7
  • Adaptive Multiple Kernel Learning 7 weighting Biconvex function
  • Slide 8
  • Adaptive Multiple Kernel Learning 8 Biconvex functions f(x,y) is a biconvex function if f y (x) is convex and f x (y) is convex. Example: f(x,y)=x 2 +y 2 -3xy Biconvex optimization At least one function in the objective functions and constraints is biconvex, and others are convex. Local optima
  • Slide 9
  • Adaptive Multiple Kernel Learning 9 Adaptive Multiple Kernel Learning (AdaMKL) Aim to simplify the MKL learning process as well as keep the similar discriminative power of MKL using biconvex optimization. Binary classification
  • Slide 10
  • Adaptive Multiple Kernel Learning 10 Objective function:
  • Slide 11
  • Adaptive Multiple Kernel Learning Optimization Learn w by fixing using N p ( ) norm Learn by fixing w using L 1 or L 2 norm of Repeat the two steps until converged 11
  • Slide 12
  • Adaptive Multiple Kernel Learning 12 Learning w (Dual)
  • Slide 13
  • Adaptive Multiple Kernel Learning 13 Learning
  • Slide 14
  • Adaptive Multiple Kernel Learning 14 Computational complexity Same as quadratic programming Convergence If hard-margin cases (C=+) can be solved at the initialization stage, then AdaMKL will converge to a local minimum. If at either step our objective function converged, then AdaMKL has converged to a local minimum.
  • Slide 15
  • Adaptive Multiple Kernel Learning L p -norm MKLAdaMKL Convex Kernel coefficient norm condition Gradient search, Semi-infinite programming (SIP), etc Biconvex Kernel coefficient conditions hidden in dual Quadratic programming 15
  • Slide 16
  • Experiments 16 4 specific AdaMKL: N 0 L 1, N 1 L 1, N 1 L 2, N 2 L 2, where N and L denote the types of norm used for learning w and . 2 experiments 1. Toy example: C=10 5 without tuning, 10 Gaussian kernels, randomly sampled from 2D Gaussian distributions Positive samples: mean [0 0], covariance [0.3 0; 0 0.3], 100 samples Negative samples: mean [-1 -1] and [1 1], covariance [0.1 0; 0 0.1] and [0.2 0; 0 0.2], 100 samples, respectively.
  • Slide 17
  • Experiments (2) 17 2. 4 benchmark datasets: breast-cancer, heart, thyroid, and titanic (downloaded from http://ida.first.fraunhofer.de/projects/bench/) Gaussian kernels + polynomial kernels 100, 140, 60, 40 kernels for corresponding datasets, respectively Compared with convex optimization based MKL: GMKL [2] and SMKL [3] [2] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efficiency in multiple kernel learning. In ICML07. [3] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. JMLR, 9:24912521, 2008.
  • Slide 18
  • Experiments - Toy example 18
  • Slide 19
  • Experiments - Toy example 19
  • Slide 20
  • Experiments - Benchmark datasets 20 (a) Breast-Cancer: [69.64 ~ 75.23](b) Heart: [79.71 ~ 84.05] (c) Thyroid: [95.20 ~ 95.80] (d) Titanic: [76.02 ~ 77.58])
  • Slide 21
  • Experiments - Benchmark datasets 21 (a) Breast-Cancer(b) Heart (c) Thyroid (d) Titanic
  • Slide 22
  • Conclusion 22 Biconvex optimization for MKL Hide the kernel coefficient constraints (non-negative and L p (p1) norm) in the dual without explicit consideration. Easy to optimize, fast to converge, lower computational time but similar performance as traditional convex optimization based MKL
  • Slide 23
  • 23 Thank you !!!