a ktec center of excellence 1 pattern analysis using convex optimization: part 2 of chapter 7...

43
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

Upload: katelynn-baskett

Post on 15-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 1

Pattern Analysis using Convex Optimization: Part 2 of

Chapter 7 Discussion

Presenter: Brian Quanz

Page 2: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 2

About today’s

discussion…• Last time: discussed convex opt.

• Today: Will apply what we learned to 4

pattern analysis problems given in

book:• (1) Smallest enclosing hypersphere (one-class SVM)

• (2) SVM classification

• (3) Support vector regression (SVR)

• (4) On-line classification and regression

Page 3: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 3

About today’s

discussion…• This time for the most part:

• Describe problems

• Derive solutions ourselves on the board!

• Apply convex opt. knowledge to solve

•Mostly board work today

Page 4: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 4

Recall: KKT Conditions• What we will use:

• Key to remember ch. 7:• Complementary slackness -> sparse dual rep.

• Convexity -> efficient global solution

Page 5: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 5

Novelty Detection:

Hypersphere• Train data – learn support

•Capture with hypersphere

•Outside – ‘novel’ or ‘abnormal’ or

‘anomaly’

• Smaller sphere = more fine-tuned

novelty detection

Page 6: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 6

1st: Smallest Enclosing

Hypersphere•Given:

• Find center, c, of smallest

hypersphere containing S

Page 7: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 7

S.E.H. Optimization

Problem•O.P.:

• Let’s solve using Lagrangian and

KKT and discuss

Page 8: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 8

Cheat

Page 9: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 9

S.E.H.: Solution

•H(x) = 1 if x>=0, 0 o.w.

Dual=primal @

Page 10: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 10

Theorem on bound of false

positive

Page 11: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 11

Hypersphere that only contains some data – soft

hypersphere

• Balance missing some points and

reducing radius• Robustness –single point could throw off

• Introduce slack variables (repeated

approach)• 0 within sphere, squared distance outside

Page 12: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 12

Hypersphere optimization

problem•Now with trade off between radius

and training point error:

• Let’s derive solution again

Page 13: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 13

Cheat

Page 14: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 14

Soft hypersphere

solution

Page 15: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 15

Linear Kernel Example

Page 16: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 16

Similar theorem

Page 17: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 17

Remarks• If data lies in subspace of feature

space:• Hypersphere overestimates support in perpendicular

dir.

• Can use kernel PCA (next week discussion)

• If normalized data (k(x,x)=1)• Corresponds to separating hyperplane, from origin

Page 18: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 18

Maximal Margin

Classifier•Data and linear classifier

•Hinge loss, gamma margin

• Linear separable if

Page 19: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 19

Margin Example

Page 20: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 20

Typical formulation• Typical formulation fixes gamma

(functional margbin) to 1 and allows w

to vary since scaling doesn’t affect

decision, margin proportional to

1/norm(w) to vary.

•Here we fix w norm, and vary

functional margin gamma

Page 21: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 21

Hard Margin SVM• Arrive at optimization problem

• Let’s solve

Page 22: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 22

Cheat

Page 23: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 23

Solution

• Recall:

Page 24: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 24

Example with Gaussian kernel

Page 25: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 25

Soft Margin Classifier•Non-separable - Introduce slack

variables as before• Trade off with 1-norm of error vector

Page 26: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 26

Solve Soft Margin SVM• Let’s solve it!

Page 27: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 27

Soft Margin Solution

Page 28: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 28

Soft Margin Example

Page 29: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 29

Support Vector

Regression• Similar idea to classification, except turned

inside-out

• Epsilon-insensitive loss instead of hinge

• Ridge Regression: Squared-error loss

Page 30: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 30

Support Vector

Regression• But, encourage sparseness

•Need inequalities• epsilon-insensitive loss

Page 31: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 31

Epsilon-insensitive•Defines band around function for 0-

loss

Page 32: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 32

SVR (linear epsilon)•Opt. problem:

• Let’s solve again

Page 33: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 33

SVR Dual and Solution•Dual problem

Page 34: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 34

Online• So far batch: processed all at once

• Many tasks require data processed one at a

time from start

• Learner:

• Makes prediction

• Gets feedback (correct value)

• Updates

• Conservative only updates if non-zero loss

Page 35: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 35

Simple On-line Alg.:

Perceptron• Threshold linear function

• At t+1 weight updated if error

• Dual update rule:

• If

Page 36: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 36

Algorithm Pseudocode

Page 37: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 37

Novikoff Theorem• Convergence bound for hard-margin case

• If training points contained in ball of radius R around

origin

• w* hard margin svm with no bias and geometric

margin gamma

• Initial weight:

• Number of updates bounded by:

Page 38: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 38

Proof• From 2 inequalities:

• Putting these together we have:

• Which leads to bound:

Page 39: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 39

Kernel Adatron• Simple modification to perceptron, models hard margin

SVM with 0 thresholdalpha stops changing, either alpha positive and right term 0, or right term negative

Page 40: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 40

Kernel Adatron – Soft

Margin• 1-norm soft margin version

• Add upper bound to the values of alpha (C)

• 2-norm soft margin version

• Add constant to diagonal of kernel matrix

• SMO

• To allow a variable threshold, updates must be made on pair of

examples at once

• Results in SMO

• Rate of convergence both algs. sensitive to order

• Good heuristics, e.g. choose points most violate conditions first

Page 41: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 41

On-line regression• Also works for regression case

• Basic gradient ascent with additional

constraints

Page 42: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 42

Online SVR

Page 43: A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz

A KTEC Center of Excellence 43

Questions•Questions, Comments?