discriminative, unsupervised, convex learning

51
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005

Upload: liora

Post on 09-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Discriminative, Unsupervised, Convex Learning. Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005. Current Research Group. PhD Tao Wang reinforcement learning PhD Ali Ghodsi dimensionality reduction - PowerPoint PPT Presentation

TRANSCRIPT

Discriminative, Unsupervised,

Convex Learning

Dale SchuurmansDepartment of Computing Science

University of Alberta

MITACS Workshop, August 26, 2005

2

Current Research GroupPhD Tao Wang reinforcement learning

PhD Ali Ghodsi dimensionality reduction

PhD Dana Wilkinson action-based embedding

PhD Yuhong Guo ensemble learning

PhD Feng Jiao bioinformatics

PhD Jiayuan Huang transduction on graphs

PhD Qin Wang statistical natural language

PhD Adam Milstein robotics, particle filtering

PhD Dan Lizotte optimization, everything

PhD Linli Xu unsupervised SVMs

PDF Li Cheng computer vision

3

Current Research GroupPhD Tao Wang reinforcement learning

PhD Dana Wilkinson action-based embedding

PhD Feng Jiao bioinformatics

PhD Qin Wang statistical natural language

PhD Dan Lizotte optimization, everything

PDF Li Cheng computer vision

4

Today I will talk about: One Current Research Direction

Learning Sequence Classifiers (HMMs)

Discriminative Unsupervised Convex

EM?

5

Outline

Unsupervised SVMs

Discriminative, unsupervised, convex HMMs

Tao, Dana, Feng, Qin, Dan, Li

6

Unsupervised Support Vector Machines

Joint work with

Linli Xu

8

Main Idea

Unsupervised SVMs(and semi-supervised SVMs)

Harder computational problem than SVMs

Convex relaxation – Semidefinite program(Polynomial time)

9

Background: Two-class SVM Supervised classification learning

Labeled data linear discriminant

Classification rule:

Some better than others?

0b w x

sgn( )y b w x

+

10

Maximum Margin Linear Discriminant

Choose a linear discriminant to maximize

,min ( , , Plane 0)i i i iy dist y b x x w x

0b w x

11

Unsupervised Learning Given unlabeled data,

how to infer classifications?

Organize objects into groups — clustering

12

Idea: Maximum Margin Clustering Given unlabeled data,

find maximum margin separating hyperplane

Clusters the data

Constraint: class balance: bound difference in sizes between classes

13

Challenge

Find label assignment that results in a large margin

Hard

Convex relaxation – based on semidefinite programming

14

How to Derive Unsupervised SVM?

Two-class case:1. Start with Supervised Algorithm

Given vector of assignments, y, solve

* 2 1

2max ,

subject to 0 1

K

λλ e λλ yy

λ

Inv. sq. margin

15

How to Derive Unsupervised SVM?

2. Think of as a function of y

If given y, would then solve

* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

* 2

Goal: Choose y to minimize inverse squared margin

Problem: not a convex function of yInv. sq.

margin

16

How to Derive Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given y, would then solve

* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Inv. sq. margin

17

How to Derive Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given M, would then solve

* 2 1

2( ) max ,

subject to 0 1

M MK

λλ e λλ

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Maximum of linear functions is convex

Inv. sq. margin

Note: convex function of M

18

How to Derive Unsupervised SVM?

4. Get constrained optimization problem

Solve for M

* 2min ( )

subject to 0 1

1, 1

M

n n

M

M

M

M

λ

e e

yy

e

encodes an equivalence relation

iff0, diag( )M M e±

1, 1n n

M

Not convex!

Class balance

19

How to Derive Unsupervised SVM?

4. Get constrained optimization problem

Solve for M

* 2min ( )

subject to 0 1

1, 1

0, dia

g( )

M

n n

M

M

M

M

M

λ

e e e

e± encodes

an equivalence relationiff

0, diag( )M M e±

1, 1n n

M

20

How to Derive Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±

21

How to Derive Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±Semidefinite

program

22

Multi-class Unsupervised SVM?

1. Start with Supervised Algorithm

Given vector of assignments, y, solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

e

(Crammer & Singer 01)

Margin loss

23

Multi-class Unsupervised SVM?

2. Think of as a function of y

If given y, would then solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

y

e

(Crammer & Singer 01)

Margin loss

Goal: Choose y to minimize margin

loss

Problem: not a convex function of y

24

Multi-class Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given y, would then solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

y

e

(Crammer & Singer 01)

Margin loss

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

25

Multi-class Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given M and D, would then solve

1

2

1 1

2

, max ( , , ) subject to 0,

where ( , , ) , ,

, ,

M D Q M D

Q M D n D K M

KD K

Λe e

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

Margin loss convex

function of M & D

26

Multi-class Unsupervised SVM?

4. Get constrained optimization problem

Solve for M and D

,min ,

subject to , diag( )

0,1 , 0,1

1 1

M D

n n n k

M D

M DD M

M D

n M nk k

e

e e eClass balance

27

Multi-class Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M and D

,min ,

subject to , diag( )

1 1

0,1 , 0,1n n n k

M D

M DD

M

M D

D

M

n M nk k

e

e e e

28

Multi-class Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M and D

,min ,

subject to , diag( )

1 1

0 1, 0

1

M D M D

M

n M nk

M DD

D

k

M

e

e e e

±

Semidefinite program

29

Experimental ResultsSemiDef

Spectral

Clusterin

g

Kmean

s

30

Experimental Results

31

Percentage of misclassification errors

Experimental Results

Digit dataset

32

Extension to Semi-Supervised Algorithm

11 t

t

Labeled

(Cl amped)

Unlabeled

1

2

{ 1,..., }

t

ijj

M t

i t n

ij i jM y y

Matrix M :

33

Experimental Results

Percentage of misclassification errors

Face dataset

34

Experimental Results

35

Discriminative, Unsupervised, Convex HMMs

Joint work withLinli Xu

With help from Li Cheng and Tao Wang

37

Hidden Markov Model

Joint probability model Viterbi classifier

1y 2y 3y

3x2x1x

)(xyP)|(maxarg xy

y

P

“hidden” state

observations

Must coordinate local classifiers )( ii xfy

38

HMM Training: Supervised

Given ,11yx ,...22yx nnyx

Maximum likelihood

Conditional likelihood

)(max1 ii

n

iP yx

)|(max1 ii

n

iP xy

)()|(max1 iii

n

iPP xxy

Models input distribution

Discriminative(CRFs)

39

HMM Training: Unsupervised

Given only Now what?

,1x ,...2x nx

EM!

Marginal likelihood )(max1 i

n

iP x

Exactly the part we don’t

care about

40

HMM Training: Unsupervised

Given only

The problem with EM: Not convex Wrong objective Too popular Doesn’t work

,1x ,...2x nx

41

HMM Training: Unsupervised

Given only

The dream: Convex training Discriminative training

When will someone invent unsupervised CRFs?

,1x ,...2x nx

)|( xyP

42

HMM Training: Unsupervised

Given only

The question: How to learn effectively

without seeing any y’s?

,1x ,...2x nx

)|( xyP

43

HMM Training: Unsupervised

Given only

The question: How to learn effectively

without seeing any y’s?

The answer: That’s what we already did!

Unsupervised SVMs

,1x ,...2x nx

)|( xyP

44

HMM Training: Unsupervised

Given only

The plan:

,1x ,...2x nx

supervised

unsupervised

single sequence

SVM M3N

unsup SVM ?

y y

45

M3N: Max Margin Markov Nets

Relational SVMs

Supervised training: Given Solve factored QP

,11yx ,...22yx nnyx

1y 2y 3y

3x2x1x

),( 21 yyxf

46

Unsupervised M3Ns Strategy

Start with supervised M3N QP y-labels re-express in local M,D

equivalence relations Impose class-balance Relax non-convex constraints

Then solve a really big SDP But still polynomial size

47

Unsupervised M3Ns

SDP

48

Some Initial Results

Synthetic HMM Protein Secondary Structure pred.

49

50

Current Research GroupPhD Tao Wang reinforcement learning

PhD Dana Wilkinson action-based embedding

PhD Feng Jiao bioinformatics

PhD Qin Wang statistical natural language

PhD Dan Lizotte optimization, everything

PDF Li Cheng computer vision

51

Brief Research Background

Sequential PAC Learning Linear Classifiers: Boosting, SVMs Metric-Based Model Selection Greedy Importance Sampling Adversarial Optimization & Search Large Markov Decision Processes