discriminative training and machine learning approaches machine learning lab, dept. of csie, ncku...

40
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Upload: bryan-stevenson

Post on 18-Jan-2018

225 views

Category:

Documents


0 download

DESCRIPTION

Our Concerns  Feature extraction and HMM modeling should be jointly performed.  Common objective function should be considered.  To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory.  Model parameters should be calculated rapidly without applying descent algorithm. 3

TRANSCRIPT

Page 1: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Discriminative Trainingand

Machine Learning Approaches

Machine Learning Lab, Dept. of CSIE, NCKU

Chih-Pin Liao

Page 2: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Discriminative Training

2

Page 3: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Our ConcernsFeature extraction and HMM modeling should be jointly performed.

Common objective function should be considered.

To alleviate model confusion and improve recognition performance, we should estimate HMM using discriminative criterion built from statistics theory.

Model parameters should be calculated rapidly without applying descent algorithm.

3

Page 4: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

MCE is a popular discriminative training algorithm developed for speech recognition and extended to other PR applications.

Rather than maximizing likelihood of observed data, MCE aims to directly minimize classification errors.

Gradient descent algorithm was used to estimate HMM parameters.

Minimum Classification Error (MCE)4

Page 5: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Procedure of training discriminative models using observations X

Discriminant function

Anti-discriminant function

Misclassification measure

)(log),g( jj XPX

1

)(log1

1)(log),(

jc

cjj XPC

XPXG

),(),g(),( jjj XGXXd

}{ j

MCE Training Procedure5

Page 6: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Loss function is calculated by mapping

into a range between zero to one through a sigmoid function.

Minimize the expected loss or classification error to find discriminative model.

C

jjjXX XXlEXlE

1

)(1 ),( minarg)],([argmin ˆ

)),(exp(11),(

jj Xd

Xl

),( jXd

Expected Loss6

Page 7: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Hypothesis Test

7

Page 8: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

New training criterion was derived from hypothesis test theory.

We are testing null hypothesis against alternative hypothesis.

Optimal solution is obtained by a likelihood ratio test according to Neyman-Pearson Lemma

Higher likelihood ratio imply stronger confidence towards accepting null hypothesis.

)()(

LR1

0

HXPHXP

Likelihood Ratio Test8

Page 9: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Null and alternative hypotheses : Observations X are from target HMM state

j : Observation X are not from target HMM

state j

We develop discriminative HMM parameters for target state against non-target states.

Problem turns out to verify the goodness of data alignment to the corresponding HMM states.

0H

1H

Hypotheses in HMM Training9

Page 10: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Maximum Confidence Hidden Markov Model

10

Page 11: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

MCHMM is estimated by maximizing the log likelihood ratio or the confidence measure

where parameter set consists of HMM parameters and transformation matrix

)|(log)|(log maxarg

)|LLR( maxarg MC

XPXP

X

},,,{ Wjkjkjk

Maximum Confidence HMM11

Page 12: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Expectation-maximization (EM) algorithm is applied to tackle missing data problem for maximum confidence estimation

E-step

T

t

C

jctjtt

SS

jc

PC

PXjsP

SXXSPXSXEQ

1 1

)(1

1)(log),(

),(LLR),(],),(LLR[)(

xx

Hybrid Parameter Estimation12

Page 13: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

)},,{},,({)}{}({

)()(21

log212log

2loglog

11

)()(21

log212log

2loglog

),(

)(

1

1

1 1 1 1

1

WWQQ

WW

dW

C

WW

dWkj

Q

jkjkjkjkgjkjk

cktckT

ckt

ckck

T

t

C

j

K

k iktikT

ikt

ikik

t

jc

xx

xx

Expectation Function13

Page 14: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

T

t

T

t

K

kt

K

kt

T

tt

T

tt

jk

jc

jc

kcC

kj

kcC

kj

1 1 11

11

),(1

1),(

),(1

1),(

T

tt

T

tt

T

ttt

T

ttt

jk

jc

jc

kcC

kj

kcC

kj

W

11

11

),(1

1),(

),(1

1),(

xx

MC Estimates of HMM Parameters14

Page 15: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

'

),(1

1),(

))((),(1

1

))((),(

11

1

1

W

kcC

kj

kcC

kj

W T

tt

T

tt

T

t

Tcktcktt

Tjktjkt

T

tt

Tjk

jc

jc

xx

xx

MC Estimates of HMM Parameters15

Page 16: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

)(

)()()1( )(

i

igii

W

WWQWW

C

j

K

k

ijk

Tiijki

ig WWWT

WWWQ

1 1

1)()()()(

)(

)()(

MC Estimate of Transformation Matrix16

Page 17: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Training featuresfrom face images

Uniformsegmentation

Transform HMMparameters with W

Convergence?

Viterbidecoding

MCM-basedHMM parameters

Extract featureswith estimated Wfrom observation

yes

no

Initialize W

Estimate transformation matrix Wwith GPD algorithm

W convergence?

WWWQWW tt

)|()()1(

no yes

Estimate initialHMM parameters

17

Page 18: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

MC Classification Rule

Let Y denote an input test image data. We apply the same criterion to identify the most likely category corresponding to Y

)LLR( maxargMC cc

Yc

18

Page 19: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Summary A new maximum confidence HMM framework was

proposed. Hypothesis test principle was used for building

training criterion. Discriminative feature extraction and HMM

modeling were performed under the same criterion. “Maximum Confidence Hidden Markov Modeling for Face

Recognition”Chien, Jen-Tzung; Liao, Chih-Pin;Pattern Analysis and Machine Intelligence, IEEE Transactions on

Volume 30,  Issue 4,  April 2008 Page(s):606 – 616

19

Page 20: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Machine Learning Approaches

20

Page 21: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Introduction

Conditional Random Fields (CRF) relax the normal conditional independence

assumption of the likelihood model enforce the homogeneity of labeling variables

conditioned on the observation Due to the weak assumptions of CRF model

and its discriminative nature allows arbitrary relationship among data may require less resources to train its

parameters

21

Page 22: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Better performance of CRF models than the Hidden Markov Model (HMM) and Maximum Entropy Markov models (MEMMs) language and text processing problem Object recognition problems Image and video segmentation tracking problem in video sequences

22

Page 23: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Generative & Discriminative Model

23

Page 24: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Two Classes of Models24

Generative model (HMM) - model the distribution of states

Direct model (MEMM and CRF)- model the posterior probability directly

)|()|( SXXS PP

)|(maxargˆ XSSS

p 1ts

1tx tx 1tx

ts 1ts

1ts

1tx tx 1tx

ts 1ts 1ts

1tx tx 1tx

ts 1ts

MEMM CRF

Page 25: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Comparisons of Two Kinds of Model

25 Generative model – HMM Use Bayesian rule approximation Assume that observations are independent Multiple overlapping features are not modeled Model is estimated through recursive Viterbi

algorithm)|()|()()( 11 sxPssPss t

Sstt

Page 26: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Direct model - MEMM and CRF Direct modeling of posterior probability Dependencies of observations are flexibly

modeled Model is estimated through recursive

Viterbi algorithm),|()()( 11

t

Sstt xssPss

26

Page 27: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Hidden Markov Model &Maximum Entropy Markov Model

27

Page 28: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

HMM for Human Motion Recognition

HMM is defined by Transition probability Observation probability

1ts

1tx tx 1tx

ts 1ts

28

)|( 1tt ssp

)|( tt sxp

Page 29: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Maximum Entropy Markov Model29

MEMM is defined by is used to replace

transition and observation probability in HMM model

1ts

1tx tx 1tx

ts 1ts

),|( 1 ttt xssp

Page 30: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Maximum Entropy Criterion 30

Definition of feature functions

where

Constrained optimization problem

where empirical

expectation

model expectation

01)(,0

),(,

ssandcbifscf tt

ttsb

},{ 1 ttt sxc

ii ffi EEf ~:

VsCc

if scfcpcspEi

,),()(~)|(

N

jjji

VsCcif scf

NscfscpE

i1,

),(1),(),(~~

Page 31: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Solution of MEMM

Lagrange multipliers are used for constrained optimization

where are the model parameters

Solution is obtained by

Ss jjj

iii

iii scf

scfscf

cZcsp

)),(exp(

)),(exp()),(exp(1)|(

31

i

ffi iiEEcspHcsp ))~(())|(()),|((

}{ i

)|(log)|()(~))|((,

cspcspcpcspHVsCc

Page 32: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

GIS Algorithm Optimize the Maxmimum Mutual

Information Criterion (MMI) Step1: Calculate the empirical expectation

Step2: Start from an initial value

Step3: Calculate the model expectation

Step4: Update model parameters

Repeat step 3 and 4 until convergence)

~log( )(

)()(current

f

fcurrenti

newi

i

i

E

E

32

1)0( i

N

jjjif scf

NE

i1

),(1~

VsCc

if scfcspN

Ei

,

),()|(1

Page 33: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Conditional Random Field

33

Page 34: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Conditional Random Field

34 Definition

Let be a graph such that .When conditioned on , and obeyed the

Markovproperty Then, is a conditional random field

1ts

1tx tx 1tx

ts 1ts

),( SXG Vvv )(SS

X vS

)~,,|(),,|( vwpvwp wvwv SXSSXS

),( SX

Page 35: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

CRF Model Parameters

The undirected graphical structure can be used to factorize into a normalized product of potential functions

Consider the graph as a linear-chain structure

Model parameter set

Feature function set

35

jVvvjj

iEeeii vgefp

,,),,(),,(exp),|( XSXSXS

,...},,...;,{ 2121

,...},,...;,{ 2121 ggff

Page 36: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

CRF Parameter Estimation

36 We can rewrite and maximize the posterior probability

where

and Log posterior probability is given by

)),(exp()(

1),|( k

kk FZ

p XSX

XS

,...},,...;,{,....},{ 212121

,...},,...;,{,...},{ 212121 ggffFF

k j

kkjjk F

ZL ),(

)(1log)( )()(

)( XSX

Page 37: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Parameter Updating by GIS Algorithm

37 Differentiating the log posterior probability with respect to parameter

Setting this derivative to zero yields the constraint in maximum entropy model

This estimation has no closed-form solution. We can use GIS algorithm.

)],([)],([)( )(),|(),(~ )(

kj

kpjp

j

FEFELk XSXS

XSXS

Page 38: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

CRF MEMMDifference Objective Function Max. posterior

probability with Gibbs distribution

Max. entropy under constrain

Complexity of calculating normalization term

Full

DP

N-Best

Top One

Inference in model

Similarity Feature function State & observationState & state

Parameter Weight of feature function

Distribution Gibbs distribution

)1(O

)|(| NsO

)(kO

)|(| NsO

)|( XSp

)|(| 2 NsO

),|( 1 ttt xssp

38

Page 39: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Summary and Future works 39

We construct complex CRF with cycle for better modeling of contextual dependency. Graphical model algorithm is applied.

In the future, the variational inference algorithm will be developed for improving calculation of conditional probability.

The posterior probability can be calculated directly by a approximating approach.

“Graphical modeling of conditional random fields for human motion recognition” Liao, Chih-Pin; Chien, Jen-Tzung;ICASSP 2008. IEEE International Conference on March 31 2008-April 4 2008 Page(s):1969 - 1972

Page 40: Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao

Thanks for your attention and

Discussion

40