comp90051 statistical machine learning - github … · comp90051 statistical machine learning...

23
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing

Upload: truongcong

Post on 26-Aug-2018

262 views

Category:

Documents


0 download

TRANSCRIPT

COMP90051StatisticalMachineLearningSemester2,2017

Lecturer:TrevorCohn

24.HiddenMarkovModels&messagepassing

Lecture24StatisticalMachineLearning(S22017)

Lookingback…

• Representationofjointdistributions

• Conditional/marginalindependence* Directedvsundirected

• Probabilisticinference* Computingotherdistributionsfromjoint

• Statisticalinference* Learnparametersfrom(missing)data

• Today:puttingtheseallintopractice…

2

StatisticalMachineLearning(S22017) Lecture24

HiddenMarkovModels

Modelofchoiceforsequentialdata.Aformofclustering(ordimensionalityreduction)fordiscrete

timeseries.

3

Lecture24StatisticalMachineLearning(S22017)

TheHMM(andKalman Filter)

• Sequentialobservedoutputsfromhiddenstate* statestakediscretevalues(i.e.,clusters)* assumesdiscretetimesteps1,2,…,T

• TheKalman filtersamewithcontinuousGaussianr.v.’s* i.e.,dimensionalityreduction,butwithtemporaldynamic

4

Lecture24StatisticalMachineLearning(S22017)

HMMApplications

• NLP– partofspeechtagging:givenwordsinsentence,inferhiddenpartsofspeech

“IloveMachineLearning”à noun,verb,noun,noun

• Speechrecognition:givenwaveform,determinephonemes

• Biologicalsequences:classification,search,alignment

• Computervision:identifywho’swalkinginvideo,tracking

5

Lecture24StatisticalMachineLearning(S22017)

Formulation

• FormulatedasdirectedPGM* thereforejointexpressedas

* bold variablesareshorthandforvectorofT values

• Parameters(forhomogenousHMM)

6

P (o,q) = P (q1)P (o1|q1)TY

i=2

P (qi|qi�1)P (oi|qi)

Lecture24StatisticalMachineLearning(S22017)

allothero’sexcludingi

Independence

• GraphencodesindependencebtwRVs* conditionalindependence:

oi⟘ o\i |qi* stateqi mustencodeallsequentialcontext

• Markovblanketislocal* foroi blanketisqi* forqi blanketis{oi,qi-1, qi+1}

7

Lecture24StatisticalMachineLearning(S22017)

FundamentalHMMTasksHMMTask PGMTask

Evaluation. GivenanHMM𝜇 andobservationsequence𝒐,determinelikelihoodPr(𝒐|𝜇)

Probabilisticinference

Decoding. GivenanHMM𝜇 andobservationsequence𝒐,determinemostprobablehiddenstatesequence𝒒

MAPpointestimate

Learning. Givenanobservationsequence𝒐andsetofstates,learnparameters𝐴, 𝐵, Π

Statisticalinference

8

Lecture24StatisticalMachineLearning(S22017)

“Evaluation”a.k.a.marginalisation

• Computeprob.ofobservationso bysummingoutq

• Makethismoreefficientbymovingthesums

• Dejavu?Maybewecoulddovar.elimation…

9

P (o|µ) =X

q

P (o,q|µ)

=X

q1

X

q2

. . .

X

qT

P (q1)P (o1|q1)P (q2|q1)P (o2|q2) . . . P (qT |qT�1)P (oT |qT )

P (o|µ) =X

q1

P (q1)P (o1|q1)X

q2

P (q2|q1)P (o2|q2) . . .X

qT

P (qT |qT�1)P (oT |qT )

Lecture24StatisticalMachineLearning(S22017)

Elimination=BackwardAlgorithm

10

P (o|µ) =X

q1

P (q1)P (o1|q1)X

q2

P (q2|q1)P (o2|q2) . . .X

qT

P (qT |qT�1)P (oT |qT )

mT!T�1(qT�1)

m2!1(q1)

P (o|µ) =X

q1

P (q1)P (o1|q1)m2!1(q1)

EliminateqT…

Eliminateq2“Eliminate”q1

qT…q1 q2

Lecture24StatisticalMachineLearning(S22017)

Elimination=ForwardAlgorithm

11

Eliminateq1…

EliminateqT-1“Eliminate”qT

qT…q1 q2

P (o|µ) =X

qT

P (oT |qT )X

qT�1

P (qT |qT�1)P (oT |qT ) . . .X

q1

P (q2|q1)P (q1)P (o1|q1)

m1!2(q2)

mT�1!T (qT )

P (o|µ) =X

q1

P (oT |qT )mT�1!T (qT )

Lecture24StatisticalMachineLearning(S22017)

Forward-Backward

• Bothalgorithmsarejustvariableeliminationusing different orderings* qT … q1 à backwardalgorithm* q1 … qT à forwardalgorithm* bothhavetimecomplexityO(Tl2) wherel isthelabelset

• CanuseeithertocomputeP(o)* butevenbetter,canusethemvaluestocomputemarginals (andpairwisemarginals overqi,qi+1)

12

P (qi|o) =1

P (o)mi�1!i(qi)P (oi|qi)mi+1!i(qi)

forward backward

Lecture24StatisticalMachineLearning(S22017)

StatisticalInference(Learning)

• Learnparameters𝛍 =(A,B,𝛑),givenobservationsequenceo

• Called“BaumWelch”algorithmwhichusesEM toapproximateMLE,argmax𝛍 P(o|𝛍):1. initialise 𝛍1,leti=12. computeexpectedmarginaldistributions

P(qt|o, 𝛍i)forallt;andP(qt-1,qt|o, 𝛍i)fort=2..T3. fitmodel𝛍i+1 basedonexpectations4. repeatfromstep2,withi=i+1

• Expectationscomputedusingforward-backward13

StatisticalMachineLearning(S22017) Lecture24

MessagePassing

Sum-productalgorithmforefficientlycomputingmarginaldistributionsovertrees.Anextensionof

variableeliminationalgorithm.

14

Lecture24StatisticalMachineLearning(S22017)

Inferenceasmessagepassing

• Eachmcanbeconsideredasamessage whichsummarises theeffectoftherestofthegraphonthecurrentnodemarginal.* Inference=passingmessagesbetweenallnodes

15

q1 q2 q4q3

m1!2

m3!2

Lecture24StatisticalMachineLearning(S22017)

Inferenceasmessagepassing

• Messagesvectorvalued,i.e.,functionoftargetlabel

• Messagesdefinedrecursively:lefttoright,orrighttoleft

16

q1 q2 q4q3

m1!2

m3!2

m4!3

Lecture24StatisticalMachineLearning(S22017)

Sum-productalgorithm

• Applicationofmessagepassingtomoregeneralgraphs* appliestochains,treesandpoly-trees(directedPGMswith>1parent)

* ‘sum-product’derivesfrom:• product=productofincomingmessages• sum =summingouttheeffectofRV(s)akaelimination

• Algorithmsupportsotheroperations(semi-rings)* e.g.,max-product,swappingsumformax* Viterbi algorithmisthemax-productvariantoftheforwardalgorithmforHMMs,solvestheargmaxqP(q|o)

17

Lecture24StatisticalMachineLearning(S22017)

ApplicationtoDirectedPGMS

18

CTL FG

GRLFA

AS

CTL FG

GRLFA

AS

CTL FG

GRLFA

AS

DirectedPGM Undirected”moralised”PGM

Factorgraph

Lecture24StatisticalMachineLearning(S22017)

19

f2(CTL,GRL,FG) = P (GRL|CTL, FG)

f1(CTL) = P (CTL)

CTL FG

GRLFA

AS

f1f2

f3f4f5

Factorgraphs

FG=Abipartitegraph,withfactors(functions)andRVs

DirectedPGMsresultintree-structuredFG

Lecture24StatisticalMachineLearning(S22017)

20

FactorgraphfortheHMM

q1 q2 q3 q4

P (q1)P (o1|q1)

P (q2|q1)

P (o2|q2)P (o3|q3)

P (o4|q4)

P (q4|q3)P (q3|q2)

Effectofobservednodesincorporatedintounaryfactors

Lecture24StatisticalMachineLearning(S22017)

• Twotypesofmessages:* betweenfactorsandRVs;andbetweenRVsandfactors* summarise acompletesub-graph

• E.g.,

• Structureinferenceas“gather-and-distribute”* gathermessagesfromleavesoftreetowardsroot* thenpropagatemessagebackdownfromroottoleaves

21

Sum-ProductoverFactorGraphs

mf2!GRL(GRL) =X

CTL

X

FG

f2(GRL,CTL, FG)mCTL!f2(CTL)mFG!f2(FG)

Lecture24StatisticalMachineLearning(S22017)

UndirectedPGManalogue:CRFs

• ConditionalRandomField:Samemodelappliedtosequences* observedoutputsarewords,speech,aminoacidsetc* statesaretags:part-of-speech,phone,alignment…* sharedinferencealgo.,i.e.,sum-product/max-product

• CRFsarediscriminative,modelP(q|o)* versusHMMswhicharegenerative,P(q,o)* undirectedPGMmoregeneralandexpressive

22𝑞1 𝑞2𝑞3𝑞4

𝑜1 𝑜2𝑜3𝑜4

Lecture24StatisticalMachineLearning(S22017)

Summary

• HMMsasexamplePGMs* formulationasPGM* independenceassumptions* probabilisticinferenceusingforward-backward* statisticalinferenceusingexpectationmaximisation

• Messagepassing:generalinferencemethodforU-PGMs* sum-product&max-product* factorgraphs

23