comp90051 statistical machine learning - github … · comp90051 statistical machine learning...

COMP90051StatisticalMachineLearningSemester2,2017

Lecturer:TrevorCohn

24.HiddenMarkovModels&messagepassing

Lecture24StatisticalMachineLearning(S22017)

Lookingback…

• Representationofjointdistributions

• Conditional/marginalindependence* Directedvsundirected

• Probabilisticinference* Computingotherdistributionsfromjoint

• Statisticalinference* Learnparametersfrom(missing)data

• Today:puttingtheseallintopractice…

2

StatisticalMachineLearning(S22017) Lecture24

HiddenMarkovModels

Modelofchoiceforsequentialdata.Aformofclustering(ordimensionalityreduction)fordiscrete

timeseries.

3


TheHMM(andKalman Filter)

• Sequentialobservedoutputsfromhiddenstate* statestakediscretevalues(i.e.,clusters)* assumesdiscretetimesteps1,2,…,T

• TheKalman filtersamewithcontinuousGaussianr.v.’s* i.e.,dimensionalityreduction,butwithtemporaldynamic

4


HMMApplications

• NLP– partofspeechtagging:givenwordsinsentence,inferhiddenpartsofspeech

“IloveMachineLearning”à noun,verb,noun,noun

• Speechrecognition:givenwaveform,determinephonemes

• Biologicalsequences:classification,search,alignment

• Computervision:identifywho’swalkinginvideo,tracking

5


Formulation

• FormulatedasdirectedPGM* thereforejointexpressedas

* bold variablesareshorthandforvectorofT values

• Parameters(forhomogenousHMM)

6

P (o,q) = P (q1)P (o1|q1)TY

i=2

P (qi|qi�1)P (oi|qi)


allothero’sexcludingi

Independence

• GraphencodesindependencebtwRVs* conditionalindependence:

oi⟘ o\i |qi* stateqi mustencodeallsequentialcontext

• Markovblanketislocal* foroi blanketisqi* forqi blanketis{oi,qi-1, qi+1}

7


FundamentalHMMTasksHMMTask PGMTask

Evaluation. GivenanHMM𝜇 andobservationsequence𝒐,determinelikelihoodPr(𝒐|𝜇)

Probabilisticinference

Decoding. GivenanHMM𝜇 andobservationsequence𝒐,determinemostprobablehiddenstatesequence𝒒

MAPpointestimate

Learning. Givenanobservationsequence𝒐andsetofstates,learnparameters𝐴, 𝐵, Π

Statisticalinference

8


Forward-Backward

• Bothalgorithmsarejustvariableeliminationusing different orderings* qT … q1 à backwardalgorithm* q1 … qT à forwardalgorithm* bothhavetimecomplexityO(Tl2) wherel isthelabelset

• CanuseeithertocomputeP(o)* butevenbetter,canusethemvaluestocomputemarginals (andpairwisemarginals overqi,qi+1)

12

P (qi|o) =1

P (o)mi�1!i(qi)P (oi|qi)mi+1!i(qi)

forward backward


StatisticalInference(Learning)

• Learnparameters𝛍 =(A,B,𝛑),givenobservationsequenceo

• Called“BaumWelch”algorithmwhichusesEM toapproximateMLE,argmax𝛍 P(o|𝛍):1. initialise 𝛍1,leti=12. computeexpectedmarginaldistributions

P(qt|o, 𝛍i)forallt;andP(qt-1,qt|o, 𝛍i)fort=2..T3. fitmodel𝛍i+1 basedonexpectations4. repeatfromstep2,withi=i+1

• Expectationscomputedusingforward-backward13

StatisticalMachineLearning(S22017) Lecture24

MessagePassing

Sum-productalgorithmforefficientlycomputingmarginaldistributionsovertrees.Anextensionof

variableeliminationalgorithm.

14


Inferenceasmessagepassing

• Eachmcanbeconsideredasamessage whichsummarises theeffectoftherestofthegraphonthecurrentnodemarginal.* Inference=passingmessagesbetweenallnodes

15

q1 q2 q4q3

m1!2

m3!2


Inferenceasmessagepassing

• Messagesvectorvalued,i.e.,functionoftargetlabel

• Messagesdefinedrecursively:lefttoright,orrighttoleft

16

q1 q2 q4q3

m1!2

m3!2

m4!3


Sum-productalgorithm

• Applicationofmessagepassingtomoregeneralgraphs* appliestochains,treesandpoly-trees(directedPGMswith>1parent)

* ‘sum-product’derivesfrom:• product=productofincomingmessages• sum =summingouttheeffectofRV(s)akaelimination

• Algorithmsupportsotheroperations(semi-rings)* e.g.,max-product,swappingsumformax* Viterbi algorithmisthemax-productvariantoftheforwardalgorithmforHMMs,solvestheargmaxqP(q|o)

17


ApplicationtoDirectedPGMS

18

CTL FG

GRLFA

AS

CTL FG

GRLFA

AS

CTL FG

GRLFA

AS

DirectedPGM Undirected”moralised”PGM

Factorgraph


19

f2(CTL,GRL,FG) = P (GRL|CTL, FG)

f1(CTL) = P (CTL)

CTL FG

GRLFA

AS

f1f2

f3f4f5

Factorgraphs

FG=Abipartitegraph,withfactors(functions)andRVs

DirectedPGMsresultintree-structuredFG


20

FactorgraphfortheHMM

q1 q2 q3 q4

P (q1)P (o1|q1)

P (q2|q1)

P (o2|q2)P (o3|q3)

P (o4|q4)

P (q4|q3)P (q3|q2)

Effectofobservednodesincorporatedintounaryfactors


• Twotypesofmessages:* betweenfactorsandRVs;andbetweenRVsandfactors* summarise acompletesub-graph

• E.g.,

• Structureinferenceas“gather-and-distribute”* gathermessagesfromleavesoftreetowardsroot* thenpropagatemessagebackdownfromroottoleaves

21

Sum-ProductoverFactorGraphs

mf2!GRL(GRL) =X

CTL

X

FG

f2(GRL,CTL, FG)mCTL!f2(CTL)mFG!f2(FG)


UndirectedPGManalogue:CRFs

• ConditionalRandomField:Samemodelappliedtosequences* observedoutputsarewords,speech,aminoacidsetc* statesaretags:part-of-speech,phone,alignment…* sharedinferencealgo.,i.e.,sum-product/max-product

• CRFsarediscriminative,modelP(q|o)* versusHMMswhicharegenerative,P(q,o)* undirectedPGMmoregeneralandexpressive

22𝑞1 𝑞2𝑞3𝑞4

𝑜1 𝑜2𝑜3𝑜4


Summary

• HMMsasexamplePGMs* formulationasPGM* independenceassumptions* probabilisticinferenceusingforward-backward* statisticalinferenceusingexpectationmaximisation

• Messagepassing:generalinferencemethodforU-PGMs* sum-product&max-product* factorgraphs

23

comp90051 statistical machine learning - github … · comp90051 statistical machine learning...

Documents