comp90051 statistical machine learning - github … · comp90051 statistical machine learning...
TRANSCRIPT
COMP90051StatisticalMachineLearningSemester2,2017
Lecturer:TrevorCohn
24.HiddenMarkovModels&messagepassing
Lecture24StatisticalMachineLearning(S22017)
Lookingback…
• Representationofjointdistributions
• Conditional/marginalindependence* Directedvsundirected
• Probabilisticinference* Computingotherdistributionsfromjoint
• Statisticalinference* Learnparametersfrom(missing)data
• Today:puttingtheseallintopractice…
2
StatisticalMachineLearning(S22017) Lecture24
HiddenMarkovModels
Modelofchoiceforsequentialdata.Aformofclustering(ordimensionalityreduction)fordiscrete
timeseries.
3
Lecture24StatisticalMachineLearning(S22017)
TheHMM(andKalman Filter)
• Sequentialobservedoutputsfromhiddenstate* statestakediscretevalues(i.e.,clusters)* assumesdiscretetimesteps1,2,…,T
• TheKalman filtersamewithcontinuousGaussianr.v.’s* i.e.,dimensionalityreduction,butwithtemporaldynamic
4
Lecture24StatisticalMachineLearning(S22017)
HMMApplications
• NLP– partofspeechtagging:givenwordsinsentence,inferhiddenpartsofspeech
“IloveMachineLearning”à noun,verb,noun,noun
• Speechrecognition:givenwaveform,determinephonemes
• Biologicalsequences:classification,search,alignment
• Computervision:identifywho’swalkinginvideo,tracking
5
Lecture24StatisticalMachineLearning(S22017)
Formulation
• FormulatedasdirectedPGM* thereforejointexpressedas
* bold variablesareshorthandforvectorofT values
• Parameters(forhomogenousHMM)
6
P (o,q) = P (q1)P (o1|q1)TY
i=2
P (qi|qi�1)P (oi|qi)
Lecture24StatisticalMachineLearning(S22017)
allothero’sexcludingi
Independence
• GraphencodesindependencebtwRVs* conditionalindependence:
oi⟘ o\i |qi* stateqi mustencodeallsequentialcontext
• Markovblanketislocal* foroi blanketisqi* forqi blanketis{oi,qi-1, qi+1}
7
Lecture24StatisticalMachineLearning(S22017)
FundamentalHMMTasksHMMTask PGMTask
Evaluation. GivenanHMM𝜇 andobservationsequence𝒐,determinelikelihoodPr(𝒐|𝜇)
Probabilisticinference
Decoding. GivenanHMM𝜇 andobservationsequence𝒐,determinemostprobablehiddenstatesequence𝒒
MAPpointestimate
Learning. Givenanobservationsequence𝒐andsetofstates,learnparameters𝐴, 𝐵, Π
Statisticalinference
8
Lecture24StatisticalMachineLearning(S22017)
“Evaluation”a.k.a.marginalisation
• Computeprob.ofobservationso bysummingoutq
• Makethismoreefficientbymovingthesums
• Dejavu?Maybewecoulddovar.elimation…
9
P (o|µ) =X
q
P (o,q|µ)
=X
q1
X
q2
. . .
X
qT
P (q1)P (o1|q1)P (q2|q1)P (o2|q2) . . . P (qT |qT�1)P (oT |qT )
P (o|µ) =X
q1
P (q1)P (o1|q1)X
q2
P (q2|q1)P (o2|q2) . . .X
qT
P (qT |qT�1)P (oT |qT )
Lecture24StatisticalMachineLearning(S22017)
Elimination=BackwardAlgorithm
10
P (o|µ) =X
q1
P (q1)P (o1|q1)X
q2
P (q2|q1)P (o2|q2) . . .X
qT
P (qT |qT�1)P (oT |qT )
mT!T�1(qT�1)
m2!1(q1)
P (o|µ) =X
q1
P (q1)P (o1|q1)m2!1(q1)
EliminateqT…
Eliminateq2“Eliminate”q1
qT…q1 q2
Lecture24StatisticalMachineLearning(S22017)
Elimination=ForwardAlgorithm
11
Eliminateq1…
EliminateqT-1“Eliminate”qT
qT…q1 q2
P (o|µ) =X
qT
P (oT |qT )X
qT�1
P (qT |qT�1)P (oT |qT ) . . .X
q1
P (q2|q1)P (q1)P (o1|q1)
m1!2(q2)
mT�1!T (qT )
P (o|µ) =X
q1
P (oT |qT )mT�1!T (qT )
Lecture24StatisticalMachineLearning(S22017)
Forward-Backward
• Bothalgorithmsarejustvariableeliminationusing different orderings* qT … q1 à backwardalgorithm* q1 … qT à forwardalgorithm* bothhavetimecomplexityO(Tl2) wherel isthelabelset
• CanuseeithertocomputeP(o)* butevenbetter,canusethemvaluestocomputemarginals (andpairwisemarginals overqi,qi+1)
12
P (qi|o) =1
P (o)mi�1!i(qi)P (oi|qi)mi+1!i(qi)
forward backward
Lecture24StatisticalMachineLearning(S22017)
StatisticalInference(Learning)
• Learnparameters𝛍 =(A,B,𝛑),givenobservationsequenceo
• Called“BaumWelch”algorithmwhichusesEM toapproximateMLE,argmax𝛍 P(o|𝛍):1. initialise 𝛍1,leti=12. computeexpectedmarginaldistributions
P(qt|o, 𝛍i)forallt;andP(qt-1,qt|o, 𝛍i)fort=2..T3. fitmodel𝛍i+1 basedonexpectations4. repeatfromstep2,withi=i+1
• Expectationscomputedusingforward-backward13
StatisticalMachineLearning(S22017) Lecture24
MessagePassing
Sum-productalgorithmforefficientlycomputingmarginaldistributionsovertrees.Anextensionof
variableeliminationalgorithm.
14
Lecture24StatisticalMachineLearning(S22017)
Inferenceasmessagepassing
• Eachmcanbeconsideredasamessage whichsummarises theeffectoftherestofthegraphonthecurrentnodemarginal.* Inference=passingmessagesbetweenallnodes
15
q1 q2 q4q3
m1!2
m3!2
Lecture24StatisticalMachineLearning(S22017)
Inferenceasmessagepassing
• Messagesvectorvalued,i.e.,functionoftargetlabel
• Messagesdefinedrecursively:lefttoright,orrighttoleft
16
q1 q2 q4q3
m1!2
m3!2
m4!3
Lecture24StatisticalMachineLearning(S22017)
Sum-productalgorithm
• Applicationofmessagepassingtomoregeneralgraphs* appliestochains,treesandpoly-trees(directedPGMswith>1parent)
* ‘sum-product’derivesfrom:• product=productofincomingmessages• sum =summingouttheeffectofRV(s)akaelimination
• Algorithmsupportsotheroperations(semi-rings)* e.g.,max-product,swappingsumformax* Viterbi algorithmisthemax-productvariantoftheforwardalgorithmforHMMs,solvestheargmaxqP(q|o)
17
Lecture24StatisticalMachineLearning(S22017)
ApplicationtoDirectedPGMS
18
CTL FG
GRLFA
AS
CTL FG
GRLFA
AS
CTL FG
GRLFA
AS
DirectedPGM Undirected”moralised”PGM
Factorgraph
Lecture24StatisticalMachineLearning(S22017)
19
f2(CTL,GRL,FG) = P (GRL|CTL, FG)
f1(CTL) = P (CTL)
CTL FG
GRLFA
AS
f1f2
f3f4f5
Factorgraphs
FG=Abipartitegraph,withfactors(functions)andRVs
DirectedPGMsresultintree-structuredFG
Lecture24StatisticalMachineLearning(S22017)
20
FactorgraphfortheHMM
q1 q2 q3 q4
P (q1)P (o1|q1)
P (q2|q1)
P (o2|q2)P (o3|q3)
P (o4|q4)
P (q4|q3)P (q3|q2)
Effectofobservednodesincorporatedintounaryfactors
Lecture24StatisticalMachineLearning(S22017)
• Twotypesofmessages:* betweenfactorsandRVs;andbetweenRVsandfactors* summarise acompletesub-graph
• E.g.,
• Structureinferenceas“gather-and-distribute”* gathermessagesfromleavesoftreetowardsroot* thenpropagatemessagebackdownfromroottoleaves
21
Sum-ProductoverFactorGraphs
mf2!GRL(GRL) =X
CTL
X
FG
f2(GRL,CTL, FG)mCTL!f2(CTL)mFG!f2(FG)
Lecture24StatisticalMachineLearning(S22017)
UndirectedPGManalogue:CRFs
• ConditionalRandomField:Samemodelappliedtosequences* observedoutputsarewords,speech,aminoacidsetc* statesaretags:part-of-speech,phone,alignment…* sharedinferencealgo.,i.e.,sum-product/max-product
• CRFsarediscriminative,modelP(q|o)* versusHMMswhicharegenerative,P(q,o)* undirectedPGMmoregeneralandexpressive
22𝑞1 𝑞2𝑞3𝑞4
𝑜1 𝑜2𝑜3𝑜4
Lecture24StatisticalMachineLearning(S22017)
Summary
• HMMsasexamplePGMs* formulationasPGM* independenceassumptions* probabilisticinferenceusingforward-backward* statisticalinferenceusingexpectationmaximisation
• Messagepassing:generalinferencemethodforU-PGMs* sum-product&max-product* factorgraphs
23