lecture 5: hidden variables - cs.cmu.edunasmith/psnlp/lecture5.pdfopmizaon for hidden variables •...
TRANSCRIPT
![Page 1: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/1.jpg)
Lecture5:HiddenVariables
![Page 2: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/2.jpg)
RandomVariablesinDecoding
inputs(X)
outputs(Y)
parameters(w)
![Page 3: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/3.jpg)
RandomVariablesinSupervisedLearning
inputs(X)
outputs(Y)
parameters(w)
![Page 4: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/4.jpg)
HiddenVariablesareDifferent
• Weusetheterm“hiddenvariable”(or“latentvariable”)torefertosomethingweneversee.– Notevenintraining.– SomeFmeswebelievetheyarereal.– SomeFmeswebelievetheyonlyapproximatereality.
![Page 5: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/5.jpg)
RandomVariablesinDecoding
inputs(X)
outputs(Y)
parameters(w)
latent(Z)
![Page 6: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/6.jpg)
RandomVariablesinSupervisedLearning
inputs(X)
outputs(Y)
parameters(w)
latent(Z)
![Page 7: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/7.jpg)
LatentVariablesandInference
• BothlearninganddecodingcanbesFllbeunderstoodasinferenceproblems.
• Usually“mixed”:– somevariablesaregeLngmaximized– somevariablesaregeLngsummed
![Page 8: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/8.jpg)
WordAlignments
• SinceIBMmodel1,wordalignmentshavebeentheprototypicalhiddenvariable.
• UlFmately,intranslaFon,wedonotcarewhattheyare.
• Currentapproach:learnthewordalignmentsunsupervised,thenfixthemtotheirmostlikelyvalues.– ThenconstructmodelsfortranslaFon.
• Alignmentonitsown:unsupervisedproblem.• MTonitsown:supervisedproblem.• MT+alignment:supervisedproblemwithlatentvariables.
![Page 9: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/9.jpg)
AlignmentsinText‐to‐TextProblems
• Wangetal.(2007):“Jeopardy”modelforanswerrankinginQA.– AlignquesFonstoanswers.– SimilarmodelforparaphrasedetecFon(DasandSmith,2009)
![Page 10: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/10.jpg)
LatentAnnotaFonsinParsing
• Treebankcategories(N,NN,NP,etc.)aretoocoarse‐grained.– LexicalizaFon(Collins,Eisner)– Johnson’s(1998)parentannotaFon– KleinandManning(2003)parser
• Treatthetrue,fine‐grainedcategoryashidden,andinferitfromdata.– Matsuzaki,Petrov,Dreyer,manyothers.
![Page 11: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/11.jpg)
RicherFormalisms
• Cohnetal.(2009):treesubsFtuFongrammar.– Derivedtreeisobserved(outputvariable).– DerivaFontree(segmentaFonintoelementarytrees)ishidden.
• ZeglemoyerandCollins(2005andlater):inferCCGsyntaxfromfirst‐orderlogicalexpressionsandsentences.
• Liangetal.(2011):infersemanFcrepresentaFonfromtextanddatabase.
![Page 12: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/12.jpg)
TopicModels
• Infertopics(ortopicblends)indocuments.• LatentDirichletallocaFon(Bleietal.,2003)isagreatexample.– SomeFmesaugmentedwithanoutputvariable(BleiandMcAuliffe,2007)–“supervised”LDA.
– Manyextensions!
![Page 13: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/13.jpg)
UnsupervisedNLP
• Clustering(Brown,1992,manymore)• POStagging(Merialdo,1994,manymore)• Parsing(PereiraandSchabes,KleinandManning,…)• SegmentaFon(word–Goldwater;discourse–Eisenstein)
• Morphology• LexicalsemanFcs• Syntax‐semanFcscorrespondences• SenFmentanalysis• CoreferenceresoluFon• Word,phrase,andtreealignment
![Page 14: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/14.jpg)
SupervisedorUnsupervised?
• Dependsonthetask,notthemodel.– Isay“unsupervised”whentheoutputvariablesarehiddenattrainingFme.
![Page 15: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/15.jpg)
RandomVariablesinUnsupervisedLearning
inputs(X)
outputs(Y)
parameters(w)
latent(Z)
![Page 16: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/16.jpg)
ProbabilisFcView
• TheusualstarFngpointforhiddenvariablesismaximumlikelihood.– “Input”and“output”donotmager;onlyobserved/latent.
![Page 17: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/17.jpg)
RandomVariablesinProbabilisFcLearning
visible(V)
latent(L)
parameters(w)
![Page 18: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/18.jpg)
EmpiricalRiskView
• Log‐loss– Equatestomaximummarginallikelihood(orMAPifR(w)isanegatedlogprior).
– UnlikelossfuncFonsinlecture4,thisisnotconvex!– EMseekstosolvethisproblem(butit’snottheonlyway).– RegularizaFondecisionsareorthogonal.
loss(v;hw) = ! log pw(v)
= ! log!
!
pw(v, !)
minw!Rd
1N
!
i
loss(vi;hw) + R(w)
![Page 19: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/19.jpg)
OpFmizingtheMarginalLog‐Loss
• EMasinference• EMasopFmizaFon
• DirectopFmizaFon
![Page 20: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/20.jpg)
GenericEMAlgorithm
• Input:w(0)andobservaFonsv1,v2,…vN• Output:learnedw• t=0• RepeatunFlw(t)≈w(t‐1):– Estep:
– Mstep:
– ++t• Returnw(t)
!i,!!, q(t)i (!)" pw(t)(! | vi)
w(t+1) ! maxw
!
i
!
!
q(t)i (!) log pw(vi, !)
![Page 21: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/21.jpg)
MAPLearningasaGraphicalModel
w L
V
R
exp–R(w)=p(w)
pw(L)
pw(V|L)
• Combinedinference(maxoverw,sumoverL)isveryhard.– Ifwwerefixed,geLngtheposterioroverLwouldn’tbesobad.
– IfLwerefixed,maximizingoverwwouldn’tbesobad.
![Page 22: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/22.jpg)
MAPLearningasaGraphicalModel
w L
V
R
exp–R(w)=p(w)
pw(L)
pw(V|L)
Estep Mstep
w L
V
R
exp–R(w)=p(w)
pw(L)
pw(V|L)
![Page 23: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/23.jpg)
Baum‐Welch(EMforHMMs)asanExample
• Estep:forward‐backwardalgorithm(oneachexample).– ThisisexactmarginalinferencebyvariableeliminaFon.
– Thestructureofthegraphicalmodelletsusdothisbydynamicprogramming.
– ThemarginalsareprobabiliFesoftransiFonandemissioneventsateachposiFon.
• Mstep:MLEbasedonsoteventcounts.– RelaFvefrequencyesFmaFonaccomplishesMLEformulFnomials.
![Page 24: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/24.jpg)
visible(V) latent(L)
parameters(w)
Baum‐WelchasaGraphicalModel
emit Y1
X1
R
Y2
X2
transit
Y3
X3
YnXn
![Page 25: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/25.jpg)
visible(V) latent(L)
parameters(w)
AcFveTrail!
emit Y1
X1
R
Y2
X2
transit
Y3
X3
YnXn
![Page 26: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/26.jpg)
visible(V) visible(V)
parameters(w)
NoAcFveTrailinAll‐VisibleCase
emit Y1
X1
R
Y2
X2
transit
Y3
X3
YnXn
![Page 27: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/27.jpg)
WhyLatentVariablesMakeLearningHard
• NewintuiFon:parametersarenowinterdependentthatwerenotinterdependentinthefully‐visiblecase.
• ItallgoesbacktoacFvetrails.
![Page 28: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/28.jpg)
“Viterbi”Learningis“Okay”!
w L
V
R
exp–R(w)=p(w)
pw(L)
pw(V|L)
• ApproximatejointMAPinferenceoverwandL(mostprobableexplanaFoninference).
• LossfuncFon: loss(v;hw) = !max!
log pw(v, !)
![Page 29: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/29.jpg)
CondiFonalModels
• EMisusuallycloselyassociatedwithfullygeneraFveapproaches.
• Youcandothesamethingswithlog‐linearmodelsandwithcondiFonalmodels.– Locallynormalizedmodelsgiveflexibilitywithoutrequiringglobalinference(Berg‐Kirkpatrick,2010).
– HiddenvariableCRFs(Quagonietal.,2007)areverypowerful.
![Page 30: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/30.jpg)
LearningCondiFonalHiddenVariableModels
w L
Vout
R
Vin
w
Vout
R
VindistribuFonoverVinisnotmodeled
standardcondiFonalmodel(e.g.,CRF)
![Page 31: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/31.jpg)
OpFmizaFonforHiddenVariables
• We’vedescribedhiddenvariablelearningasinferenceproblems.
• ItismorepracFcal,ofcourse,tothinkaboutthisasop.miza.on.
• EMcanbeunderstoodfromanopFmizaFonframework,aswell.
![Page 32: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/32.jpg)
EMandLikelihood
• TheconnecFonbetweenthegoalaboveandtheEMprocedureisnotimmediatelyclear.
!(w) =!
i
log!
!
pw(vi, !)
![Page 33: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/33.jpg)
OpFmizaFonViewofEM
• AfuncFonofwandthecollecFonofqi.• Claim:EMperformscoordinate ascentonthisfuncFon.
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#
![Page 34: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/34.jpg)
OpFmizaFonViewofEM
• Thethirdtermisouractualgoal,Φ.Itonlydependsonw(nottheqi).
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#!(w)
![Page 35: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/35.jpg)
OpFmizaFonViewofEM
• ThelagertwotermstogetherarepreciselywhatwemaximizeontheMstep,giventhecurrentqi.– Thisisaconcaveproblemandwesolveitexactly.
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#!(w)
!
!
qi(!) log pw(vi, !)
![Page 36: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/36.jpg)
OpFmizaFonViewofEM
• Concern:istheMstepimprovingterm2attheexpenseofΦ?– No.
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#!(w)
!
!
qi(!) log pw(vi, !)
![Page 37: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/37.jpg)
TheMStep
• SecondpartisalsonotgeLnganyworsefromiteraFontoiteraFon:
!(w) =!
i
!
!
q(t)i (!) log pw(vi, !)!
!
i
!
!
q(t)i (!) log pw(! | vi)
!!
i
!
!
q(t)i (!) log pw(t+1)(! | vi) +
!
i
!
!
q(t)i (!) log pw(t)(! | vi)
= !!
i
!
!
q(t)i (!) log pw(t+1)(! | vi) +
!
i
!
!
q(t)i (!) log q(t)
i (!)
=!
i
D(q(t)i (·)"pw(t+1)(· | vi))
# 0
![Page 38: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/38.jpg)
TheMStep
• EachMstep,onceqiisfixed,maximizesaboundonthelog‐likelihoodΦ.– Forfixedqi,thisisaconcaveproblemwecansolveinclosedforminmanycases.
• WhatabouttheEstep?
![Page 39: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/39.jpg)
OpFmizaFonViewofEM
• Estepconsidersthefirsttwoterms.• Setseachqitobeequaltotheposteriorunderthecurrentmodel.
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#!(w)
!
!
qi(!) log pw(vi, !)
!D(qi(·)"pw(· | vi))
![Page 40: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/40.jpg)
CoordinateAscent
• Estepfixeswandsolvesfortheqi.• Mstepfixesallqiandsolvesforw.
!
i
"!
!
!
qi(!) log qi(!) +!
!
qi(!) log pw(! | vi) + log pw(vi)
#
!
!
qi(!) log pw(vi, !)
!D(qi(·)"pw(· | vi))
![Page 41: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/41.jpg)
ThingsPeopleForgetAboutEM
• MulFplerandomstarts(ornon‐randomstarts),selectusinglikelihoodondevelopmentdata.
• VariantsmayhelpavoidlocalopFma…
![Page 42: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/42.jpg)
VariantsofEM
• “Online”variantswherewedoanEstepononeoramini‐batchofexamplesaresFllcoordinateascent(NealandHinton,1998).
• DeterminisFcannealing:flagenouttheqi,makingthefuncFonclosertoconcave.
• StochasFcvariant:userandomizedapproximateinferenceforEstep.
• “Generalized”EM:improvewbutdon’tbotheropFmizingcompletely.
![Page 43: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/43.jpg)
DirectOpFmizaFon
• AnalternaFvetoEM:applystochasFcgradientascentorquasi‐NewtonmethodsdirectlytoΦ.
• TypicallydoneforMN‐likemodelswithfeatures,e.g.,latent‐variableCRFs.– GradientisadifferenceoffeatureexpectaFons.– Requiresmarginalinference.
![Page 44: Lecture 5: Hidden Variables - cs.cmu.edunasmith/psnlp/lecture5.pdfOpmizaon for Hidden Variables • We’ve described hidden variable learning as inference problems. • It is more](https://reader034.vdocuments.mx/reader034/viewer/2022042300/5ecaad7e47af58370650edf2/html5/thumbnails/44.jpg)
Summary
• EM:manywaystounderstandit.– Theguarantee:eachroundwillimprovethelikelihood.
– That’saboutasmuchaswecansay.
• SomeFmesitworks.– SmartiniFalizers– Lotsofbiasinherentinthemodelstructure/assumpFons