lecture 8: learning fully observed undirected graphical models · mle for undirected graphical...

30
CS839: Probabilistic Graphical Models Lecture 8: Learning Fully Observed Undirected Graphical Models Theo Rekatsinas 1

Upload: others

Post on 04-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

CS839:ProbabilisticGraphicalModels

Lecture8:LearningFullyObservedUndirectedGraphicalModels

TheoRekatsinas

1

Page 2: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Recall:UndirectedGraphicalModels

2

• Pairwise(non-causal)relationships• Wecanwritedownthemodel,scorespecificconfigurationsoftheRVsbutnotgeneratesamples• Contingencyconstraintsonnodeconfigurations

Page 3: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Recall:MLEforBNs

3

• IfweassumetheparametersforeachCPDaregloballyindependent,andallnodesarefullyobserved,thenthelog-likelihoodfunctiondecomposesintoasumoflocalterms,onepernode

• MLE-basedparameterestimationofGMreducestolocalest.ofeachGLIM.

Page 4: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEforUndirectedGraphicalModels

4

• Fordirectedmodels,thelog-likelihooddecomposesintoasumofterms,oneperfamily(nodeplusparents).• Forundirectedmodels,thelog-likelihooddoesnotdecompose,becausethenormalizationconstantZisafunctionofallparameters.

• Ingeneral,weneedtodoinferencetolearnparametersforundirectedmodels,eveninthefullyobservedcase.

Page 5: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

LoglikelihoodforUndirectedGraphicalModelswithtabularcliquepotentials

5

• Sufficientstatistics:foranMRF(V,E)thenumberoftimesthataconfigurationx isobservedinadatasetD canberepresentedasfollows.

• Thelog-likelihoodis:

Page 6: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

LoglikelihoodforUndirectedGraphicalModelswithtabularcliquepotentials

6

• Sufficientstatistics:foranMRF(V,E)thenumberoftimesthataconfigurationx isobservedinadatasetD canberepresentedasfollows.

• Intermsofthecounts,theloglikelihoodis:

Page 7: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Takingthederivative

7

• Log-likelihood

• Fistterm:

• Secondterm:

Page 8: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Takingthederivative

8

• Derivativeoflog-likelihood

• Henceweneedthat:

• Thissaysthat:• Forthemaximumlikelihoodestimatesoftheparameters,foreachclique,themodelmarginals mustbeequaltotheobservedmarginals (empiricalcounts)

• Thisisonlyaconditionthattheparametersshouldsatisfy!• Itdoesnottellushowtogetthemaximumlikelihoodestimates.

Page 9: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEforUndirectedGraphicalModels

9

• Case1:Themodelisdecomposable (triangulatedgraph)andallthecliquepotentialsaredefinedonmaximalcliques.• TheMLEofcliquepotentialsareequaltotheempiricalmarginals (orconditionals)ofthecorrespondingclique.

• SolveMLEbyinspection

• Decomposablemodels• Gisdecomposable,Gistriangulated,Ghasajunctiontree

• Ex.:ChainX1– X2– X3 pMLE(X1, X2, X3) =p̃(X1, X2)p̃(X2, X3)

p̃(X2)

pMLE(X1, X2) =X

X3

p̃(X1, X2, X3) = p̃(X1|X2)X

X3

p̃(X2, X3) = p̃(X1, X2)

pMLE(X2, X3) = p̃(X2, X3)

Page 10: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEforUndirectedGraphicalModels

10

• Decomposablemodels• Gisdecomposable,Gistriangulated,Ghasajunctiontree

• Ex.:ChainX1– X2– X3

• Tocomputethecliquepotentialswejustusetheempiricalmarginals (orconditionals),i.e.,theseparatormustbedividedintooneofitsneighbors.ThenZ=1

pMLE(X1, X2, X3) =p̃(X1, X2)p̃(X2, X3)

p̃(X2)

pMLE(X1, X2) =X

X3

p̃(X1, X2, X3) = p̃(X1|X2)X

X3

p̃(X2, X3) = p̃(X1, X2)

pMLE(X2, X3) = p̃(X2, X3)

Page 11: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEforUndirectedGraphicalModels

11

• Case2:Themodelisnon-decomposable,thepotentialsaredefinedasnon-maximalcliques.WecannotequateMLEofcliquepotentialstoempiricalmarginals (orconditionals)• Iterativepotentialfitting• GeneralizedIterativeScaling

Page 12: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

IterativeProportionalFitting(IPF)

12

• Fromthelog-likelihood:

• Let’srewriteinadifferentway:or

• Thecliquepotentialsimplicitlyappearinthemodelmarginal

• Let’sforgetaclosedformsolutionandfocusonafixed-pointiterationmethod

• Needtoruninferenceforp(t)(xc)

m(xc)

N c(xc)=

p(xc)

c(xc)

p̃(xc)

c(xc)=

p(xc)

c(xc)

p(xc) = f( c(xc))

p̃(xc)

(t+1)c (xc)

=p(xc)

(t)c (xc)

(t+1)c (xc) = (t)

c (xc)p̃(xc)

p(t)(xc))

Page 13: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

PropertiesofIPFUpdates

13

• Setoffixed-pointequations:

• Wecanshowthatitisalsoacoordinateascentalgorithm(coordinates=parametersofcliquepotentials)

• Ateachstep,itwillincreasethelog-likelihood,anditwillconvergetoaglobalmaximum.

• MaximizingtheloglikelihoodisequivalenttominimizingtheKLdivergence(crossentropy)• Themax-entropyprincipletoparameterizationoffersadualperspectivetotheMLE.

(t+1)c (xc) = (t)

c (xc)p̃(xc)

p(t)(xc)

Page 14: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEforundirectedgraphicalmodels

14

• Whathaveweseensofar?

• Decomposablegraphs• Cliquepotentialscorrespondtomarginals orconditionals

• Cliquepotentialsthatcorrespondtofulltables• IterativeProportionalfitting

• Whataboutmodelsthatareparameterizedmorecompactly?

(t+1)c (xc) = (t)

c (xc)p̃(xc)

p(t)(xc)

Page 15: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-parameterizedcliquepotentials

15

• Sofarwesawthemostgeneralformofanundirectedgraphicalmodel:cliquesareparameterizedbygeneraltabular potentialfunctions

• Forlargecliquesthesepotentialsareexponentiallycostlyforinference.Also,wehaveexponentiallymanyparameterstolearnfromlimiteddata.

• Solution:?

Page 16: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-parameterizedcliquepotentials

16

• Sofarwesawthemostgeneralformofanundirectedgraphicalmodel:cliquesareparameterizedbygeneraltabular potentialfunctions

• Forlargecliquesthesepotentialsareexponentiallycostlyforinference.Also,wehaveexponentiallymanyparameterstolearnfromlimiteddata.

• Solution:Changethegraphicalmodeltomakecliquessmaller.

Page 17: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-parameterizedcliquepotentials

17

• Sofarwesawthemostgeneralformofanundirectedgraphicalmodel:cliquesareparameterizedbygeneraltabular potentialfunctions

• Forlargecliquesthesepotentialsareexponentiallycostlyforinference.Also,wehaveexponentiallymanyparameterstolearnfromlimiteddata.

• Solution:Changethegraphicalmodeltomakecliquessmaller.

• Thischangesthedependenciesandmayforceustomakemoreindependenceassumptionsthanwhatwehad

Page 18: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-parameterizedcliquepotentials

18

• Sofarwesawthemostgeneralformofanundirectedgraphicalmodel:cliquesareparameterizedbygeneraltabular potentialfunctions

• Forlargecliquesthesepotentialsareexponentiallycostlyforinference.Also,wehaveexponentiallymanyparameterstolearnfromlimiteddata.

• Solution:Keepthesamegraphicalmodelbutuselessparameterstodefinethecliquepotentials• RecallparametersharingforBNs

• Thisistheideabehindfeature-basedmodels.

Page 19: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Features

19

• Letacliquecorrespondtothreeconsecutivecharacters• Howwouldyoudefinep(c1,c2,c3)?

Page 20: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Features

20

• Letacliquecorrespondtothreeconsecutivecharacters• Howwouldyoudefinep(c1,c2,c3)?• Forallpossiblecharactercombinationsyouneed263 – 1parameters.• Buttherearesequencesthatareunlikely:kfd

• A“feature”isafunctionthatisnon-zeroforafewparticularinputs.ThinkofBooleanfeatures.• Is“ing”theinputsequence?Then1otherwise0.

• Wecandefinefeaturesforcontinuousfeaturesaswell.

Page 21: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Featuresaspotentials

21

• Eachfeaturefunctioncanbeconvertedtoapotentialbytakingtheexponentofit.Wecanmultiplythesepotentialstogethertogetacliquepotential.

• Example:

• ThereisstillanexponentialnumberofsettingbutweonlyuseKparameterscorrespondingtotheKfeatures.• Canwerecoverthetabularrepresentation?

Page 22: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

CombiningFeatures

22

• Eachfeaturehasaweightθk whichrepresentsthenumericalstrengthofthefeatureandwhetheritincreasesordecreasestheprobabilityofaclique.• Themarginaloverthecliqueisageneralizedexponentialfamilydistribution(ageneralizedlinearmodel)

• Thefeaturesmaybeoverlappingacrosscliques

Page 23: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-basedmodel

23

• Jointdistribution:

• Wecanusethesimplifiedform

• Thefeaturescorrespondtothesufficientstatisticsofourmodel.

• Weneedtolearnparametersθk

Page 24: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Feature-basedmodel

24

• Jointdistribution:

• Wecanusethesimplifiedform

• Thefeaturescorrespondtothesufficientstatisticsofourmodel.

• Weneedtolearnparametersθk• WhataboutIPF?• Notclearhowtousethisruletoupdatetheparametersandpotentials

(t+1)c (xc) = (t)

c (xc)p̃(xc)

p(t)(xc)

Page 25: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEofFeature-basedUndirectedGraphicalModels

25

• Objective:scaledlikelihoodfunction

• Maindifficulties:thepartitionfunctionisacomplexfunctionoftheparameters.IfwetakeaderivativeZappearsinthedenominator.Nothingchanges.WewanttoavoidcomputingZ.

• Approximationtime…

Page 26: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEofFeature-basedUndirectedGraphicalModels

26

• Objective:scaledlikelihoodfunction

• WereplacelogZ byitsupperboundlogZ(θ) <=μΖ(θ)– logμ– 1whereμ =Z-1(θ(t))

• Thuswehave

Page 27: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEofFeature-basedUndirectedGraphicalModels

27

• Wehave

• Wedefine

• Weassume.Alsobyconvexityofexpfi(x) � 0,X

i

fi = 1 exp(

X

i

⇡ixi) X

i

⇡i exp(xi)

Page 28: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

MLEofFeature-basedUndirectedGraphicalModels

28

• Wehave

• Wetakethederivative• p(t)(x)istheunnormalized versionofp(x|θ(t))

• Ourupdatesare:

Page 29: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Summary

29

• IterativeProportionalFitting(IPF)isageneralalgorithmforMLEofUGMs• A fixed-pointequationforpotentialsoversinglecliques,usescoordinateascent• Requiresthepotentialtobefullyparameterized• Thecliquedescribedbythepotentialsdoesnothavetobemax-clique• Forfullydecomposablemodel,reducestoasinglestepiteration

• GeneralizedIterativeScaling(GIS)• IterativescalingongeneralUGMwithfeature-basedpotentials• IPFisaspecialcaseofGISwherethecliquepotentialisbuiltonfeaturesdefinedasindicatorfunctionsofthecliqueconfigurations.

Page 30: Lecture 8: Learning Fully Observed Undirected Graphical Models · MLE for Undirected Graphical Models 10 •Decomposable models •G is decomposable, G is triangulated, G has a junction

Summary

30