lecture 1: introduction to graphical models · pgms in the deep learning era 40 section 2 deep...

Post on 04-Jun-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

CS839:ProbabilisticGraphicalModels

Lecture1:IntroductiontoGraphicalModels

TheoRekatsinas

1

Acknowledgement:adaptedslidesbyEricXing

1.Introduction,admin&setup

2

Section1

WhoamI…

Instructor(me)TheoRekatsinas• FacultyintheComputerSciencesandpartoftheUW-DatabaseGroup• Research:dataintegrationandcleaning,statisticalanalytics,andmachinelearning.• thodrek@cs.wisc.edu• Officehours:Byappointment@CS4361

3

Section1

CourseWebpage:

https://thodrek.github.io/CS839_fall18/

4

Section1

Logistics

• Textbooks:• ProbabilisticGraphicalModels,byDaphneKoller andNir Friedman• IntroductiontoStatisticalRelationalLearning,byLise Getoor andBenTaskar

• Officehours:• Byappointment.Justsendmeanemail

• Homeworksubmission:• WewilluseCanvas

5

Section1

AssignmentsandGradingLogistics

• 3homeworkassignments:20%ofgrade• Theoryexercises,implementationexercises

• Midterm:30%ofgrade• Inclassexam• ~weak#9

• Finalproject:50%ofgrade• Projectproposal:10%ofgrade(~weak#9)• Proposalpresentation:10%ofgrade• Finalreport:30%ofgrade(dueonDec20th)• Ingroupsof(upto)3.Ideallyitshouldbethree.Groupsshouldbeformedinthefirsttwoweeks.

6

Section1

Projectexamples

• ApplyingPGMtothedevelopmentofareal,substantialMLsystem• Buildaweb-scalefakenewsdetector.• Buildastorylinetrackingsystemfornewsmedia.• Designandimplementthestate-of-the-artknowledgebaseemgeddings

• Theoryand/oralgorithmicprojects• Amoreefficientapproximateinferencealgorithm.• Whenisinferenceinthepresenceofnoisyobservationshard?• WhencanweapproximatePGMswithfeed-forwardnetworks?

• System’s• ImplementMarkovlogicontopofPyro

7

Section1

2.Classoverview

8

Section2

Whataregraphicalmodels?

9

Section2

MGraph Model

D ⌘ {X(i)1 , X(i)

2 , . . . , X(i)m }NI=1

Data

PGMsallowustoreasonaboutuncertainty

10

Section2

InformationExtraction`

DataCleaningWeakSupervision

FundamentalQuestions

• Representation• Howtocapture/modeluncertaintiesinpossibleworlds?• Howtoencodeourdomainknowledge/assumptions/constraints?

• Example:IsyourGradeindependentoftheDifficultyoftheclass?

11

Section2

Difficulty Intelligence Grade

FundamentalQuestions

• Inference• Howdoweanswerquestions/queriesaccordingtothemodelinhandandtheavailabledataP(X|Data)

• Example:WhatwillyourGradebeifDifficultyis“high”?

12

Section2

Difficulty Intelligence Grade

FundamentalQuestions

• Learning• Whatmodelis“right”forthedata?

• Example:Whatifwehave(Difficulty=“Low”,Intelligence=“High”,Grade=“High”)forperson1,(Difficulty=“High”,Intelligence=“High”,Grade=“Low”)forperson2,etc?

13

Section2

Difficulty Intelligence Grade

M = arg max

M2MF (D;M)

BasicProbabilityConcepts

• Representation:Whatisthejointprobabilitydistributiononmultiplevariables?

• Howmanystateconfigurationswehaveintotal?• Aretheyallneededtoberepresented?• Whatinsightsdowegetfromthismodel?

14

Section2

P (X1, X2, X3, X4, X5, X6, X7, X8)

BasicProbabilityConcepts

• Learning:Wheredowegetalltheseprobabilities?

• Maximal-likelihoodestimation?Howmanydatadoweneed?• Arethereotherestimationprinciples?• Wheredoweputdomainknowledgeintermsofplausiblerelationshipsbetweenvariables,andplausiblevaluesoftheprobabilities?

15

Section2

BasicProbabilityConcepts

• Inference:Ifnotallvariablesareobservable,howtocomputetheconditionaldistributionoflatentvariablesgivenevidence?

• Assumeisgiven.Computingrequiressummingoverallconfigurationsoftheunobservedvaribles.

16

Section2

X1 P (X2|X1)26

Whatisagraphicalmodel?

17

Section2

AMultivariateDistributioninHigh-DSpace

Example:Apossibleworldforcellularsignaltransduction

18

Section2

Signaltransduction istheprocessbywhichachemicalorphysicalsignalistransmittedthroughacellasaseriesofmolecularevents

Example:Apossibleworldforcellularsignaltransduction

19

Section2

StructureSimplifiesRepresentation

20

Section2

Arrowsindicatedependenciesamongstvariables

ProbabilisticGraphicalModels

21

Section2

• If areconditionallyindependent(asdescribedbyaPGM),thejointcanbefactoredtoaproductofsimplerterms,e.g.,

• So,whyaPGM?Wecanincorporatedomainknowledge• 1+1+2+2+2+4+2+4=18,a16-foldreductionfrom28inrepresentationcost!

Xi’s

P (X1, X2, X3, X4, X5, X6, X7, X8)

=P (X1)P (X2)P (X3|X1)P (X4|X2)P (X5|X2)

P (X6|X3, X4)P (X7|X6)P (X8|X5, X6)

OtherdesiredpropertiesofPGMS

22

Section2

•Modularity– Allowsustointegrateheterogeneousdata

OtherdesiredpropertiesofPGMS

23

Section2

• Priorknowledge– Bayesianlearning

• Capturesuncertaintyinamoreprincipledway– introducepriors

Whatisagraphicalmodel?

24

Section2

Multivariatestatistics+structure

Whatisagraphicalmodel?

25

Section2

• Informal: Itisasmartwaytospecify/compose/designexponentially-largeprobabilitydistributionswithoutpayinganexponentialcost,andatthesametimeendowthedistributionswithstructuredsemantics

Whatisagraphicalmodel?

26

Section2

•Moreformal: Itreferstoafamilyofdistributionsonasetofrandomvariablesthatarecompatiblewithalltheprobabilisticindependencepropositionsencodedbyagraphthatconnectsthesevariables

TypesofPGMs

27

Section2

•Directed: Bayesiannetworks•Directededgesgivecausalityrelationships

P (X1, X2, X3, X4, X5, X6, X7, X8)

=P (X1)P (X2)P (X3|X1)P (X4|X2)P (X5|X2)

P (X6|X3, X4)P (X7|X6)P (X8|X5, X6)

TypesofPGMs

28

Section2

•Undirected:Markovrandomfields•Undirectededgessimplygivecorrelationsbetweenvariables

P (X1, X2, X3, X4, X5, X6, X7, X8)

=

1

Zexp(E(X1) + E(X2) + E(X1, X3) + E(X2, X4) + E(X2, X5))

exp(E(X6, X3, X4) + E(X7, X6) + E(X8, X5, X6))

BayesianNetworks

29

Section2

•Structure:DAGs•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkoutsideitsMarkovblanket

TheMarkovblanketofnode includesitsparents,childrenandtheotherparentsofallofitschildren.

BayesianNetworks

30

Section2

• Structure:DAGs

•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkoutsideitsMarkovblanket

• Localconditionaldistributions(CPD)andtheDAGcompletelydeterminethejointdistribution.

• Edgesrepresentcausalityrelationships,andfacilitateagenerativeprocess

MarkovRandomFields

31

Section2

• Structure:undirectedgraph

•Meaning:anodeisconditionallyindependentofeveryothernodeinthenetworkgivenitsdirectneighbors

• Localcontingencyfunctions(potentials)andthecliquesinthegraphcompletelydeterminethejointdist.

• Edgesrepresentcorrelationsbetweenvariables,butnoexplicitwaytogeneratesamples

Well-knownmodelsasPGMs

32

Section2

•Densityestimation• Parametricandnon-parametricmethods

•Regression• Linear,conditionalmixture,non-parametric

•Classification• Generativeanddiscriminativeapproaches

•Clustering

Morecomplexmodels

33

Section2

•PartiallyobservedMarkovdecisionprocesses

Morecomplexmodels

34

Section2

• InformationExtraction

[OpenTag,Zhengetal.,KDD2018]

Morecomplexmodels

35

Section2

•Solidstatephysics

ApplicationsofGraphicalModels

36

Section2

• Machinelearning• Computationalstatistics• Computervisionandgraphics• NLP• Informationextraction• Roboticcontrol• Decisionmakingunderuncertainty• Computationalbiology• Medicaldiagnosis/prognosis• Financeandeconomics• Etc.

WhyPGMs?

37

Section2

•Languageforcommunication•Languageforcomputation•Languagefordevelopment

•Doesitremindyouofsomething?

WhyPGMs?

38

Section2

•Probabilitytheory: Formalframeworktocombineheterogeneouspartsandensureconsistency.•Graphstructure: Appealinginterfaceformodelinghighly-interactingsetsofvariables.Interpretabilityanddomainknowledge.•Generalization:ManyclassicalprobabilisticsystemsarespecialcasesofPGMs

PGMsintheDeepLearningera

39

Section2

•ProbabilisticModels: Goalistocapturethejointdistributionofinputvariables,outputvariables,latentvariables,parametersandhyper-parameters.Everythingisarandomvariables.•Deep(Learning)Models: Hierarchicalmodelstructurewheretheoutputofonemodelbecomestheinputofthenexthigherlevelmodel.Targetedtowardsfeaturelearning.

PGMsintheDeepLearningera

40

Section2

DeepLearning PGMsEmpiricalgoal: e.g.,Classification,featurelearning e.g.,transferlearning,latentvariable

inference

Structure: Graphical Graphical

Objective: Aggregatedfromlocalfunctions Aggregatedfromlocalfunctions

Vocabulary: Neuron, activation/gatefunction Variables,potentialfunctions

Algorithm: Singleinference algorithm,BP Manyalgorithms,major focusofopenresearch,approximateinference

Evaluation: Onend-performance On almosteveryintermediatequantity(calibratedprobabilities)

Implementation: ManytricksJ Quite standardized

PGMsintheDeepLearningera

41

Section2

•WhyProbabilisticModels?: Predictionsfromaprobabilisticmodelthatcapturesaprinciplednotionofuncertainty.Decisionmaking.•WhyDeep(Learning)Models: Featurelearning.Noassumptionsforcomplexdomainssuchasimagesandspeech.

CombiningPGMsandDeepLearning

42

Section2

•DeepBoltzmannMachines

UsingPGMstogeneratetrainingdataforDL

43

Section2

•Weaksupervision/Dataprogramming

ClassOverview

44

Section2

• FundamentalsofPGMs:• BayesianNetworksandMarkovRandomFields• Discrete,Continuous,andHybridmodels,exponentialfamily• Basicrepresentation,inference,andlearning• Focusonspecificnetworks:MultivariateGaussianModels,HiddenMarkovModels

• AdvancedTopics:• Approximateinference• Boundedtreewidth• SpectralmethodsforGraphicalmodels• Structurelearning• Relationalrepresentationlearningandconnectionstodeeplearning

• Applications

top related