graphical models and applications cns/ee148 instructors: m.polito, p.perona, r.mceliece ta: c. fanti
TRANSCRIPT
![Page 1: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/1.jpg)
Graphical Models and Graphical Models and ApplicationsApplications
CNS/EE148
Instructors: M.Polito, P.Perona, R.McEliece
TA: C. Fanti
![Page 2: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/2.jpg)
Example from Medical Example from Medical DiagnosticsDiagnostics
Visit to Asia
Tuberculosis
Tuberculosisor Cancer
XRay Result Dyspnea
BronchitisLung Cancer
Smoking
Patient Information
Medical Difficulties
Diagnostic Tests
![Page 3: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/3.jpg)
What is a graphical model ?What is a graphical model ?A graphical model is a way of representing probabilistic relationships between random variables.
Variables are represented by nodes:
Conditional (in)dependencies are represented by (missing) edges:
Undirected edges simply give correlations between variables (Markov Random Field or Undirected Graphical model):
Directed edges give causality relationships (Bayesian Network or Directed Graphical Model):
![Page 4: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/4.jpg)
“Graphical models are a marriage between probability theory and graph theory.
They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering – uncertainty and complexity –
and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms.
Fundamental to the idea of a graphical model is the notion of modularity – a complex system is built by combining simpler parts.
![Page 5: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/5.jpg)
Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data.
The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms.
Many of the classical multivariate probabilistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism -- examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models.
![Page 6: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/6.jpg)
The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism.
This view has many advantages -- in particular, specialized techniques that have been developed in one field can be
transferred between research communities and exploited more widely.
Moreover, the graphical model formalism provides a natural framework for the design of new systems.“
--- Michael Jordan, 1998.
![Page 7: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/7.jpg)
(Picture by Zoubin Ghahramani and Sam Roweis)
We already know many graphical models:
![Page 8: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/8.jpg)
Plan for the class Plan for the class Introduction to Graphical Models (Polito)
Basics on graphical models and statistics
Learning from data
Exact inference
Approximate inference
Applications to Vision (Perona)
Applications to Coding Theory (McEliece)
Belief Propagation and Spin Glasses
![Page 9: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/9.jpg)
Plan for the class Plan for the class Introduction to Graphical Models (Polito)
Basics on graphical models and statistics
Learning from data
Exact inference
Approximate inference
Applications to Vision (Perona)
Applications to Coding Theory (McEliece)
Belief Propagation and Spin Glasses
![Page 10: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/10.jpg)
Basics on graphical models and statistics
• Basics of graph theory.
• Families of probability distributions associated to directed and undirected graphs.
•Markov properties and conditional independence.
•Statistical concepts as building blocks for graphical models.
•Density estimation, classification and regression.
![Page 11: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/11.jpg)
Basics on graphical models and statistics
Graphs and Families of Probability Distributions
There is a family of probability distributions that can be represented with this graph. X1
X4
X2
X3
X5
X6
X7
1) Every P.D. presenting (at least) the conditional independencies that can be derived from the graph belongs to that family.2) Every P.D. that can be factorized as p(x1,…,x7)=p(x4|x1,x2) p(x7|x4) p(x5|x4) p(x6|x5,x2) p(x3|x2) belongs to that family.
![Page 12: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/12.jpg)
Basics on graphical models and statistics
Building blocks for graphical models
X
Y
p(X)= ??? p(Y|X)= ???? Bayesian approach: every unknown quantity (including parameters) is treated as a random variable.
X
Density estimation
Regression
Classification
Parametric and nonparametric methods
Linear, conditional mixture, nonparametric
Generative and discriminative approach
Q
X
Q
X
X Y
XX
![Page 13: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/13.jpg)
Plan for the class Plan for the class Introduction to Graphical Models (Polito)
Basics on graphical models and statistics
Learning from data
Exact inference
Approximate inference
Applications to Vision (Perona)
Applications to Coding Theory (McEliece)
Belief Propagation and Spin Glasses
![Page 14: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/14.jpg)
Learning from data
• Model structure and parameters estimation.
•Complete observations and latent variables.
•MAP and ML estimation.
•The EM algorithm.
•Model selection.
![Page 15: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/15.jpg)
Structure Observability Method
Known Full ML or MAP estimation
Known Partial EM algorithm
Unknown Full Model selection or model averaging
Unknown Partial EM + model sel. or model aver.
![Page 16: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/16.jpg)
Plan for the class Plan for the class Introduction to Graphical Models (Polito)
Basics on graphical models and statistics
Learning from data
Exact inference
Approximate inference
Applications to Vision (Perona)
Applications to Coding Theory (McEliece)
Belief Propagation and Spin Glasses
![Page 17: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/17.jpg)
Exact inference
• The junction tree and related algorithms.
• Belief propagation and belief revision.
• The generalized distributive law.
• Hidden Markov Models and Kalmann Filtering with graphical models.
![Page 18: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/18.jpg)
Exact inferenceConditional independencies
Given a probability distribution p(X,Y,Z,W),
How can we decide if the groups of variables X and Y
are “conditionally independent” from each other
once the value of the variables Z is assigned ?
With graphical models, we can implement an algorithm
reducing this global problem to a series of local problems
(see Matlab demo of the Bayes-Ball algorithm)
![Page 19: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/19.jpg)
Exact inference Variable elimination and Distributive Law
X1
X4
X2
X3
X5
X7
X6
X5
X4
X3
p(x1,…,x6,x7)=p(x4|x1,x2)p(x6|x4)p(x5|x4) p(x3|x2)p(x7|x2,x5)
Marginalize over x7:
p(x1,…,x6)=x7 [p(x4|x1,x2)p(x6|x4)p(x5|x4)
p(x3|x2)p(x7|x2,x5)]
Applying a “distributive law”:
p(x1,…,x6)=p(x4|x1,x2)p(x6|x4)p(x5|x4)
p(x3|x2) x7 [ p(x7|x2,x5)]
The language of graphical models allows a general formalization of this method.
![Page 20: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/20.jpg)
Exact inferenceJunction graph and message passing
Group random variables which are “fully connected”.
Connect group-nodes with common members: the “junction graph”.
Every node only needs to “communicate” with its neighbors.
If the junction graph is a tree, there is a “message passing” protocol which allows exact inference.
![Page 21: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/21.jpg)
Plan for the class Plan for the class Introduction to Graphical Models (Polito)
Basics on graphical models and statistics
Learning from data
Exact inference
Approximate inference
Applications to Vision (Perona)
Applications to Coding Theory (McEliece)
Belief Propagation and Spin Glasses
![Page 22: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/22.jpg)
Approximate inferenceApproximate inference
•Kullback-Leibler divergence and entropy.
•Variational methods.
•Monte Carlo Methods.
• Loopy junction graphs and loopy belief propagation.
• Performance of loopy belief propagation.
•Bethe approximation of free energy and belief propagation.
![Page 23: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/23.jpg)
Approximate inferenceApproximate inferenceKullback-Leibler divergence and GMKullback-Leibler divergence and GM
The graphical model associated to a P.D. p(x) is too complicated. AND NOW ?!?!?
We choose to approximate p(x) with q(x), obtained by making assumptions on the junction graph corresponding to the graphical model.
Example: eliminate loops, bound the number of nodes linked to each node.
A good criterion for choosing q: mimimize the cross-entropy, or Kullback-Leibler divergence:
dxxq
xpxpqpD
)(
)(log)()||(
![Page 24: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/24.jpg)
Approximate inferenceApproximate inferenceVariational methodsVariational methods
Example: the QMR-DT database
)exp(1)|1(
)exp()|0(
)(
0
)(
0
ij
ijiji
ij
ijiji
adadfp
adadfp
By using the inequality:)(1 Hxx ee
We get the approximation:
)(
0 )exp())(exp()|1(ij
dijiiiii
jaHadfp
The node fi is “unlinked”:
Diseases: d1,d2,d3
Symptoms: f1,f2,f3,f4
p(f|d)=p(fi|d) p(di)
![Page 25: Graphical Models and Applications CNS/EE148 Instructors: M.Polito, P.Perona, R.McEliece TA: C. Fanti](https://reader030.vdocuments.mx/reader030/viewer/2022032516/56649c755503460f94928593/html5/thumbnails/25.jpg)
Approximate inferenceApproximate inferenceLoopy belief propagationLoopy belief propagation
When the junction graph is not a tree, inference is not exact: roughly speaking, a message might pass through a node more often, causing trouble.
However, in certain cases, an iterated application of a message passing algorithm converges to a good candidate for the exact solution.