lecture 17 slides may 30 th , 2006
DESCRIPTION
University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes . Lecture 17 Slides May 30 th , 2006. Announcements. READING: M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman) - PowerPoint PPT PresentationTRANSCRIPT
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 1
University of WashingtonDepartment of Electrical Engineering
EE512 Spring, 2006 Graphical Models
Jeff A. Bilmes <[email protected]>Jeff A. Bilmes <[email protected]>
Lecture 17 Slides
May 30th, 2006
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 2
• READING: – M. Jordan: Chapters 13,14,15 (on Gaussians and Kalman)
• Reminder: TA discussions and office hours:– Office hours: Thursdays 3:30-4:30, Sieg Ground Floor
Tutorial Center– Discussion Sections: Fridays 9:30-10:30, Sieg Ground Floor
Tutorial Center Lecture Room
• No more homework this quarter, concentrate on final projects!!
• Makeup class, tomorrow Wednesday, 5-7pm, room TBA (watch email).
Announcements
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 3
• L1: Tues, 3/28: Overview, GMs, Intro BNs.• L2: Thur, 3/30: semantics of BNs + UGMs• L3: Tues, 4/4: elimination, probs, chordal I• L4: Thur, 4/6: chrdal, sep, decomp, elim• L5: Tue, 4/11: chdl/elim, mcs, triang, ci props.• L6: Thur, 4/13: MST,CI axioms, Markov prps.• L7: Tues, 4/18: Mobius, HC-thm, (F)=(G)• L8: Thur, 4/20: phylogenetic trees, HMMs• L9: Tue, 4/25: HMMs, inference on trees• L10: Thur, 4/27: Inference on trees, start poly
• L11: Tues, 5/2: polytrees, start JT inference• L12: Thur, 5/4: Inference in JTs• Tues, 5/9: away• Thur, 5/11: away• L13: Tue, 5/16: JT, GDL, Shenoy-Schafer• L14: Thur, 5/18: GDL, Search, Gaussians I• L15: Mon, 5/22: laptop crash • L16: Tues, 5/23: search, Gaussians I• L17: Thur, 5/25: Gaussians• Mon, 5/29: Holiday• L18: Tue, 5/30• L19: Thur, 6/1: final presentations
Class Road Map
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 4
• L1: Tues, 3/28: • L2: Thur, 3/30:• L3: Tues, 4/4: • L4: Thur, 4/6:• L5: Tue, 4/11:• L6: Thur, 4/13:• L7: Tues, 4/18:• L8: Thur, 4/20: Team Lists, short abstracts I• L9: Tue, 4/25:• L10: Thur, 4/27: short abstracts II• L11: Tues, 5/2:
• L12: Thur, 5/4: abstract II + progress• L--: Tues, 5/9• L--: Thur, 5/11: 1 page progress report• L13: Tue, 5/16:
• L14: Thur, 5/18: 1 page progress report• L15: Tues, 5/23• L16: Thur, 5/25: 1 page progress report• L17: Tue, 5/30: Today• L18: Wed, 5/31:• L19: Thur, 6/1: final presentations
• L20: Tue, 6/6 4-page papers due (like a conference paper), Only .pdf versions accepted.
Final Project Milestone Due Dates
• Team lists, abstracts, and progress reports must be turned in, in class and using paper (dead tree versions only).
• Final reports must be turned in electronically in PDF (no other formats accepted).
• No need to repeat what was on previous progress reports/abstracts, I have those available to refer to.
• Progress reports must report who did what so far!!
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 5
• Gaussian Graphical Models
Summary of Last Time
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 6
• Other forms of inference.• Structure learning in graphical models
Outline of Today’s Lecture
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 7
Books and Sources for Today
• Jordan chapters 13-15• Other references contained in presentation …
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 8
Graphical Models
1. We start with some probability distribution P1. Could be specified as a given, or more likely we have training data of
some number of samples. Goal is to learn P or some approximation to it (training) and then use P in some way (inference for making decisions, such as most probable assignment, max-product semi-ring, etc.)
2. The graph =(,) represents “structure” in P
3. Graph can provide efficient representation and computational inference for
4. There can be multiple graphs that represent a given (e.g., complete graph represents all ).
5. Goal: find computationally cheap exact or approximate graph cover for 6. Once we do this, we just compute probabilities using the junction tree
algorithm or search algorithm, etc.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 9
Graphical Models & Tree-width
1. The complexity parameter for G=(V,E)
2. Def: k-tree: k-nodes, clique of size k. n>k nodes, connect nth node to previous k fully connected nodes
3. Example: 4-tree
note: all separators are of size 4
4-tree with 4 nodes4-tree with 5 nodes4-tree with 6 nodes
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 10
Graphical Models & Tree-width
1. Def: partial k-tree: any sub-graph of a k-tree
2. Def: tree-width of a graph G is smallest k such that G is a partial k-tree.
3. Thm: The tree-width decision problem is NP-complete1. We mentioned this before, proven by Arnborg,
4. Thm: exact probabilistic inference (computing probabilities, etc.) is exponential in the tree-width
1. Time-space tradeoffs can help here, but what if all of the points in the achievable region are intolerably computationally expensive?
5. The big question, what if exact inference is too expensive?
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 11
When exact inference is too expensive
1. Two general approaches: either an exact solution to an approximate problem, or an approximate solution to an exact problem.
2. Exact solution to approximate problem1. Structure learning: find a low tree-width (or “cheap” in some way)
graphical model that is still “high-quality” in some way, and then perform exact inference on the approximate model.
2. This can be easy or hard depending on the tree-width and on the measure of “high-quality”, and on the learning paradigm.
3. Approximate solution to an exact problem1. Approximate inference, tries to approximate in some way what
must be computed: Loopy Belief propagation, Sampling/Pruning, Variational/Mean-field, and hybrids between the above
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 12
Finding k-trees
1. How do we score a k-tree?1. Maximum likelihood, or conditional score
2. May we assume that truth itself is a k-tree1. Sometimes simplifications can be made if we assume that truth is
part of a known model class, such as a k-tree for some fixed constant k independent of n=|V|, the number of nodes.
3. How to find best 1-tree?
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 13
Finding 1-trees
1. Given P, goal is to find best 1-tree approximation of P in a maximum likelihood sense.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 14
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 15
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 16
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 17
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 18
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 19
Finding 1-trees
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 20
Plethora of negative results
• Chickering1996, Chickering/Meek/Heckerman2003: learning Bayesian networks in ML sense is NP-hard (“is there a BN with fixed upper bound on in-degree that achieves a given ML score?”)
• Dasgupta1999: learning polytrees in ML sense is NP-hard (“is there a poly-tree with fixed upper-bound in-degree with given ML score?”) and worse, there is constant c such that NP-complete to decide if there is polytree with score <= c*OPT_score.
• Meek2001: learning even a path (sub-class of trees) in ML sense is NP-hard.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 21
Plethora of negative results
• Srebro/Karger2001: learning k-trees in ML sense is hard.• So, generative model structure learning is likely to be a
difficult problem (unless k=1, or P=NP).• We next spend a bit of time talking about the Srebro/Karger
result.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 22
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 23
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 24
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 25
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 26
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 27
Optimal ML k-trees is NP-complete
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 28
Some good news …
• PAC framework: key difference, assume graph is in concept class (learn the class of k-trees). This means that if we have sampled data, we assume that the sampled data is from truth which itself is a k-tree.
• Hoeffgen’93: Can robustly (polynomial samples in n, 1/ 1/) PAC learn bounded tree-width graphical models, and can robustly and efficiently (algorithm polynomial in same) PAC learn 1-trees.
• Narasimhan&Bilmes2004: Can robustly and efficiently PAC learn bounded tree-width graphical models.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 29
More good news …
• Abbeel,Koller,Ng2005: Can robustly and efficiently PAC learn bounded-degree factor graphs
– note: this does not have complexity guarantee. E.g., x grids have bounded degree but not tree-width. Star has unbounded degree but bounded tree-width. Tree-width crucial for computation in general.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 30
How to PAC-learn such graphs …
• Mutual information is symmetric submodular
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 31
How to PAC-learn such graphs …
• Submodularity and Optimization
(Narisimhan&Bilmes,2004)
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 32
Another positive result
• Since mutual information is symmetric-submodular, we can find optimal partitions:
• where• This has implications for clustering (Narishamhan,Jojic,Bilmes’05) and
also for structure learning (can find optimal 1-step graph decomposition by finding the optimal k-separator).
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 33
Finding ML decompositions …
• Optimal to one level
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 34
Discriminative structure
• Goal might be classification using a generative model.
• Distinction between parameters & structure• Two possible goals:
– 1) find one global structure that classifies well– 2) find class-specific structure (one per class)
• In either case, finding a good discriminative structure may render discriminative parameter learning less necessary.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 35
Optimal discriminative structure procedure …
• choose (for now, lets just assume =1)• Find tree that best satisfies:
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 36
Properties
• Options:– can fix structure and train parameters using either maximum
likelihood (generative) or maximum conditional likelihood (discriminative)
– Can learn discriminative structure, and can train either generatively or discriminatively
– In all cases, assume appropriate regularization.
• Bad news: KL-divergence not decomposable w.r.t. tree in the discriminative case.
• Goal: identify a local discriminative measure on edges in a graph (analogous to mutual information for generative case).
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 37
EAR measure
• EAR (explaining away residual) measure.
(Bilmes,’98)• Goal is to maximize EAR:
– Intuition: dependence class-conditionally, but otherwise independent
• EAR is approximation to expected log conditional posterior. Exact for independent “auxiliary” variables.
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 38
Conditional mutual information?
• Conditional mutual information is not guaranteed to discriminate well.
• Building a MST using (;|) as edge weights will not necessarily produce a tree with good classification properties. EAR fixes this in certain cases.
• Example: 3 features (,,) and a class
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 39
Generative training/structure
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 40
Generative training/structure
Lec 17: May 30th, 2006 EE512 - Graphical Models - J. Bilmes Page 41
General Structure Learning