02 probabilistic inference in graphical models

Click here to load reader

Post on 19-May-2015

589 views

Category:

Technology

1 download

Embed Size (px)

TRANSCRIPT

  • 1. Belief PropagationVariational Inference Sampling Break Part 3: Probabilistic Inference in Graphical Models Sebastian Nowozin and Christoph H. LampertColorado Springs, 25th June 2011Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models

2. Belief PropagationVariational Inference Sampling BreakBelief PropagationBelief PropagationMessage-passing algorithmExact and optimal for tree-structured graphsApproximate for cyclic graphsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 3. Belief PropagationVariational Inference Sampling BreakBelief PropagationExample: Inference on ChainsYiYjYk YlF G H Z = exp(E (y )) y Y = exp(E (yi , yj , yk , yl )) yi Yi yj Yj yk Yk yl Yl = exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi Yi yj Yj yk Yk yl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 4. Belief PropagationVariational Inference Sampling BreakBelief PropagationExample: Inference on ChainsYiYjYk YlF G H Z = exp(E (y )) y Y = exp(E (yi , yj , yk , yl )) yi Yi yj Yj yk Yk yl Yl = exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi Yi yj Yj yk Yk yl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 5. Belief PropagationVariational Inference Sampling BreakBelief PropagationExample: Inference on ChainsYiYjYk YlF G H Z = exp(E (y )) y Y = exp(E (yi , yj , yk , yl )) yi Yi yj Yj yk Yk yl Yl = exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl ))) yi Yi yj Yj yk Yk yl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 6. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Inference on ChainsYiYjYkYlF GH Z= exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl )))yi Yi yj Yj yk Yk yl Yl= exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Yk yl Yl=exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Ykyl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 7. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Inference on ChainsYiYjYkYlF GH Z= exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl )))yi Yi yj Yj yk Yk yl Yl= exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Yk yl Yl=exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Ykyl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 8. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Inference on ChainsYiYjYkYlF GH Z= exp((EF (yi , yj ) + EG (yj , yk ) + EH (yk , yl )))yi Yi yj Yj yk Yk yl Yl= exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Yk yl Yl=exp(EF (yi , yj )) exp(EG (yj , yk )) exp(EH (yk , yl ))yi Yi yj Yj yk Ykyl YlSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 9. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on ChainsrHYk RYkYiYjYk YlF GH Z=exp(EF (yi , yj )) exp(EG (yj , yk ))exp(EH (yk , yl ))yi Yi yj Yj yk Yk yl YlrHYk (yk )=exp(EF (yi , yj )) exp(EG (yj , yk ))rHYk (yk )yi Yi yj Yj yk YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 10. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on ChainsrHYk RYkYiYjYk YlF GH Z=exp(EF (yi , yj )) exp(EG (yj , yk ))exp(EH (yk , yl ))yi Yi yj Yj yk Yk yl YlrHYk (yk )=exp(EF (yi , yj )) exp(EG (yj , yk ))rHYk (yk )yi Yi yj Yj yk YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 11. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on ChainsrGYj RYjYiYj YkYlF G H Z =exp(EF (yi , yj ))exp(EG (yj , yk ))rHYk (yk ) yi Yi yj Yjyk YkrG Yj (yj ) =exp(EF (yi , yj ))rG Yj (yj ) yi Yi yj YjSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 12. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on ChainsrGYj RYjYiYj YkYlF G H Z =exp(EF (yi , yj ))exp(EG (yj , yk ))rHYk (yk ) yi Yi yj Yjyk YkrG Yj (yj ) =exp(EF (yi , yj ))rG Yj (yj ) yi Yi yj YjSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 13. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on Trees Yi Yj YkYlF G HIYm Z =exp(E (y ))y Y =exp((EF (yi , yj ) + + EI (yk , ym )))yi Yi yj Yi yk Yi yl Yi ym YmSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 14. Belief PropagationVariational Inference SamplingBreakBelief PropagationExample: Inference on Trees Yi Yj YkYlF G HIYm Z =exp(EF (yi , yj ))exp(EG (yj , yk )) yi Yi yj Yjyk Yk exp(EH (yk , yl ))exp(EI (yk , ym )) yl Ylym Ym rHYk (yk )rI Yk (yk )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 15. Belief PropagationVariational InferenceSamplingBreakBelief PropagationExample: Inference on Trees rHYk (yk ) Yi YjYkYlF GHrIYk (yk )IYmZ = exp(EF (yi , yj ))exp(EG (yj , yk )) yi Yi yj Yj yk Yk (rHYk (yk ) rI Yk (yk ))Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 16. Belief PropagationVariational InferenceSamplingBreakBelief PropagationExample: Inference on TreesqYk G (yk ) rHYk (yk ) Yi YjYkYlFG HrIYk (yk )IYmZ = exp(EF (yi , yj ))exp(EG (yj , yk )) yi Yi yj Yj yk Yk (rHYk (yk ) rI Yk (yk )) qYk G (yk )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 17. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Inference on TreesqYk G (yk ) rHYk (yk ) Yi YjYkYlFG HrIYk (yk )IYm Z =exp(EF (yi , yj )) exp(EG (yj , yk ))qYk G (yk ) yi Yi yj Yj yk YkSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 18. Belief PropagationVariational Inference Sampling BreakBelief PropagationFactor Graph Sum-Product AlgorithmMessage: pair of vectors at each ...factor graph edge (i, F ) E ...rF Yi 1. rF Yi RYi : factor-to-variableYi ...message 2. qYi F RYi : variable-to-factor ...F qYi FmessageAlgorithm iteratively update messagesAfter convergence: Z and F can beobtained from the messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 19. Belief PropagationVariational Inference Sampling BreakBelief PropagationFactor Graph Sum-Product AlgorithmMessage: pair of vectors at each ...factor graph edge (i, F ) E ...rF Yi 1. rF Yi RYi : factor-to-variableYi ...message 2. qYi F RYi : variable-to-factor ...F qYi FmessageAlgorithm iteratively update messagesAfter convergence: Z and F can beobtained from the messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 20. Belief PropagationVariational Inference Sampling BreakBelief PropagationSum-Product: Variable-to-Factor messageSet of factors adjacent to rAYivariable i AqYi F M(i) = {F F : (i, F ) E}Yi BrF Yi FVariable-to-factor message . rBYi . .qYi F (yi ) = rFYi (yi ) F M(i){F }Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 21. Belief PropagationVariational InferenceSamplingBreakBelief PropagationSum-Product: Factor-to-Variable message YjqYj FrF YiYi Yk qYi FqYk FF...Factor-to-variable message rF Yi (yi ) =exp (EF (yF ))qYj F (yj )yF YF , jN(F ){i} yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 22. Belief PropagationVariational Inference Sampling BreakBelief PropagationMessage Scheduling qYi F (yi ) = rF Yi (yi )F M(i){F } rF Yi (yi ) = exp (EF (yF )) qYj F (yj )yF YF , jN(F ){i} yi =yiProblem: message updates depend on each otherNo dependencies if product is empty (= 1)For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 23. Belief Propagation Variational InferenceSampling BreakBelief PropagationMessage Scheduling qYi F (yi ) = rF Yi (yi )F M(i){F } rF Yi (yi ) =exp (EF (yF ))qYj F (yj )yF YF , jN(F ){i} yi =yiProblem: message updates depend on each otherNo dependencies if product is empty (= 1)For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 24. Belief Propagation Variational InferenceSampling BreakBelief PropagationMessage Scheduling qYi F (yi ) = rF Yi (yi )F M(i){F } rF Yi (yi ) =exp (EF (yF ))qYj F (yj )yF YF , jN(F ){i} yi =yiProblem: message updates depend on each otherNo dependencies if product is empty (= 1)For tree-structured graphs we can resolve all dependenciesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 25. Belief PropagationVariational Inference Sampling BreakBelief PropagationMessage Scheduling: Trees10.1. B Ym BYm 9.2.F F6.8.5.3. C C Yk YlYk Yl 7. 4.3.5. 8. 6.D EDE2.4.9.7.A A YiYj Yi Yj1.10.1. Select one variable node as tree root2. Compute leaf-to-root messages3. Compute root-to-leaf messagesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 26. Belief PropagationVariational Inference Sampling BreakBelief PropagationInference Results: Z and marginals... . . .YiA rAYi qYi F ...F YiqYj F qYk FrBYi rCYi BCYjYk...... ... ...... ...Partition function, evaluated at root Z=rF Yr (yr ) yr Yr F M(r )Marginal distributions, for each factor1F (yF ) = p(YF = yF ) = exp (EF (yF ))qYi F (yi ) ZiN(F )Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 27. Belief PropagationVariational InferenceSampling BreakBelief PropagationMax-Product/Max-Sum AlgorithmBelief Propagation for MAP inference y = argmax p(Y = y |x, w ) y YExact for treesFor cyclic graphs: not as well understood as sum-product algorithmqYi F (yi ) =rF Yi (yi ) F M(i){F } rF Yi (yi ) = max EF (yF ) +qYj F (yj ) yF YF ,jN(F ){i}yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 28. Belief Propagation Variational Inference Sampling BreakBelief PropagationSum-Product/Max-Sum comparison Sum-Product qYi F (yi ) =rF Yi (yi )F M(i){F } rF Yi (yi ) =exp (EF (yF )) qYj F (yj )yF YF ,jN(F ){i} yi =yi Max-Sum qYi F (yi ) =rF Yi (yi )F M(i){F } rF Yi (yi ) = max EF (yF ) +qYj F (yj )yF YF ,jN(F ){i} yi =yiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 29. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Pictorial Structures(1) FtopX Ytop(2) F top,head... ... Yhead ... ......YrarmYtorso Ylarm...Yrhnd... ... ... YlhndYrleg Ylleg......Yrfoot YlfootTree-structured model for articulated pose (Felzenszwalb andHuttenlocher, 2000), (Fischler and Elschlager, 1973)Body-part variables, states: discretized tuple (x, y , s, )(x, y ) position, s scale, and rotationSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 30. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Pictorial Structures(1) FtopX Ytop(2) F top,head... ... Yhead ... ......YrarmYtorso Ylarm...Yrhnd... ... ... YlhndYrleg Ylleg......Yrfoot YlfootTree-structured model for articulated pose (Felzenszwalb andHuttenlocher, 2000), (Fischler and Elschlager, 1973)Body-part variables, states: discretized tuple (x, y , s, )(x, y ) position, s scale, and rotationSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 31. Belief PropagationVariational InferenceSampling BreakBelief PropagationExample: Pictorial Structures (cont) rF Yi (yi ) =max EF (yi , yj ) + qYj F (yj ) (1)(yi ,yj )Yi Yj ,jN(F ){i} yi =yiBecause Yi is large ( 500k), Yi Yj is too big(Felzenszwalb and Huttenlocher, 2000) use special form forEF (yi , yj ) so that (1) is computable in O(|Yi |)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 32. Belief PropagationVariational Inference Sampling BreakBelief PropagationLoopy Belief PropagationKey dierence: no schedule that removes dependenciesBut: message computation is still well denedTherefore, classic loopy belief propagation (Pearl, 1988) 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum) 2. Update messages, hoping for convergence 3. Upon convergence, treat beliefs F as approximate marginalsDierent messaging schedules (synchronous/asynchronous,static/dynamic)Improvements: generalized BP (Yedidia et al., 2001), convergent BP(Heskes, 2006)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 33. Belief PropagationVariational Inference Sampling BreakBelief PropagationLoopy Belief PropagationKey dierence: no schedule that removes dependenciesBut: message computation is still well denedTherefore, classic loopy belief propagation (Pearl, 1988) 1. Initialize message vectors to 1 (sum-product) or 0 (max-sum) 2. Update messages, hoping for convergence 3. Upon convergence, treat beliefs F as approximate marginalsDierent messaging schedules (synchronous/asynchronous,static/dynamic)Improvements: generalized BP (Yedidia et al., 2001), convergent BP(Heskes, 2006)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 34. Belief PropagationVariational InferenceSamplingBreakBelief PropagationSynchronous Iteration ABAB YiYjYk YiYjYkC DEC D E FGFG YlYmYn YlYmYnHI JH I J KLKL YoYpYq YoYpYqSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 35. Belief PropagationVariational Inference Sampling BreakVariational InferenceMean eld methodsMean eld methods (Jordan et al., 1999), (Xing et al., 2003)Distribution p(y |x, w ), inference intractableApproximate distribution q(y )Tractable family QQq pSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 36. Belief PropagationVariational Inference SamplingBreakVariational InferenceMean eld methods (cont)Qq = argmin DKL (q(y ) p(y |x, w ))qQqpDKL (q(y ) p(y |x, w ))q(y )= q(y ) logp(y |x, w )y Y=q(y ) log q(y ) q(y ) log p(y |x, w )y Yy Y= H(q) +F ,yF (q)EF (yF ; xF , w ) + log Z (x, w ),F F yF YF where H(q) is the entropy of q and F ,yF are the marginals of q. (The form of depends on Q.)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 37. Belief PropagationVariational Inference SamplingBreakVariational InferenceMean eld methods (cont)Qq = argmin DKL (q(y ) p(y |x, w ))qQqpDKL (q(y ) p(y |x, w ))q(y )= q(y ) logp(y |x, w )y Y=q(y ) log q(y ) q(y ) log p(y |x, w )y Yy Y= H(q) +F ,yF (q)EF (yF ; xF , w ) + log Z (x, w ),F F yF YF where H(q) is the entropy of q and F ,yF are the marginals of q. (The form of depends on Q.)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 38. Belief PropagationVariational InferenceSamplingBreakVariational InferenceMean eld methods (cont) q = argmin DKL (q(y ) p(y |x, w ))qQWhen Q is rich: q is close to pMarginals of q approximate marginals of pGibbs inequalityDKL (q(y ) p(y |x, w )) 0Therefore, we have a lower boundlog Z (x, w ) H(q) F ,yF (q)EF (yF ; xF , w ).F F yF YFSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 39. Belief PropagationVariational InferenceSamplingBreakVariational InferenceMean eld methods (cont) q = argmin DKL (q(y ) p(y |x, w ))qQWhen Q is rich: q is close to pMarginals of q approximate marginals of pGibbs inequalityDKL (q(y ) p(y |x, w )) 0Therefore, we have a lower boundlog Z (x, w ) H(q) F ,yF (q)EF (yF ; xF , w ).F F yF YFSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 40. Belief PropagationVariational Inference Sampling BreakVariational InferenceNaive Mean FieldSet Q all distributions of the form q(y ) =qi (yi ).iV qe qfqg qh qi qjqkql qmSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 41. Belief PropagationVariational Inference Sampling BreakVariational InferenceNaive Mean FieldSet Q all distributions of the form q(y ) =qi (yi ).iVMarginals F ,yF take the form F ,yF (q) = qi (yi ).iN(F )Entropy H(q) decomposes H(q) =Hi (qi ) = qi (yi ) log qi (yi ).iV iV yi YiSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 42. Belief PropagationVariational InferenceSampling BreakVariational InferenceNaive Mean Field (cont) Putting it together, argmin DKL (q(y ) p(y |x, w ))qQ = argmax H(q) F ,yF (q)EF (yF ; xF , w ) log Z (x, w )qQF F yF YF = argmax H(q) F ,yF (q)EF (yF ; xF , w )qQF F yF YF = argmax qi (yi ) log qi (yi )qQiV yi Yi qi (yi ) EF (yF ; xF , w ) .F F yF YF iN(F ) Optimizing over Q is optimizing over qi i , the probability simplices.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 43. Belief PropagationVariational InferenceSamplingBreakVariational InferenceNaive Mean Field (cont) argmax qi (yi ) log qi (yi ) qQiV yi Yi qi (yi ) EF (yF ; xF , w ) . F F yF YFiN(F )Non-concave maximization problem hard. (For general EF and pairwise or Fqf higher-order factors.) FBlock coordinate ascent: closed-formqiqh qjupdate for each qi ql Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 44. Belief PropagationVariational InferenceSampling BreakVariational InferenceNaive Mean Field (cont) Closed form update for qi : qi (yi ) = exp 1 qj (yj ) EF (yF ; xF , w ) + F F , yF YF , jN(F ){i}iN(F ) [yF ]i =yi = log exp 1 qj (yj ) EF (yF ; xF , w ) yi YiF F , yF YF ,jN(F ){i}iN(F ) [yF ]i =yiLook scary, but very easy to implementSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 45. Belief PropagationVariational Inference Sampling BreakVariational InferenceStructured Mean FieldNaive mean eld approximation can be poorStructured mean eld (Saul and Jordan, 1995) uses factorialapproximations with larger tractable subgraphsBlock coordinate ascent: optimizing an entire subgraph using exactprobabilistic inference on treesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 46. Belief PropagationVariational InferenceSampling BreakSamplingSampling Probabilistic inference and related tasks require the computation ofEy p(y |x,w ) [h(x, y )], where h : X Y R is an arbitary but well-behaved function.Inference: hF ,zF (x, y ) = I (yF = zF ),Ey p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ),the marginal probability of factor F taking state zF .Parameter estimation: feature map h(x, y ) = (x, y ), : X Y Rd , Ey p(y |x,w ) [(x, y )],expected sucient statistics under the model distribution.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 47. Belief PropagationVariational InferenceSampling BreakSamplingSampling Probabilistic inference and related tasks require the computation ofEy p(y |x,w ) [h(x, y )], where h : X Y R is an arbitary but well-behaved function.Inference: hF ,zF (x, y ) = I (yF = zF ),Ey p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ),the marginal probability of factor F taking state zF .Parameter estimation: feature map h(x, y ) = (x, y ), : X Y Rd , Ey p(y |x,w ) [(x, y )],expected sucient statistics under the model distribution.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 48. Belief PropagationVariational InferenceSampling BreakSamplingSampling Probabilistic inference and related tasks require the computation ofEy p(y |x,w ) [h(x, y )], where h : X Y R is an arbitary but well-behaved function.Inference: hF ,zF (x, y ) = I (yF = zF ),Ey p(y |x,w ) [hF ,zF (x, y )] = p(yF = zF |x, w ),the marginal probability of factor F taking state zF .Parameter estimation: feature map h(x, y ) = (x, y ), : X Y Rd , Ey p(y |x,w ) [(x, y )],expected sucient statistics under the model distribution.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 49. Belief PropagationVariational Inference SamplingBreakSamplingMonte CarloSample approximation from y (1) , y (2) , . . .S1Ey p(y |x,w ) [h(x, y )] h(x, y (s) ).S s=1When the expectation exists, then the law of large numbersguarantees convergence for S , For S independent samples, approximation error is O(1/ S),independent of the dimension d. ProblemProducing exact samples y (s) from p(y |x) is hardSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 50. Belief PropagationVariational Inference SamplingBreakSamplingMonte CarloSample approximation from y (1) , y (2) , . . .S1Ey p(y |x,w ) [h(x, y )] h(x, y (s) ).S s=1When the expectation exists, then the law of large numbersguarantees convergence for S , For S independent samples, approximation error is O(1/ S),independent of the dimension d. ProblemProducing exact samples y (s) from p(y |x) is hardSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 51. Belief PropagationVariational InferenceSampling BreakSamplingMarkov Chain Monte Carlo (MCMC)Markov chain with p(y |x) asstationary distribution 0.5 B 0.10.250.4Here: Y = {A, B, C }Here: p(y ) = (0.1905, 0.3571, 0.4524) A0.5 0.50.25C0.5Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 52. Belief PropagationVariational InferenceSampling BreakSamplingMarkov ChainsDenition (Finite Markov chain)0.5 BGiven a nite set Y and a matrix 0.1P RYY , then a random process0.250.4(Z1 , Z2 , . . . ) with Zt taking values from Y A 0.5 0.5is a Markov chain with transition matrix P,if 0.25p(Zt+1 = y (j) |Z1 = y (1) , Z2 = y (2) , . . . , Zt = y (t) )C0.5= p(Zt+1 = y (j) |Zt = y (t) )= Py (t) ,y (j)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 53. Belief PropagationVariational Inference Sampling BreakSamplingMCMC Simulation (1) Steps1. Construct a Markov chain with stationary distribution p(y |x, w )2. Start at y (0)3. Perform random walk according to Markov chain4. After sucient number S of steps, stop and treat y (S) as sample from p(y |x, w )Justied by ergodic theorem.In practise: discard a xed number of initial samples (burn-inphase) to forget starting pointIn practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 54. Belief PropagationVariational Inference Sampling BreakSamplingMCMC Simulation (1) Steps1. Construct a Markov chain with stationary distribution p(y |x, w )2. Start at y (0)3. Perform random walk according to Markov chain4. After sucient number S of steps, stop and treat y (S) as sample from p(y |x, w )Justied by ergodic theorem.In practise: discard a xed number of initial samples (burn-inphase) to forget starting pointIn practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 55. Belief PropagationVariational Inference Sampling BreakSamplingMCMC Simulation (1) Steps1. Construct a Markov chain with stationary distribution p(y |x, w )2. Start at y (0)3. Perform random walk according to Markov chain4. After each step, output y (i) as sample from p(y |x, w )Justied by ergodic theorem.In practise: discard a xed number of initial samples (burn-inphase) to forget starting pointIn practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 56. Belief PropagationVariational Inference Sampling BreakSamplingMCMC Simulation (1) Steps1. Construct a Markov chain with stationary distribution p(y |x, w )2. Start at y (0)3. Perform random walk according to Markov chain4. After each step, output y (i) as sample from p(y |x, w )Justied by ergodic theorem.In practise: discard a xed number of initial samples (burn-inphase) to forget starting pointIn practise: afterwards, use between 100 to 100, 000 samplesSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 57. Belief PropagationVariational Inference Sampling BreakSamplingGibbs sampler How to construct a suitable Markov chain?Metropolis-Hastings chain, almost always possibleSpecial case: Gibbs sampler (Geman and Geman, 1984) 1. Select a variable yi , 2. Sample yi p(yi |yV {i} , x).Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 58. Belief PropagationVariational Inference Sampling BreakSamplingGibbs sampler How to construct a suitable Markov chain?Metropolis-Hastings chain, almost always possibleSpecial case: Gibbs sampler (Geman and Geman, 1984) 1. Select a variable yi , 2. Sample yi p(yi |yV {i} , x).Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 59. Belief PropagationVariational Inference Sampling BreakSamplingGibbs Sampler(t) (t)(t)p(yi , yV {i} |x, w ) p (yi , yV {i} |x, w ) p(yi |yV {i} , x, w ) = (t)= (t)yi Yi p(yi , yV {i} |x, w ) yi Yi p (yi , yV {i} |x, w ) Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 60. Belief PropagationVariational Inference SamplingBreakSamplingGibbs Sampler(t)(t)F M(i) exp(EF (yi , yF {i} , xF , w )) p(yi |yV {i} , x, w ) = (t)yi YiF M(i) exp(EF (yi , yF {i} , xF , w ))Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 61. Belief PropagationVariational Inference Sampling BreakSamplingExample: Gibbs samplerSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 62. Belief PropagationVariational InferenceSampling BreakSamplingExample: Gibbs sampler Cummulative mean 1 0.950.9 0.85 p(foreground)0.8 0.750.7 0.650.6 0.550.5 0 5001000 1500 20002500 3000Gibbs sweepSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 63. Belief PropagationVariational InferenceSampling BreakSamplingExample: Gibbs samplerGibbs sample means and average of 50 repetitions 1 0.8 Estimated probability 0.6 0.4 0.2 00 50010001500 20002500 3000Gibbs sweepp(yi = foreground ) 0.770 0.011Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 64. Belief Propagation Variational InferenceSampling BreakSamplingExample: Gibbs samplerEstimated one unit standard deviation0.25 0.2 Standard deviation0.15 0.10.050 05001000 1500 20002500 3000 Gibbs sweep Standard deviation O(1/ S)Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 65. Belief PropagationVariational Inference Sampling BreakSamplingGibbs Sampler TransitionsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 66. Belief PropagationVariational Inference Sampling BreakSamplingGibbs Sampler TransitionsSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 67. Belief PropagationVariational Inference Sampling BreakSamplingBlock Gibbs Sampler Extension to larger groups of variables1. Select a block yI2. Sample yI p(yI |yV I , x) Tractable if sampling from blocks is tractable.Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 68. Belief PropagationVariational Inference Sampling BreakSamplingSummary: Sampling Two families of Monte Carlo methods1. Markov Chain Monte Carlo (MCMC)2. Importance Sampling PropertiesOften simple to implement, general, parallelizable(Cannot compute partition function Z )Can fail without any warning signs References(Hggstrm, 2000), introduction to Markov chainsa o(Liu, 2001), excellent Monte Carlo introductionSebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models 69. Belief PropagationVariational Inference Sampling BreakBreakCoee Break Coee BreakContinuing at 10:30am Slides available athttp://www.nowozin.net/sebastian/cvpr2011tutorial/Sebastian Nowozin and Christoph H. LampertPart 3: Probabilistic Inference in Graphical Models