expectation propagation

34
Expecta(on Propaga(on Theory and Applica(on Dong Guo Research Workshop 2013 Hulu Internal See more details in hEp://dongguo.me/blog/2014/01/01/expecta(onpropaga(on/ hEp://dongguo.me/blog/2013/12/01/bayesianctrpredic(onforbing/

Upload: -

Post on 15-Jan-2015

582 views

Category:

Technology


7 download

DESCRIPTION

It's the deck for one Hulu internal machine learning workshop, which introduces the background, theory and application of expectation propagation method.

TRANSCRIPT

Page 1: Expectation propagation

Expecta(on  Propaga(on  Theory  and  Applica(on  

Dong  Guo  Research  Workshop  2013  Hulu  Internal  

 See  more  details  in  

hEp://dongguo.me/blog/2014/01/01/expecta(on-­‐propaga(on/  hEp://dongguo.me/blog/2013/12/01/bayesian-­‐ctr-­‐predic(on-­‐for-­‐bing/  

   

Page 2: Expectation propagation

Outline  

•  Overview  •  Background  •  Theory  •  Applica(ons  

Page 3: Expectation propagation

OVERVIEW  

Page 4: Expectation propagation

Bayesian  Paradigm  

•  Infer  posterior  distribu(on  Prior  

Data  

Posterior   Make  decision  

Note:  figure  of  LDA  is  from  Wikipedia,  and  the  right  figure  is  from  paper  ‘Web-­‐Scale  Bayesian  Click-­‐Through  Rate  PredicFon  for  Sponsored  Search  AdverFsing  in  MicrosoI’s  Bing  Search  Engine’    

Page 5: Expectation propagation

Bayesian  inference  methods  

•  Exact  inference  – Belief  propaga(on  

•  Approximate  inference  – Stochas(c  (sampling)  – Determinis(c  

•  Assumed  density  filtering  •  Expecta(on  propaga(on  •  Varia(onal  Bayes  

Page 6: Expectation propagation

Message  passing  

•  A  form  of  communica(on  used  in  mul(ple  domains  of  computer  science  – Parallel  compu(ng  (MPI)  – Object-­‐oriented  programming  –  Inter-­‐process  communica(on  – Bayesian  inference  

•  A  family  of  methods  to  infer  posterior  distribu(on  

Page 7: Expectation propagation

Expecta(on  Propaga(on  

•  Belongs  to  message  passing  family  

•  Approximated  method  (itera(on  is  needed)    

•  Very  popular  in  Bayesian  inference,  especially  in  graphic  model  

Page 8: Expectation propagation

Researchers  

•  Thomas  Minka  – EP  was  proposed  in  PhD  thesis  

•  Kevin  p.  Murphy  – Machine  Learning  A  ProbabilisFc  PerspecFve  

Page 9: Expectation propagation

BACKGROUND  

Page 10: Expectation propagation

Background  

•  (Truncated)  Gaussian  •  Exponen(al  family  •  Graphic  model  •  Factor  graph  •  Belief  propaga(on  •  Moment  matching  

Page 11: Expectation propagation

Gaussian  and  Truncated  Gaussian  

•  Gaussian  opera(on  is  a  basis  for  EP  inference  – Gaussian  +*/  Gaussian  – Gaussian  integral  

•  Truncated  Gaussian  is  used  in  many  EP  applica(ons  

•  See  details  here  

Page 12: Expectation propagation

Exponen(al  family  distribu(on  

•  Very  good  summary  in  Wikipedia      •  Sufficient  sta(s(cs  of  Gaussian  distribu(on:  (x,  x^2)  •  Typical  distribu(on  

q(z) = h(z)g(η)exp{ηTu(z)}

Note:  above  4  figures  are  from  Wikipedia  

Page 13: Expectation propagation

Graphical  Models  •  Directed  graph  (Bayesian  Network)   •  Undirected  graph  (Condi(onal  

Random  Field)  

P(x) = p(xk | pak )k=1

K

x1  

x4  

x3  x2   x1  

x4  

x3  x2  

Page 14: Expectation propagation

Factor  graph  

•  Express  rela(on  between  variable  nodes  explicitly  •  Rela(on  in  edge  -­‐>  factor  node  

•  Hide  the  difference  of  BN  and  CRF  in  inference  •  Make  inference  more  intui(onal  

x1  

x4  

x3  x2   x1  

x4  

x3  x2  fa  

fc  

c  

Page 15: Expectation propagation

BELIEF  PROPAGATION  

Page 16: Expectation propagation

Belief  Propaga(on  Overview  

•  Exact  Bayesian  method  to  infer  marginal  distribu(on  –  ‘sum-­‐product’  message  passing  

•  Key  components  – Calculate  posterior  distribu(on  of  variable  node  – Two  kinds  of  messages  

Page 17: Expectation propagation

Posterior  distribu(on  of  variable  node  

•  Factor  graph  

p(X) = Fs (s,Xs )s∈ne(x )∏ , for any variable x in the graph

p(x) = p(X)X \x∑ = Fs (s,Xs )

s∈ne(x )∏ =

X \x∑ Fs (x,Xs )

Xs∑

s∈ne(x )∏ = µ fs −>x

(x)s∈ne(x )∏

in which µ fs −>x(x) = Fs (x,Xs )

Xs∑

Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 18: Expectation propagation

Message:  factor  -­‐>  variable  node  

•  Factor  graph  

µ fs −>x(x) = ...

x1

∑ fs (x, x1,..., xM )xM∑ µxm −> fs

(xm )xm∈ne( fs )\x∏ ,

in which {x1,..., xM } is the set of variables on which the factor fs depends

Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 19: Expectation propagation

Message:  variable  -­‐>  factor  node  

•  Factor  graph  

µxm −> fs(xm ) = µ fl −>xm

(xm )l∈ne(xm )\ fs∏

Summary:  posterior  distribuFon  is  only  determined  by  factors  !!    

Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 20: Expectation propagation

Whole  steps  of  BP  

•  Steps  to  calculate  posterior  distribu(on  of  given  variable  node  –  Step  1:  construct  factor  graph  –  Step  2:  treat  the  variable  node  as  root,  and  ini(alize  messages  sent  from  leaf  nodes  

–  Step  3:  leverage  the  message  passing  steps  recursively  un(l  the  root  node  receives  messages  from  all  of  its  neighbors  

–  Step  4:  get  the  marginal  distribu(on  by  mul(plying  all  messages  sent  in  

Note:  the  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 21: Expectation propagation

BP:  example  •  Infer  marginal  distribu(on  of  x_3  

•  Infer  marginal  distribu(on  of  every  variables  

Note:  the  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 22: Expectation propagation

Posterior  is  intractable  some(mes  

•  Example  –  Infer  the  mean  of  a  Gaussian  distribu(on  

– Ad  predictor  

p(x |θ ) = (1−w)N(x |θ , I )+wN(x | 0,aI )

p(θ ) = N(θ | 0,bI )

Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 23: Expectation propagation

Distribu(on  Approxima(on  

Such that: q(x) = h(x)g(η)exp{ηTu(x)}

KL(p || q) = − p(x)∫ In q(x)p(x)

dx = − p(x)Inq(x)dx +∫ p(x)Inp(x)∫ dx

= − p(x)Ing(η)dx − p(x)ηTu(x)∫ dx + const∫ = − Ing(η)−ηTΕ p(x )[u(x)]+ constwhere const terms are independent of the natural parameter η

Minimize KL(p || q) by setting the gradient with repect to η to zero: => −∇Ing(η) = Ε p(x )[u(x)]By leveraging formula (2.226) in PRML: => Eq(x )[u(x)]= −∇Ing(η) = Ε p(x )[u(x)]

Approximate p(x) with q(x), which belongs to exponential family

Page 24: Expectation propagation

Moment  matching  

•  Moments  of  a  distribu(on  

It's called moment matching when q(x) is Gaussian distribution then u(x) = (x, x2 )T

=> q(x)xdx = p(x)xdx∫∫ , and q(x)x2 dx = p(x)x2 dx∫∫=> meanq(x ) = q(x)xdx = p(x)xdx∫∫ = meanp(x ),

varianceq(x ) = q(x)x2 dx − (meanq(x ) )2∫

= p(x)x2 dx∫ − (meanp(x ) )2 = variance p(x )

k'th moment Mk = xk f (x)dxa

b

Page 25: Expectation propagation

EXPECTATION  PROPAGATION  =  Belief  Propaga(on  +  Moment  matching?  

Page 26: Expectation propagation

Key  Idea  •  Approximate  each  factor  with  Gaussian  distribu(on  

•  Approximate  corresponding  factor  pairs  one  by  one?  

•  Approximate  each  factor  in  turn  in  the  context  of  all  remaining  factors  (Proposed  by  Minka)  

refine factor f j(θ ) by ensuring qnew (θ )∝ f j(θ )q \ j (θ ) is close with f j (θ )q \ j (θ )

in which q \ j (θ ) = q(θ )f j(θ )

Page 27: Expectation propagation

EP:  The  detail  steps  

   

1.Initialize all of the approximating factors fi(θ )

2.Initialize the posterior approximation by setting :q(θ )∝ fi(θ )i∏

3.Until convergence :

(a). Choose a fator f j(θ ) to refine.

(b). Remove f j(θ ) from the posterior by division :q \ j (θ ) = q(θ )f j(θ )

(c). Get the new posterior by settting sufficient statistics of qnew (θ ) equal to those of f j (θ )q \ j (θ )

z j

(minimize KL(f j (θ )q \ j (θ )

z j|| qnew (θ ))),in which z j = f j (θ )q \ j (θ )dθ∫ , and qnew (θ ) = 1

kf j(θ )q \ j (θ )

(d). Get the refined factor f j(θ ) : f j(θ ) = k qnew (θ )q \ j (θ )

Page 28: Expectation propagation

Example:  The  cluEer  problem  

•  Infer  the  mean  of  a  Gaussian  distribu(on  •  Want  to  try  MLE,  but  

•  Approximate  with  

– Approximate  mixture  Gaussian  using  Gaussian  

p(x |θ ) = (1−w)N(x |θ , I )+wN(x | 0,aI )

p(θ ) = N(θ | 0,bI )

q(θ ) = N(θ |m,vI ), and each factor fn(θ ) = N(θ |mn ,vnI )

Note:  the  figure  is  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 29: Expectation propagation

Example:  The  cluEer  problem(2)  

•  Approximate  complex  factor(e.g.  mixture  Gaussian)  with  Gaussian  

fn (θ ) in blue, fn(θ ) in red, and q \n (θ ) in green Remember variance of q \n (θ ) is usually very small, so fn(θ ) only need to approximate fn (θ ) in small range

Note:  above  2  figures  are  from  book  ‘PaMern  recogniFon  and  machine  learning’  

Page 30: Expectation propagation

Applica(on:  Bayesian  CTR  predictor  for  Bing  

•  See  the  details  here  –  Inference  step  by  step  – Make  predic(on  

•  Some  insights  –  Variance  of  each  feature  increases  aker  every  exposure  

–  Sample  with  more  features  will  have  bigger  variance  •  Independent  assump(on  for  features  

Page 31: Expectation propagation

Experimenta(on  •  Dataset  is  very  Inhomogeneous  

 

•  Performance    

 – Other  metrics  

•  Pros:  speed,  parameter  choice  cost,  online  learning  support,  interpreta(ve,  support  add  more  factors  

•  Cons:  sparse  •  Code  

Model   FTRL   OWLQN   Ad  predictor  

AUC   0.638   0.641   0.639  

Page 32: Expectation propagation

Application: XBOX skill rating system

•     

See  details  in  P793~798  of  Machine  Learning  A  ProbabilisFc  PerspecFve      Note:  the  figure  is  from  paper:  ‘TrueSkill:  A  Bayesian  Skill  RaFng  System’    

Page 33: Expectation propagation

Apply  to  all  Bayesian  models  

•  Infer.net  (Microsok/Bishop)  – A  framework  for  running  Bayesian  inference  in  graphical  models    

– Model-­‐based  machine  learning    

Page 34: Expectation propagation

References  •  Books  

–  Chapter  2/8/10  of  PaMern  RecogniFon  and  Machine  Learning  –  Chapter  22  of  Machine  Learning:  A  ProbabilisFc  PerspecFve  

•  Papers  –  A  family  of  algorithms  for  approximate  Bayesian  inference  –  From  belief  propagaFon  to  expectaFon  propagaFon  –  TrueSkill:  A  Bayesian  Skill  RaFng  System  –  Web-­‐Scale  Bayesian  Click-­‐Through  Rate  PredicFon  for  Sponsored  

Search  AdverFsing  in  MicrosoI’s  Bing  Search  Engine  

•  Roadmap  for  EP