lawrence meadows, philip torr, victor lee, prabhat, jialin...
TRANSCRIPT
![Page 1: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/1.jpg)
Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model
Atılım Güneş Baydin, Lukas Heinrich, Wahid Bhimji, Lei Shao, Saeid Naderiparizi, Andreas Munk, Jialin Liu, Bradley Gram-Hansen, Gilles Louppe, Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Kyle Cranmer, Frank Wood
![Page 2: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/2.jpg)
Probabilistic programming
![Page 3: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/3.jpg)
Deep learning
3
Neural network
Model is learned from data as a differentiable transformation
Inputs Outputs
![Page 4: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/4.jpg)
Deep learning
4
Neural network (differentiable program)
Model is learned from data as a differentiable transformation
Inputs Outputs
Difficult to interpret the actual learned model
![Page 5: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/5.jpg)
Deep learning
5
Neural network (differentiable program)
Model / probabilistic program / simulator
Model is learned from data as a differentiable transformation
Model is defined as a structured generative program
Inputs
Inputs
Probabilistic programming
Outputs
Outputs
![Page 6: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/6.jpg)
Probabilistic programming
6
Model / probabilistic program / simulator
Probabilistic model: a joint distribution of random variables● Latent (hidden, unobserved) variables● Observed variables (data)
Inputs Outputs
![Page 7: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/7.jpg)
Probabilistic programming
7
Model / probabilistic program / simulator
Probabilistic model: a joint distribution of random variables● Latent (hidden, unobserved) variables● Observed variables (data)
Inputs Outputs
Probabilistic graphical models use graphs to express conditional dependence● Bayesian networks● Markov random fields (undirected)
![Page 8: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/8.jpg)
Probabilistic programming
8
Model / probabilistic program / simulator
Probabilistic model: a joint distribution of random variables● Latent (hidden, unobserved) variables● Observed variables (data)
Inputs Outputs
Probabilistic programming extends this to “ordinary programming with two added constructs”● Sampling from distributions● Conditioning by specifying observed values
![Page 9: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/9.jpg)
Inference
9
Model / probabilistic program / simulator
Use your model to analyze (explain) some given dataas the posterior distribution of latents conditioned on observations
Inputs Outputs
See Edward tutorials for a good intro: http://edwardlib.org/tutorials/
Posterior:Distribution of latents describing given data
Prior, describes latents
Likelihood:How do data depend on latents?
![Page 10: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/10.jpg)
Inference
10
Inputs Simulateddata
Observed data
● Run many times
● Record execution traces ,
● Approximate the posterior
Model / probabilistic program / simulator
![Page 11: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/11.jpg)
Inference
11
Inputs Simulateddata
Observed data
● Run many times
● Record execution traces ,
● Approximate the posterior
Model / probabilistic program / simulator
This is importance sampling, other inference engines run differently
![Page 12: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/12.jpg)
Inference reverses the generative process
12
Inputs Simulated data (detector response)
Real world system
Observed data (detector response)
Generative model / simulator (e.g., Sherpa, Geant)
Inputs
![Page 13: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/13.jpg)
Live demo
Inference
13
![Page 14: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/14.jpg)
● Markov chain Monte Carlo○ Probprog-specific:
■ Lightweight Metropolis–Hastings
■ Random-walk Metropolis–Hastings
○ Sequential○ Autocorrelation in samples○ “Burn in” period
● Importance sampling○ Propose from prior○ Use learned proposal
parameterized by observations○ No autocorrelation or burn in○ Each sample is independent (parallelizable)
● Others: variational inference, Hamiltonian Monte Carlo, etc.
Inference engines
14
We sample in trace space:each sample (trace) is one full execution of the model/simulator!
prior proposal
posterior
![Page 15: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/15.jpg)
● Markov chain Monte Carlo○ Probprog-specific:
■ Lightweight Metropolis–Hastings
■ Random-walk Metropolis–Hastings
○ Sequential○ Autocorrelation in samples○ “Burn in” period
● Importance sampling○ Propose from prior○ Use learned proposal
parameterized by observations○ No autocorrelation or burn in○ Each sample is independent (parallelizable)
● Others: variational inference, Hamiltonian Monte Carlo, etc.
Inference engines
15
prior proposal
posterior
We sample in trace space:each sample (trace) is one full execution of the model/simulator!
![Page 16: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/16.jpg)
● Markov chain Monte Carlo○ Probprog-specific:
■ Lightweight Metropolis–Hastings
■ Random-walk Metropolis–Hastings
○ Sequential○ Autocorrelation in samples○ “Burn in” period
● Importance sampling○ Propose from prior○ Use learned proposal
parameterized by observations○ No autocorrelation or burn in○ Each sample is independent (parallelizable)
● Others: variational inference, Hamiltonian Monte Carlo, etc.
Inference engines
16
prior proposal
posterior
We sample in trace space:each sample (trace) is one full execution of the model/simulator!
![Page 17: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/17.jpg)
● Markov chain Monte Carlo○ Probprog-specific:
■ Lightweight Metropolis–Hastings
■ Random-walk Metropolis–Hastings
○ Sequential○ Autocorrelation in samples○ “Burn in” period
● Importance sampling○ Propose from prior○ Use learned proposal
parameterized by observations○ No autocorrelation or burn in○ Each sample is independent (parallelizable)
● Others: variational inference, Hamiltonian Monte Carlo, etc.
Inference engines
17
prior proposal
posterior
We sample in trace space:each sample (trace) is one full execution of the model/simulator!
![Page 18: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/18.jpg)
● Anglican (Clojure)● Church (Scheme)● Edward, TensorFlow Probability (Python, TensorFlow)● Pyro (Python, PyTorch)● Figaro (Scala)● Infer.NET (C#)● LibBi (C++ template library)● PyMC3 (Python)● Stan (C++)● WebPPL (JavaScript)
For more, see http://probabilistic-programming.org
Probabilistic programming languages (PPLs)
18
![Page 19: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/19.jpg)
Existing simulatorsas probabilistic programs
![Page 20: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/20.jpg)
A stochastic simulator implicitly defines a probability distribution by sampling (pseudo-)random numbers → already satisfying one requirement for probprog
Key idea:● Interpret all RNG calls as sampling from a prior distribution● Introduce conditioning functionality to the simulator● Execute under the control of general-purpose inference engines ● Get posterior distributions over all simulator latents
conditioned on observations
Execute existing simulators as probprog
20
![Page 21: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/21.jpg)
A stochastic simulator implicitly defines a probability distribution by sampling (pseudo-)random numbers → already satisfying one requirement for probprog
Advantages:Vast body of existing scientific simulators (accurate generative models) with years of development: MadGraph, Sherpa, Geant4
● Enable model-based (Bayesian) machine learning in these● Explainable predictions directly reaching into the simulator
(simulator is not used as a black box)● Results are still from the simulator and meaningful
Execute existing simulators as probprog
21
![Page 22: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/22.jpg)
Several things are needed:
● A PPL with with simulator control incorporated into design
● A language-agnostic interface for connecting PPLs to simulators
● Front ends in languages commonly used for coding simulators
Coupling probprog and simulators
22
![Page 23: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/23.jpg)
Several things are needed:
● A PPL with with simulator control incorporated into designpyprob
● A language-agnostic interface for connecting PPLs to simulators PPX - the Probabilistic Programming eXecution protocol
● Front ends in languages commonly used for coding simulatorspyprob_cpp
Coupling probprog and simulators
23
![Page 24: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/24.jpg)
https://github.com/probprog/pyprob
A PyTorch-based PPL
Inference engines:● Markov chain Monte Carlo
○ Lightweight Metropolis Hastings (LMH)○ Random-walk Metropolis Hastings (RMH)
● Importance Sampling○ Regular (proposals from prior)○ Inference compilation (IC)
● Hamiltonian Monte Carlo (in progress)
pyprob
24
![Page 25: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/25.jpg)
https://github.com/probprog/pyprob
A PyTorch-based PPL
Inference engines:● Markov chain Monte Carlo
○ Lightweight Metropolis Hastings (LMH)○ Random-walk Metropolis Hastings (RMH)
● Importance Sampling○ Regular (proposals from prior)○ Inference compilation (IC)
Le, Baydin and Wood. Inference Compilation and Universal Probabilistic Programming. AISTATS 2017
pyprob
25
![Page 26: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/26.jpg)
26
![Page 27: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/27.jpg)
https://github.com/probprog/ppx
Probabilistic Programming eXecution protocol● Cross-platform, via flatbuffers: http://google.github.io/flatbuffers/ ● Supported languages: C++, C#, Go, Java, JavaScript, PHP, Python,
TypeScript, Rust, Lua● Similar to Open Neural Network Exchange (ONNX) for deep learning
Enables inference engines and simulators to be● implemented in different programming languages● executed in separate processes, separate machines across networks
27
PPX
![Page 28: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/28.jpg)
28
E.g., SHERPA, GEANT
![Page 29: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/29.jpg)
29
PPX
![Page 30: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/30.jpg)
https://github.com/probprog/pyprob_cpp A lightweight C++ front end for PPX
pyprob_cpp
30
![Page 31: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/31.jpg)
Probprog and high-energy physics“etalumis”simulate
![Page 32: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/32.jpg)
32
etalumis | simulateAtılım Güneş Baydin
Bradley Gram-Hansen
Lukas Heinrich
KyleCranmer
AndreasMunk
SaeidNaderiparizi
FrankWood
WahidBhimji
JialinLiu
Prabhat
GillesLouppe
LeiShao
LarryMeadows
VictorLee
PhilTorr
Cori supercomputer, Lawrence Berkeley Lab2,388 Haswell nodes (32 cores per node)9,688 KNL nodes (68 cores per node)
![Page 33: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/33.jpg)
pyprob_cpp and Sherpa
33
![Page 34: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/34.jpg)
Main challengesWorking with large-scale HEP simulators requires several innovations● Wide range of prior probabilities, some events highly unlikely and not
learned by IC neural network● Solution: “prior inflation”
○ Training: modify prior distributions to be uninformativeHEP: sample according to phase space
○ Inference: use the unmodified (real) prior for weighting proposalsHEP: differential cross-section = phase space * matrix element
34
![Page 35: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/35.jpg)
Main challengesWorking with large-scale HEP simulators requires several innovations● Potentially very long execution traces due to rejection sampling loops● Solution: “replace” (or “rejection-sampling”) mode
○ Training: only consider the last (accepted) values within loops○ Inference: use the same proposal distribution for these samples
35
![Page 36: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/36.jpg)
Experiments
![Page 37: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/37.jpg)
Tau decay in Sherpa, 38 decay channels, coupled with an approximate calorimeter simulation in C++
Tau lepton decay
37
![Page 38: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/38.jpg)
Probabilistic addresses in SherpaApproximately 25,000 addresses encountered
... 38
![Page 39: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/39.jpg)
Common trace types in SherpaApproximately 450 trace types encounteredTrace type: unique sequencing of addresses (with different sampled values)
... 39
![Page 40: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/40.jpg)
Inference results with MCMC enginePrior
![Page 41: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/41.jpg)
Inference results with MCMC enginePrior
MCMC Posterior conditioned on calorimeter
7,700,000 samples Slow and has to run single node
![Page 42: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/42.jpg)
Convergence to true posteriorWe establish that two independent RMH MCMC chains converge to the same posterior for all addresses in Sherpa● Chain initialized with random trace from prior● Chain initialized with known ground-truth trace
Gelman-Rubin convergence diagnostic
Autocorrelation
Trace log-probability
![Page 43: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/43.jpg)
Convergence to true posteriorImportant:● We get posteriors over the
whole Sherpa address space, 1000s of addresses
● Trace complexity varies depending on observed event
This is just a selected subset:
![Page 44: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/44.jpg)
Convergence to true posteriorImportant:● We get posteriors over the
whole Sherpa address space, 1000s of addresses
● Trace complexity varies depending on observed event
This is just a selected subset:
![Page 45: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/45.jpg)
Inference results with IC engine
MCMC true posterior(7.7M single node)
![Page 46: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/46.jpg)
Inference results with IC engine
IC posteriorafter importance weighting320,000 samples
Fast “embarrassingly” parallel multi-node
IC proposalfrom trained NN
MCMC true posterior(7.7M single node)
![Page 47: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/47.jpg)
InterpretabilityLatent probabilistic structure of 10 most frequent trace types
47
![Page 48: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/48.jpg)
Latent probabilistic structure of 10 most frequent trace types
48
Interpretability
![Page 49: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/49.jpg)
Latent probabilistic structure of 10 most frequent trace types
49
px
py
pz
Decay channel
Rejection sampling
Rejection sampling
Calorimeter
Interpretability
![Page 50: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/50.jpg)
Latent probabilistic structure of 25 most frequent trace types
50
px
py
pz
Decay channel
Rejection sampling
Rejection sampling
Calorimeter
Interpretability
![Page 51: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/51.jpg)
Latent probabilistic structure of 100 most frequent trace types
51
px
py
pz
Decay channel
Rejection sampling
Rejection sampling
Calorimeter
Interpretability
![Page 52: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/52.jpg)
Latent probabilistic structure of 250 most frequent trace types
52
px
py
pz
Decay channel
Rejection sampling
Rejection sampling
Calorimeter
Interpretability
![Page 53: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/53.jpg)
53
Interpretability
![Page 54: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/54.jpg)
What’s next?
![Page 55: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/55.jpg)
● Autodiff through PPX protocol● Learning simulator surrogates (approximate forward simulators)● Rejection sampling loops (weighting schemes)● Rare event simulation for compilation (“prior inflation”)● Batching of open-ended traces for NN training● Distributed training of dynamic networks
○ Recently ran on 32k CPU cores on Cori (largest-scale PyTorch MPI)● User features: posterior code highlighting, etc.● Other simulators: astrophysics, epidemiology, computer vision
Current and upcoming work
55
![Page 56: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/56.jpg)
● Autodiff through PPX protocol● Learning simulator surrogates (approximate forward simulators)● Rejection sampling loops (weighting schemes)● Rare event simulation for compilation (“prior inflation”)● Batching of open-ended traces for NN training● Distributed training of dynamic networks
○ Recently ran on 32k CPU cores on Cori (largest-scale PyTorch MPI)● User features: posterior code highlighting, etc.● Other simulators: astrophysics, epidemiology, computer vision
Current and upcoming work
56
Rejection sampling loopsin Sherpa (tau decay)
![Page 57: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/57.jpg)
Workshop at Neural Information Processing Systems (NeurIPS) conferenceDecember 14, 2019, Vancouver, Canada● Machine learning for physical sciences● Physics for machine learning
57https://ml4physicalsciences.github.io/
Invited talks: Alan Aspuru-Guzik, Yasaman Bahri, Katie Bouman, Bernhard Schölkopf, Maria Schuld, Lenka Zdeborova
Contributed talks: MilesCranmer, Eric Metodiev, Danilo Jimenez Rezende, Alvaro Sanchez-Gonzalez, Samuel Schoenholz, Rose Yu
![Page 58: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/58.jpg)
Thank you for listening
![Page 59: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/59.jpg)
Extra slides
![Page 60: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/60.jpg)
CalorimeterFor each particle in the final state coming from Sherpa:
1. Determine whether it interacts with the calorimeter at all(muons and neutrinos don't)
2. Calculate the total mean number and spatial distribution of energy depositions from the calorimeter shower(simulating combined effect of secondary particles )
3. Draw a number of actual depositions from the total mean and then draw that number of energy depositions according to the spatial distribution
![Page 61: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/61.jpg)
• Minimize•
• Using stochastic gradient descent with Adam• Infinite stream of minibatches
sampled from the model
Training objective and data for IC
61
![Page 62: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/62.jpg)
Gelman-Rubin and autocorrelation formulae
62From Eric B. Ford (Penn State): Bayesian Computing for Astronomical Data Analysishttp://astrostatistics.psu.edu/RLectures/diagnosticsMCMC.pdf
![Page 63: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/63.jpg)
Gelman-Rubin and autocorrelation formulae
63From Eric B. Ford (Penn State): Bayesian Computing for Astronomical Data Analysishttp://astrostatistics.psu.edu/RLectures/diagnosticsMCMC.pdf
![Page 64: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/64.jpg)
Model writing is decoupled from running inference
● Exact (limited applicability)○ Belief propagation○ Junction tree algorithm
● Approximate (very common)○ Deterministic
■ Variational methods○ Stochastic (sampling-based)
■ Monte Carlo methods● Markov chain Monte Carlo (MCMC)● Sequential Monte Carlo (SMC)
■ Importance sampling (IS)● Inference compilation (IC)
Inference engines
64
![Page 65: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/65.jpg)
Transform a generative model implemented as a probabilistic program into a trained neural network artifact for performing inference
Inference compilation
65
![Page 66: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/66.jpg)
● A stacked LSTM core● Observation embeddings,
sample embeddings, and proposal layers specified by the probabilistic program
sample value
sample address
sample instance
trace length
Inference compilation
66
Proposal distribution parameters
![Page 67: Lawrence Meadows, Philip Torr, Victor Lee, Prabhat, Jialin ...gunes/assets/pdf/baydin-2019-neurips-slides.pdfPrabhat Gilles Louppe Lei Shao Larry Meadows Victor Lee Phil Torr Cori](https://reader033.vdocuments.mx/reader033/viewer/2022060421/5f17d37efb09b0510b0f8834/html5/thumbnails/67.jpg)
Tau decay in Sherpa, 38 decay channels, coupled with an approximate calorimeter simulation in C++
Tau lepton decay
67
Observation: 3D calorimeter depositions (Poisson)○ Particle showers modeled as Gaussian blobs, deposited energy
parameterizes a multivariate Poisson○ Shower shape variables and sampling fraction based on final
state particle
Monte Carlo truth (latent variables) of interest:● Decay channel (Categorical)● px, py, pz momenta of tau particle (Continuous uniform)● Final state momenta and IDs