Transcript
Page 1: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Probabilistic Programming with Infer.NET

John Bronskill

University of Cambridge

19th October 2017

Page 2: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

The Promise ofProbabilistic Programming

“We want to do for machine learning what the advent of high-level program languages 50 years ago did for the software development community as a whole.”

Kathleen Fisher, DARPA program manager.

Page 3: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Goals of Probabilistic Programming

• Reduce the level of expertise necessary to build machine learning applications

• Compact model code• faster to write and easier to understand

• Reduce development time & cost • encourage experimentation

Page 4: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Probabilistic Programming Languages

http://probabilistic-programming.org

ChurchBUGS

R2

Stan

Anglican

BLOG

Dimple

FACTORIE

Figaro

Hansei

HBC

PRISM

ProbLog

PyMC Venture

Infer.NET

Turing

Page 5: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

A Great Resource

http://www.mbmlbook.com/

Page 6: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Infer.NETProbabilistic Programming

Framework

Page 7: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Background

• Primary authors: Tom Minka and John Winn• They were tired of writing inference code by hand

• Under development for 12+ years

• Design Goals:• Fast inference

• Emphasis on approximate inference (Expectation Propagation)

• Industrial strength for production environments

• Write the forward model, everything else automatic• Inference code generated by compiler

• Not a solution to all ML problems• e.g. Focused on fully factored models

Page 8: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

How Infer.NET Works

Page 9: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Two Coins Example

The Model

Variable<bool> firstCoin = Variable.Bernoulli(0.5).Named("firstCoin");

Variable<bool> secondCoin = Variable.Bernoulli(0.5).Named("secondCoin");

Variable<bool> bothHeads = (firstCoin & secondCoin).Named("bothHeads");

Page 10: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Two Coins Example (con’t)

Inference Query 1 – What is the probability that both coins are heads?

InferenceEngine ie = new InferenceEngine();

Bernoulli query1 = (Bernoulli)ie.Infer(bothHeads);

Probability both coins are heads: Bernoulli(0.25)

Page 11: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Two Coins Example (con’t)Inference Query 2

What is the probability distribution over the first coin given that we observe that bothHeads = false?

bothHeads.ObservedValue = false;

Bernoulli query2 = (Bernoulli)ie.Infer(firstCoin));

Probability distribution over firstCoin: Bernoulli(0.3333)

Page 12: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Two Coins Demo

Page 13: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

public interface IGeneratedAlgorithm // Generated Inference Code{

// Get/Set Observed Valuesvoid SetObservedValue(string variableName, object value);object GetObservedValue(string variableName);

// Do workvoid Execute(int numberOfIterations);void Update(int additionalIterations);void Reset();

// Get computed marginals/posteriors/messagesobject Marginal(string variableName);T Marginal<T>(string variableName);object Marginal(string variableName, string query);T Marginal<T>(string variableName, string query);object GetOutputMessage(string variableName);

// Monitor progressint NumberOfIterationsDone { get; }event EventHandler<ProgressChangedEventArgs> ProgressChanged;

}

Page 14: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Generated Code Demo

Page 15: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Basic Steps for Using Infer.NET

• Create Model

• Set priors and observe random variables

• Infer posterior(s)

• Use the posterior(s) to make predictions, etc.

Model

Priors

Observations

Posteriors

Page 16: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Dataset

This data set gives average weights for humans as a function of their height in the population of American women of age 30–39. (Wikipedia)

Page 17: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Linear Regression Model

Y[n] = M* X[n] + B + N[n]

M = slope

B = y-intercept

N = noise

M, B, N: random variables whose

uncertainty is represented with a

Gaussian distribution.

X

multiply

add

add

Y

Gaussian

M

Gaussian

B

n

N

Gaussian(0, 1)

Page 18: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Model CodeM = Variable<double>.Random(MPrior);

B = Variable<double>.Random(BPrior);

DataLength = Variable.New<int>();n = new Range(DataLength);

N = Variable.Array<double>(n);

X = Variable.Array<double>(n);

Y = Variable.Array<double>(n);

Variable.ForEach(n){

N[n] = Variable.GaussianFromMeanAndVariance(0, 1);Y[n] = M * X[n] + B + N[n];

}

X

multiply

add

add

Y

Gaussian

M

Gaussian

B

n

N

Gaussian(0, 1)

Page 19: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Linear Regression Training

// set observed valuesDataLength.ObservedValue = instanceCount;X.ObservedValue = xTrainingData;Y.ObservedValue = yTrainingData;MPrior.ObservedValue =

Gaussian.FromMeanAndVariance(0, 1000);BPrior.ObservedValue =

Gaussian.FromMeanAndVariance(0, 1000);

// infer slope M and intercept Bvar Engine = new InferenceEngine();MPosterior = Engine.Infer<Gaussian>(M);BPosterior = Engine.Infer<Gaussian>(B);

?

?

X

multiply

add

add

Y

Gaussian

M

Gaussian

B

n

N

Gaussian(0, 1)

Page 20: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Linear Regression Prediction// set observed valuesDataLength.ObservedValue = instanceCount;

X.ObservedValue = xTestData;

MPrior.ObservedValue = MPosterior;

BPrior.ObservedValue = BPosterior;

// infer YYPosterior = Engine.Infer<Gaussian[]>(Y);

?

X

multiply

add

add

Y

Gaussian

M

Gaussian

B

n

N

Gaussian(0, 1)

Page 21: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Infer.NET Regression Demo

Page 22: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

TrueSkill Model

Gaussian

WinnerSkill

WinnerPerformance

Gaussian

Gaussian

player

game

Skill

LoserPerformance

LoserSkill

LoserWinner

Constrain Greater Than

Variance = 17 Variance = 17

TrueSkillTM: A Bayesian Skill Ranking System. Herbrich, Minka and Graepel, NIPS19, 2007.

Mean Mean

Page 23: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

TrueSkill Model CodeGameCount = Variable.Observed(default(int))var game = new Range(this.GameCount);

PlayerCount = Variable.Observed(default(int));var player = new Range(this.PlayerCount);

Skills = Variable.Array<double>(player);Variable.ForEach(player){

Skills[player] = Variable.GaussianFromMeanAndVariance(25.0, 70.0);

}

Winner = Variable.Observed(default(int[]), game);Loser = Variable.Observed(default(int[]), game);

Variable.ForEach(game){

// Define the player performance as a noisy version of their skillvar winnerPerformance = Variable.GaussianFromMeanAndVariance(Skills[Winner[game]], 17.0);

var loserPerformance = Variable.GaussianFromMeanAndVariance(Skills[Loser[game]], 17.0);

// The winner's performance must be higher than the loser's in a given gameVariable.ConstrainTrue(winnerPerformance > loserPerformance);

}

Page 24: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

TrueSkill Inference

public static Gaussian[] InferPlayerSkills(int playerCount, int[] winners,int[] losers, double performanceNoiseVariance)

{// Instantiate a modelTrueSkillModel model = new TrueSkillModel(performanceNoiseVariance);

// Set observed datamodel.GameCount.ObservedValue = winners.Length;model.PlayerCount.ObservedValue = playerCount;model.Winner.ObservedValue = winners;model.Loser.ObservedValue = losers;

// Infer resultsvar engine = new InferenceEngine();return engine.Infer<Gaussian[]>(model.PlayerSkills);

}

Page 25: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

TrueSkill Demo

Page 26: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Infer.NET Architecture

Infer.NET

components

Factors/Constraints (55)Represent relationships between variables- e.g. Boolean, comparison, arithmetic, linear algebra, etc.

Inference Algorithms (3)

- Expectation Propagation (EP)

- Variational Message Passing (VMP)

- Block Gibbs sampling

Distributions (11)Used for messages and marginals

- Continuous: Beta, Gaussian, Gamma

- Discrete: Bernoulli, Binomial, Poisson

- Multivariate: Dirichlet, Wishart

Message operators (100s)The atomic operations of an inference algorithm.

Page 27: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Modeling Features

• A model is essentially a factor graph represented in code using Infer.NET APIs

• Random Variables are represented by .NET variables• Factors are represented as .NET methods/functions.

• Sparse and dense arrays are supported

• Probabilistic control flow (loop, conditional, switch) to facilitate plates, gates, and mixtures

• Chain and grid models (i.e. Markov chain) are supported

• Inference on strings

• Extensible: Can write your own factors and distributions

• List of Factors: http://infernet.azurewebsites.net/docs/list%20of%20factors%20and%20constraints.aspx

Page 28: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Inference on Strings

Page 29: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Hello uncertain world in Infer.NET

// Create two uncertain stringsvar a = Variable.UniformOver(Strings.All);var b = Variable.UniformOver(Strings.All);

// Format the two strings togethervar c = Variable.Format(“{0} {1}!!”, a, b);

// Observe the resultc.ObservedValue = “Hello Uncertain World!!”;

What is the distribution over a?

Uniform("Hello Uncertain","Hello")

And b?

Uniform("Uncertain World",“World")

Page 30: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Learning templates

var a= Variable.UniformOver(WordStrings.All);var b= Variable.UniformOver(WordStrings.All);// Uncertain templatevar template = Variable.UniformOver(Strings.All);// Format the two strings togethervar c = Variable.Format(template, a, b);// Observe the inputs and outputsa.ObservedValue = “Hello”;c.ObservedValue = “Hello Uncertain World!!”;

What is the distribution over template?

Uniform(“{0} Uncertain {1}!!”, “{0} {1}!!” , “{0} {1} World!!”)

Page 31: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Fact Finding

Seed Fact:

Date-Of-Birth

[Barack Obama,

August 4, 1961]

Observed Text:

“Barack Obama was born

on August 4, 1961.”

Learned Template:

“{Entity Name} was born

on (Entity Birthdate}”

Observed Text:

“George Bush was born

on July 6, 1946”

New Fact:

Date-Of-Birth

[George Bush,

July 6, 1946]

Search corpus

for seed text

Infer

TemplateSearch corpus

for text that

matches template.

Infer

Fact

Page 32: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Examples Browser Demo

Page 33: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

High Scale Applications

Page 34: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Office 365 E-Mail Clutter

- Sender- Recipients- To Distribution List?- Is Meeting Request?- Has attachment?…

Extract FeaturesClassifier

Clutter

NotClutter

Page 35: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Gaussian

Sum

Product

Gaussian

GammaGaussianCommunity

FeatureWeights

UserMailbox

PersonalFeatureWeights

ForAction

E-mailFeature

Data

t > 0

noise

xwWeight

t

y

wm wp

wm prior wp prior wm : weight meanwp : weight precisionw : weightt : score

x : feature datay : action label

Community Training:1. Compute e-mail features x for many

pieces of e-mail for many users. 2. Observe y .3. Infer wm and wp for each feature.

These are the priors for the weights

w.

Classification:1. Compute features x for each e-mail.

2. Use weights w learned for each user to date.

3. Infer y for each e-mail.

Daily Online Training:1. Train on new user e-mails older than

a fixed period of time (e.g. 1 week).2. Compute e-mail features x. 3. Observe y by looking at action the

user actually took.4. Infer new weights w for each feature

using current weights as priors.

Page 36: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Recommendation

Page 37: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

prefer escapist

preferserious

preferseriousdrama

preferaction

oriented3.0

1.0escapist serious

seriousdrama

actionoriented

-1.5

2.5

×escapist serious

seriousdrama

actionoriented-4.5

2.5

+user-movie affinity

-2.0

user traits movie traits

Recommender Generative Model

Page 38: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

item

instance

user

trait

𝒩

user traits

item traits

user bias

item bias

noisy affinity

products

𝒩

affinitynoise

affinity

x

+

𝒩

𝒩

rating

>affinitythreshold

user thresholds

𝒩

user bias

featureweights

user featurevalues

user traitfeature weights

𝒩

item traitfeature weights

itemfeaturevalues

itembias

featureweights

𝒩 𝒩

𝒩𝒩

Recommender Model

Matchbox: Large Scale Online Bayesian Recommendations. Stern, Herbrich and Graepel, WWW ‘09, 2009.

Page 39: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Why .NET/C#?

• Pros• Industrial strength, high-scale production ready

• Easily connect to large databases

• Threading, parallel, asynchronous processing

• High quality profilers and debuggers

• Cons• Not as nice for rapid prototyping as Python or MatLab

• Weaker on visualisation/plotting

Page 40: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Issues

• Not really a language• Set of APIs for doing Bayesian inference on graphical

models

• Inference• Inference (e.g. EP) not always possible for all models.

• Can fail in mysterious ways (model symmetry, lack of convergence, etc.).

• Hard to know whether inference actually worked.

• Tricky to debug – no inference debugger.

• Performance• Compiler only generates single-threaded code

• Can manually “parallelise”, but this can be difficult

Page 41: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Current Status

• Actively worked on at Microsoft Research Cambridge

• New features aimed at supporting specific research interests or products

• License• Currently: academic use only

• No commercial license

Page 42: Probabilistic Programming with Infercbl.eng.cam.ac.uk/pub/Intranet/MLG/ReadingGroup/Bronskill_infer_N… · Probabilistic Programming “We want to do for machine learning what the

Thanks and Questions

• Infer.NET http://research.microsoft.com/infernet


Top Related