simple   bayesian supervised models

26
Simple Bayesian Supervised Models Saskia Klein & Steffen Bollmann 1

Upload: ojal

Post on 15-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Simple   Bayesian Supervised Models. Saskia Klein & Steffen Bollmann. Content. Recap from last weak Bayesian  Linear  Regression What is linear regression? Application of the Bayesian Theory on Linear Regression Example Comparison to Conventional Linear Regression - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Simple   Bayesian Supervised Models

Simple  Bayesian  Supervised  Models

Saskia Klein & Steffen Bollmann

1

Page 2: Simple   Bayesian Supervised Models

Content

Saskia Klein & Steffen Bollmann 2

Recap from last weak Bayesian  Linear  Regression

What is linear regression? Application of the Bayesian Theory on Linear

Regression Example Comparison to Conventional Linear Regression

Bayesian  Logistic  Regression Naive  Bayes  classifier

Source:   Bishop  (ch.  3,4); Barber (ch. 10)

Page 3: Simple   Bayesian Supervised Models

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.evidence

likelihoodprior

posterior

Page 4: Simple   Bayesian Supervised Models

Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a

prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

• For any member of the exponential family, there exists a conjugate prior that can be written in the form

• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma

Page 5: Simple   Bayesian Supervised Models

Linear Regression

Saskia Klein & Steffen Bollmann 5

goal: predict the value of a target variable given the value of a D-dimensional vector of input variables

linear regression models: linear functions of the adjustable parameters𝐱

𝑡

for example:

Page 6: Simple   Bayesian Supervised Models

Linear Regression

Saskia Klein & Steffen Bollmann 6

Training … training data set comprising observations,

where … corresponding target values compute the weights

Prediction goal: predict the value of for a new value of = model the predictive distribution and make predictions of in such a way as to

minimize the expected value of a loss function

Page 7: Simple   Bayesian Supervised Models

Examples of linear regression models

Saskia Klein & Steffen Bollmann 7

simplest linear regression model: linear function of the weights/parameters and the

data linear regression models using basis

functions :

Page 8: Simple   Bayesian Supervised Models

Bayesian Linear Regression

Saskia Klein & Steffen Bollmann 8

model: … target variable … model … data … weights/parameters … additive Gaussian noise: with zero mean and

precision (inverse variance)

Page 9: Simple   Bayesian Supervised Models

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.evidence

likelihoodprior

posterior

Page 10: Simple   Bayesian Supervised Models

Bayesian Linear Regression - Likelihood

Saskia Klein & Steffen Bollmann 10

likelihood function:

observation of N training data sets of inputs and target values (independently drawn from the distribution)

Page 11: Simple   Bayesian Supervised Models

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.evidence

likelihoodprior

posterior

Page 12: Simple   Bayesian Supervised Models

Conjugate prior• In general, for a given probability distribution p(x|η), we can seek a

prior p(η) that is conjugate to the likelihood function, so that the posterior distribution has the same functional form as the prior.

• For any member of the exponential family, there exists a conjugate prior that can be written in the form

• Important conjugate pairs include:Binomial – BetaMultinomial – DirichletGaussian – Gaussian (for mean)Gaussian – Gamma (for precision)Exponential – Gamma

Page 13: Simple   Bayesian Supervised Models

Bayesian Linear Regression - Prior

Saskia Klein & Steffen Bollmann 13

prior probability distribution over the model parameters 

conjugate prior: Gaussian distribution

mean and covariance

Page 14: Simple   Bayesian Supervised Models

Maximum a posterior estimation• The bayesian approach to estimate parameters of the

distribution given a set of observations is to maximize posterior distribution.

• It allows to account for the prior information.evidence

likelihoodprior

posterior

Page 15: Simple   Bayesian Supervised Models

Bayesian Linear Regression – Posterior Distribution

Saskia Klein & Steffen Bollmann 15

due to the conjugate prior, the posterior will also be Gaussian

(derivation: Bishop p.112)

Page 16: Simple   Bayesian Supervised Models

Example Linear Regression

Saskia Klein & Steffen Bollmann 16

matlab

Page 17: Simple   Bayesian Supervised Models

Predictive Distribution

Saskia Klein & Steffen Bollmann 17

making predictions of for new values of predictive distribution:

variance of the distribution:

first term represents the noise in the data second term reflects the uncertainty

associated with the parameters optimal prediction, for a new value of , would

be the conditional mean of the target variable:

Page 18: Simple   Bayesian Supervised Models

Common Problem in Linear Regression: Overfitting/model complexitiy

Saskia Klein & Steffen Bollmann 18

Least Squares approach (maximizing the likelihood): point estimate of the weights Regularization: regularization term and value

needs to be chosen Cross-Validation: requires large datasets and high

computational power Bayesian approach:

distribution of the weights good prior model comparison: computationally demanding,

validation data not required

Page 19: Simple   Bayesian Supervised Models

From Regression to Classification

Saskia Klein & Steffen Bollmann 19

for regression problems: target variable was the vector of real numbers

whose values we wish to predict in case of classification:

target values represent class labels two-class problem: K > 2: class 2

Page 20: Simple   Bayesian Supervised Models

Classification

Saskia Klein & Steffen Bollmann 20

goal: take an input vector and assign it to one of discrete classes

decision boundary

Page 21: Simple   Bayesian Supervised Models

Bayesian Logistic Regression

Saskia Klein & Steffen Bollmann 21

model the class-conditional densities and the prior probabilities and apply Bayes Theorem:

Page 22: Simple   Bayesian Supervised Models

Bayesian Logistic Regression

Saskia Klein & Steffen Bollmann 22

exact Bayesian inference for logistic regression is intractable

Laplace approximation aims to find a Gaussian approximation to a

probability density defined over a set of continuous variables

posterior distribution is approximated around

Page 23: Simple   Bayesian Supervised Models

Example

Saskia Klein & Steffen Bollmann 23

Barber: DemosExercises\demoBayesLogRegression.m

Page 24: Simple   Bayesian Supervised Models

Example

Saskia Klein & Steffen Bollmann 24

Barber: DemosExercises\demoBayesLogRegression.m

Page 25: Simple   Bayesian Supervised Models

Naive  Bayes  classifier

Saskia Klein & Steffen Bollmann 25

Why naive? strong independence assumptions assumes that the presence/absence of a feature of

a class is unrelated to the presence/absence of any other feature, given the class variable

Ignores relation between features and assumes that all feature contribute independently to a class

[http://en.wikipedia.org/wiki/Naive_Bayes_classifier]

Page 26: Simple   Bayesian Supervised Models

Saskia Klein & Steffen Bollmann

Thank you for your attention

26