an introduction to bayesian machine learning for multimedia information processing...
TRANSCRIPT
-
An Introduction to Bayesian Machine Learning forMultimedia Information Processing
Part I - Introduction
A. Taylan Cemgil
Signal Processing and Communications Lab.
2008 IEEE ICME Tutorial, 23 June, Hannover
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover
-
Goals of this Tutorial
• Provide a basic understanding of underlying principles ofprobabilistic modeling and Bayesian inference
• Orientation in the broad literature of Bayesian machine learningand statistical signal processing
• Focus on fundamental concepts rather than technical details,
. . . we avoid heavy use of algebra by a graphical notation
. . . but there will be some maths
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 1
-
Goals of this Tutorial
• Model based approach
. . . rather than description of algorithms for solving specific problems
• Illustrate with examples how certain problems in multimedia signalanalysis can be approached using generic tools
• Motivate participants to investigate further
. . . provide alternative perspective to existing solutions
. . . and hopefully provide new inspiration
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 2
-
Part I, Introduction
• Introduction
– Bayes’ Theorem,– Trivial toy example to clarify notation
• Graphical Models
– Bayesian Networks– Undirected Graphical models, Markov Random Fields– Factor graphs
• Maximum Likelihood, Penalised Likelihood, Bayesian Learning
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 3
-
Part II, Basic Modelling and Inference Strategies
• Basic Building Blocks in model construction
– Probability distributions, Exponential family
• Approximate Inference
– Stochastic∗ Markov Chain Monte Carlo (MCMC), Gibbs sampler∗ Simulated Annealing, Iterative Improvement (SA - II)
– Deterministic Inference∗ Variational Bayes (VB)∗ Expectation-Maximisation (EM)∗ Iterative conditional modes (ICM)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 4
-
Part III, Models and Applications
• Hidden Markov Models,
– Tempo tracking, Score-performance matching– Inference in Hidden Markov Models∗ Forward Backward Algorithm∗ Viterbi∗ Exact inference by message passing: Belief Propagation
• Linear Dynamical systems, Kalman Filter Models
– Tracking– Computer Accompaniment– Kalman Filtering and Smoothing– Audio Restoration and Interpolation⋆
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 5
-
• Nonlinear Dynamical Systems
– Object tracking in video– Importance sampling, Particle Filtering, Sequential Monte Carlo– Switching State Space models, Changepoint Models– Pitch tracking
• Markov Random Fields
– Denoising, Source Separation
• Topic-Term-Document Models
– Latent Semantic indexing– Generative aspect model– Non Negative Matrix Factorisation
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 6
-
• Factorial Models, Model selection
– Audio Source Separation– Polyphonic Pitch Tracking
• Final Remarks and Bibliography
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 7
-
Bayes’ Theorem [1, 3]
Thomas Bayes (1702-1761)
What you know about a parameter λ after the data D arrive iswhat you knew before about λ and what the data D told you.
p(λ|D) =p(D|λ)p(λ)
p(D)
Posterior =Likelihood × Prior
Evidence
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 8
-
An application of Bayes’ Theorem: “Source Separation”
Given two fair dice with outcomes λ and y,
D = λ + y
What is λ when D = 9 ?
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 9
-
An application of Bayes’ Theorem: “Source Separation”
D = λ + y = 9
D = λ + y y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 2 3 4 5 6 7λ = 2 3 4 5 6 7 8λ = 3 4 5 6 7 8 9λ = 4 5 6 7 8 9 10λ = 5 6 7 8 9 10 11λ = 6 7 8 9 10 11 12
Bayes theorem “upgrades” p(λ) into p(λ|D).
But you have to provide an observation model: p(D|λ)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 10
-
“Bureaucratical” derivation
Formally we write
p(λ) = C(λ; [ 1/6 1/6 1/6 1/6 1/6 1/6 ])
p(y) = C(y; [ 1/6 1/6 1/6 1/6 1/6 1/6 ])
p(D|λ, y) = δ(D − (λ + y))
p(λ, y|D) =1
p(D)× p(D|λ, y) × p(y)p(λ)
Posterior =1
Evidence× Likelihood × Prior
Kronecker delta function denoting a degenerate (deterministic) distribution δ(x) ={
1 x = 00 x 6= 0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 11
-
Prior
p(y)p(λ)
p(y) × p(λ) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 1/36 1/36 1/36 1/36 1/36 1/36λ = 2 1/36 1/36 1/36 1/36 1/36 1/36λ = 3 1/36 1/36 1/36 1/36 1/36 1/36λ = 4 1/36 1/36 1/36 1/36 1/36 1/36λ = 5 1/36 1/36 1/36 1/36 1/36 1/36λ = 6 1/36 1/36 1/36 1/36 1/36 1/36
• A table with indicies λ and y
• Each cell denotes the probability p(λ, y)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 12
-
Likelihood
p(D = 9|λ, y)
p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0λ = 2 0 0 0 0 0 0λ = 3 0 0 0 0 0 1λ = 4 0 0 0 0 1 0λ = 5 0 0 0 1 0 0λ = 6 0 0 1 0 0 0
• A table with indicies λ and y
• The likelihood is not a probability distribution, but a positive function.
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 13
-
Likelihood × Prior
φD(λ, y) = p(D = 9|λ, y)p(λ)p(y)
p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0λ = 2 0 0 0 0 0 0λ = 3 0 0 0 0 0 1/36λ = 4 0 0 0 0 1/36 0λ = 5 0 0 0 1/36 0 0λ = 6 0 0 1/36 0 0 0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 14
-
Evidence (= Marginal Likelihood)
p(D = 9) =∑
λ,y
p(D = 9|λ, y)p(λ)p(y)
= 0 + 0 + · · · + 1/36 + 1/36 + 1/36 + 1/36 + 0 + · · · + 0
= 1/9
p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0λ = 2 0 0 0 0 0 0λ = 3 0 0 0 0 0 1/36λ = 4 0 0 0 0 1/36 0λ = 5 0 0 0 1/36 0 0λ = 6 0 0 1/36 0 0 0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 15
-
Posterior
p(λ, y|D = 9) =1
p(D)p(D = 9|λ, y)p(λ)p(y)
p(D = 9|λ, y) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0λ = 2 0 0 0 0 0 0λ = 3 0 0 0 0 0 1/4λ = 4 0 0 0 0 1/4 0λ = 5 0 0 0 1/4 0 0λ = 6 0 0 1/4 0 0 0
1/4 = (1/36)/(1/9)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 16
-
Marginal Posterior
p(λ|D) =∑
y
1
p(D)p(D|λ, y)p(λ)p(y)
p(λ|D = 9) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0 0λ = 2 0 0 0 0 0 0 0λ = 3 1/4 0 0 0 0 0 1/4λ = 4 1/4 0 0 0 0 1/4 0λ = 5 1/4 0 0 0 1/4 0 0λ = 6 1/4 0 0 1/4 0 0 0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 17
-
The “proportional to” notation
p(λ|D = 9) ∝ p(λ,D = 9) =∑
y
p(D = 9|λ, y)p(λ)p(y)
p(λ,D = 9) y = 1 y = 2 y = 3 y = 4 y = 5 y = 6
λ = 1 0 0 0 0 0 0 0λ = 2 0 0 0 0 0 0 0λ = 3 1/36 0 0 0 0 0 1/36λ = 4 1/36 0 0 0 0 1/36 0λ = 5 1/36 0 0 0 1/36 0 0λ = 6 1/36 0 0 1/36 0 0 0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 18
-
Another application of Bayes’ Theorem: “Model Selection”
Given an unknown number of fair dice with outcomes λ1, λ2, . . . , λn,
D =n∑
i=1
λi
How many dice are there when D = 9 ?
Assume that any number n is equally likely
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 19
-
Another application of Bayes’ Theorem: “Model Selection”
Given all n are equally likely (i.e., p(n) is flat), we calculate (formally)
p(n|D = 9) =p(D = 9|n)p(n)
p(D)∝ p(D = 9|n)
p(D|n = 1) =∑
λ1
p(D|λ1)p(λ1)
p(D|n = 2) =∑
λ1
∑
λ2
p(D|λ1, λ2)p(λ1)p(λ2)
. . .
p(D|n = n′) =∑
λ1,...,λn′
p(D|λ1, . . . , λn′)n′∏
i=1
p(λi)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 20
-
p(D|n) =∑
λp(D|λ, n)p(λ|n)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200
0.2
p(D
|n=
1)
D
0
0.2
p(D
|n=
2)
0
0.2
p(D
|n=
3)
0
0.2
p(D
|n=
4)0
0.2
p(D
|n=
5)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 21
-
Another application of Bayes’ Theorem: “Model Selection”
1 2 3 4 5 6 7 8 90
0.1
0.2
0.3
0.4
0.5
n = Number of Dice
p(n|
D =
9)
• Complex models are more flexible but they spread their probability mass
• Bayesian inference inherently prefers “simpler models” – Occam’s razor
• Computational burden: We need to sum over all parameters λ
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 22
-
Probabilistic Inference
A huge spectrum of applications – all boil down to computation of
• expectations of functions under probability distributions: Integration
〈f(x)〉 =
∫
X
dxp(x)f(x) 〈f(x)〉 =∑
x∈X
p(x)f(x)
• modes of functions under probability distributions: Optimization
x∗ = argmaxx∈X
p(x)f(x)
• any “mix” of the above: e.g.,
x∗ = argmaxx∈X
p(x) = argmaxx∈X
∫
Z
dzp(z)p(x|z)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 23
-
Divide and Conquer
Probabilistic modelling provides a methodology that puts a cleardivision between
• What to solve : Model Construction
– Both an Art and Science– Highly domain specific
• How to solve : Inference Algorithm
– Mechanical (In theory! not in practice)– Generic
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 24
-
Exercise
p(x1, x2) x2 = 1 x2 = 2x1 = 1 0.3 0.3x1 = 2 0.1 0.3
1. Find the following quantities
• Marginals: p(x1), p(x2)• Conditionals: p(x1|x2), p(x2|x1)• Posterior: p(x1, x2 = 2), p(x1|x2 = 2)• Evidence: p(x2 = 2)• p({})• Max: p(x∗1) = maxx1 p(x1|x2 = 1)• Mode: x∗1 = arg maxx1 p(x1|x2 = 1)• Max-marginal: maxx1 p(x1, x2)
2. Are x1 and x2 independent ? (i.e., Is p(x1, x2) = p(x1)p(x2) ?)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 25
-
Answers
p(x1, x2) x2 = 1 x2 = 2x1 = 1 0.3 0.3x1 = 2 0.1 0.3
• Marginals:
p(x1)x1 = 1 0.6x1 = 2 0.4
p(x2) x2 = 1 x2 = 20.4 0.6
• Conditionals:
p(x1|x2) x2 = 1 x2 = 2x1 = 1 0.75 0.5x1 = 2 0.25 0.5
p(x2|x1) x2 = 1 x2 = 2x1 = 1 0.5 0.5
x1 = 2 0.25 0.75
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 26
-
Answers
p(x1, x2) x2 = 1 x2 = 2x1 = 1 0.3 0.3x1 = 2 0.1 0.3
• Posterior:
p(x1, x2 = 2) x2 = 2x1 = 1 0.3x1 = 2 0.3
p(x1|x2 = 2) x2 = 2x1 = 1 0.5x1 = 2 0.5
• Evidence:p(x2 = 2) =
∑
x1
p(x1, x2 = 2) = 0.6
• Normalisation constant:
p({}) =∑
x1
∑
x2
p(x1, x2) = 1
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 27
-
Answers
p(x1, x2) x2 = 1 x2 = 2x1 = 1 0.3 0.3x1 = 2 0.1 0.3
• Max: (get the value)max
x1p(x1|x2 = 1) = 0.75
• Mode: (get the index)argmax
x1
p(x1|x2 = 1) = 1
• Max-marginal: (get the “skyline”) maxx1 p(x1, x2)
maxx1 p(x1, x2) x2 = 1 x2 = 20.3 0.3
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 28
-
Exercise: Continuous Random variables
−0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
x
p(x,c=1)
p(x,c=2)
• Evaluate
– p(c), p(x = 0) and p(c|x = 10)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 29
-
Exercise: Continuous Random variables
• x, a conditionally Gaussian Random variable and c is discrete
p(x|c) = N (x; µ(c), v(c)) ≡1
√
2πv(c)exp(−
1
2
(x − µ(c))2
v(c))
• In this example we take
p(x, c = 1) = 0.6N (x; 0, 0.01) p(x, c = 2) = 0.4N (x; 0.2, 0.03)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 30
-
Solutions
• p(x = 0) is a number, that we calculate here with Matlab (or octave)
>> x = 0; mu = [0, 0.2]; v = [0.01 0.03]; pc = [0.6 0.4];>> pc. * (2 * pi * v).ˆ(-1/2). * exp(-0.5 * (x-mu).ˆ2./v)ans =
2.3937 0.4730>> sum(ans)ans =
2.8667
• Note: This works here adding exp’s can be numerically instable
• How come that the “probability” is larger than one ?
– p(x) is a density. The probability is p(x)dx
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 31
-
Solution
−0.5 0 0.5 10
0.5
1
1.5
2
2.5
3
x
p(x,c=1)
p(x,c=2)
p(x)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 32
-
Solutions
• p(c|x = 10) is a distribution,
p(c|x = 10) = p(c, x = 10)/p(x = 10)
• We calculate here with Matlab (or octave)
>> x = 10; mu = [0, 0.2]; v = [0.01 0.03]; pc = [0.6 0.4];>> pcx = pc. * (2 * pi * v).ˆ(-1/2). * exp(-0.5 * (x-mu).ˆ2./v)pcx =
0 0>> pcx/sum(pcx)Warning: Divide by zero.ans =
NaN NaN
• Problem : Underflow . We ALWAYS work with log-densities in practice
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 33
-
Solutions
−0.5 0 0.5 1
10−20
10−15
10−10
10−5
100
x
logp(x,c=1) logp(x,c=2)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 34
-
Solutions
• The log-density is
log p(x|c) = logN (x; µ(c), v(c)) ≡ −1
2log(2πv(c)) −
1
2
(x − µ(c))2
v(c)
>> lpc = log([0.6 0.4]);lpcx = lpc - 1/2 * log(2 * pi * v) -0.5 * (x-mu).ˆ2./v
1.0e+003 *
-4.9991 -1.6007>> lpcx - log(sum(exp(lpcx)))Warning: Log of zero.ans =
Inf Inf
• Problem still persists. (we exp very small numbers)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 35
-
Numerically Stable computation of log(∑
i exp(li)))
• Derivation
L = log(∑
i
exp(li))
= log(∑
i
exp(li)exp(l∗)
exp(l∗))
= log(exp(l∗)∑
i
exp(li − l∗))
= l∗ + log(∑
i
exp(li − l∗))
• We take l∗ as the maximum l∗ = maxi li
• Exercise: Implement above as a function logsumexp(l)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 36
-
Solutions
• >> lpc = log([0.6 0.4]);lpcx = lpc - 1/2 * log(2 * pi * v) -0.5 * (x-mu).ˆ2./v
1.0e+003 *-4.9991 -1.6007
>> lpcx - logsumexp(lpcx)ans =
1.0e+003 *-3.3984 0
>> exp(ans)ans =
0 1
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 37
-
Probability Models
+
Inference Algorithms
=
Bayesian Numerical Methods
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 38
-
Applications of Probability Models
• Classification
• Optimal Decision, given a loss function
• Finding interesting (hidden) structure
– Clustering, Segmentation– Dimensionality Reduction– Outlier Detection
• Finding a compact representation = Data Compression
• Prediction
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 39
-
Graphical Models
• formal languages for specification of probability models andassociated inference algorithms
• historically, introduced in probabilistic expert systems (Pearl 1988)as a visual guide for representing expert knowledge
• today, a standard tool in machine learning, statistics and signalprocessing
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 40
-
Graphical Models
• provide graph based algorithms for derivations and computation
• pedagogical insight/motivation for model/algorithm construction
– Statistics:“Kalman filter models and hidden Markov models (HMM) are equivalent uptoparametrisation”
– Signal processing:“Fast Fourier transform is an instance of sum-product algorithm on a factorgraph”
– Computer Science:“Backtracking in Prolog is equivalent to inference in Bayesian networks withdeterministic tables”
• Automated tools for code generation start to emerge, making thedesign/implement/test cycle shorter
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 41
-
Important types of Graphical Models
• Useful for Model Construction
– Directed Acyclic Graphs (DAG), Bayesian Networks– Undirected Graphs, Markov Networks, Random Fields– Influence diagrams– ...
• Useful for Inference
– Factor Graphs– Junction/Clique graphs– Region graphs– ...
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 42
-
Directed Graphical models (DAG)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 43
-
DAG Example: Two dice
p(λ) p(y)
λ y
D
p(D|λ, y)
p(D, λ, y) = p(D|λ, y)p(λ)p(y)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 44
-
DAG with observations
p(λ) p(y)
λ y
D
p(D = 9|λ, y)
φD(λ, y) = p(D = 9|λ, y)p(λ)p(y)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 45
-
Directed Graphical models
• Each random variable is associated with a node in the graph,
• We draw an arrow from A → B if p(B| . . . , A, . . . ) (A ∈ parent(B)),
• The edges tell us qualitatively about the factorization of the jointprobability
• For N random variables x1, . . . , xN , the distribution admits
p(x1, . . . , xN) =N∏
i=1
p(xi|parent(xi))
• Describes in a compact way an algorithm to “generate” the data –“Generative models”
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 46
-
Examples
Model Structure factorization
Full x1 x2 x3 x4 p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x1, x2, x3)Markov(2) x1 x2 x3 x4 p(x1)p(x2|x1)p(x3|x1, x2)p(x4|x2, x3)Markov(1) x1 x2 x3 x4 p(x1)p(x2|x1)p(x3|x2)p(x4|x3)
x1 x2 x3 x4 p(x1)p(x2|x1)p(x3|x1)p(x4)Factorized x1 x2 x3 x4 p(x1)p(x2)p(x3)p(x4)
Removing edges eliminates a term from the conditional probability factors.
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 47
-
Undirected Graphical Models
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 48
-
Undirected Graphical Models
• Define a distribution by non-negative local compatibility functions φ(xα)
p(x) =1
Z
∏
α
φ(xα)
where α runs over cliques : fully connected subsets
• Examplesx1
x2 x3
x4
x1
x2 x3
x4
p(x) = 1Zφ(x1, x2)φ(x1, x3)φ(x2, x4)φ(x3, x4) p(x) =
1Zφ(x1, x2, x3)φ(x2, x3, x4)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 49
-
Possible Model Topologies
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 50
-
Factor graphs
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 51
-
Factor graphs [2]
• A bipartite graph. A powerful graphical representation of the inference problem
– Factor nodes : Black squares. Factor potentials (local functions) definingthe posterior.
– Variable nodes : White Nodes. Define collections of random variables– Edges : denote membership. A variable node is connected to a factor node
if a member variable is an argument of the local function.
p(λ) p(y)
λ y
p(D = 9|λ, y)
φD(λ, y) = p(D = 9|λ, y)p(λ)p(y) = φ1(λ, y)φ2(λ)φ3(y)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 52
-
Exercise
• For the following Graphical models, write down the factors of the jointdistribution and plot an equivalent factor graph and an undirected graph.
Full x1 x2 x3 x4 Markov(1) x1 x2 x3 x4HMM
h1 h2 h3 h4x1 x2 x3 x4 MIX hx1 x2 x3 x4IFA
h1 h2x1 x2 x3 x4 Factorized x1 x2 x3 x4Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 53
-
Answer (Markov(1))
x1 x2 x3 x4
p(x1)
x1
p(x2|x1)
x2
p(x3|x2)
x3
p(x4|x3)
x4
x1 x2 x3 x4
p(x1)p(x2|x1)︸ ︷︷ ︸
φ(x1,x2)
p(x3|x2)︸ ︷︷ ︸
φ(x2,x3)
p(x4|x3)︸ ︷︷ ︸
φ(x3,x4)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 54
-
Answer (IFA – Factorial)h1 h2x1 x2 x3 x4p(h1)p(h2)
4∏
i=1
p(xi|h1, h2)
h1 h2
x1 x2 x3 x4
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 55
-
Answer (IFA – Factorial)
h1 h2
x1 x2 x3 x4
• We can also cluster nodes together
h1, h2
x1 x2 x3 x4
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 56
-
Inference and Learning
• Data setD = {x1, . . . xN}
• Model with parameter λp(D|λ)
• Maximum Likelihood (ML)
λML = arg maxλ
log p(D|λ)
• Predictive distribution
p(xN+1|D) ≈ p(xN+1|λML)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 57
-
Regularisation
• Priorp(λ)
• Maximum a-posteriori (MAP) : Regularised Maximum Likelihood
λMAP = arg maxλ
log p(D|λ)p(λ)
• Predictive distribution
p(xN+1|D) ≈ p(xN+1|λMAP)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 58
-
Bayesian Learning
• We treat parameters on the same footing as all other variables
• We integrate over unknown parameters rather than using pointestimates (remember the many-dice example)
– Self-regularisation, avoids overfitting– Natural setup for online adaptation– Model selection
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 59
-
Bayesian Learning
• Predictive distribution
p(xN+1|D) =
∫
dλ p(xN+1|λ)p(λ|D)
λ
x1 x2 . . . xN xN+1
• Bayesian learning is just inference ...
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 60
-
Example Applications and Models
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 61
-
Audio Restoration
• During download or transmission, some samples of audio are lost
• Estimate missing samples given clean ones
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 62
-
Examples: Audio Restoration
p(x¬κ|xκ) ∝
∫
dHp(x¬κ|H)p(xκ|H)p(H)
H ≡ (parameters, hidden states)
H
x¬κ xκ
Missing Observed
0 50 100 150 200 250 300 350 400 450 500
0
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 63
-
Restoration
(Cemgil and Godsill 2005 [?])
• Piano
– Signal with missing samples (37%)– Reconstruction, 7.68 dB improvement– Original
• Trumpet
– Signal with missing samples (37%)– Reconstruction, 7.10 dB improvement– Original
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 64
piano_missing.wavMedia File (audio/wav)
piano_kalman.wavMedia File (audio/wav)
piano_clean.wavMedia File (audio/wav)
trumpet_missing.wavMedia File (audio/wav)
trumpet_kalman.wavMedia File (audio/wav)
trumpet_clean.wavMedia File (audio/wav)
-
Interpolation of Images
H
x¬κ xκ
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 65
-
Interpolation of Images
Data (25% Missing) Variational Bayes+ICM NMF ML NMF2
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 66
-
Medical Expert Systems
A S
T L B
E
X D
Diseases
Symptomes
Causes
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 67
-
Medical Expert Systems
Visit to Asia? Smoking?
Tuberclosis? Lung Cancer? Bronchitis?
Either T or L?
Positive X Ray? Dyspnoea?
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 68
-
Medical Expert Systems
Visit to Asia?0 %991 %1
Smoking?0 %501 %50
Tuberclosis?0 %991 %1
Lung Cancer?0 %94.51 %5.5
Bronchitis?0 %551 %45
Either T or L?0 %93.51 %6.5
Positive X Ray?0 %891 %11
Dyspnoea?0 %56.41 %43.6
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 69
-
Medical Expert Systems
Visit to Asia?0 %98.71 %1.3
Smoking?0 %31.21 %68.8
Tuberclosis?0 %90.81 %9.2
Lung Cancer?0 %51.11 %48.9
Bronchitis?0 %49.41 %50.6
Either T or L?0 %42.41 %57.6
Positive X Ray?0 %01 %100
Dyspnoea?0 %35.91 %64.1
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 70
-
Medical Expert Systems
Visit to Asia?0 %98.51 %1.5
Smoking?0 %1001 %0
Tuberclosis?0 %85.21 %14.8
Lung Cancer?0 %85.81 %14.2
Bronchitis?0 %701 %30
Either T or L?0 %71.11 %28.9
Positive X Ray?0 %01 %100
Dyspnoea?0 %561 %44
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 71
-
Model Selection: Variable selection in Polynomial Regress ion
• Given D = {tj, x(tj)}j=1...J , what is the order N of the polynomial?
x(t) = s1 + s2t + s3t2 + s4t
3 + · · · + ǫ(t)
−1 −0.5 0 0.5 1−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 72
-
Bayesian Variable Selection
C(r1; π) C(rW ; π)
r1 . . . rW
N (s1; µ(r1), Σ(r1)) s1 . . . sW N (sW ; µ(rW ), Σ(rW ))
x
N (x; Cs1:W , R)
• Generalized Linear Model – Column’s of C are the basis vectors
• The exact posterior is a mixture of 2W Gaussians
• When W is large, computation of posterior features becomes intractable.
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 73
-
Regression
i
0
1
2
3
4
−10
0
10
20
30
p(x,
r1:
W)
All on Configurations All off
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 74
-
Regression
−1 −0.5 0 0.5 1−0.2
0
0.2
0.4
0.6
0.8
1
1.2datatrueapprox
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 75
-
Clustering
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 76
-
Clustering
π Label probability
c1 c2 . . . cN Labels ∈ {a, b}
x1 x2 . . . xN Data Points
µa µb Cluster Centers
(µ∗a, µ∗b, π
∗) = argmaxµa,µb,π
∑
c1:N
N∏
i=1
p(xi|µa, µb, ci)p(ci|π)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 77
-
Computer vision / Cognitive Science
How many rectangles are there in this image?
0 10 20 30 40 50 60
0
5
10
15
20
25
30
35
40
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 78
-
Computer Vision
How many people are there in these images?
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 79
-
Visual Tracking
20 40 60
20
40
60
80
100
120
14020 40 60
20
40
60
80
100
120
14020 40 60
20
40
60
80
100
120
14020 40 60
20
40
60
80
100
120
140
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 80
-
Navigation, Robotics : Sensor Fusion
−0.5 0 0.5−0.6
−0.4
−0.2
0
0.2
0.4
0.6
−2 0 2
−2
−1
0
1
2
−20
2
−2
0
2
0
2
4
f
Lx
Ly
−0.5 0 0.5−0.6
−0.4
−0.2
0
0.2
0.4
0.6
−2 0 2
−2
−1
0
1
2
−20
2
−20
2
0
2
4
6
8
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 81
-
Navigation, Robotics : Sensor Fusion
GPS?t GPS status
Gt GPS reading
... Other sensors (magnetic, pressure, e.t.c.)
lt Linear accelerator sensor
ωt Gyroscope
Et−1 Et Attitude Variables
Xt−1 Xt Linear Kinematic Variables
{ξ1:Nt}t Set of feature points (Camera Frame)
{x1:Mt}t Set of feature points (World Coordinates)
ρ(x) Global Static Map (Intensity function)
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 82
-
Computer Accompaniment
(Music Plus One, Raphael 2000 [?], Dannenberg and Raphael 2006)
c0 c1 . . . ck−1 ck . . . cK
s0 s1 . . . sk−1 sk . . . sK
y1 . . . yk−1 yk . . . yK
ya1 . . . yak−1 y
ak . . . y
aK
a0 a1 . . . ak−1 ak . . . aK
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 83
-
References
[1] E. T. Jaynes. Probability Theory, The Logic of Science. Cambridge University Press, edited by G. L. Bretthorst,2003.
[2] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. Factor graphs and the sum-product algorithm. IEEETransactions on Information Theory, 47(2):498–519, February 2001.
[3] D. J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press, 2003.
Cemgil IEEE ICME 2008 - Introduction to Bayesian Machine Learning for Multimedia. June 23, 2008, Hannover 84