hierarchical bayesian models for audio and music signal ...cemgil/papers/talks/... · signal...

86
Hierarchical Bayesian Models for Audio and Music Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. 8 December 2007 NIPS 07 Workshop on Music Cemgil Hierarchical Bayesian Models for Music Signal Analysis. Nips 2007 Workshops, Whistler, Canada, 1 December 2007

Upload: others

Post on 16-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Hierarchical Bayesian Models for Audio and Music

Signal Processing

A Taylan Cemgil

Signal Processing and Communications Lab

8 December 2007NIPS 07 Workshop on Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007

Colaborators

bull Onur Dikmen Bogazici Istanbul

bull Paul Peeling Cambridge

bull Nick Whiteley Cambridge

bull Simon Godsill Cambridge

bull Cedric Fevotte ENST Paris Telecom

bull David Barber UCL London

bull Bert Kappen Nijmegen The Netherlands

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 1

Statistical Approaches

bull Probabilistic

bull Hierarchical signal models to incorporate prior knowledgeinspiration

from various sources

ndash Physics (acoustics physical models )

ndash Studies of human cognition and perception (masking psychoacoustics )

ndash Musicology (musical constructs harmony tempo form )

bull Consistent framework for developing inference algorithms

bull Contrast to TraditionalProcedural approaches ndash where no clear

distinction between ldquowhatrdquo and ldquohowrdquo

bull Need to overcome computational obstacles (time memory)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2

Generative Models for audition

bull Computer audition hArr inverse synthesis via Bayesian inference

p(Structure|Observations) prop p(Observations|Structure)p(Structure)

Goal Developing flexible prior structures for modelling nonstationary

sources

lowast source separation transcription

lowast restoration interpolation localisation identification

lowast coding compression resynthesis cross synthesis

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3

Bayesian Source Separation

bull Joint estimation of Sources given Observations

Source Model v Parameters of Source prior

sk1 skn skN v

xk1 xkM

k = 1 K

λ

Observation Model λ Channel noise mixing system

p(Src|Obs) prop

int

dλdvp(Obs|Src λ)p(Src|v)p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 2: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Colaborators

bull Onur Dikmen Bogazici Istanbul

bull Paul Peeling Cambridge

bull Nick Whiteley Cambridge

bull Simon Godsill Cambridge

bull Cedric Fevotte ENST Paris Telecom

bull David Barber UCL London

bull Bert Kappen Nijmegen The Netherlands

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 1

Statistical Approaches

bull Probabilistic

bull Hierarchical signal models to incorporate prior knowledgeinspiration

from various sources

ndash Physics (acoustics physical models )

ndash Studies of human cognition and perception (masking psychoacoustics )

ndash Musicology (musical constructs harmony tempo form )

bull Consistent framework for developing inference algorithms

bull Contrast to TraditionalProcedural approaches ndash where no clear

distinction between ldquowhatrdquo and ldquohowrdquo

bull Need to overcome computational obstacles (time memory)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2

Generative Models for audition

bull Computer audition hArr inverse synthesis via Bayesian inference

p(Structure|Observations) prop p(Observations|Structure)p(Structure)

Goal Developing flexible prior structures for modelling nonstationary

sources

lowast source separation transcription

lowast restoration interpolation localisation identification

lowast coding compression resynthesis cross synthesis

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3

Bayesian Source Separation

bull Joint estimation of Sources given Observations

Source Model v Parameters of Source prior

sk1 skn skN v

xk1 xkM

k = 1 K

λ

Observation Model λ Channel noise mixing system

p(Src|Obs) prop

int

dλdvp(Obs|Src λ)p(Src|v)p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 3: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Statistical Approaches

bull Probabilistic

bull Hierarchical signal models to incorporate prior knowledgeinspiration

from various sources

ndash Physics (acoustics physical models )

ndash Studies of human cognition and perception (masking psychoacoustics )

ndash Musicology (musical constructs harmony tempo form )

bull Consistent framework for developing inference algorithms

bull Contrast to TraditionalProcedural approaches ndash where no clear

distinction between ldquowhatrdquo and ldquohowrdquo

bull Need to overcome computational obstacles (time memory)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 2

Generative Models for audition

bull Computer audition hArr inverse synthesis via Bayesian inference

p(Structure|Observations) prop p(Observations|Structure)p(Structure)

Goal Developing flexible prior structures for modelling nonstationary

sources

lowast source separation transcription

lowast restoration interpolation localisation identification

lowast coding compression resynthesis cross synthesis

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3

Bayesian Source Separation

bull Joint estimation of Sources given Observations

Source Model v Parameters of Source prior

sk1 skn skN v

xk1 xkM

k = 1 K

λ

Observation Model λ Channel noise mixing system

p(Src|Obs) prop

int

dλdvp(Obs|Src λ)p(Src|v)p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 4: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Generative Models for audition

bull Computer audition hArr inverse synthesis via Bayesian inference

p(Structure|Observations) prop p(Observations|Structure)p(Structure)

Goal Developing flexible prior structures for modelling nonstationary

sources

lowast source separation transcription

lowast restoration interpolation localisation identification

lowast coding compression resynthesis cross synthesis

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 3

Bayesian Source Separation

bull Joint estimation of Sources given Observations

Source Model v Parameters of Source prior

sk1 skn skN v

xk1 xkM

k = 1 K

λ

Observation Model λ Channel noise mixing system

p(Src|Obs) prop

int

dλdvp(Obs|Src λ)p(Src|v)p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 5: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bayesian Source Separation

bull Joint estimation of Sources given Observations

Source Model v Parameters of Source prior

sk1 skn skN v

xk1 xkM

k = 1 K

λ

Observation Model λ Channel noise mixing system

p(Src|Obs) prop

int

dλdvp(Obs|Src λ)p(Src|v)p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 4

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 6: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 5

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 7: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Polyphonic Music Transcription

bull from sound

tsec

fHz

0 1 2 3 4 5 6 7 80

1000

2000

3000

4000

5000

0

10

20

(S)

bull to score

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 6

Decimated_chopinwav
Media File (audiowav)

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 8: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Modelling and Computational issues

bull Hierarchical

ndash Signal levelpitch onsets timbre

ndash Symbolic levelmelody motives harmony chords tonality rhythm beat tempo articulationinstrumentation voice

ndash Cognitive levelexpression genre form style mood emotion

bull Uncertainty

ndash Parameter LearningWhich pitch rhythm tempo meter time signature

ndash Model SelectionHow many notes harmonics onsets sections

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 7

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 9: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Generative Models for Music

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 8

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 10: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Generative Models for Music

Score Expression

Piano-Roll

Signal

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 9

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 11: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Hierarchical Modeling of Music

M

1 2 tv1 v2 vtk1 k2 kth1 h2 ht1 2 tm1 m2 mtgj1 gj2 gjtrj1 rj2 rjtnj1 nj2 njtxj1 xj2 xjtyj1 yj2 yjty1 y2 yt

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 10

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 12: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Research Questions

What kinds of prior knowledge and modelling techniques are usefulHow can we do efficient inference

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 11

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 13: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Signal Models for Audio

bull Time domain ndash state space dynamical models

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA) switching state space models

ndash Flexible Physically realistic

ndash Analysis down to sample precision Computationally quite heavy

bull Transform domain ndash Fourier representations Generalised Linear

model

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 12

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 14: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Sinusoidal Modeling

bull Sound is primarily about oscillations and resonance

bull Cascade of second order sytems

bull Audio signals can often be compactly represented by sinusoidals

(real) yn =

psum

k=1

αkeminusγkn cos(ωkn+ φk)

(complex) yn =

psum

k=1

ck(eminusγk+jωk)n

y = F (γ1p ω1p)c

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 13

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 15: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

State space Parametrisation

xn+1 =

eminusγ1+jω1

eminusγp+jωp

︸ ︷︷ ︸

A

xn x0 =

c1c2cp

yn =(

1 1 1 1)

︸ ︷︷ ︸

C

xn

x0 x1 xkminus1 xk xK

y1 ykminus1 yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 14

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 16: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

State Space Parametrisation

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 15

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 17: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Audio RestorationInterpolation

bull Estimate missing samples given observed ones

bull Restoration concatenative expressive speech synthesis

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 16

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 18: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Audio Interpolation

p(xnotκ|xκ) prop

int

dHp(xnotκ|H)p(xκ|H)p(H)

H equiv (parameters hidden states)

H

xnotκ xκ

Missing Observed

0 50 100 150 200 250 300 350 400 450 500

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 17

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 19: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Probabilistic Phase Vocoder (Cemgil and Godsill 2005)

Aν Qν

sν0 middot middot middot sν

k middot middot middot sνKminus1

ν = 0 W minus 1

x0 xk xKminus1

sνk sim N (sν

kAνsνkminus1 Qν) Aν sim N

(

(cos(ων) minus sin(ων)sin(ων) cos(ων)

)

Ψ

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 18

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 20: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Inference Structured Variational Bayes

Aα q(Aα) Qα q(Qα)

middot middot middot sαkminus1 sα

ksα

k+1 middot middot middot

α isin C

prod

k q(sαk |s

αkminus1)

xk q(xk)

bull Intuitive algorithm

ndash Substract from the observed signal x the prediction of the frequency bands in notα

ndash Compute a fit for α to this residual and iterate

bull For fixed A Q this is equivalent to Gauss-Seidel an iterative method for solving linear systems of

equations

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 19

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 21: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Restoration

bull Piano

ndash Signal with missing samples (37)

ndash Reconstruction 768 dB improvement

ndash Original

bull Trumpet

ndash Signal with missing samples (37)

ndash Reconstruction 710 dB improvement

ndash Original

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 20

piano_missingwav
Media File (audiowav)
piano_kalmanwav
Media File (audiowav)
piano_cleanwav
Media File (audiowav)
trumpet_missingwav
Media File (audiowav)
trumpet_kalmanwav
Media File (audiowav)
trumpet_cleanwav
Media File (audiowav)

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 22: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Hierarchical Factorial Models

bull Each component models a latent process

bull The observations are projections

rν0 middot middot middot rν

k middot middot middot rνK

θν0 middot middot middot θ

νk middot middot middot θ

νK

ν = 1 W

yk yK

bull Generalises Source-filter models

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 21

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 23: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Harmonic model with changepoints

rk|rkminus1 sim p(rk|rkminus1) rk isin 0 1

θk|θkminus1 rk sim [rk = 0]N (Aθkminus1 Q)︸ ︷︷ ︸

reg

+ [rk = 1]N (0 S)︸ ︷︷ ︸

new

yk|θk sim N (Cθk R)

A =

G2ω

GH

ω

N

Gω = ρk

(cos(ω) minus sin(ω)sin(ω) cos(ω)

)

damping factor 0 lt ρk lt 1 framelength N and damped sinusoidal basis matrix C of size N times 2H

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 22

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 24: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Exact Inference in switching state space models is intractable

bull In general exact inference is NP hard

ndash Conditional Gaussians are not closed under marginalization

rArr Unlike HMMrsquos or KFMrsquos summing over rk does not simplify the filteringdensity

rArr Number of Gaussian kernels to represent exact filtering density p(rk θk|y1k)increases exponentially

minus7903666343

076292

minus103422

minus101982minus2393

minus27957

minus04593

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 23

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 25: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Exact Inference for Changepoint detection

bull Exact inference is achievable in polynomial timespace

ndash Intuition When a changepoint occurs the state vector θ is reinitializedrArr Number of Gaussians kernels grows only polynomially (See eg Barry and Hartigan

1992 Digalakis et al 1993 O Ruanaidh and Fitzgerald 1996 Gustaffson 2000 Fearnhead 2003 Zoeter and

Heskes 2006)

r1 = 1 r2 = 0 r3 = 0 r4 = 1 r5 = 0

θ0 θ1 θ2 θ3 θ4 θ5

y1 y2 y3 y4 y5

bull The same structure can be exploited for the MMAP problem arg maxr1kp(r1k|y1k)

rArr Trajectories of r(i)1k which are dominated in terms of conditional evidence

p(y1k r(i)1k) can be discarded without destroying optimality

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 24

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 26: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Monophonic model (Cemgil et al 2006)

bull We introduce a pitch label indicator m

bull At each time k the process can be in one of the ldquomuterdquo ldquosoundrdquo timesM states

r0 r1 rT

m0 m1 mT

s0 s1 sT

y1 yT

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 25

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 27: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Monophonic Pitch Tracking

Monophonic Pitch Tracking = Online estimation (filtering) of p(rkmk|y1k)

100 200 300 400 500 600 700 800 900 1000minus100

minus50

0

50

100 200 300 400 500 600 700 800 900 1000

5

10

15

bull If pitch is constant exact inference is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 26

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 28: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Transcription

bull Detecting onsets offsets and pitch to sample precision (Cemgil et al 2006 IEEE

TSALP)

500 1000 1500 2000 2500 3000 3500

Exact inference (S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 27

d1wav
Media File (audiowav)

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 29: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Tracking Pitch Variations

bull Allow m to change with k

50 100 150 200 250 300 350 400 450 500

bull Intractable need to resort to approximate inference (Mixture Kalman Filter -Rao-Blackwellized Particle Filter)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 28

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 30: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Factorial Generative models for Analysis of Polyphonic Audio

νfr

eque

ncy

k

x k

bull Each latent changepoint process ν = 1 W corresponds to a ldquopiano keyrdquoIndicators r1W1K encode a latent ldquopiano rollrdquo (S1) (S2) (S3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 29

montuno1wav
Media File (audiowav)
montuno2wav
Media File (audiowav)
montuno3wav
Media File (audiowav)

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 31: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Single time slice - Bayesian Variable Selection

ri sim C(ri πon πoff)

si|ri sim [ri = on]N (si 0 Σ) + [ri 6= on]δ(si)

x|s1W sim N (x Cs1W R)

C equiv [ C1 Ci CW ]

r1 rW

s1 sW

x

bull Generalized Linear Model ndash Columnrsquos of C are the basis vectors

bull The exact posterior is a mixture of 2W Gaussians

bull When W is large computation of posterior features becomes intractable

bull Sparsity by construction (Olshausen and Millman Attias )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 30

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 32: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Factorial Switching State space model

r0ν sim C(r0ν π0ν)

θ0ν sim N (θ0ν microν Pν)

rkν|rkminus1ν sim C(rkν πν(rtminus1ν)) Changepoint indicator

θkν|θkminus1ν sim N (θkν Aν(rk)θkminus1ν Qν(rk)) Latent state

yk|θk1W sim N (yk Ckθk1W R) Observation

rν0 middot middot middot rν

k middot middot middot rνK

sν0 middot middot middot sν

k middot middot middot sνK

ν = 1 W

yk yK

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 31

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 33: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Synthetic Data

νx

freq

ν

ν

k

(S)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 32

audio_examplewav
Media File (audiowav)

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 34: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Technical Difficulties

bull Inference is quite heavy

bull Vanilla Kalman filtering methods are not stable ndash computations with

large matrices

ndash Need advance techniques from linear algebra

ndash Interesting links to subspace methods

bull Hyperparameter learning is necessary

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 33

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 35: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Modelling levels

bull Physical - acoustical

bull Time domain ndash state space dynamical models

bull Transform domain ndash Fourier representations Generalised Linear

model

bull Feature Based

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 34

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 36: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Spectrogram

bull Basis functions φk(t) centered around time-frequency atom k = k(ν τ) =(Frequency Time ) such as STFT or MDCT

x(t) =sum

k

skφk(t)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

bull Spectrogram displays log |sk| or |sk|2 (of STFT)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 35

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 37: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Models for time-frequency Energy distributions

bull Non-Negative Matrix factorisation (Sha Saul Lee 2002 Smaragdis Brown 2003

Virtanen 2003 Abdallah Plumbley 2004 )

Xντ = WνjSjτ

Spectrogram = Spectral Templatestimes Excitations

= times

ndash however spectrograms are not additive (a2 + b2 6= (a+ b)2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 36

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 38: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Models for time-frequency Energy distributions

bull Mask models (Roweis 2001 Reyes-Gomez Jojic Ellis 2005 )

Xντ = [rντ = 0]S(0)ντ + [rντ = 1]S(1)

ντ

Spectrogram = Masktimes Source0 + (1minusMask)times Source1

= + +

ndash however sources do overlap in time and frequency

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 37

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 39: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Prior structures on time-frequency Energy distributions

bull Main Idea Spectrogram is a point estimate of the energy at a

time-frequency atom k(ν τ)

bull We place a suitable prior on the variance of transform coefficients sk

and tie the prior variances across harmonically and temporally related

time-frequency atoms

p(s|v)p(v) =

(prod

k

p(sk|vk)

)

p(v)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 38

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 40: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

One channel source separation Gaussian source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim N (skn 0 vkn)

xk|sk1N =sumN

n=1 skn

bull Straightforward application of Bayesrsquo theorem yields

p(skn|vk1N xk) = N (skn κknxk vkn(1minus κkn))

κkn = vknsum

nprime

vknprime (Responsibilities)

bull Each source coefficient sn gets a fraction κn of the observation x

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 39

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 41: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

One channel source separation Poisson source model

vk1 vkN

sk1 skN

xk

k = 1 K

skn|vkn sim PO(skn vkn)

xk|sk1N =sumN

n=1 skn

bull This is the generative model for the NMF when we write

vk(ντ)n = tνn times eτn (Templatetimes Excitation)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 40

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 42: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma G(x a b) and Inverse Gamma IG(x a b)

0 1 2 3 4 50

02

04

06

08

1

12a = 09 b =1

a = 1 b =1

a = 13 b =1

a = 2 b =1

x

p(x)

0 1 2 3 4 50

02

04

06

08

1

12

14

a=1 b=1

a=1 b=05

a=2 b=1

G(x a z) equiv exp((aminus 1) log xminus zminus1x+ a log zminus1 minus log Γ(a))

IG(x a z) equiv exp((a+ 1) log xminus1 minus zminus1xminus1 + a log zminus1 minus log Γ(a))

bull Gamma Conjugate prior for Gaussian precision Poisson intensity Inverse Gamma scale

bull Inverse Gamma Conjugate prior for Gaussian variance and Gamma scale

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 41

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 43: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma Chains

We define an inverse Gamma-Markov chain for k = 1 K as follows

vk|zk sim IG(vk a zka)

zk+1|vk sim IG(zk+1 az vkaz)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

bull Variance variables v are priors for sources

bull Auxillary variables z are needed for conjugacy and positive correlation

bull Shape parameters a and az describe coupling strength and drift of the chain

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 42

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 44: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma Chains typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 43

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 45: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma Chains with changepoints typical draws

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 10

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 4

100 200 300 400 500 600 700 800 900 1000

minus20

0

20

log

v k

a = 10 az = 40

k

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 44

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 46: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma Chains

bull The joint can be written as product of singleton and pairwise

potentials of form

ψkk = exp(minusazminus1k vminus1

k ) (Pairwise)

z1 middot middot middot vkminus1 zk vk zk+1 middot middot middotaz a az

φzk = exp((az + a+ 1) log zminus1

k ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 45

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 47: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Gamma Fields

bull The joint can be written as product of singleton and pairwise

potentials

ψij = exp(minusaijξminus1i ξminus1

j ) (Pairwise)

φi = exp((sum

j

aij + 1) log ξminus1i ) (Singletons)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 46

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 48: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Possible Model Topologies

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 47

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 49: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Approximate Inference

bull Stochastic

ndash Markov Chain Monte Carlo Gibbs sampler

ndash Sequential Monte Carlo Particle Filtering

bull Deterministic

ndash Variational Bayes

In all these conjugacy helps

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 48

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 50: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(vk) larr exp(φk + 〈logψkk + logψkk+1〉q(τ)(zk)q(τ)(zk+1))

bull Gibbs

v(τ)k sim p(vk|zkminus1 zk yk) prop p(yk|vk)ψkk(z

(τ)k )ψkk+1(z

(τ)k+1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 49

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 51: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

VB or Gibbs

ψ01

z1

ψ11

v1

ψ12

z2

ψ22

v2 middot middot middot

p(y1|v1) p(y2|v2)

bull VB

q(τ)(zk) larr exp(φk + 〈logψkkminus1 + logψkk〉q(τ)(vk)q(τ)(vk+1))

bull Gibbs

z(τ)k sim p(zk|vkminus1 vk) prop ψkkminus1(v

(τ)kminus1)ψkk(v

(τ)k )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 50

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 52: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Denoising - Speech (VB)

bull Additive Gaussian noise with unknown variance

bull Inference Variational Bayes

Noisy Original

X

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xorg

20 40 60 80 100 120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xh SNR1998

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xv SNR2079

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xb SNR1968

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Xg SNR1997

20406080100120

50

100

150

200

250

300

350

400

450

500

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 51

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 53: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Denoising ndash MusicOriginal

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Noisy

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

PF SNR853

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

Gibbs SNR866

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

VB SNR208

50 100 150 200 250

50

100

150

200

250

300

350

400

450

500

minus18

minus16

minus14

minus12

minus10

minus8

minus6

minus4

minus2

0

ldquoTristram (Matt Uelmen)rdquo + sim 0dB white noise

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 52

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 54: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Single Channel Source Separation (with Onur Dikmen)

bull Source 1 Horizontal Tie across time harmonic continuity

bull Source 2 Vertical Tie across frequency transients percussive sounds

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 53

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 55: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Single Channel Source Separation with IGMCs

E-guitar ldquoMatte Kudasai (King Crimson)rdquo + Drums ldquoTerritory (Sepultura)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -474 -328 567 -158 1546 -137

Gibbs -45 -262 457 105 1246 161

GibbsEM -423 -242 482 134 1313 185

Preminus trained -404 -315 813 356 1144 464

Oracle 614 1716 658 1266 1995 136

bull Oracle We use the square of the source coefficient as the latent variance estimate

bull Pre-trained We use the best coupling parameters az and a trained on isolatedsources

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 54

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 56: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Single Channel Source Separation with IGMCs

ldquoVandringar I Vilsenhet (Anglagard)rdquo + ldquoMoby Dick (Led Zeppelin)rdquo = Mix

s1 s2

SDR SIR SAR SDR SIR SAR

VB -78 -622 453 -235 184 -225

Gibbs -846 -753 693 -404 1459 -383

GibbsEM -774 -619 462 -114 1662 -097

Preminus trained -64 -539 695 38 1639 414

Oracle 121 329 1214 2113 3389 2137

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 55

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 57: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Harmonic-Transient Decomposition

Time (τ)

Fre

quen

cy B

in (ν

)

Xorg

Shor

Sver

(Original) (Hor) (Vert)

(Original) (Hor) (Vert)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 56

s2wav
Media File (audiowav)
ss_s2_est1wav
Media File (audiowav)
ss_s2_est2wav
Media File (audiowav)
originalwav
Media File (audiowav)
sig1wav
Media File (audiowav)
sig2wav
Media File (audiowav)

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 58: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Chord Detection - Signal model (with Paul Peeling)

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 57

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 59: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Chord Detection

Time τ s

Fre

quen

cy

Hz

MDCT of piano chord 41485156

05 1 15 2 250

500

1000

1500

2000

2500

3000

3500

4000

Time τ s

MID

Inot

ej

logsum

ν vνjτ

05 1 15 2 2540

45

50

55

60

65

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 58

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 60: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Multichannel Source Separation

bull Hierarchical Prior Model (Fevotte and Godsill 2005 Cemgil et al 2006)

λ1 λn λN sim G(λn aλ bλ)

vk1 vkn middot middot middot vkN sim IG(vkn ν2 2(νλn))

sk1 skn skN sim N (skn 0 vkn)

xk1 xkM

k = 1 K

sim N (xkma⊤msk1N rm)

a1 r1 aM

sim N (am middot middot middot )

rM

sim IG(rm middot middot middot )

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 59

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 61: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Equivalent Gamma MRF

bull A tree for each source

bull λn can be interpreted as the overall ldquovolumerdquo of source n

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 60

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 62: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Source Separation

tsecfH

z

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

5

10

15

20

25

(Guitar)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

(Mix)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 61

s1wav
Media File (audiowav)
s2wav
Media File (audiowav)
s3wav
Media File (audiowav)
x1wav
Media File (audiowav)

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 63: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Reconstructions

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

10

15

20

25

30

(Speech)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

0

10

20

30

(Piano)

tsec

fHz

0 200 400 600 800 1000 12000

2000

4000

6000

8000

10000

5

10

15

20

25

(Guitar)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 62

var_se1wav
Media File (audiowav)
var_se2wav
Media File (audiowav)
var_se3wav
Media File (audiowav)

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 64: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Multimodality

bull Typically underdetermined (Channels lt Sources) rArr Multimodal posterior

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 63

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 65: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Multimodality

Annealing Bridging Overrelaxation Tempering

0 500 1000 1500 2000

minus08024

08295

20375

a

0 500 1000 1500 2000

72408

251398362295

λ

0 500 1000 1500 2000

0545118648

r

Epoch

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 64

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 66: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Tempo tracking and score performance matching

bull Given expressive music data (onsetsdetectionsspectral features)

ndash Determine the position of a performance on a score

ndash Determine where a human listener would clap her hand

ndash Create a quantizedhuman readable score

ndash

bull Online-Realtime or Offline-Batch

bull All of these problems can be mapped to inference problems in a HMM

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 65

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 67: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer (Whiteley Cemgil Godsill 2006)

| | |

3 bull bull bull bull bull bull bull bull

nk 2 bull bull bull bull bull bull bull bull

1 bull bull bull bull bull bull bull bull

1 2 3 4 5 6 7 8

mk

34 time

44 time

bull Each dot denotes a state x = (mn) (Score Position ndash Tempo level)

bull Directed Arcs denote state transitions with positive probability

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 66

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 68: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - transition model

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Tem

po L

evel

Bar Position

p(x2| x

1)

1 2 3 4 5 6 7 8

1

2

3

4

5

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 67

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 69: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 1

Tem

po L

evel

Bar Position

p(x1)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 68

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 70: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 2

Tem

po L

evel

Bar Position

p(x2)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 69

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 71: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 3

Tem

po L

evel

Bar Position

p(x3)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 70

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 72: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 4

Tem

po L

evel

Bar Position

p(x4)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 71

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 73: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 5

Tem

po L

evel

Bar Position

y5 = 0 p(x

5| y

15)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 72

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 74: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - k = 10

Tem

po L

evel

Bar Position

p(x10

)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 73

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 75: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Bar position Pointer - observation model (Poisson)

bull Observation model p(yk|xk) Poisson intensity

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Triplet Rhythm

0 100 200 300 400 500 600 700 800 900 10000

2

4

mk

micro k

Duplet Rhythm

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 74

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 76: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Tempo Rhythm Meter analysis

Bar Pointer Model (Whiteley Cemgil Godsill 2006)

n0 n1 n2 n3

θ0 θ1 θ2 θ3

m0 m1 m2 m3

r0 r1 r2 r3

λ1 λ2 λ3

y1 y2 y3

bull θ Time signature indicator (eg 34 44) r Rhytmic pattern indicator

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 75

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 77: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Filtering

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1k)

50 100 150 200 250 300 350 400 450

800

600

400

200

minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1k)

50 100 150 200 250 300 350 400 450

180

120

60minus4

minus2

0

p(rk|y

1k)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets

002040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 76

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 78: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Smoothing

0 50 100 150 200 250 300 350 400 4500

1

2

y k

Observed Data

mk

log p(mk|y

1K)

50 100 150 200 250 300 350 400 450

800

600

400

200 minus10

minus5

0Q

uart

er n

otes

per

min

log p(nk|y

1K)

50 100 150 200 250 300 350 400 450

180

120

60minus10

minus5

0

p(rk|y

1K)

Frame Index k

50 100 150 200 250 300 350 400 450

Triplets

Duplets 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 77

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 79: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Time Signature

0 2 4 6 8 10 12

minus1

0

1

sam

ple

valu

e

time s

Observed Data

mk

log p(mk|z

1K)

100 200 300 400 500

800

600

400

200 minus10

minus5

0

Qua

rter

not

es p

er m

in log p(n

k|z

1K)

100 200 300 400 500

155

103

52minus10

minus5

0

p(θk|z

1K)

Frame Index k

100 200 300 400 500

44

34 02040608

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 78

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 80: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Score-Performance matching (ISMIR) 2007

bull Given a musical score associate note events with the audio

4

t

x t

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 79

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 81: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Score-Performance matching - Graphical Model

ν = 1 W

t1 t2 tK

r1 r2 rK

λ1 λ2 λK

vν1 vν2 vνK

sν1 sν2 sνK

6 7 81 2 53 4

rk

vντ sim IG(vντ a 1(aλσν(rτ)))

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 80

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 82: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Score-Performance matching - Signal model

0 500 1000 1500 2000 2500 3000 3500 4000minus12

minus10

minus8

minus6

minus4

minus2

0

Frequency ν Hz

log

σ ν

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 81

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 83: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Score-Performance matching

Spectrogram Data

Time s

Fre

quen

cy

Hz

0 2 4 6 8 10 12 140

1000

2000

3000

4000

50 100 150 200 250 300 350 400 45055

60

65

70

75

80

85MIDI Data

Score position

MID

I not

e

Online (filtering) or Offline (smoothing) processing is possible

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 82

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 84: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Transcription

log p(rτ |sτ )

MID

Inot

enum

ber

Time s1 2 3 4

60

65

70

75

80

1 2 3 4minus10

minus5

0

5

10

sum

i w(i)τ λ

(i)τ

Time s

logλ

MDCT of audio (source Daniel-Ben Pienaar)

Time s

Fre

quen

cy

Hz

1 2 3 40

500

1000

1500

2000

2500

3000

3500

4000

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 83

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 85: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Summary

bull ldquoTime Domainrdquo ndash Switching State Space Models

ndash State space modeling

ndash Conditional Linear Dynamical Systems Gaussian processes (eg

AR ARMA)

ndash Analysis down to sample precision (if required)

ndash Computationally quite heavy

bull ldquoTransform Domainrdquo ndash Gamma Fields

ndash Models on (orthogonal) transform coefficients Energy compaction

ndash Practical can make use of fast transforms (FFT MDCT )

ndash Inherent limitations (analysis windows frequency resolution)

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 84

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85

Page 86: Hierarchical Bayesian Models for Audio and Music Signal ...cemgil/papers/talks/... · Signal Processing A. Taylan Cemgil Signal Processing and Communications Lab. ... Cemgil Hierarchical

Summary

bull Gamma chains and fields a flexible stochastic volatility prior for

ndash Time-Frequency Energy distributions

bull Ongoing Work

ndash Comparison of inference methods (VB MCMC SMC)

ndash Learning

ndash Applications

lowast Chord detection Polyphonic transcription

lowast Musical Score guided source separation

ndash Prior structures for other observation models NMF

Cemgil Hierarchical Bayesian Models for Music Signal Analysis Nips 2007 Workshops Whistler Canada 1 December 2007 85