machine learning for music

64
Machine Learning for Music Faculty of Mathematics and Informatics, SU Petko Nikolov April 8, 2015

Upload: petko-nikolov

Post on 17-Jul-2015

807 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Machine learning for Music

Machine Learning for Music

Faculty of Mathematics and Informatics, SUPetko Nikolov April 8, 2015

Page 2: Machine learning for Music

About Me

Machine Learning

Music Information Retrieval

Machine Learning / Automated Data Science

Page 3: Machine learning for Music

What’s Music Information Retrieval?

Musicology

Computer Science

Signal Processing

Machine Learning

MIR

Page 4: Machine learning for Music
Page 5: Machine learning for Music
Page 6: Machine learning for Music

Music Recommendations

Page 7: Machine learning for Music

Recommending tags

Page 8: Machine learning for Music

Spotify’s Shuffle Mode

● Not really random

● Certainly some processing

● Probably some MIR behind

Page 9: Machine learning for Music

Pandora’s Music Genome Project

● started in 2000

● 800 000 manually annotated tracks by music experts

● 450 attributes to describe music

● 25 minutes per track to label

Page 10: Machine learning for Music

MIREX

Music Information Retrieval Evaluation eXchange annual competition featuring more than 20 tasks

state-of-the-art algorithms compete against each other

Page 11: Machine learning for Music

Structured Information

Retrieval

Synthesis

Page 12: Machine learning for Music

fingerprintingcover song detectiongenre recognitioninstrument recognitionmood detectiontranscriptionplaylist generation

beat trackingkey detectionpitch trackingvocal detectionrecommendationaudio similaritysource separation

Page 13: Machine learning for Music

genre recognitioninstrument recognitionmood detection

vocal detection

audio similarity

Page 14: Machine learning for Music

MIR Architecture

Audio

Page 15: Machine learning for Music

Segmentation and

Preprocessing

MIR Architecture

Audio

Page 16: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

MIR Architecture

Audio

Page 17: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

MIR Architecture

Audio

Page 18: Machine learning for Music

Segmentation and

Preprocessing

Feature Extraction

Machine Learning

classical

piano

romanticBethoven

by Daniel Barenboim

2 4

MIR Architecture

Audio

Page 19: Machine learning for Music

Segmentation and

Preprocessing

classical

piano

romanticBethoven

Deep Learning

by Daniel Barenboim

2 4

MIR Architecture

Audio

Page 20: Machine learning for Music

Audio signal

Page 21: Machine learning for Music

Audio signal

human hearing: 20 Hz to 20 KHz

Page 22: Machine learning for Music

Segmentation

Page 23: Machine learning for Music

SegmentationFrame

Page 24: Machine learning for Music

SegmentationFrame

52 ms

Page 25: Machine learning for Music

SegmentationFrame

52 msf1

Page 26: Machine learning for Music

SegmentationFrame

52 msf1 f2

Page 27: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3

Page 28: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3 f4

Page 29: Machine learning for Music

SegmentationFrame

52 msf1 f2 f3 f4 fn

Page 30: Machine learning for Music

Spectrum - on frame level

Discrete Fourier Transform (DFT)

time frequency

Page 31: Machine learning for Music

Feature extraction

f x

Page 32: Machine learning for Music

Spectral Centroid

where is the ‘center of mass’ of the spectrum

Page 33: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 34: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 35: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 36: Machine learning for Music

Spectral Slope

fit linear regression and get the slope coef.

Page 37: Machine learning for Music

Spectral Correlation is the cosine distance between the frequency vectors of two consecutive framesVariation is (1.0 - correlation) respectively.

Spectral Correlation / Variation

Page 38: Machine learning for Music

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Frames

Page 39: Machine learning for Music

Feature extraction - Result

f11 f12 f13 f14 f15 ……… f1m

f21 f22 f23 f24 f25 ……… f2m

centroid

correlation

Framesframes number vary across audio recordings

Page 40: Machine learning for Music

Universal Background Model

Page 41: Machine learning for Music

Gaussian Mixture Model

frame feature vector

Page 42: Machine learning for Music

Gaussian Mixture Model

Multivariate Gaussian Distribution

Page 43: Machine learning for Music

Gaussian Mixture Model

Page 44: Machine learning for Music

Gaussian Mixture Model

Page 45: Machine learning for Music

Gaussian Mixture Model - per track

Page 46: Machine learning for Music

Gaussian Mixture Model - per track

Page 47: Machine learning for Music

Gaussian Mixture Model - per track

Page 48: Machine learning for Music

Gaussian Mixture Model - per track

[𝛍1,𝛍2,𝛍3,𝛍4]

Page 49: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 50: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 51: Machine learning for Music

Classification - Example Neural Netaik

wk

Feature vector

Input Hidden Output

Likelihood of Rock?

Layers:

Page 52: Machine learning for Music

What’s Deep Learning?

(defn deep-learning? [neural-net] (hidden-layer? neural-net))

we are trying to learn new high-level representation having many more hidden layers

input is as raw as possible

Page 53: Machine learning for Music

Mel-spectrum

Page 54: Machine learning for Music

Deep Neural Network

Page 55: Machine learning for Music

Deep Neural Network

Backpropagation

Page 56: Machine learning for Music

Deep Neural Network

Backpropagation

Page 57: Machine learning for Music

Deep Neural Network

Backpropagation gradient fades quickly

Page 58: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Page 59: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Page 60: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Hidden Layer 1Restricted Boltzmann Machine

Page 61: Machine learning for Music

Deep Belief Network

Input (Mel spectrum)

Output

Hidden Layer 3

Hidden Layer 2

Hidden Layer 1Restricted Boltzmann Machine

RBM

RBM

RBM

Rock Jazz Punk Electronic

Page 62: Machine learning for Music

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Page 63: Machine learning for Music

Deep Auto Encoders

Mel spectrum

Mel spectrumOutput

Input

Used for denoising

Page 64: Machine learning for Music

Tools

essentia - audio retrieval algorithms

theano - CPU/GPU symbolic optimization

scikit-learn - machine learning in Python