1 copyright 2011 g.tzanetakis music information retrieval george tzanetakis...

34
1 Copyright 2011 G.Tzanetakis Music Information Retrieval George Tzanetakis ([email protected] ) Associate Professor, IEEE Senior Member Tier II Canada Research Chair Computer Science Department (also in Music, ECE) University of Victoria, Canada

Upload: ayana-mcneil

Post on 16-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

1 Copyright 2011 G.Tzanetakis

Music Information Retrieval

George Tzanetakis ([email protected]) Associate Professor, IEEE Senior MemberTier II Canada Research Chair Computer Science Department(also in Music, ECE)University of Victoria, Canada

Copyright 2011 G.Tzanetakis

MIR

‣ Interdisciplinary science of retrieving information from music

‣ ISMIR - Int. Symposium -> Int. Conf. on MIR -> Int. Conf. of the Society of MIR

‣ First ISMIR in 2000

‣ Increasing presence in ICASSP, ICME, ACMM, TMM, TASLP, MMTA

‣ All proceedings are freely available online

[email protected]

3 Copyright 2011 G.Tzanetakis

ConnectionsMachineLearning

Signal Processing

Psychology

Computer Science

Information Science

Human-ComputerInteraction

MUSIC

Copyright 2011 G.Tzanetakis4

Music today Music is produced, distributed and consumed

digitally

2011 digital music sales > physical album sales

5 Copyright 2011 G.Tzanetakis

IndustryQuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

6725421

Copyright 2011 G.Tzanetakis

Music Collections

‣ Personal music collections ~ thousands

‣ Streaming music sites, stores ~ millions

‣ Great celestial jukebox in the sky ~ all of recorded music in human history

‣ A 5-minute music track is digitally represented using approximately 26 million floating point numbers

7 Copyright 2011 G.Tzanetakis

Overview Focus on signal processing and audio

Audio Feature Extraction

Timbre, Pitch, Rhythm

Analysis

Similarity, Classification, Modelling Time

Tasks

Similarity, Genre classification, Tag annotation, Query-by-Humming, Audio-Score Alignment

8 Copyright 2011 G.Tzanetakis

Audio Feature Extraction Sound and sine waves

Timbral Features Short Time Fourier Transform (STFT)

Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Audio Compression

Pitch and Harmony

Rhythm

9 Copyright 2011 G.Tzanetakis

Linear Systemsand Sinusoids

in1

in2

in1 + in2

out1

out2

out1 + out2

Amplitude

Period = 1 / Frequency

0 180 360

Phase True sine waves last forever

sine wave -> LTI -> new sine wave

10

Copyright 2011 G.Tzanetakis

Fourier Transform

Text

1768-1830

Copyright 2011 G.Tzanetakis

Short TimeFourier Transform

Time-varying spectraFast Fourier Transform FFT

Input

Time

t

t+1

t+2

Filters Oscillators

Output

Amplitude

Frequency

12

Copyright 2011 G.Tzanetakis

Spectrum and Shape Descriptors

M

F

CentroidRolloffFlux BandwidthMoments....

Centroid

FeatureSpace

Feature vector

=

13

Copyright 2011 G.Tzanetakis

Mel Frequency Cepstral Coefficients

Mel-scale13 linearly-spaced filters 27 log-spaced filters

CFCF-130CF / 1.0718

CF+130CF * 1.0718

Mel-filtering

Log

DCT

MFCCs

14

Copyright 2011 G.Tzanetakis

Audio Feature Extraction

15

Copyright 2011 G.Tzanetakis

Traditional Music Representations

16

Copyright 2011 G.Tzanetakis

Pitch content Harmony, melody = pitch concepts

Music Theory Score = Music

Bridge to symbolic MIR

Automatic music transcription

Non-transcriptive argumentsSplit the octave to discrete logarithmicallyspaced intervals

17

Copyright 2011 G.Tzanetakis

Pitch Detection

P

Time-domainFrequency-domainPerceptual

Pitch is a PERCEPTUAL attribute correlated but not equivalent to fundamental frequency

18

Copyright 2011 G.Tzanetakis

Time Domain

C4 Clarinet Note C4 Sine Wave

# zero-crossings sensitive to noise – needs LPF

19

Copyright 2011 G.Tzanetakis

AutoCorrelation

Efficient computation possible for powers of 2 using FFT

F(f) = FFT(X(t))S(f) = F(f) F*(f)R(l) = IFFT(S(f))

20

Copyright 2011 G.Tzanetakis

Frequency Domain

Fundamental frequency (as well as pitch) will correspond to peaks in the Spectrum. The fundamental does not necessarily have the highest amplitude.

Sine C4 Clarinet C4

21

Copyright 2011 G.Tzanetakis

Chroma – Pitch perception

22

Copyright 2011 G.Tzanetakis

Automatic RhythmDescription

23

Copyright 2011 G.Tzanetakis

Beat HistogramsTzanetakis et al AMTA01

max(h(i)), argmax(h(i)) Beat Histogram Features

24

Copyright 2011 G.Tzanetakis

Analysis Overview

Musical Piece

Trajectory

Point

Cloud

25

Copyright 2011 G.Tzanetakis

Content-basedSimilarity Retrieval

(or query-by-example)

Point

Input: Query exampleOutput: Ranked list of similar audio files based on feature vector similarity

26

Copyright 2011 G.Tzanetakis

p( | ) * P( )

Classification

Decision boundary

Partitioning of feature spaceGenerative vs discriminative models

P( | )= p( )

MusicSpeech

27

Copyright 2011 G.Tzanetakis

Classification

Genre/Style

Emotion/Mood

Artist

Instrument

MIREX 2007

10 genres 700 30-secondclips / genre

28

Copyright 2011 G.Tzanetakis

Multi-tag annotation

Free-form tags (female voice, woman singing)

Multi-label classification problems with twists

Issues: synonyms, subpart relations, sparse,noisy

Cold start problem

Typically each tag is treated independently as a classification problem

Inverse also interesting (query-by-keywords)

29

Copyright 2011 G.Tzanetakis

Stacking

30

Copyright 2011 G.Tzanetakis

Polyphonic Audio-Score Alignment

Representation Time Series of Chroma

Matching Procedure Dynamic Time

Warping

31

Copyright 2011 G.Tzanetakis

Dynamic Time Wraping

Aligned Performances of the same orchestral piece

Attempting to align two different orchestra pieces

32

Copyright 2011 G.Tzanetakis

Query-by-humming

User sings a melody

Computer searches database for song containing the melody

The challenge of difficult queries

33

Copyright 2011 G.Tzanetakis

The MUSART system

Query preprocessing Pitch contour extraction (audio) Note segmentation (symbolic)

Target preprocessing (symbolic) Theme extraction Model-forming, representation

Search to find approximate match Dynamic Time Warping, HMMs

34

Copyright 2011 G.Tzanetakis

Conclusions Through a combination of digital signal processing

and machine learning techniques a variety of music information retrieval tasks have been explored in the literature

The tasks covered in this presentation are representative of existing work and there are already commercial implementations for them. There are many more that are actively being investigated.

Music is a complex and fascinating signal and we

are just beginning to understand it better using computers