automatic transcription of piano music - presentation at icme 2011

7/31/2019 Automatic Transcription of PIano Music - Presentation at ICME 2011

1/28

Automatic Transcription of Piano Music by

Sparse Representation of Magnitude Spectra

Cheng-Te Lee, Yi-Hsuan Yang, and Homer Chen

National Taiwan University

ICME 2011 Oral Presentation

2011/07/14

Speaker: Cheng-Te Lee

1


2/28

Outline

Introduction

Proposed System

Performance Analysis & Demo

2


3/28

I. Introduction

3


4/28

Automatic Transcription

Music signal

(in WAVE format)Musical score

(in MIDI format)

Goal: Converting music signal to musical

scores

Main drawbacks of previous work

Training data is difficult to generate

Assuming the spectral shapes of notes are

constant

4


5/28

Spectral Shape of Piano Sound

Spectra of note C4 (MIDI number 60)

produced by 6 pianos

5


6/28

ADSR Model

Attack, Decay, Sustain, Release

The spectral shape of a note varies with time

6

A

D

S

R

Frame

Note C4 in time-domain Spectra over time


7/28

Design Consideration

Exploit online repository of piano notes as

database to make the transcription

work without generating training data

adapt to a new piano easily adopt the ADSR model

Synthesized mixture

Keyboard

Database of

individual

piano notes

7Input signal


8/28

II. Proposed System

8


9/28

Tuning factor

estimation

WAVE fileVolume

normalization

Frame

decomposition

FFT

analysis

Note candidate

selection

Sparse representation

computation

Noise

elimination

HMM post-processing

MIDI filePiano sound

database

DatabaseTuning

Tuned piano

sound database

System Overview

9


10/28

DatabaseTuning

HMM post-processing

Noise

elimination

FFT

analysis

Volume

normalization

Tuning factor

estimation

WAVE fileFrame

decomposition

Note candidate

selection


computation


database

Tuned piano

sound database

Note Candidate Selection

10


11/28

Note Candidate Selection

Octave notes can be easily mistaken for each

other because they have similar spectra

Avoid octave error by note candidate selection

Leverage the harmonic structure of piano sounds

Spectra of note C4 (MIDI number 60) of two pianos:

11

Strong fundamental Weak fundamental


12/28

Illustration of Candidate Selection

Strong fundamental

Weak fundamental

13


13/28

Note candidate

selection

DatabaseTuning

HMM post-processing

Noise

elimination

FFT

analysis

Volume

normalization

Tuning factor

estimation

WAVE fileFrame

decomposition


computation


database

Tuned piano

sound database

Sparse Representation Computation

14


14/28

Sparsity of Played Notes

A total of 88 keys on a piano

But the actual keys played each time are a

sparse subset of the whole keys

Only 4 voiced notes at a time on average

15


15/28

Sparse Representation

Problem formulation

y: vector of the magnitude spectrum of a frame

A: matrix of bases, each column of A is the magnitude

spectrum of a note candidate

x*: vector of sparse representation coefficients

*

0argmin || || subject to ,x

x x y = Ax (1)

16


16/28

Illustration of Sparse Representation

y (frame spectrum) A (spectra of note candidates) x* (coefficient vector)

17

Solving (1) is NP-complete

*


x x y = Ax (1)


17/28

Sparse Representation (contd)

If the solution of (1) is sparse enough, it is close

to the solution of the l1-regularized problem

Can be solved in polynomial time, O(n1.2)

* 2

1argmin || || + || ||

xx y - Ax x

18

*


x x y = Ax (1)


18/28

Note candidate

selection

DatabaseTuning

HMM post-processing

Noise

elimination

FFT

analysis

Volume

normalization

Tuning factor

estimation

WAVE fileFrame

decomposition


computation


database

Tuned piano

sound database

HMM Post-Processing

19


19/28

20

HMM Post-Processing

Model each note with a two-state (on/off)HMM (88 HMMs for 88 keys on a piano)

Given a frame sequence X = x1x2xn, t[1,n]

Maximize

Because

so we maximize

Learnt from MIDI files

Estimated from sparse

representation coefficient


20/28

(b) After HMM post-processing

Result of HMM Post-Processing

21

True Positive , False Positive False Negative , True Negative,

(a) Before HMM post-processing


21/28

III. Performance Analysis & Demo

22


22/28

Frame-Level Evaluation

70.2% F-measure 10 one-minute long classical music recordings

Each frame is 100 ms long, hop size is 10 ms

59,910 frames, 211,082 notes, 3.54 avg. polyphony

Significant improvement compared to two state-

of-the-art systems

Under the one-tailed t-test (p-value < 0.05)

F-measure Precision Recall

Proposed system 70.2% 74.4% 66.5%

Klapuris system [1] 62.2% 72.4% 54.6%

Marolts system [2] 66.1% 78.6% 57.1%

[1] M. Marolt, A connectionist approach to automatic transcription of polyphonic piano music,IEEE Trans. Multimedia, vol. 6, no. 3, pp. 439449, 2004.

[2] A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, in Proc. ISMIR, Victoria, Canada, pp. 216221, Oct. 2006.23


23/28

Note-Level Evaluation

73.0% F-measure

Only consider onsets of notes

Within 100ms of the ground-truth onset

4937 notes

Significant improvement compared to the best

system of MIREX F0 tracking 2010 [3]

24

F-measure Precision Recall

Proposed system 70.2% 74.6% 71.6%

Yehs system [3] 67.1% 57.2% 81.1%

[3] C. Yeh and A. Roebel. (2010). Multiple-F0 estimation for MIREX 2010. Music Information Retrieval Evaluation eXchange.

[Online]. Available: http://www.music-ir.org/mirex/abstracts/2010/AR1.pdf
http://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdfhttp://www.music-ir.org/mirex/abstracts/2010/AR1.pdf


24/28

Analysis of System Components

25


25/28

Number of Base Elements

Because we adopt the ADSR model, there are

more than one base element for each note

F-measure is improved from 64.6% (88 base

elements) to 70.2% (646 base elements)

26


26/28

Conclusion

We have presented an automatic transcription

system that

exploits sparse nature of played keys

adapts to a new piano easily

adopts ADSR model to improve the accuracy

Significant improvement over state-of-the-art

systems


27/28

Live Demo

Song

Prelude and

Fugue No.2 in

C Minor

Sonata no. 8

Pathetique in

C minor, 3rd

movement

Moments

Musicaux No.

4

Sonata K.333

in Bb Major,

1st Movement

Composer Bach Beethoven Schubert Mozart

Original

Result

F-measure 78.2% 74.6% 67.0% 78.4%

28


28/28

Thanks for your attention

Q&A

29

automatic transcription of piano music - presentation at icme 2011

Documents